When exploring data or models we often examine variables one by one. This analysis is incomplete if the relationship between these variables is not taken into account. The corrgrapher package facilitates simultaneous exploration of the Partial Dependence Profiles and the correlation between variables in the model.
This package aims to plot correlations between variables in form of a graph. Each node on it is associated with single variable. Variables correlated with each other (positively and negatively alike) shall be close, and weakly correlated - far from each other.
It is achieved through a physical simulation, where the nodes are treated as points with mass (and are pushing each other away) and edges are treated as mass-less springs. The length of a spring depends on absolute value of correlation between connected nodes. The bigger the correlation, the shorter the spring.
When you click on the node of the graph you can view the distribution or the Partial Dependence Plot for the selected variable.
The easiest way to get
corrgrapher is to install it from CRAN:
Or the the development version from GitHub:
First, load the package
For data frames the
corrgrapher shows correlation network and histograms/distributions for features.
df <- as.data.frame(datasets::Seatbelts) cgr <- corrgrapher(df) cgr
For models the
corrgrapher shows partial dependencies. Use the
DALEX::explain() function to create an adapter for any predictive model.
library(DALEX) library(ranger) titanic_rgr <- ranger(survived ~ ., data = titanic_imputed, classification = TRUE) titanic_exp <- explain(titanic_rgr, data = titanic_imputed, y = titanic_imputed$survived, verbose = FALSE) cgr <- corrgrapher(titanic_exp) cgr