This is the main function of corrgrapher package. It does necessary calculations and creates a corrgrapher object. Feel free to pass it into plot, include it in knitr report or generate a simple HTML.

corrgrapher(x, ...)

# S3 method for explainer
corrgrapher(
  x,
  cutoff = 0.2,
  values = NULL,
  cor_functions = list(),
  ...,
  feature_importance = NULL,
  partial_dependence = NULL
)

# S3 method for matrix
corrgrapher(x, cutoff = 0.2, values = NULL, cor_functions = list(), ...)

# S3 method for default
corrgrapher(x, cutoff = 0.2, values = NULL, cor_functions = list(), ...)

Arguments

x

an object to be used to select the method, which must satisfy conditions:

  • if data.frame (default), columns of numeric type must contain numerical variables and columns of factor class must contain categorical variables. Columns of other types will be ignored.

  • if explainer, methods feature_importance and partial_dependence must not return an error. See also arguments feature_importance and partial_dependence.

  • if matrix, it will be converted with as.data.frame.

...

other arguments.

cutoff

a number. Correlations below this are treated as no correlation. Edges corresponding to them will not be included in the graph.

values

a data.frame with information about size of the nodes, containing columns value and label (consistent with colnames of x). Default set to equal for all nodes, or (for explainer) importance of variables.

cor_functions

a named list of functions to pass to calculate_cors. Must contain necessary functions from num_num_f, num_cat_f or cat_cat_f. Must contain also max_cor

feature_importance

Either:

partial_dependence

a named list with 2 elements: numerical and categorical. Both of them should be either:

If only one kind of data was used, use a list with 1 object.

Value

A corrgrapher object. Essentially a list, consisting of following fields:

  • nodes - a data.frame to pass as argument nodes to visNetwork function

  • edges - a data.frame to pass as argument edges to visNetwork function

  • pds (if x was of explainer class) - a list with 2 elements: numerical and categorical. Each of them contains an object of aggregated_profiles_explainer used to create partial dependency plots.

  • data - data used to create the object.

Details

Data analysis (and creating ML models) involves many stages. For early exploration, it is useful to have a grip not only on individual series (AKA variables) available, but also on relations between them. Unfortunately, the task of understanding correlations between variables proves to be difficult. corrgrapher package aims to plot correlations between variables in form of a graph. Each node on it is associated with single variable. Variables correlated with each other (positively and negatively alike) shall be close, and weakly correlated - far from each other.

See also

Examples

# convert the category variable df <- as.data.frame(datasets::Seatbelts) df$law <- factor(df$law) cgr <- corrgrapher(df)