Black-box models may have very different structures. This function creates a unified representation of a model, which can be further processed by various explainers.

explain.default(model, data = NULL, y = NULL,
  predict_function = NULL, residual_function = NULL, weights = NULL,
  ..., label = NULL, verbose = TRUE, precalculate = TRUE,
  colorize = TRUE, model_info = NULL)

explain(model, data = NULL, y = NULL, predict_function = NULL,
  residual_function = NULL, weights = NULL, ..., label = NULL,
  verbose = TRUE, precalculate = TRUE, colorize = TRUE,
  model_info = NULL)

Arguments

model

object - a model to be explained

data

data.frame or matrix - data that was used for fitting. If not provided then will be extracted from the model. Data should be passed without target column (this shall be provided as the y argument). NOTE: If target variable is present in the data, some of the functionalities my not work properly.

y

numeric vector with outputs / scores. If provided then it shall have the same size as data

predict_function

function that takes two arguments: model and new data and returns numeric vector with predictions

residual_function

function that takes three arguments: model, data and response vector y. It should return a numeric vector with model residuals for given data. If not provided, response residuals (\(y-\hat{y}\)) are calculated.

weights

numeric vector with sampling weights. By default it's NULL. If provided then it shall have the same length as data

...

other parameters

label

character - the name of the model. By default it's extracted from the 'class' attribute of the model

verbose

if TRUE (default) then diagnostic messages will be printed

precalculate

if TRUE (default) then predicted_values and residual are calculated when explainer is created. This will happen also if verbose is TRUE. Set both verbose and precalculate to FALSE to omit calculations.

colorize

if TRUE (default) then WARNINGS, ERRORS and NOTES are colorized. Will work only in the R console.

model_info

a named list (package, version, type) containg information about model. If NULL, DALEX will seek for information on it's own.

Value

An object of the class explainer.

It's a list with following fields:

  • model the explained model.

  • data the dataset used for training.

  • y response for observations from data.

  • weights sample weights for data. NULL if weights are not specified.

  • y_hat calculated predictions.

  • residuals calculated residuals.

  • predict_function function that may be used for model predictions, shall return a single numerical value for each observation.

  • residual_function function that returns residuals, shall return a single numerical value for each observation.

  • class class/classes of a model.

  • label label of explainer.

  • model_info named list contating basic information about model, like package, version of package and type.

Details

Please NOTE, that the model is the only required argument. But some explainers may require that other arguments will be provided too.

Examples

# simple explainer for regression problem aps_lm_model4 <- lm(m2.price ~., data = apartments) aps_lm_explainer4 <- explain(aps_lm_model4, data = apartments, label = "model_4v")
#> Preparation of a new explainer is initiated #> -> model label : model_4v #> -> data : 1000 rows 6 cols #> -> target variable : not specified! ( WARNING ) #> -> predict function : yhat.lm will be used ( default ) #> -> predicted values : numerical, min = 1781.848 , mean = 3487.019 , max = 6176.032 #> -> residual function : difference between y and yhat ( default ) #> -> model_info : package stats , ver. 3.6.1 , task regression ( default ) #> A new explainer has been created!
aps_lm_explainer4
#> Model label: model_4v #> Model class: lm #> Data head : #> m2.price construction.year surface floor no.rooms district #> 1 5897 1953 25 3 1 Srodmiescie #> 2 1818 1992 143 9 5 Bielany
# various parameters for the explain function # all defaults aps_lm <- explain(aps_lm_model4)
#> Preparation of a new explainer is initiated #> -> model label : lm ( default ) #> -> data : 1000 rows 6 cols (extracted from the model) #> -> target variable : not specified! ( WARNING ) #> -> predict function : yhat.lm will be used ( default ) #> -> predicted values : numerical, min = 1781.848 , mean = 3487.019 , max = 6176.032 #> -> residual function : difference between y and yhat ( default ) #> -> model_info : package stats , ver. 3.6.1 , task regression ( default ) #> A new explainer has been created!
# silent execution aps_lm <- explain(aps_lm_model4, verbose = FALSE) # user provided predict_function aps_lm <- explain(aps_lm_model4, data = apartments, label = "model_4v", predict_function = predict)
#> Preparation of a new explainer is initiated #> -> model label : model_4v #> -> data : 1000 rows 6 cols #> -> target variable : not specified! ( WARNING ) #> -> predict function : predict #> -> predicted values : numerical, min = 1781.848 , mean = 3487.019 , max = 6176.032 #> -> residual function : difference between y and yhat ( default ) #> -> model_info : package stats , ver. 3.6.1 , task regression ( default ) #> A new explainer has been created!
# set target variable aps_lm <- explain(aps_lm_model4, data = apartments, label = "model_4v", y = apartments$m2.price)
#> Preparation of a new explainer is initiated #> -> model label : model_4v #> -> data : 1000 rows 6 cols #> -> target variable : 1000 values #> -> data : A column identical to the target variable `y` has been found in the `data`. ( WARNING ) #> -> data : It is highly recommended to pass `data` without the target variable column #> -> predict function : yhat.lm will be used ( default ) #> -> predicted values : numerical, min = 1781.848 , mean = 3487.019 , max = 6176.032 #> -> residual function : difference between y and yhat ( default ) #> -> residuals : numerical, min = -247.4728 , mean = 2.093656e-14 , max = 469.0023 #> -> model_info : package stats , ver. 3.6.1 , task regression ( default ) #> A new explainer has been created!
aps_lm <- explain(aps_lm_model4, data = apartments, label = "model_4v", y = apartments$m2.price, predict_function = predict)
#> Preparation of a new explainer is initiated #> -> model label : model_4v #> -> data : 1000 rows 6 cols #> -> target variable : 1000 values #> -> data : A column identical to the target variable `y` has been found in the `data`. ( WARNING ) #> -> data : It is highly recommended to pass `data` without the target variable column #> -> predict function : predict #> -> predicted values : numerical, min = 1781.848 , mean = 3487.019 , max = 6176.032 #> -> residual function : difference between y and yhat ( default ) #> -> residuals : numerical, min = -247.4728 , mean = 2.093656e-14 , max = 469.0023 #> -> model_info : package stats , ver. 3.6.1 , task regression ( default ) #> A new explainer has been created!
# set model_info model_info <- list(package = "stats", ver = "3.6.1", type = "regression") aps_lm_model4 <- lm(m2.price ~., data = apartments) aps_lm_explainer4 <- explain(aps_lm_model4, data = apartments, label = "model_4v", model_info = model_info)
#> Preparation of a new explainer is initiated #> -> model label : model_4v #> -> data : 1000 rows 6 cols #> -> target variable : not specified! ( WARNING ) #> -> predict function : yhat.lm will be used ( default ) #> -> predicted values : numerical, min = 1781.848 , mean = 3487.019 , max = 6176.032 #> -> residual function : difference between y and yhat ( default ) #> -> model_info : package stats , ver. 3.6.1 , task regression #> A new explainer has been created!
# \dontrun{ # set model_info model_info <- list(package = "stats", ver = "3.6.1", type = "regression") aps_lm_model4 <- lm(m2.price ~., data = apartments) aps_lm_explainer4 <- explain(aps_lm_model4, data = apartments, label = "model_4v", model_info = model_info)
#> Preparation of a new explainer is initiated #> -> model label : model_4v #> -> data : 1000 rows 6 cols #> -> target variable : not specified! ( WARNING ) #> -> predict function : yhat.lm will be used ( default ) #> -> predicted values : numerical, min = 1781.848 , mean = 3487.019 , max = 6176.032 #> -> residual function : difference between y and yhat ( default ) #> -> model_info : package stats , ver. 3.6.1 , task regression #> A new explainer has been created!
aps_lm_explainer4 <- explain(aps_lm_model4, data = apartments, label = "model_4v", weights = as.numeric(apartments$construction.year > 2000))
#> Preparation of a new explainer is initiated #> -> model label : model_4v #> -> data : 1000 rows 6 cols #> -> target variable : not specified! ( WARNING ) #> -> sampling weights : 1000 values ( note that not all explanations handle weights ) #> -> predict function : yhat.lm will be used ( default ) #> -> predicted values : numerical, min = 1781.848 , mean = 3487.019 , max = 6176.032 #> -> residual function : difference between y and yhat ( default ) #> -> model_info : package stats , ver. 3.6.1 , task regression ( default ) #> A new explainer has been created!
# more complex model library("randomForest")
#> randomForest 4.6-14
#> Type rfNews() to see new features/changes/bug fixes.
aps_rf_model4 <- randomForest(m2.price ~., data = apartments) aps_rf_explainer4 <- explain(aps_rf_model4, data = apartments, label = "model_rf")
#> Preparation of a new explainer is initiated #> -> model label : model_rf #> -> data : 1000 rows 6 cols #> -> target variable : not specified! ( WARNING ) #> -> predict function : yhat.randomForest will be used ( default ) #> -> predicted values : numerical, min = 1969.945 , mean = 3487.686 , max = 5796.451 #> -> residual function : difference between y and yhat ( default ) #> -> model_info : package randomForest , ver. 4.6.14 , task regression ( default ) #> A new explainer has been created!
aps_rf_explainer4
#> Model label: model_rf #> Model class: randomForest.formula,randomForest #> Data head : #> m2.price construction.year surface floor no.rooms district #> 1 5897 1953 25 3 1 Srodmiescie #> 2 1818 1992 143 9 5 Bielany
# }