LIME-like explanations based on Ceteris Paribus curves — individual_surrogate

This function fits a LIME-type explanation of a single prediction. Interpretable binary features that describe the local impact of features on the prediction are created based on Ceteris Paribus Profiles. Thend, a new dataset of similar observations is created and black box model predictions (scores in case of classification) are calculated for this dataset and LASSO regression model is fitted to them. This way, explanations are simplified and include only the most important features. More details about the methodology can be found in the vignettes.

individual_surrogate_model(
  x,
  new_observation,
  size,
  seed = NULL,
  kernel = identity_kernel,
  sampling = "uniform",
  ...
)

Arguments

x	an explainer created with the function DALEX::explain().
new_observation	an observation to be explained. Columns in should correspond to columns in the data argument to x.
size	number of similar observation to be sampled.
seed	If not NULL, seed will be set to this value for reproducibility.
kernel	Kernel function which will be used to weight simulated observations.
sampling	Parameter that controls sampling while creating new observations.
...	Additional arguments that will be passed to ingredients::ceteris_paribus.

Value

data.frame of class local_surrogate_explainer

Examples

# Example based on apartments data from DALEX package.
library(DALEX)
library(randomForest)
library(localModel)
data('apartments')
mrf <- randomForest(m2.price ~., data = apartments, ntree = 50)
explainer <- explain(model = mrf,
                     data = apartments[, -1])
#> Preparation of a new explainer is initiated
#>   -> model label       :  randomForest  (  default  )
#>   -> data              :  1000  rows  5  cols 
#>   -> target variable   :  not specified! (  WARNING  )
#>   -> predict function  :  yhat.randomForest  will be used (  default  )
#>   -> predicted values  :  numerical, min =  1917.419 , mean =  3486.937 , max =  5842.92  
#>   -> model_info        :  package randomForest , ver. 4.6.14 , task regression (  default  ) 
#>   -> residual function :  difference between y and yhat (  default  )
#>   A new explainer has been created!  
model_lok <- individual_surrogate_model(explainer, apartments[5, -1],
                                        size = 500, seed = 17)
model_lok
#>   estimated                                          variable original_variable
#> 1 3486.9367                                      (Model mean)                  
#> 2 3245.5658                                       (Intercept)                  
#> 3 -230.1591                                     surface > 125           surface
#> 4  516.2736 district = Mokotow, Ochota, Srodmiescie, Zoliborz          district
#>   dev_ratio response predicted_value        model
#> 1 0.5241099                 3145.381 randomForest
#> 2 0.5241099                 3145.381 randomForest
#> 3 0.5241099                 3145.381 randomForest
#> 4 0.5241099                 3145.381 randomForest
plot(model_lok)