R/local_attributions.R
local_attributions.Rd
This function finds Variable attributions via Sequential Variable Conditioning.
The complexity of this function is O(2*p).
This function works in a similar way to step-up and step-down greedy approximations in function break_down
.
The main difference is that in the first step the order of variables is determined.
And in the second step the impact is calculated.
local_attributions(x, ...)
# S3 method for explainer
local_attributions(x, new_observation, keep_distributions = FALSE, ...)
# S3 method for default
local_attributions(
x,
data,
predict_function = predict,
new_observation,
label = class(x)[1],
keep_distributions = FALSE,
order = NULL,
...
)
an explainer created with function explain
or a model.
other parameters.
a new observation with columns that correspond to variables used in the model.
if TRUE
, then distribution of partial predictions is stored and can be plotted with the generic plot()
.
validation dataset, will be extracted from x
if it is an explainer.
predict function, will be extracted from x
if it is an explainer.
name of the model. By default it's extracted from the 'class' attribute of the model.
if not NULL
, then it will be a fixed order of variables. It can be a numeric vector or vector with names of variables.
an object of the break_down
class.
Explanatory Model Analysis. Explore, Explain and Examine Predictive Models. https://ema.drwhy.ai
library("DALEX")
library("iBreakDown")
set.seed(1313)
model_titanic_glm <- glm(survived ~ gender + age + fare,
data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
data = titanic_imputed,
y = titanic_imputed$survived,
label = "glm")
#> Preparation of a new explainer is initiated
#> -> model label : glm
#> -> data : 2207 rows 8 cols
#> -> target variable : 2207 values
#> -> predict function : yhat.glm will be used ( default )
#> -> predicted values : No value for predict function target column. ( default )
#> -> model_info : package stats , ver. 4.1.2 , task classification ( default )
#> -> predicted values : numerical, min = 0.1490412 , mean = 0.3221568 , max = 0.9878987
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -0.8898433 , mean = 4.198546e-13 , max = 0.8448637
#> A new explainer has been created!
bd_glm <- local_attributions(explain_titanic_glm, titanic_imputed[1, ])
bd_glm
#> contribution
#> glm: intercept 0.322
#> glm: gender = male -0.107
#> glm: fare = 7.11 -0.018
#> glm: age = 42 -0.014
#> glm: class = 3rd 0.000
#> glm: embarked = Southampton 0.000
#> glm: sibsp = 0 0.000
#> glm: parch = 0 0.000
#> glm: survived = 0 0.000
#> glm: prediction 0.183
plot(bd_glm, max_features = 3)
# \dontrun{
## Not run:
library("randomForest")
set.seed(1313)
# example with interaction
# classification for HR data
model <- randomForest(status ~ . , data = HR)
new_observation <- HR_test[1,]
explainer_rf <- explain(model,
data = HR[1:1000,1:5])
#> Preparation of a new explainer is initiated
#> -> model label : randomForest ( default )
#> -> data : 1000 rows 5 cols
#> -> target variable : not specified! ( WARNING )
#> -> predict function : yhat.randomForest will be used ( default )
#> -> predicted values : No value for predict function target column. ( default )
#> -> model_info : package randomForest , ver. 4.7.1 , task multiclass ( default )
#> -> model_info : Model info detected multiclass task but 'y' is a NULL . ( WARNING )
#> -> model_info : By deafult multiclass tasks supports only factor 'y' parameter.
#> -> model_info : Consider changing to a factor vector with true class names.
#> -> model_info : Otherwise I will not be able to calculate residuals or loss function.
#> -> predicted values : predict function returns multiple columns: 3 ( default )
#> -> residual function : difference between 1 and probability of true class ( default )
#> A new explainer has been created!
bd_rf <- local_attributions(explainer_rf,
new_observation)
bd_rf
#> contribution
#> randomForest.fired: intercept 0.386
#> randomForest.fired: hours = 42.32 0.231
#> randomForest.fired: evaluation = 2 0.062
#> randomForest.fired: salary = 2 -0.272
#> randomForest.fired: age = 57.73 0.092
#> randomForest.fired: gender = male 0.281
#> randomForest.fired: prediction 0.778
#> randomForest.ok: intercept 0.278
#> randomForest.ok: hours = 42.32 -0.053
#> randomForest.ok: evaluation = 2 0.091
#> randomForest.ok: salary = 2 0.271
#> randomForest.ok: age = 57.73 -0.086
#> randomForest.ok: gender = male -0.283
#> randomForest.ok: prediction 0.218
#> randomForest.promoted: intercept 0.336
#> randomForest.promoted: hours = 42.32 -0.178
#> randomForest.promoted: evaluation = 2 -0.152
#> randomForest.promoted: salary = 2 0.001
#> randomForest.promoted: age = 57.73 -0.006
#> randomForest.promoted: gender = male 0.002
#> randomForest.promoted: prediction 0.004
plot(bd_rf)
plot(bd_rf, baseline = 0)
# example for regression - apartment prices
# here we do not have interactions
model <- randomForest(m2.price ~ . , data = apartments)
explainer_rf <- explain(model,
data = apartments_test[1:1000,2:6],
y = apartments_test$m2.price[1:1000])
#> Preparation of a new explainer is initiated
#> -> model label : randomForest ( default )
#> -> data : 1000 rows 5 cols
#> -> target variable : 1000 values
#> -> predict function : yhat.randomForest will be used ( default )
#> -> predicted values : No value for predict function target column. ( default )
#> -> model_info : package randomForest , ver. 4.7.1 , task regression ( default )
#> -> predicted values : numerical, min = 2043.066 , mean = 3487.722 , max = 5773.976
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -630.6766 , mean = 1.057813 , max = 1256.239
#> A new explainer has been created!
bd_rf <- local_attributions(explainer_rf,
apartments_test[1,])
bd_rf
#> contribution
#> randomForest: intercept 3487.722
#> randomForest: district = Srodmiescie 1034.737
#> randomForest: surface = 131 -315.991
#> randomForest: no.rooms = 5 -163.113
#> randomForest: floor = 3 150.529
#> randomForest: construction.year = 1976 -24.021
#> randomForest: prediction 4169.863
plot(bd_rf, digits = 1)
bd_rf <- local_attributions(explainer_rf,
apartments_test[1,],
keep_distributions = TRUE)
plot(bd_rf, plot_distributions = TRUE)
#> Warning: `fun.y` is deprecated. Use `fun` instead.
# }