This function finds Variable attributions via Sequential Variable Conditioning. The complexity of this function is O(2*p). This function works in a similar way to step-up and step-down greedy approximations in function break_down. The main difference is that in the first step the order of variables is determined. And in the second step the impact is calculated.

local_attributions(x, ...)

# S3 method for explainer
local_attributions(x, new_observation,
  keep_distributions = FALSE, ...)

# S3 method for default
local_attributions(x, data, predict_function = predict,
  new_observation, label = class(x)[1], keep_distributions = FALSE,
  order = NULL, ...)

Arguments

x

an explainer created with function explain or a model.

...

other parameters.

new_observation

a new observation with columns that correspond to variables used in the model.

keep_distributions

if `TRUE`, then distribution of partial predictions is stored and can be plotted with the generic `plot()`.

data

validation dataset, will be extracted from `x` if it is an explainer.

predict_function

predict function, will be extracted from `x` if it is an explainer.

label

name of the model. By default it's extracted from the 'class' attribute of the model.

order

if not `NULL`, then it will be a fixed order of variables. It can be a numeric vector or vector with names of variables.

Value

an object of the `break_down` class.

References

Predictive Models: Visual Exploration, Explanation and Debugging https://pbiecek.github.io/PM_VEE

See also

Examples

library("DALEX") library("iBreakDown") # Toy examples, because CRAN angels ask for them titanic <- na.omit(titanic) set.seed(1313) titanic_small <- titanic[sample(1:nrow(titanic), 500), c(1,2,6,9)] model_titanic_glm <- glm(survived == "yes" ~ gender + age + fare, data = titanic, family = "binomial") explain_titanic_glm <- explain(model_titanic_glm, data = titanic_small[,-9], y = titanic_small$survived == "yes")
#> Preparation of a new explainer is initiated #> -> model label : lm (default) #> -> data : 500 rows 4 cols #> -> target variable : 500 values #> -> predict function : yhat.glm will be used (default) #> -> predicted values : numerical, min = 0.1483104 , mean = 0.315637 , max = 0.9822194 #> -> residual function : difference between y and yhat (default) #> -> residuals : numerical, min = -0.7976411 , mean = -0.01763704 , max = 0.8334961 #> A new explainer has been created!
bd_rf <- local_attributions(explain_titanic_glm, titanic_small[1, ]) bd_rf
#> contribution #> lm: intercept 0.316 #> lm: gender = male -0.099 #> lm: age = 50 -0.026 #> lm: fare = 13 -0.009 #> lm: survived = no 0.000 #> lm: prediction 0.182
plot(bd_rf, max_features = 3)
# \donttest{ ## Not run: library("randomForest") set.seed(1313) # example with interaction # classification for HR data model <- randomForest(status ~ . , data = HR) new_observation <- HR_test[1,] explainer_rf <- explain(model, data = HR[1:1000,1:5], y = HR$status[1:1000])
#> Preparation of a new explainer is initiated #> -> model label : randomForest (default) #> -> data : 1000 rows 5 cols #> -> target variable : 1000 values #> -> target variable : Please note that 'y' is a factor. (WARNING) #> -> target variable : Consider changing the 'y' to a logical or numerical vector. #> -> target variable : Otherwise I will not be able to calculate residuals or loss function. #> -> predict function : yhat.randomForest will be used (default)
#> Warning: the condition has length > 1 and only the first element will be used
#> -> predicted values : numerical, min = 0 , mean = 0.3333333 , max = 1 #> -> residual function : difference between y and yhat (default)
#> Warning: ‘-’ not meaningful for factors
#> -> residuals : numerical, min = NA , mean = NA , max = NA #> A new explainer has been created!
bd_rf <- local_attributions(explainer_rf, new_observation) bd_rf
#> contribution #> randomForest.fired: intercept 0.386 #> randomForest.fired: hours = 42 0.231 #> randomForest.fired: evaluation = 2 0.062 #> randomForest.fired: salary = 2 -0.272 #> randomForest.fired: age = 58 0.092 #> randomForest.fired: gender = male 0.281 #> randomForest.fired: prediction 0.778 #> randomForest.ok: intercept 0.278 #> randomForest.ok: hours = 42 -0.053 #> randomForest.ok: evaluation = 2 0.091 #> randomForest.ok: salary = 2 0.271 #> randomForest.ok: age = 58 -0.086 #> randomForest.ok: gender = male -0.283 #> randomForest.ok: prediction 0.218 #> randomForest.promoted: intercept 0.336 #> randomForest.promoted: hours = 42 -0.178 #> randomForest.promoted: evaluation = 2 -0.152 #> randomForest.promoted: salary = 2 0.001 #> randomForest.promoted: age = 58 -0.006 #> randomForest.promoted: gender = male 0.002 #> randomForest.promoted: prediction 0.004
plot(bd_rf)
plot(bd_rf, baseline = 0)
# example for regression - apartment prices # here we do not have interactions model <- randomForest(m2.price ~ . , data = apartments) explainer_rf <- explain(model, data = apartments_test[1:1000,2:6], y = apartments_test$m2.price[1:1000])
#> Preparation of a new explainer is initiated #> -> model label : randomForest (default) #> -> data : 1000 rows 5 cols #> -> target variable : 1000 values #> -> predict function : yhat.randomForest will be used (default) #> -> predicted values : numerical, min = 2043.066 , mean = 3487.722 , max = 5773.976 #> -> residual function : difference between y and yhat (default) #> -> residuals : numerical, min = -630.6766 , mean = 1.057813 , max = 1256.239 #> A new explainer has been created!
bd_rf <- local_attributions(explainer_rf, apartments_test[1,]) bd_rf
#> contribution #> randomForest: intercept 3487.722 #> randomForest: district = Srodmiescie 1034.737 #> randomForest: surface = 130 -315.991 #> randomForest: no.rooms = 5 -163.113 #> randomForest: floor = 3 150.529 #> randomForest: construction.year = 2000 -24.021 #> randomForest: prediction 4169.863
plot(bd_rf, digits = 1)
bd_rf <- local_attributions(explainer_rf, apartments_test[1,], keep_distributions = TRUE) plot(bd_rf, plot_distributions = TRUE)
# }