This package helps to calculate instance level variable importance (local sensitivity). The importance measure is based on Ceteris Paribus profiles and can be calculated in eight variants. Select the variant that suits your needs by setting parameters:
vivo is a part of DrWhy collection of tools for Visual Exploration, Explanation and Debugging of Predictive Models.
Ceteris Paribus is a latin phrase meaning „other things held constant” or „all else unchanged”. Ceteris Paribus Plots show how the model response depends on changes in a single input variable, keeping all other variables unchanged. They work for any Machine Learning model and allow for model comparisons to better understand how a black model works.
The measure is based on Ceteris Paribus profiles oscillations. In particular, the larger influence of an explanatory variable on prediction at a particular instance, the larger the deviation along the corresponding Ceteris Paribus profile. For a variable that exercises little or no influence on model prediction, the profile will be flat or will barely change.
Let consider an example
We define a random forest regression model.
## Preparation of a new explainer is initiated ## -> model label : randomForest ( [33m default [39m ) ## -> data : 9000 rows 4 cols ## -> target variable : 9000 values ## -> model_info : package randomForest , ver. 4.6.14 , task regression ( [33m default [39m ) ## -> predict function : yhat.randomForest will be used ( [33m default [39m ) ## -> predicted values : numerical, min = 2085.883 , mean = 3514.857 , max = 5329.799 ## -> residual function : difference between y and yhat ( [33m default [39m ) ## -> residuals : numerical, min = -1244.621 , mean = -3.333807 , max = 2156.984 ## [32m A new explainer has been created! [39m
Now, we calculate Ceteris Paribus profiles for new observation.
The value of the colored area is our measure. The larger the area, the more important is the variable.
We calculated measure with
density parameters equal to true. This means that the deviation is calculated as a distance from observation, not from the average. Measure is weighted based on the density of variable and we use absolute deviation.
For the new observation the most important variable is surface, then floor, construction.year and no.rooms.
The package was created as a part of master’s diploma thesis at Warsaw University of Technology at Faculty of Mathematics and Information Science by Anna Kozak.