Intro

yardstick is a package that offers many measures for evaluating model performance. It is based on the tidymodels/tidyverse philosophy, the performance is calculated by functions working on the data.frame with the results of the model.

DALEX uses model performance measures to assess the importance of variables (in the model_parts function). These are typically calculated based on loss functions (functions with prefix loss) that are working on two vectors - the score from the model and the true target variable.

Although these packages have a slightly different philosophy of operation, you can use the measures available in yardstick when working with DALEX. Below is information on how to use the loss_yardstick function to do this.

Prepare a classification model

The yardstick package supports both classification models and regression models. We will start our example with a classification model for the titanic data - the probability of surviving this disaster.

The following instruction trains a classification model.

library("DALEX")
library("yardstick")
titanic_glm <- glm(survived~., data = titanic_imputed, family = "binomial")

Class Probability Metrics

The Class Probability Metrics in the yardstick package assume that the true value is a factor and the model returns a numerical score. So let’s prepare an explainer that has factor as y and the predict_function returns the probability of the target class (default behaviour).

NOTE: Performance measures will be calculated on data supplied in the explainer. Put here the test data!

explainer_glm <- DALEX::explain(titanic_glm,
                        data = titanic_imputed[,-8],
                        y = factor(titanic_imputed$survived))

To make functions from the yardstick compatible with DALEX we must use the loss_yardstick adapter. In the example below we use the roc_auc function (area under the receiver operator curve). The yardstick:: prefix is not necessary, but we put it here to show explicitly where the functions you use are located.

NOTE: we set yardstick.event_first = FALSE as the model predicts probability of survived = 1.

options(yardstick.event_first = FALSE)

glm_auc <- model_parts(explainer_glm, type = "raw",
                  loss_function = loss_yardstick(yardstick::roc_auc))
glm_auc
#>       variable mean_dropout_loss label
#> 1 _full_model_         0.8137302    lm
#> 2       gender         0.6174230    lm
#> 3        class         0.7166700    lm
#> 4          age         0.7863548    lm
#> 5        sibsp         0.8033765    lm
#> 6     embarked         0.8074437    lm
#> 7         fare         0.8133961    lm
#> 8        parch         0.8137905    lm
#> 9   _baseline_         0.4871857    lm
plot(glm_auc)

In a similar way, we can use the pr_auc function (area under the precision recall curve).

glm_prauc <- model_parts(explainer_glm, type = "raw",
                  loss_function = loss_yardstick(yardstick::pr_auc))
glm_prauc
#>       variable mean_dropout_loss label
#> 1 _full_model_         0.7415507    lm
#> 2       gender         0.4459565    lm
#> 3        class         0.5865294    lm
#> 4          age         0.7162901    lm
#> 5     embarked         0.7273457    lm
#> 6        sibsp         0.7277256    lm
#> 7        parch         0.7416492    lm
#> 8         fare         0.7418198    lm
#> 9   _baseline_         0.3273634    lm
plot(glm_prauc)

Classification Metrics

The Classification Metrics in the yardstick package assume that the true value is a factor and the model returns a factor variable.

This is different behavior than for most explanations in DALEX, because when explaining predictions we typically operate on class membership probabilities. If we want to use Classification Metrics we need to provide a predict function that returns classes instead of probabilities.

So let’s prepare an explainer that has factor as y and the predict_function returns classes.

explainer_glm <- DALEX::explain(titanic_glm,
                        data = titanic_imputed[,-8],
                        y = factor(titanic_imputed$survived),
                        predict_function = function(m,x) {
                              factor(as.numeric(predict(m, x, type = "response") > 0.5), 
                                     levels = c("0", "1"))
                        })

Again, let’s use the loss_yardstick adapter. In the example below we use the accuracy function.

glm_accuracy <- model_parts(explainer_glm, type = "raw",
                    loss_function = loss_yardstick(yardstick::accuracy))
glm_accuracy
#>       variable mean_dropout_loss label
#> 1 _full_model_            0.7977    lm
#> 2       gender            0.6334    lm
#> 3        class            0.7399    lm
#> 4          age            0.7875    lm
#> 5        sibsp            0.7923    lm
#> 6     embarked            0.7960    lm
#> 7        parch            0.7976    lm
#> 8         fare            0.7986    lm
#> 9   _baseline_            0.5867    lm
plot(glm_accuracy)

In a similar way, we can use the bal_accuracy function (balanced accuracy).

glm_bal_accuracy <- model_parts(explainer_glm, type = "raw",
                    loss_function = loss_yardstick(yardstick::bal_accuracy))
glm_bal_accuracy
#>       variable mean_dropout_loss label
#> 1 _full_model_         0.7397031    lm
#> 2       gender         0.5619733    lm
#> 3        class         0.6788103    lm
#> 4          age         0.7268739    lm
#> 5     embarked         0.7346292    lm
#> 6        sibsp         0.7377564    lm
#> 7        parch         0.7393319    lm
#> 8         fare         0.7396152    lm
#> 9   _baseline_         0.5051107    lm
plot(glm_bal_accuracy)

The lower the better?

For the loss function, the smaller the values the better the model. Therefore, the importance of variables is often calculated as loss(perturbed) - loss(original).

But many model performance functions have the opposite characteristic, the higher they are the better (e.g. AUC, accuracy, etc). To maintain a consistent analysis pipeline it is convenient to invert such functions, e.g. by converting to 1- AUC or 1 - accuracy.

To do it, just add the reverse = TRUE argument.

glm_1accuracy <- model_parts(explainer_glm, 
                    loss_function = loss_yardstick(accuracy, reverse = TRUE))
glm_1accuracy
#>       variable mean_dropout_loss label
#> 1 _full_model_            0.1980    lm
#> 2         fare            0.1974    lm
#> 3        parch            0.1984    lm
#> 4     embarked            0.1986    lm
#> 5        sibsp            0.2035    lm
#> 6          age            0.2152    lm
#> 7        class            0.2525    lm
#> 8       gender            0.3658    lm
#> 9   _baseline_            0.4144    lm
plot(glm_1accuracy)

Calculate performance on whole dataset

By default the performance is calculated on N = 1000 randomly selected observations (to speed up the calculations). Set N = NULL to use the whole dataset.

glm_1accuracy <- model_parts(explainer_glm, 
                    loss_function = loss_yardstick(accuracy, reverse = TRUE),
                    N = NULL)
plot(glm_1accuracy)

Prepare a regression model

The following instruction trains a regression model.

library("ranger")
apartments_ranger <- ranger(m2.price~., data = apartments, num.trees = 50)

Regression Metrics

The Regression Metrics in the yardstick package assume that the true value is a numeric variable and the model returns a numeric score.

explainer_ranger  <- DALEX::explain(apartments_ranger, data = apartments[,-1],
                             y = apartments$m2.price, label = "Ranger Apartments")
#> Preparation of a new explainer is initiated
#>   -> model label       :  Ranger Apartments 
#>   -> data              :  1000  rows  5  cols 
#>   -> target variable   :  1000  values 
#>   -> predict function  :  yhat.ranger  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package ranger , ver. 0.14.1 , task regression (  default  ) 
#>   -> predicted values  :  numerical, min =  1899.414 , mean =  3488.861 , max =  6046.561  
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -445.4107 , mean =  -1.842099 , max =  736.702  
#>   A new explainer has been created!

To make functions from the yardstick compatible with DALEX we must use the loss_yardstick adapter. In the example below we use the rmse function (root mean squared error).

ranger_rmse <- model_parts(explainer_ranger, type = "raw",
                      loss_function = loss_yardstick(rmse))
ranger_rmse
#>            variable mean_dropout_loss             label
#> 1      _full_model_          155.6184 Ranger Apartments
#> 2          no.rooms          316.5036 Ranger Apartments
#> 3 construction.year          392.5758 Ranger Apartments
#> 4             floor          431.6225 Ranger Apartments
#> 5           surface          524.9066 Ranger Apartments
#> 6          district          763.5492 Ranger Apartments
#> 7        _baseline_         1206.6411 Ranger Apartments
plot(ranger_rmse)

And one more example for rsq function (R squared).

ranger_rsq <- model_parts(explainer_ranger, type = "raw",
                      loss_function = loss_yardstick(rsq))
ranger_rsq
#>            variable mean_dropout_loss             label
#> 1      _full_model_       0.983377709 Ranger Apartments
#> 2          district       0.326764600 Ranger Apartments
#> 3           surface       0.667962198 Ranger Apartments
#> 4             floor       0.775455846 Ranger Apartments
#> 5 construction.year       0.816108921 Ranger Apartments
#> 6          no.rooms       0.920990517 Ranger Apartments
#> 7        _baseline_       0.001389768 Ranger Apartments
plot(ranger_rsq)

Summary

I hope that using the yardstick package at DALEX will now be easy and enjoyable. If you would like to share your experience with this package, please create an issue at https://github.com/ModelOriented/DALEX/issues.

Session info

#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ranger_0.14.1   yardstick_1.0.0 DALEX_2.4.2    
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.2  xfun_0.32         bslib_0.4.0       purrr_0.3.4      
#>  [5] lattice_0.20-45   colorspace_2.0-3  vctrs_0.4.1       generics_0.1.3   
#>  [9] htmltools_0.5.3   yaml_2.3.5        utf8_1.2.2        rlang_1.0.5      
#> [13] pkgdown_2.0.6     jquerylib_0.1.4   pillar_1.8.1      glue_1.6.2       
#> [17] lifecycle_1.0.1   stringr_1.4.1     munsell_0.5.0     gtable_0.3.1     
#> [21] ragg_1.2.2        memoise_2.0.1     evaluate_0.16     labeling_0.4.2   
#> [25] knitr_1.40        fastmap_1.1.0     fansi_1.0.3       highr_0.9        
#> [29] Rcpp_1.0.9        scales_1.2.1      cachem_1.0.6      desc_1.4.1       
#> [33] jsonlite_1.8.0    ingredients_2.2.0 farver_2.1.1      systemfonts_1.0.4
#> [37] fs_1.5.2          textshaping_0.3.6 ggplot2_3.3.6     digest_0.6.29    
#> [41] stringi_1.7.8     dplyr_1.0.10      grid_4.2.1        rprojroot_2.0.3  
#> [45] cli_3.3.0         tools_4.2.1       magrittr_2.0.3    sass_0.4.2       
#> [49] tibble_3.1.8      crayon_1.5.1      pkgconfig_2.0.3   Matrix_1.4-1     
#> [53] rmarkdown_2.16    R6_2.5.1          compiler_4.2.1