Intro

yardstick is a package that offers many measures for evaluating model performance. It is based on the tidymodels/tidyverse philosophy, the performance is calculated by functions working on the data.frame with the results of the model.

DALEX uses model performance measures to assess the importance of variables (in the model_parts function). These are typically calculated based on loss functions (functions with prefix loss) that are working on two vectors - the score from the model and the true target variable.

Although these packages have a slightly different philosophy of operation, you can use the measures available in yardstick when working with DALEX. Below is information on how to use the loss_yardstick function to do this.

Prepare a classification model

The yardstick package supports both classification models and regression models. We will start our example with a classification model for the titanic data - the probability of surviving this disaster.

The following instruction trains a classification model.

library("DALEX")
library("yardstick")
titanic_glm <- glm(survived~., data = titanic_imputed, family = "binomial")

Class Probability Metrics

The Class Probability Metrics in the yardstick package assume that the true value is a factor and the model returns a numerical score. So let’s prepare an explainer that has factor as y and the predict_function returns the probability of the target class (default behaviour).

NOTE: Performance measures will be calculated on data supplied in the explainer. Put here the test data!

explainer_glm <- DALEX::explain(titanic_glm,
                        data = titanic_imputed[,-8],
                        y = factor(titanic_imputed$survived))

To make functions from the yardstick compatible with DALEX we must use the loss_yardstick adapter. In the example below we use the roc_auc function (area under the receiver operator curve). The yardstick:: prefix is not necessary, but we put it here to show explicitly where the functions you use are located.

NOTE: we set yardstick.event_first = FALSE as the model predicts probability of survived = 1.

options(yardstick.event_first = FALSE)

glm_auc <- model_parts(explainer_glm, type = "raw",
                  loss_function = loss_yardstick(yardstick::roc_auc))
glm_auc
#>       variable mean_dropout_loss label
#> 1 _full_model_         0.8176487    lm
#> 2       gender         0.6300156    lm
#> 3        class         0.7233351    lm
#> 4          age         0.7887847    lm
#> 5        sibsp         0.8087978    lm
#> 6     embarked         0.8104637    lm
#> 7         fare         0.8174282    lm
#> 8        parch         0.8177252    lm
#> 9   _baseline_         0.5004211    lm
plot(glm_auc)

In a similar way, we can use the pr_auc function (area under the precision recall curve).

glm_prauc <- model_parts(explainer_glm, type = "raw",
                  loss_function = loss_yardstick(yardstick::pr_auc))
glm_prauc
#>       variable mean_dropout_loss label
#> 1 _full_model_         0.7343162    lm
#> 2       gender         0.4267867    lm
#> 3        class         0.5912337    lm
#> 4          age         0.7119210    lm
#> 5        sibsp         0.7202304    lm
#> 6     embarked         0.7232801    lm
#> 7        parch         0.7340958    lm
#> 8         fare         0.7342991    lm
#> 9   _baseline_         0.3216100    lm
plot(glm_prauc)

Classification Metrics

The Classification Metrics in the yardstick package assume that the true value is a factor and the model returns a factor variable.

This is different behavior than for most explanations in DALEX, because when explaining predictions we typically operate on class membership probabilities. If we want to use Classification Metrics we need to provide a predict function that returns classes instead of probabilities.

So let’s prepare an explainer that has factor as y and the predict_function returns classes.

explainer_glm <- DALEX::explain(titanic_glm,
                        data = titanic_imputed[,-8],
                        y = factor(titanic_imputed$survived),
                        predict_function = function(m,x) {
                              factor(as.numeric(predict(m, x, type = "response") > 0.5), 
                                     levels = c("0", "1"))
                        })

Again, let’s use the loss_yardstick adapter. In the example below we use the accuracy function.

glm_accuracy <- model_parts(explainer_glm, type = "raw",
                    loss_function = loss_yardstick(yardstick::accuracy))
glm_accuracy
#>       variable mean_dropout_loss label
#> 1 _full_model_            0.7995    lm
#> 2       gender            0.6425    lm
#> 3        class            0.7338    lm
#> 4          age            0.7826    lm
#> 5        sibsp            0.7932    lm
#> 6     embarked            0.7974    lm
#> 7        parch            0.7989    lm
#> 8         fare            0.8000    lm
#> 9   _baseline_            0.5867    lm
plot(glm_accuracy)

In a similar way, we can use the bal_accuracy function (balanced accuracy).

glm_bal_accuracy <- model_parts(explainer_glm, type = "raw",
                    loss_function = loss_yardstick(yardstick::bal_accuracy))
glm_bal_accuracy
#>       variable mean_dropout_loss label
#> 1 _full_model_         0.7428542    lm
#> 2       gender         0.5688662    lm
#> 3        class         0.6798010    lm
#> 4          age         0.7276367    lm
#> 5        sibsp         0.7376837    lm
#> 6     embarked         0.7391225    lm
#> 7         fare         0.7420058    lm
#> 8        parch         0.7427821    lm
#> 9   _baseline_         0.5035898    lm
plot(glm_bal_accuracy)

The lower the better?

For the loss function, the smaller the values the better the model. Therefore, the importance of variables is often calculated as loss(perturbed) - loss(original).

But many model performance functions have the opposite characteristic, the higher they are the better (e.g. AUC, accuracy, etc). To maintain a consistent analysis pipeline it is convenient to invert such functions, e.g. by converting to 1- AUC or 1 - accuracy.

To do it, just add the reverse = TRUE argument.

glm_1accuracy <- model_parts(explainer_glm, 
                    loss_function = loss_yardstick(accuracy, reverse = TRUE))
glm_1accuracy
#>       variable mean_dropout_loss label
#> 1 _full_model_            0.1965    lm
#> 2         fare            0.1954    lm
#> 3        parch            0.1969    lm
#> 4     embarked            0.1979    lm
#> 5        sibsp            0.2024    lm
#> 6          age            0.2143    lm
#> 7        class            0.2518    lm
#> 8       gender            0.3579    lm
#> 9   _baseline_            0.4081    lm
plot(glm_1accuracy)

Calculate performance on whole dataset

By default the performance is calculated on N = 1000 randomly selected observations (to speed up the calculations). Set N = NULL to use the whole dataset.

glm_1accuracy <- model_parts(explainer_glm, 
                    loss_function = loss_yardstick(accuracy, reverse = TRUE),
                    N = NULL)
plot(glm_1accuracy)

Prepare a regression model

The following instruction trains a regression model.

library("ranger")
apartments_ranger <- ranger(m2.price~., data = apartments, num.trees = 50)

Regression Metrics

The Regression Metrics in the yardstick package assume that the true value is a numeric variable and the model returns a numeric score.

explainer_ranger  <- DALEX::explain(apartments_ranger, data = apartments[,-1],
                             y = apartments$m2.price, label = "Ranger Apartments")
#> Preparation of a new explainer is initiated
#>   -> model label       :  Ranger Apartments 
#>   -> data              :  1000  rows  5  cols 
#>   -> target variable   :  1000  values 
#>   -> predict function  :  yhat.ranger  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package ranger , ver. 0.14.1 , task regression (  default  ) 
#>   -> predicted values  :  numerical, min =  1880.33 , mean =  3489.547 , max =  6152.823  
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -504.575 , mean =  -2.527879 , max =  631.0517  
#>   A new explainer has been created!

To make functions from the yardstick compatible with DALEX we must use the loss_yardstick adapter. In the example below we use the rmse function (root mean squared error).

ranger_rmse <- model_parts(explainer_ranger, type = "raw",
                      loss_function = loss_yardstick(rmse))
ranger_rmse
#>            variable mean_dropout_loss             label
#> 1      _full_model_          155.9598 Ranger Apartments
#> 2          no.rooms          329.3016 Ranger Apartments
#> 3 construction.year          386.5747 Ranger Apartments
#> 4             floor          447.6254 Ranger Apartments
#> 5           surface          526.6621 Ranger Apartments
#> 6          district          759.7785 Ranger Apartments
#> 7        _baseline_         1201.4958 Ranger Apartments
plot(ranger_rmse)

And one more example for rsq function (R squared).

ranger_rsq <- model_parts(explainer_ranger, type = "raw",
                      loss_function = loss_yardstick(rsq))
ranger_rsq
#>            variable mean_dropout_loss             label
#> 1      _full_model_      0.9822143560 Ranger Apartments
#> 2          district      0.3283019847 Ranger Apartments
#> 3           surface      0.6832458690 Ranger Apartments
#> 4             floor      0.7655788254 Ranger Apartments
#> 5 construction.year      0.8268200544 Ranger Apartments
#> 6          no.rooms      0.9125441548 Ranger Apartments
#> 7        _baseline_      0.0007561657 Ranger Apartments
plot(ranger_rsq)

Summary

I hope that using the yardstick package at DALEX will now be easy and enjoyable. If you would like to share your experience with this package, please create an issue at https://github.com/ModelOriented/DALEX/issues.

Session info

#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ranger_0.14.1   yardstick_1.1.0 DALEX_2.5.0    
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.2.0  xfun_0.36         bslib_0.4.2       purrr_1.0.1      
#>  [5] lattice_0.20-45   colorspace_2.1-0  vctrs_0.5.2       generics_0.1.3   
#>  [9] htmltools_0.5.4   yaml_2.3.7        utf8_1.2.2        rlang_1.0.6      
#> [13] pkgdown_2.0.7     jquerylib_0.1.4   pillar_1.8.1      glue_1.6.2       
#> [17] withr_2.5.0       lifecycle_1.0.3   stringr_1.5.0     munsell_0.5.0    
#> [21] gtable_0.3.1      ragg_1.2.5        memoise_2.0.1     evaluate_0.20    
#> [25] labeling_0.4.2    knitr_1.42        fastmap_1.1.0     fansi_1.0.4      
#> [29] highr_0.10        Rcpp_1.0.10       scales_1.2.1      cachem_1.0.6     
#> [33] desc_1.4.2        jsonlite_1.8.4    ingredients_2.3.0 farver_2.1.1     
#> [37] systemfonts_1.0.4 fs_1.6.0          textshaping_0.3.6 ggplot2_3.4.0    
#> [41] digest_0.6.31     stringi_1.7.12    dplyr_1.0.10      grid_4.2.2       
#> [45] rprojroot_2.0.3   cli_3.6.0         tools_4.2.2       magrittr_2.0.3   
#> [49] sass_0.4.5        tibble_3.1.8      crayon_1.5.2      pkgconfig_2.0.3  
#> [53] Matrix_1.5-1      rmarkdown_2.20    R6_2.5.1          compiler_4.2.2