`vignettes/vignette_yardstick.Rmd`

`vignette_yardstick.Rmd`

yardstick
is a package that offers many measures for evaluating model performance.
It is based on the `tidymodels`

/`tidyverse`

philosophy, the performance is calculated by functions working on the
data.frame with the results of the model.

DALEX uses model performance
measures to assess the importance of variables (in the model_parts
function). These are typically calculated based on loss functions
(functions with prefix `loss`

) that are working on two
vectors - the score from the model and the true target variable.

Although these packages have a slightly different philosophy of
operation, you can use the measures available in yardstick when working
with `DALEX.`

Below is information on how to use the
`loss_yardstick`

function to do this.

The `yardstick`

package supports both classification
models and regression models. We will start our example with a
classification model for the titanic data - the probability of surviving
this disaster.

The following instruction trains a classification model.

```
library("DALEX")
library("yardstick")
titanic_glm <- glm(survived~., data = titanic_imputed, family = "binomial")
```

The Class Probability Metrics in the `yardstick`

package
assume that the true value is a `factor`

and the model
returns a numerical score. So let’s prepare an `explainer`

that has `factor`

as `y`

and the
`predict_function`

returns the probability of the target
class (default behaviour).

**NOTE**: Performance measures will be calculated on
data supplied in the explainer. Put here the test data!

```
explainer_glm <- DALEX::explain(titanic_glm,
data = titanic_imputed[,-8],
y = factor(titanic_imputed$survived))
```

To make functions from the `yardstick`

compatible with
`DALEX`

we must use the `loss_yardstick`

adapter.
In the example below we use the `roc_auc`

function (area
under the receiver operator curve). The `yardstick::`

prefix
is not necessary, but we put it here to show explicitly where the
functions you use are located.

**NOTE**: we set
`yardstick.event_first = FALSE`

as the model predicts
probability of `survived = 1`

.

```
options(yardstick.event_first = FALSE)
glm_auc <- model_parts(explainer_glm, type = "raw",
loss_function = loss_yardstick(yardstick::roc_auc))
glm_auc
```

```
#> variable mean_dropout_loss label
#> 1 _full_model_ 0.8089452 lm
#> 2 gender 0.6238639 lm
#> 3 class 0.7099712 lm
#> 4 age 0.7813811 lm
#> 5 sibsp 0.7995240 lm
#> 6 embarked 0.8027081 lm
#> 7 parch 0.8089237 lm
#> 8 fare 0.8095338 lm
#> 9 _baseline_ 0.4990260 lm
```

`plot(glm_auc)`

In a similar way, we can use the `pr_auc`

function (area
under the precision recall curve).

```
glm_prauc <- model_parts(explainer_glm, type = "raw",
loss_function = loss_yardstick(yardstick::pr_auc))
glm_prauc
```

```
#> variable mean_dropout_loss label
#> 1 _full_model_ 0.7426051 lm
#> 2 gender 0.4370993 lm
#> 3 class 0.5874661 lm
#> 4 age 0.7246998 lm
#> 5 sibsp 0.7264267 lm
#> 6 embarked 0.7322981 lm
#> 7 fare 0.7417359 lm
#> 8 parch 0.7422524 lm
#> 9 _baseline_ 0.3259686 lm
```

`plot(glm_prauc)`

The Classification Metrics in the `yardstick`

package
assume that the true value is a `factor`

and the model
returns a `factor`

variable.

This is different behavior than for most explanations in DALEX, because when explaining predictions we typically operate on class membership probabilities. If we want to use Classification Metrics we need to provide a predict function that returns classes instead of probabilities.

So let’s prepare an `explainer`

that has
`factor`

as `y`

and the
`predict_function`

returns classes.

```
explainer_glm <- DALEX::explain(titanic_glm,
data = titanic_imputed[,-8],
y = factor(titanic_imputed$survived),
predict_function = function(m,x) {
factor(as.numeric(predict(m, x, type = "response") > 0.5),
levels = c("0", "1"))
})
```

Again, let’s use the `loss_yardstick`

adapter. In the
example below we use the `accuracy`

function.

```
glm_accuracy <- model_parts(explainer_glm, type = "raw",
loss_function = loss_yardstick(yardstick::accuracy))
glm_accuracy
```

```
#> variable mean_dropout_loss label
#> 1 _full_model_ 0.7978 lm
#> 2 gender 0.6369 lm
#> 3 class 0.7454 lm
#> 4 age 0.7819 lm
#> 5 sibsp 0.7914 lm
#> 6 embarked 0.7957 lm
#> 7 parch 0.7971 lm
#> 8 fare 0.7980 lm
#> 9 _baseline_ 0.5894 lm
```

`plot(glm_accuracy)`

In a similar way, we can use the `bal_accuracy`

function
(balanced accuracy).

```
glm_bal_accuracy <- model_parts(explainer_glm, type = "raw",
loss_function = loss_yardstick(yardstick::bal_accuracy))
glm_bal_accuracy
```

```
#> variable mean_dropout_loss label
#> 1 _full_model_ 0.7419681 lm
#> 2 gender 0.5716828 lm
#> 3 class 0.6715899 lm
#> 4 age 0.7263103 lm
#> 5 embarked 0.7361306 lm
#> 6 sibsp 0.7389633 lm
#> 7 fare 0.7413686 lm
#> 8 parch 0.7419721 lm
#> 9 _baseline_ 0.4997630 lm
```

`plot(glm_bal_accuracy)`

For the loss function, the smaller the values the better the model.
Therefore, the importance of variables is often calculated as
`loss(perturbed) - loss(original)`

.

But many model performance functions have the opposite
characteristic, the higher they are the better (e.g. `AUC`

,
`accuracy`

, etc). To maintain a consistent analysis pipeline
it is convenient to invert such functions, e.g. by converting to
`1- AUC`

or `1 - accuracy`

.

To do it, just add the `reverse = TRUE`

argument.

```
glm_1accuracy <- model_parts(explainer_glm,
loss_function = loss_yardstick(accuracy, reverse = TRUE))
glm_1accuracy
```

```
#> variable mean_dropout_loss label
#> 1 _full_model_ 0.2043 lm
#> 2 fare 0.2042 lm
#> 3 parch 0.2046 lm
#> 4 embarked 0.2083 lm
#> 5 sibsp 0.2106 lm
#> 6 age 0.2194 lm
#> 7 class 0.2614 lm
#> 8 gender 0.3694 lm
#> 9 _baseline_ 0.4071 lm
```

`plot(glm_1accuracy)`

By default the performance is calculated on `N = 1000`

randomly selected observations (to speed up the calculations). Set
`N = NULL`

to use the whole dataset.

```
glm_1accuracy <- model_parts(explainer_glm,
loss_function = loss_yardstick(accuracy, reverse = TRUE),
N = NULL)
plot(glm_1accuracy)
```

The following instruction trains a regression model.

The Regression Metrics in the `yardstick`

package assume
that the true value is a `numeric`

variable and the model
returns a `numeric`

score.

```
explainer_ranger <- DALEX::explain(apartments_ranger, data = apartments[,-1],
y = apartments$m2.price, label = "Ranger Apartments")
```

```
#> Preparation of a new explainer is initiated
#> -> model label : Ranger Apartments
#> -> data : 1000 rows 5 cols
#> -> target variable : 1000 values
#> -> predict function : yhat.ranger will be used ( default )
#> -> predicted values : No value for predict function target column. ( default )
#> -> model_info : package ranger , ver. 0.14.1 , task regression ( default )
#> -> predicted values : numerical, min = 1907.767 , mean = 3488.795 , max = 6181.198
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -643.31 , mean = -1.775808 , max = 708.0153
#> A new explainer has been created!
```

To make functions from the `yardstick`

compatible with
`DALEX`

we must use the `loss_yardstick`

adapter.
In the example below we use the `rmse`

function (root mean
squared error).

```
ranger_rmse <- model_parts(explainer_ranger, type = "raw",
loss_function = loss_yardstick(rmse))
ranger_rmse
```

```
#> variable mean_dropout_loss label
#> 1 _full_model_ 166.5718 Ranger Apartments
#> 2 no.rooms 320.9772 Ranger Apartments
#> 3 construction.year 394.2647 Ranger Apartments
#> 4 floor 442.0134 Ranger Apartments
#> 5 surface 541.2952 Ranger Apartments
#> 6 district 739.4888 Ranger Apartments
#> 7 _baseline_ 1205.0356 Ranger Apartments
```

`plot(ranger_rmse)`

And one more example for `rsq`

function (R squared).

```
ranger_rsq <- model_parts(explainer_ranger, type = "raw",
loss_function = loss_yardstick(rsq))
ranger_rsq
```

```
#> variable mean_dropout_loss label
#> 1 _full_model_ 0.979198764 Ranger Apartments
#> 2 district 0.352448580 Ranger Apartments
#> 3 surface 0.650201959 Ranger Apartments
#> 4 floor 0.764333307 Ranger Apartments
#> 5 construction.year 0.821802873 Ranger Apartments
#> 6 no.rooms 0.922314139 Ranger Apartments
#> 7 _baseline_ 0.000930641 Ranger Apartments
```

`plot(ranger_rsq)`

I hope that using the `yardstick`

package at
`DALEX`

will now be easy and enjoyable. If you would like to
share your experience with this package, please create an issue at https://github.com/ModelOriented/DALEX/issues.

```
#> R version 4.2.3 (2023-03-15)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ranger_0.14.1 yardstick_1.1.0 DALEX_2.5.1
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.0 xfun_0.37 bslib_0.4.2 purrr_1.0.1
#> [5] lattice_0.20-45 colorspace_2.1-0 vctrs_0.6.1 generics_0.1.3
#> [9] htmltools_0.5.4 yaml_2.3.7 utf8_1.2.3 rlang_1.1.0
#> [13] pkgdown_2.0.7 jquerylib_0.1.4 pillar_1.9.0 glue_1.6.2
#> [17] withr_2.5.0 lifecycle_1.0.3 stringr_1.5.0 munsell_0.5.0
#> [21] gtable_0.3.3 ragg_1.2.5 memoise_2.0.1 evaluate_0.20
#> [25] labeling_0.4.2 knitr_1.42 fastmap_1.1.1 fansi_1.0.4
#> [29] highr_0.10 Rcpp_1.0.10 scales_1.2.1 cachem_1.0.7
#> [33] desc_1.4.2 jsonlite_1.8.4 ingredients_2.3.0 farver_2.1.1
#> [37] systemfonts_1.0.4 fs_1.6.1 textshaping_0.3.6 ggplot2_3.4.1
#> [41] digest_0.6.31 stringi_1.7.12 dplyr_1.1.1 grid_4.2.3
#> [45] rprojroot_2.0.3 cli_3.6.0 tools_4.2.3 magrittr_2.0.3
#> [49] sass_0.4.5 tibble_3.2.1 crayon_1.5.2 pkgconfig_2.0.3
#> [53] Matrix_1.5-3 rmarkdown_2.20 R6_2.5.1 compiler_4.2.3
```