Currently three checks are implemented, covariate drift, residual drift and model drift.

check_drift(model_old, model_new, data_old, data_new, y_old, y_new,
  predict_function = predict, max_obs = 100, bins = 20,
  scale = sd(y_new, na.rm = TRUE))

Arguments

model_old

model created on historical / `old`data

model_new

model created on current / `new`data

data_old

data frame with historical / `old` data

data_new

data frame with current / `new` data

y_old

true values of target variable for historical / `old` data

y_new

true values of target variable for current / `new` data

predict_function

function that takes two arguments: model and new data and returns numeric vector with predictions, by default it's `predict`

max_obs

if negative, them all observations are used for calculation of PDP, is positive, then only `max_obs` are used for calculation of PDP

bins

continuous variables are discretized to `bins` intervals of equal sizes

scale

scale parameter for calculation of scaled drift

Value

This function is executed for its side effects, all checks are being printed on the screen. Additionaly it returns list with particualr checks.

Examples

library("DALEX") model_old <- lm(m2.price ~ ., data = apartments) model_new <- lm(m2.price ~ ., data = apartments_test[1:1000,]) check_drift(model_old, model_new, apartments, apartments_test, apartments$m2.price, apartments_test$m2.price)
#> ------------------------------------- #> Variable Shift #> ------------------------------------- #> m2.price 4.9 #> construction.year 6.0 #> surface 6.8 #> floor 4.9 #> no.rooms 2.8 #> district 2.8 #> ------------------------------------- #> Variable Shift #> ------------------------------------- #> Residuals 8.3 #> ----------------------------------------------- #> Variable Shift Scaled #> ----------------------------------------------- #> floor 22.10 2.5 #> no.rooms 27.44 3.0 #> surface 30.12 3.3 #> m2.price 26.41 2.9 #> construction.year 29.49 3.3
library("ranger") predict_function <- function(m,x,...) predict(m, x, ...)$predictions model_old <- ranger(m2.price ~ ., data = apartments) model_new <- ranger(m2.price ~ ., data = apartments_test) check_drift(model_old, model_new, apartments, apartments_test, apartments$m2.price, apartments_test$m2.price, predict_function = predict_function)
#> ------------------------------------- #> Variable Shift #> ------------------------------------- #> m2.price 4.9 #> construction.year 6.0 #> surface 6.8 #> floor 4.9 #> no.rooms 2.8 #> district 2.8 #> ------------------------------------- #> Variable Shift #> ------------------------------------- #> Residuals 34.1 ** #> ----------------------------------------------- #> Variable Shift Scaled #> ----------------------------------------------- #> floor 83.14 9.2 #> no.rooms 160.79 17.9 . #> surface 164.33 18.2 . #> m2.price 166.15 18.5 . #> construction.year 201.95 22.4 *