Currently three checks are implemented, covariate drift, residual drift and model drift.
check_drift(model_old, model_new, data_old, data_new, y_old, y_new, predict_function = predict, max_obs = 100, bins = 20, scale = sd(y_new, na.rm = TRUE))
model_old | model created on historical / `old`data |
---|---|
model_new | model created on current / `new`data |
data_old | data frame with historical / `old` data |
data_new | data frame with current / `new` data |
y_old | true values of target variable for historical / `old` data |
y_new | true values of target variable for current / `new` data |
predict_function | function that takes two arguments: model and new data and returns numeric vector with predictions, by default it's `predict` |
max_obs | if negative, them all observations are used for calculation of PDP, is positive, then only `max_obs` are used for calculation of PDP |
bins | continuous variables are discretized to `bins` intervals of equal sizes |
scale | scale parameter for calculation of scaled drift |
This function is executed for its side effects, all checks are being printed on the screen. Additionaly it returns list with particualr checks.
library("DALEX") model_old <- lm(m2.price ~ ., data = apartments) model_new <- lm(m2.price ~ ., data = apartments_test[1:1000,]) check_drift(model_old, model_new, apartments, apartments_test, apartments$m2.price, apartments_test$m2.price)#> ------------------------------------- #> Variable Shift #> ------------------------------------- #> m2.price 4.9 #> construction.year 6.0 #> surface 6.8 #> floor 4.9 #> no.rooms 2.8 #> district 2.8 #> ------------------------------------- #> Variable Shift #> ------------------------------------- #> Residuals 8.3 #> ----------------------------------------------- #> Variable Shift Scaled #> ----------------------------------------------- #> floor 22.10 2.5 #> no.rooms 27.44 3.0 #> surface 30.12 3.3 #> m2.price 26.41 2.9 #> construction.year 29.49 3.3library("ranger") predict_function <- function(m,x,...) predict(m, x, ...)$predictions model_old <- ranger(m2.price ~ ., data = apartments) model_new <- ranger(m2.price ~ ., data = apartments_test) check_drift(model_old, model_new, apartments, apartments_test, apartments$m2.price, apartments_test$m2.price, predict_function = predict_function)#> ------------------------------------- #> Variable Shift #> ------------------------------------- #> m2.price 4.9 #> construction.year 6.0 #> surface 6.8 #> floor 4.9 #> no.rooms 2.8 #> district 2.8 #> ------------------------------------- #> Variable Shift #> ------------------------------------- #> Residuals 34.1 ** #> ----------------------------------------------- #> Variable Shift Scaled #> ----------------------------------------------- #> floor 83.14 9.2 #> no.rooms 160.79 17.9 . #> surface 164.33 18.2 . #> m2.price 166.15 18.5 . #> construction.year 201.95 22.4 *