Here covariate drift is defined as Non-Intersection Distance between two distributions. More formally, $$d(P,Q) = 1 - sum_i min(P_i, Q_i)$$. The larger the distance the more different are two distributions.

calculate_covariate_drift(data_old, data_new, bins = 20)

Arguments

data_old

data frame with `old` data

data_new

data frame with `new` data

bins

continuous variables are discretized to `bins` intervals of equal sizes

Value

an object of a class `covariate_drift` (data.frame) with Non-Intersection Distances

Examples

library("DALEX")
#> Welcome to DALEX (version: 0.4.3). #> Find examples and detailed introduction at: https://pbiecek.github.io/PM_VEE/
# here we do not have any drift d <- calculate_covariate_drift(apartments, apartments_test) d
#> Variable Shift #> ------------------------------------- #> m2.price 4.9 #> construction.year 6.0 #> surface 6.8 #> floor 4.9 #> no.rooms 2.8 #> district 2.8
# here we do have drift d <- calculate_covariate_drift(dragons, dragons_test) d
#> Variable Shift #> ------------------------------------- #> year_of_birth 8.9 #> height 15.3 . #> weight 14.7 . #> scars 4.6 #> colour 17.9 . #> year_of_discovery 97.5 *** #> number_of_lost_teeth 6.3 #> life_length 8.6