Calculate Covariate Drift for two data frames

Here covariate drift is defined as Non-Intersection Distance between two distributions. More formally, $$d(P,Q) = 1 - sum_i min(P_i, Q_i)$$. The larger the distance the more different are two distributions.

calculate_covariate_drift(data_old, data_new, bins = 20)

Arguments

data_old	data frame with `old` data
data_new	data frame with `new` data
bins	continuous variables are discretized to `bins` intervals of equal sizes

Value

an object of a class `covariate_drift` (data.frame) with Non-Intersection Distances

Examples

library("DALEX")
#> Welcome to DALEX (version: 0.4.3).
#> Find examples and detailed introduction at: https://pbiecek.github.io/PM_VEE/
# here we do not have any drift
d <- calculate_covariate_drift(apartments, apartments_test)
d
#>                   Variable  Shift
#>   -------------------------------------
#>                   m2.price    4.9  
#>          construction.year    6.0  
#>                    surface    6.8  
#>                      floor    4.9  
#>                   no.rooms    2.8  
#>                   district    2.8  
# here we do have drift
d <- calculate_covariate_drift(dragons, dragons_test)
d
#>                   Variable  Shift
#>   -------------------------------------
#>              year_of_birth    8.9  
#>                     height   15.3  .
#>                     weight   14.7  .
#>                      scars    4.6  
#>                     colour   17.9  .
#>          year_of_discovery   97.5  ***
#>       number_of_lost_teeth    6.3  
#>                life_length    8.6

Arguments

Value

Examples

Contents