The safely_transform_continuous() function calculates a transformation function for the continuous variable using a PD/ALE plot obtained from black box model.

safely_transform_continuous(
  explainer,
  variable,
  response_type = "ale",
  grid_points = 50,
  N = 200,
  penalty = "MBIC",
  nquantiles = 10,
  no_segments = 2
)

Arguments

explainer

DALEX explainer created with explain() function

variable

a feature for which the transformation function is to be computed

response_type

character, type of response to be calculated, one of: "pdp", "ale". If features are uncorrelated, one can use "pdp" type - otherwise "ale" is strongly recommended.

grid_points

number of points on x-axis used for creating the PD/ALE plot, default 50

N

number of observations from the dataset used for creating the PD/ALE plot, default 200

penalty

penalty for introducing another changepoint, one of "AIC", "BIC", "SIC", "MBIC", "Hannan-Quinn" or numeric non-negative value

nquantiles

the number of quantiles used in integral approximation

no_segments

numeric, a number of segments variable is to be divided into in case of founding no breakpoints

Value

list of information on the transformation of given variable

Examples


library(DALEX)
library(randomForest)
library(rSAFE)

data <- apartments[1:500,]
set.seed(111)
model_rf <- randomForest(m2.price ~ construction.year + surface + floor +
                           no.rooms + district, data = data)
explainer_rf <- explain(model_rf, data = data[,2:6], y = data[,1])
#> Preparation of a new explainer is initiated
#>   -> model label       :  randomForest  (  default  )
#>   -> data              :  500  rows  5  cols 
#>   -> target variable   :  500  values 
#>   -> predict function  :  yhat.randomForest  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package randomForest , ver. 4.7.1.1 , task regression (  default  ) 
#>   -> predicted values  :  numerical, min =  2010.939 , mean =  3502.345 , max =  5764.513  
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -387.9388 , mean =  -0.6372461 , max =  749.0998  
#>   A new explainer has been created!  
safely_transform_continuous(explainer_rf, "construction.year")
#> $sv
#> Top profiles    : 
#>             _vname_      _label_  _x_    _yhat_ _ids_
#> 1 construction.year randomForest 1920   0.00000     0
#> 2 construction.year randomForest 1923  23.52078     0
#> 3 construction.year randomForest 1924  19.48436     0
#> 4 construction.year randomForest 1926   1.71197     0
#> 5 construction.year randomForest 1927 -30.08008     0
#> 6 construction.year randomForest 1929 -44.82375     0
#> 
#> $break_points
#> [1] 1926 1940 1964 1983 1994
#> 
#> $new_levels
#> [1] "(-Inf, 1926]" "(1926, 1940]" "(1940, 1964]" "(1964, 1983]" "(1983, 1994]"
#> [6] "(1994, Inf)" 
#>