R/safely_transform_continuous.R
safely_transform_continuous.Rd
The safely_transform_continuous() function calculates a transformation function for the continuous variable using a PD/ALE plot obtained from black box model.
safely_transform_continuous(
explainer,
variable,
response_type = "ale",
grid_points = 50,
N = 200,
penalty = "MBIC",
nquantiles = 10,
no_segments = 2
)
DALEX explainer created with explain() function
a feature for which the transformation function is to be computed
character, type of response to be calculated, one of: "pdp", "ale". If features are uncorrelated, one can use "pdp" type - otherwise "ale" is strongly recommended.
number of points on x-axis used for creating the PD/ALE plot, default 50
number of observations from the dataset used for creating the PD/ALE plot, default 200
penalty for introducing another changepoint, one of "AIC", "BIC", "SIC", "MBIC", "Hannan-Quinn" or numeric non-negative value
the number of quantiles used in integral approximation
numeric, a number of segments variable is to be divided into in case of founding no breakpoints
list of information on the transformation of given variable
library(DALEX)
library(randomForest)
library(rSAFE)
data <- apartments[1:500,]
set.seed(111)
model_rf <- randomForest(m2.price ~ construction.year + surface + floor +
no.rooms + district, data = data)
explainer_rf <- explain(model_rf, data = data[,2:6], y = data[,1])
#> Preparation of a new explainer is initiated
#> -> model label : randomForest ( default )
#> -> data : 500 rows 5 cols
#> -> target variable : 500 values
#> -> predict function : yhat.randomForest will be used ( default )
#> -> predicted values : No value for predict function target column. ( default )
#> -> model_info : package randomForest , ver. 4.7.1.1 , task regression ( default )
#> -> predicted values : numerical, min = 2010.939 , mean = 3502.345 , max = 5764.513
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -387.9388 , mean = -0.6372461 , max = 749.0998
#> A new explainer has been created!
safely_transform_continuous(explainer_rf, "construction.year")
#> $sv
#> Top profiles :
#> _vname_ _label_ _x_ _yhat_ _ids_
#> 1 construction.year randomForest 1920 0.00000 0
#> 2 construction.year randomForest 1923 23.52078 0
#> 3 construction.year randomForest 1924 19.48436 0
#> 4 construction.year randomForest 1926 1.71197 0
#> 5 construction.year randomForest 1927 -30.08008 0
#> 6 construction.year randomForest 1929 -44.82375 0
#>
#> $break_points
#> [1] 1926 1940 1964 1983 1994
#>
#> $new_levels
#> [1] "(-Inf, 1926]" "(1926, 1940]" "(1940, 1964]" "(1964, 1983]" "(1983, 1994]"
#> [6] "(1994, Inf)"
#>