R/safely_select_variables.R
safely_select_variables.Rd
The safely_select_variables() function selects variables from dataset returned by safely_transform_data() function. For each original variable exactly one variable is chosen
either original one or transformed one. The choice is based on the AIC value for linear model (regression) or logistic regression (classification).
safely_select_variables(
safe_extractor,
data,
y = NULL,
which_y = NULL,
class_pred = NULL,
verbose = TRUE
)
object containing information about variables transformations created with safe_extraction() function
data, original dataset or the one returned by safely_transform_data() function. If data do not contain transformed variables then transformation is done inside this function using 'safe_extractor' argument. Data may contain response variable or not - if it does then 'which_y' argument must be given, otherwise 'y' argument should be provided.
vector of responses, must be given if data does not contain it
numeric or character (optional), must be given if data contains response values
numeric or character, used only in multi-classification problems. If response vector has more than two levels, then 'class_pred' should indicate the class of interest which will denote failure - all other classes will stand for success.
logical, if progress bar is to be printed
vector of variables names, selected based on AIC values
library(DALEX)
library(randomForest)
library(rSAFE)
data <- apartments[1:500,]
set.seed(111)
model_rf <- randomForest(m2.price ~ construction.year + surface + floor +
no.rooms + district, data = data)
explainer_rf <- explain(model_rf, data = data[,2:6], y = data[,1])
#> Preparation of a new explainer is initiated
#> -> model label : randomForest ( default )
#> -> data : 500 rows 5 cols
#> -> target variable : 500 values
#> -> predict function : yhat.randomForest will be used ( default )
#> -> predicted values : No value for predict function target column. ( default )
#> -> model_info : package randomForest , ver. 4.7.1.1 , task regression ( default )
#> -> predicted values : numerical, min = 2010.939 , mean = 3502.345 , max = 5764.513
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -387.9388 , mean = -0.6372461 , max = 749.0998
#> A new explainer has been created!
safe_extractor <- safe_extraction(explainer_rf, verbose = FALSE)
safely_select_variables(safe_extractor, data, which_y = "m2.price", verbose = FALSE)
#> [1] "surface" "floor" "no.rooms"
#> [4] "construction.year_new" "district_new"