The safely_select_variables() function selects variables from dataset returned by safely_transform_data() function. For each original variable exactly one variable is chosen

• either original one or transformed one. The choice is based on the AIC value for linear model (regression) or logistic regression (classification).

safely_select_variables(
safe_extractor,
data,
y = NULL,
which_y = NULL,
class_pred = NULL,
verbose = TRUE
)

## Arguments

safe_extractor object containing information about variables transformations created with safe_extraction() function data, original dataset or the one returned by safely_transform_data() function. If data do not contain transformed variables then transformation is done inside this function using 'safe_extractor' argument. Data may contain response variable or not - if it does then 'which_y' argument must be given, otherwise 'y' argument should be provided. vector of responses, must be given if data does not contain it numeric or character (optional), must be given if data contains response values numeric or character, used only in multi-classification problems. If response vector has more than two levels, then 'class_pred' should indicate the class of interest which will denote failure - all other classes will stand for success. logical, if progress bar is to be printed

## Value

vector of variables names, selected based on AIC values

safely_transform_data

## Examples


library(DALEX)
library(randomForest)
library(rSAFE)

data <- apartments[1:500,]
set.seed(111)
model_rf <- randomForest(m2.price ~ construction.year + surface + floor +
no.rooms + district, data = data)
explainer_rf <- explain(model_rf, data = data[,2:6], y = data[,1])
#> Preparation of a new explainer is initiated
#>   -> model label       :  randomForest  (  default  )
#>   -> data              :  500  rows  5  cols
#>   -> target variable   :  500  values
#>   -> predict function  :  yhat.randomForest  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package randomForest , ver. 4.6.14 , task regression (  default  )
#>   -> predicted values  :  numerical, min =  2010.939 , mean =  3502.345 , max =  5764.513
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -387.9388 , mean =  -0.6372461 , max =  749.0998
#>   A new explainer has been created!
safe_extractor <- safe_extraction(explainer_rf, verbose = FALSE)
safely_select_variables(safe_extractor, data, which_y = "m2.price", verbose = FALSE)
#> [1] "surface"               "floor"                 "no.rooms"
#> [4] "construction.year_new" "district_new"