The safely_select_variables() function selects variables from dataset returned by safely_transform_data() function. For each original variable exactly one variable is chosen

  • either original one or transformed one. The choice is based on the AIC value for linear model (regression) or logistic regression (classification).

safely_select_variables(
  safe_extractor,
  data,
  y = NULL,
  which_y = NULL,
  class_pred = NULL,
  verbose = TRUE
)

Arguments

safe_extractor

object containing information about variables transformations created with safe_extraction() function

data

data, original dataset or the one returned by safely_transform_data() function. If data do not contain transformed variables then transformation is done inside this function using 'safe_extractor' argument. Data may contain response variable or not - if it does then 'which_y' argument must be given, otherwise 'y' argument should be provided.

y

vector of responses, must be given if data does not contain it

which_y

numeric or character (optional), must be given if data contains response values

class_pred

numeric or character, used only in multi-classification problems. If response vector has more than two levels, then 'class_pred' should indicate the class of interest which will denote failure - all other classes will stand for success.

verbose

logical, if progress bar is to be printed

Value

vector of variables names, selected based on AIC values

See also

Examples


library(DALEX)
library(randomForest)
library(rSAFE)

data <- apartments[1:500,]
set.seed(111)
model_rf <- randomForest(m2.price ~ construction.year + surface + floor +
                           no.rooms + district, data = data)
explainer_rf <- explain(model_rf, data = data[,2:6], y = data[,1])
#> Preparation of a new explainer is initiated
#>   -> model label       :  randomForest  (  default  )
#>   -> data              :  500  rows  5  cols 
#>   -> target variable   :  500  values 
#>   -> predict function  :  yhat.randomForest  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package randomForest , ver. 4.6.14 , task regression (  default  ) 
#>   -> predicted values  :  numerical, min =  2010.939 , mean =  3502.345 , max =  5764.513  
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -387.9388 , mean =  -0.6372461 , max =  749.0998  
#>   A new explainer has been created!  
safe_extractor <- safe_extraction(explainer_rf, verbose = FALSE)
safely_select_variables(safe_extractor, data, which_y = "m2.price", verbose = FALSE)
#> [1] "surface"               "floor"                 "no.rooms"             
#> [4] "construction.year_new" "district_new"