Prepare data into format correct for the selected model engine

prepare_data(
  data,
  y,
  engine = c("ranger", "xgboost", "decision_tree", "lightgbm", "catboost"),
  predict = FALSE,
  train = NULL
)

Arguments

data

A data source, that is one of the major R formats: data.table, data.frame, matrix and so on.

y

A string that indicates a target column name.

engine

A vector of tree-based models that shall be created. Possible values are: `ranger`, `xgboost`, `lightgbm`, `catboost`, `decision_tree`. Determines which models will be later learnt.

predict

A logical value, determines whether the data set will be used for prediction or training. It is necessary, because lightgbm model can't predict on training dataset.

train

A train data, if predict is TRUE you have to provide training dataset from split data here.

Value

A dataset in format proper for the selected engines.

Examples

data(iris)
type              <- guess_type(lisbon, 'Price')
preprocessed_data <- preprocessing(lisbon, 'Price', type)
#> Error in if (advanced) {    del_cor <- delete_correlated_values(pre_data, y, verbose = verbose)    pre_data <- del_cor$data    pre_data <- delete_id_columns(pre_data)    pre_data <- boruta_selection(pre_data, y)}: argument is not interpretable as logical
preprocessed_data <- preprocessed_data$data
#> Error in eval(expr, envir, enclos): object 'preprocessed_data' not found
split_data <-
  train_test_balance(preprocessed_data,
                     'Price',
                     balance = FALSE)
#> Error in train_test_balance(preprocessed_data, "Price", balance = FALSE): object 'preprocessed_data' not found
set.seed(123)
train_data <-
  prepare_data(split_data$train,
               'Price',
               engine = c('ranger', 'xgboost', 'decision_tree', 'lightgbm', 'catboost'))
#> Error in as.data.frame(unclass(data), stringsAsFactors = TRUE): object 'split_data' not found
set.seed(123)
test_data <-
  prepare_data(split_data$test,
               'Price',
               engine = c('ranger', 'xgboost', 'decision_tree','lightgbm', 'catboost'),
               predict = TRUE,
               train = split_data$train)
#> Error in as.data.frame(unclass(data), stringsAsFactors = TRUE): object 'split_data' not found