Balance and split the dataset

train_test_balance(
  data,
  y,
  balance = TRUE,
  fractions = c(0.6, 0.2, 0.2),
  seed = NULL
)

Arguments

data

A data source, that is one of the major R formats: data.table, data.frame, matrix and so on.

y

A string that indicates a target column name.

balance

A logical value, determines if we want to balance the dataset.

fractions

A vector with 3 numeric values that sum to 1 which determine sizes of train, test and validation datasets. DEFAULT: c(0.6, 0.2, 0.2).

seed

An integer random seed. It allows for comparable results. If it is NULL, the split is random.

Value

A list of train, test and validation datasets.

Examples

data(lisbon)
b_lisbon <- train_test_balance(lisbon, 'Price', balance = FALSE,
                               fractions = c(train = 0.6, valid = 0.2, test = 0.2))