This function calculate candidate splits for each selected variable. For numerical variables splits are calculated as percentiles (in general uniform quantiles of the length grid_points). For all other variables splits are calculated as unique values.

calculate_variable_split(
  data,
  variables = colnames(data),
  grid_points = 101,
  variable_splits_type = "quantiles",
  new_observation = NA
)

# S3 method for default
calculate_variable_split(
  data,
  variables = colnames(data),
  grid_points = 101,
  variable_splits_type = "quantiles",
  new_observation = NA
)

Arguments

data

validation dataset. Is used to determine distribution of observations.

variables

names of variables for which splits shall be calculated

grid_points

number of points used for response path

variable_splits_type

how variable grids shall be calculated? Use "quantiles" (default) for percentiles or "uniform" to get uniform grid of points

new_observation

if specified (not NA) then all values in new_observation will be included in variable_splits

Value

A named list with splits for selected variables

Details

Note that calculate_variable_split function is S3 generic. If you want to work on non standard data sources (like H2O ddf, external databases) you should overload it.