All functions

adult

Adult dataset

basic_info()

Provide basic dataset information

best_model_predict()

Predict a single model

binarize_target()

Binarize the target column

boruta_selection()

Perform Boruta algorithm for selecting most important features

check_cor()

Search for strongly correlated values (Spearman for numerical, Crammer V for categorical)

check_data()

Run data check pipeline to seek for potential problems with the data

check_dim()

Search for dimensionality problems in the dataset

check_duplicate_col()

Search for duplicates between columns

check_missing()

Search for missing values in the target column and predictors

check_outliers()

Search for outliers via mean standard deviation, median absolute deviation and inter quantile range

check_static()

Search for columns dominated by a single value

check_y_balance()

Check whether the target column is unbalanced (for regression it bins values via quantiles)

choose_best_models()

Choose the bests models, according to the score data frame

compas

Modified COMPAS dataset

create_ranked_list()

Create final ranked_list

delete_correlated_values()

Delete correlated values

delete_id_columns()

Delete columns that are ID-like columns

detect_id_columns()

Detect columns that are ID-like columns

draw_boxplot()

Draw boxplot of resuduals - for regression

draw_confusion_matrix()

Draw confusion matrix for the model

draw_feature_importance()

Draw Feature Importance plot

draw_radar_plot()

Plot radar chart of one metric

draw_rmse_plot()

Draws train vs test RMSE plot for models

draw_roc_plot()

Draw AUC ROC curve for the best model

draw_scatterplot()

Draw scatterplot of true vs predicted values of target for training and test data for one model

explain()

Explain forester model

fertility

Fertility dataset

forester_palette()

Return colors from palette

format_models_details()

Format info about models

guess_type()

Guess task type by the target value from the dataset

lisbon

Lisbon dataset

lymph

Lymph dataset

manage_missing()

Manage missing values

predict_models()

Predict models depending on the engine

predict_models_all()

Predictions for a list of models with multiple occurrences of the same types of models

predict_new()

Perform predictions on new data

prepare_data()

Prepare data into format correct for the selected model engine

preprocessing()

Conduct preprocessing processes

pre_rm_static_cols()

Remove columns with one value for all rows

random_search()

Random optimization of hyperparameters

report()

Generate report after training

save()

Save elements from forester

save_deleted_columns()

Save column names deleted during preprocessing process

score_models()

Score models by suitable metrics

testing_data

Testing dataset

train()

Train models with forester

train_models()

Train models from given engines

train_models_bayesopt()

Train models with Bayesian Optimization algorithm

train_test_balance()

Balance and split the dataset

verbose_cat()

Print the provided cat-like input if verbose is TRUE