- In the
check_data()
function:
- Updated the documentation or examples in:
- In the
explain()
function:
- Updated the documentation and the examples,
- Fixed the handling of the factors.
- In the
draw_radar_plot()
function:
- Updated the documentation,
- Deleted unnecessary prints.
- In the
prepare_data()
function:
- Updated the documentation,
- Fixed the process of creating an ‘other’ label during the preparation of train / test splits.
- In the
preprocessing()
function:
- Updated the documentation,
- Removed unused parameter type,
- Added a factorization of the target for binary classification task.
- In the
random_search()
function:
- Updated the documentation,
- Fixed the method for the ranger model,
- Unified the grid with the Bayesian Optimization grid.
- In the
score_models()
function:
- Updated the documentation,
- Added RMSE as a main metric for the regression task,
- Added accuracy as a main metric for the binary classification task,
- Fixed the scoring method for the regression.
- In the
train()
function:
- Updated the documentation,
- Enabled providing the tibble as a data parameter (with warning),
- Added a warning if the user tries to use incorrect task type,
- Added score_train and predictions train to the output of the function.
- In the
train_models_bayesopt()
function:
- Updated the documentation,
- Fixed the Bayesian Optimization. Before hand it didn’t work at all for models other than ranger,
- Changed tuned hyperparameters and their values for all models, which increased the performance of the method.
- Updated the majority of
tests
so they match the current version.
- Added alternative catboost installation in the
README
.
- In the
report()
function:
- Removed engine and tuning columns from the ranked list table,
- Explained how the model name is created, so the columns engine and tuning are not needed,
- Fixed the radar plot issues,
- In the
draw_radar_plot()
function:
- Fixed radar plots, so the models are named correctly (by model names, not engines),
- Fixed issue where the plot for binary classification had an empty observation name,
- Removed unnecessary and invalid Metric names (the ones that are not used and engine, name, tuning).
- In the
check_data()
function:
- Added a vector of possible outliers to the output,
- The outliers indexes are printed if there are less than 50 of them.
- In the
train()
function:
- Added a vector of possible outliers to the output.
- In the
train()
function:
- Added objects to the output of the function:
- Vector: valid_observed.
- Vectors: train_observed_labels, test_observed_labels, valid_observed_labels, predictions_all_labels predictions_best_best as labels for binary classification task.
- Removed catboost from default engines.
- Added check if catboost is installed.
- Removed parameters: loss, validation, tuning, keep, because they weren’t used.
- Added train_test_split parameter that determines proportions of train, test and validation subsets.
- Conditioned the message about columns deleted during the advanced preprocessing.
- In the
check_data()
function:
- Fixed an issue of too much unique values for the Crammer V’s correlation. If the number of combinations exceeds 2^29, the correlation is not calculated and the user is informed about it (it is caused by R limitations).
- Fixed an issue with omitting strong correlations that are negative.
- Added ticks and crosses that indicate whether something is of the high or low quality.
- Added the information which class is the dominating one in the unbalanced classification data set.
- In the
preprocessing()
function:
- Added a factorization of the target at the beginning of the
binarize_target()
sub function.
- Resolved issues with data frames where columns are not saved as factors.
- The
report()
function:
- If tinytex is not installed raises an error.
- If ggradar is not installed, there will be no error, no plot, and a note about installing the ggradar package.
- Removed engine parameter from
predict_models_all()
.
- Removed misspelled function
predcit_models_all()
.
- The
train_models()
function no longer returns NULL objects if engine is not selected.
- The
train_models_bayesopt()
function no longer returns NULL objects if engine is not selected.
- For the
format_models_details()
function fixed a bug with the method not working for a classification task for an xgboost model.
- In the
DESCRIPTION
catboost, ggradar and tinytex dependencies moved from Suggests to Imports and added crayon to Import.
- Added installation guides for catboost, ggradar and tinytex in the
README.md
, which solves installation issues on macOS.
- Cleared typos and inconsistent notation in the
train()
function.
- Added info that the Bayesian Optimization takes a lot of time.
- Fixed ‘The best model details’ section in the report to be seen.
- Added info about deleted columns during
preprocessing()
.
- Added more details of the
train()
function output.
- Added parameter ‘seed’ to keep the data division comparable.
- Cleared structure of the
predict_new()
function.
- Kept numeric values of modelrawdatalightgbm_data as numeric, and not string values.
- Used different parameters for the xgboost model to remove the warnings.
- Changed [1,2] target values to [0,1] for the regression task.
- Added info about the type of correlation used between columns.
- In the
save()
the file name has a right month.
- The
train()
function returns table with metrics on validation subset.
- The
score()
function returns tables with additional columns: engine
and tuning
.
- Added a new vignette ‘Knowledge Check’ with extended use cases.
- Setting
bayes_iter = 0
causes that Bayestian_opt()
is not ran anymore.
- Changed the parameter name from
random_iter
to random_evals
.
- Setting
verbose = FALSE
disables the check_data
entirely.
- Changes in a
train()
function by adding parameters to:
- Choose metrics.
- Choose metric to sort model.
- Add self-created metric function. # forester 1.0.0
- General code quality and readability improvements as well as improving the quality of the documentation.
-
check_data()
added detection of id columns and reformatted the outputs.
-
create_ranked_list()
able to work with missing values.
- Improved data sets documentation.
-
explain()
function redesigned to work on single and multiple models.
-
plot_metrics()
improved all plots and changed them into ggplot
visualizations, added a feature importance
plot.
- Renamed
predcit_models_all()
into predict_models_all()
and enabled prediction on non-fixed, larger amount of models.
- Added
predict_new()
function for new observations.
- Expanded
preprocessing()
function by optional, advanced preprocessing consisting of deleting correlated values, deleting id columns, selecting only the most important features via the BORUTA algorithm.
- Redesigned a file generated by the ’report()` function, by enhancing the quality of the plots.
- Added
save()
function which saves the output of the train()
function.
- Redesigned a
score_models()
function, so that the user can add their own metric function for scoring models.
- Redesigned a
train()
function by:
- Adding a verbose parameter for silent runs.
- Adding parameters for limiting the iterations of random search and Bayesian optimization.
- Turning on the advanced preprocessing.
- Increased the output information by adding engine, predictions of all models, predictions of the best models and raw_training data set (unprocessed by
prepare_data()
).
- Added a
verbose_cat()
function for optional messages.