Get a data frame with various measures of importance of variables in a random forest
measure_importance(forest, mean_sample = "top_trees", measures = NULL)
A random forest produced by the function randomForest with option localImp = TRUE
The sample of trees on which mean minimal depth is calculated, possible values are "all_trees", "top_trees", "relevant_trees"
A vector of names of importance measures to be calculated - if equal to NULL then all are calculated;
if "p_value" is to be calculated then "no_of_nodes" will be too. Suitable measures for classification
forests are:
mean_min_depth
, accuracy_decrease
, gini_decrease
, no_of_nodes
,
times_a_root
. For regression
forests choose from: mean_min_depth
,
mse_increase
, node_purity_increase
, no_of_nodes
, times_a_root
.
A data frame with rows corresponding to variables and columns to various measures of importance of variables
forest <- randomForest::randomForest(Species ~ ., data = iris, localImp = TRUE, ntree = 300)
measure_importance(forest)
#> variable mean_min_depth no_of_nodes accuracy_decrease gini_decrease
#> 1 Petal.Length 0.8993289 796 0.332686952 46.623867
#> 2 Petal.Width 1.1048546 721 0.272628994 39.578597
#> 3 Sepal.Length 2.2073714 499 0.039987377 10.525047
#> 4 Sepal.Width 3.2914989 348 0.009315119 2.478699
#> no_of_trees times_a_root p_value
#> 1 298 132 2.687474e-21
#> 2 296 107 8.694212e-10
#> 3 251 61 9.999961e-01
#> 4 218 0 1.000000e+00