Get a data frame with various measures of importance of variables in a random forest

measure_importance(forest, mean_sample = "top_trees", measures = NULL)

Arguments

forest

A random forest produced by the function randomForest with option localImp = TRUE

mean_sample

The sample of trees on which mean minimal depth is calculated, possible values are "all_trees", "top_trees", "relevant_trees"

measures

A vector of names of importance measures to be calculated - if equal to NULL then all are calculated; if "p_value" is to be calculated then "no_of_nodes" will be too. Suitable measures for classification forests are: mean_min_depth, accuracy_decrease, gini_decrease, no_of_nodes, times_a_root. For regression forests choose from: mean_min_depth, mse_increase, node_purity_increase, no_of_nodes, times_a_root.

Value

A data frame with rows corresponding to variables and columns to various measures of importance of variables

Examples

forest <- randomForest::randomForest(Species ~ ., data = iris, localImp = TRUE, ntree = 300)
measure_importance(forest)
#>       variable mean_min_depth no_of_nodes accuracy_decrease gini_decrease
#> 1 Petal.Length      0.8993289         796       0.332686952     46.623867
#> 2  Petal.Width      1.1048546         721       0.272628994     39.578597
#> 3 Sepal.Length      2.2073714         499       0.039987377     10.525047
#> 4  Sepal.Width      3.2914989         348       0.009315119      2.478699
#>   no_of_trees times_a_root      p_value
#> 1         298          132 2.687474e-21
#> 2         296          107 8.694212e-10
#> 3         251           61 9.999961e-01
#> 4         218            0 1.000000e+00