Compare machine learning models — champion

Determining if one model is better than the other one is a difficult task. Mostly because there is a lot of fields that have to be covered to make such a judgement. Overall performance, performance on the crucial subset, distribution of residuals, those are only few among many ideas related to that issue. Following function allow user to create a report based on various sections. Each says something different about relation between champion and challengers. DALEXtra package share 3 base sections which are funnel_measure overall_comparison and training_test_comparison but any object that has generic plot function can be included at report.

champion_challenger(
  sections,
  dot_size = 4,
  output_dir_path = getwd(),
  output_name = "Report",
  model_performance_table = FALSE,
  title = "ChampionChallenger",
  author = Sys.info()[["user"]],
  ...
)

Arguments

sections: - list of sections to be attached to report. Could be sections available with DALEXtra which are funnel_measure training_test_comparison, overall_comparison or any other explanation that can work with plot function. Please provide name for not standard sections, that will be presented as section titles. Otherwise class of the object will be used.
dot_size: - dot_size argument passed to plot.funnel_measure if funnel_measure section present
output_dir_path: - path to directory where Report should be created. By default it is current working directory.
output_name: - name of the Report. By default it is "Report"
model_performance_table: - If TRUE and overall_comparison section present, table of scores will be displayed.
title: - Title for report, by default it is "ChampionChallenger".
author: - Author of , report. By default it is current user name.
...: - other parameters passed to rmarkdown::render.

Value

rmarkdown report

Examples

# \donttest{
library("mlr")
#> Loading required package: ParamHelpers
#> Warning message: 'mlr' is in 'maintenance-only' mode since July 2019.
#> Future development will only happen in 'mlr3'
#> (<https://mlr3.mlr-org.com>). Due to the focus on 'mlr3' there might be
#> uncaught bugs meanwhile in {mlr} - please consider switching.
library("DALEXtra")
task <- mlr::makeRegrTask(
 id = "R",
  data = apartments,
   target = "m2.price"
 )
 learner_lm <- mlr::makeLearner(
   "regr.lm"
 )
 model_lm <- mlr::train(learner_lm, task)
 explainer_lm <- explain_mlr(model_lm, apartmentsTest, apartmentsTest$m2.price, label = "LM")
#> Preparation of a new explainer is initiated
#>   -> model label       :  LM 
#>   -> data              :  9000  rows  6  cols 
#>   -> target variable   :  9000  values 
#>   -> predict function  :  yhat.WrappedModel  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package mlr , ver. 2.19.0 , task regression (  default  ) 
#>   -> predicted values  :  numerical, min =  1792.597 , mean =  3506.836 , max =  6241.447  
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -257.2555 , mean =  4.687686 , max =  472.356  
#>   A new explainer has been created!  

 learner_rf <- mlr::makeLearner(
 "regr.ranger"
 )
 model_rf <- mlr::train(learner_rf, task)
 explainer_rf <- explain_mlr(model_rf, apartmentsTest, apartmentsTest$m2.price, label = "RF")
#> Preparation of a new explainer is initiated
#>   -> model label       :  RF 
#>   -> data              :  9000  rows  6  cols 
#>   -> target variable   :  9000  values 
#>   -> predict function  :  yhat.WrappedModel  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package mlr , ver. 2.19.0 , task regression (  default  ) 
#>   -> predicted values  :  numerical, min =  1795.43 , mean =  3504.914 , max =  6237.863  
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -573.2127 , mean =  6.609821 , max =  726.0512  
#>   A new explainer has been created!  

 learner_gbm <- mlr::makeLearner(
 "regr.gbm"
 )
 model_gbm <- mlr::train(learner_gbm, task)
 explainer_gbm <- explain_mlr(model_gbm, apartmentsTest, apartmentsTest$m2.price, label = "GBM")
#> Preparation of a new explainer is initiated
#>   -> model label       :  GBM 
#>   -> data              :  9000  rows  6  cols 
#>   -> target variable   :  9000  values 
#>   -> predict function  :  yhat.WrappedModel  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package mlr , ver. 2.19.0 , task regression (  default  ) 
#>   -> predicted values  :  numerical, min =  2118.951 , mean =  3502.078 , max =  6059.426  
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -518.8356 , mean =  9.445597 , max =  735.4289  
#>   A new explainer has been created!  


 plot_data <- funnel_measure(explainer_lm, list(explainer_rf, explainer_gbm),
                          nbins = 5, measure_function = DALEX::loss_root_mean_square)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |======================================================================| 100%

champion_challenger(list(plot_data), dot_size = 3, output_dir_path = tempdir())
# }