Generate report after training
report(
train_output,
output_file = NULL,
output_format = "pdf_document",
output_dir = getwd(),
check_data = TRUE
)
The output of `train()` function.
The otput file name.
Format of the output file ('pdf_document', 'html_document' or both).
The path where the report will be saved, by default - the working directory.
If TRUE, prints results of `check_data()` function.
Report generated to the local file. It contains table with metrics for every model, scatter plot for chosen metric and result of `check_data()` function - about used data.
library(forester)
data('lisbon')
train_output <- train(lisbon, 'Price')
#> ✔ Type guessed as: regression
#>
#> -------------------- CHECK DATA REPORT --------------------
#>
#> The dataset has 246 observations and 17 columns, which names are:
#> Id; Condition; PropertyType; PropertySubType; Bedrooms; Bathrooms; AreaNet; AreaGross; Parking; Latitude; Longitude; Country; District; Municipality; Parish; Price.M2; Price;
#>
#> With the target value described by a column Price.
#>
#> ✖ Static columns are:
#> Country; District; Municipality;
#>
#> ✖ With dominating values:
#> Portugal; Lisboa; Lisboa;
#>
#> ✖ These column pairs are duplicate:
#> District - Municipality;
#>
#> ✔ No target values are missing.
#>
#> ✔ No predictor values are missing.
#>
#> ✔ No issues with dimensionality.
#>
#> ✖ Strongly correlated, by Spearman rank, pairs of numerical values are:
#>
#> Bedrooms - AreaNet: 0.77;
#> Bedrooms - AreaGross: 0.77;
#> Bathrooms - AreaNet: 0.78;
#> Bathrooms - AreaGross: 0.78;
#> AreaNet - AreaGross: 1;
#>
#> ✖ Strongly correlated, by Crammer's V rank, pairs of categorical values are:
#> PropertyType - PropertySubType: 1;
#>
#> ✖ These obserwation migth be outliers due to their numerical columns values:
#> 145 146 196 44 5 51 57 58 59 60 61 62 63 64 69 75 76 77 78 ;
#>
#> ✖ Target data is not evenly distributed with quantile bins: 0.25 0.35 0.14 0.26
#>
#> ✖ Columns names suggest that some of them are IDs, removing them can improve the model.
#> Suspicious columns are: Id .
#>
#> ✖ Columns data suggest that some of them are IDs, removing them can improve the model.
#> Suspicious columns are: Id .
#>
#> -------------------- CHECK DATA REPORT END --------------------
#>
#> ✔ Data preprocessed.
#> ✔ Data split and balanced.
#> ✔ Correct formats prepared.
#> ✔ Models successfully trained.
#> ✔ Predicted successfully.
report(train_output, 'regression.pdf')
#>
#>
#> processing file: report.Rmd
#>
|
| | 0%
|
|.... | 5%
#> inline R code fragments
#>
#>
|
|....... | 10%
#> label: setup (with options)
#> List of 1
#> $ include: logi FALSE
#>
#>
|
|.......... | 15%
#> inline R code fragments
#>
#>
|
|.............. | 20%
#> label: table
#>
|
|.................. | 25%
#> ordinary text without R code
#>
#>
|
|..................... | 30%
#> label: unnamed-chunk-1 (with options)
#> List of 1
#> $ echo: logi FALSE
#>
#>
|
|........................ | 35%
#> ordinary text without R code
#>
#>
|
|............................ | 40%
#> label: radar_plot (with options)
#> List of 1
#> $ out.width: chr "90%"
#>
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
#>
|
|................................ | 45%
#> ordinary text without R code
#>
#>
|
|................................... | 50%
#> label: boxplot (with options)
#> List of 1
#> $ out.width: chr "90%"
#>
#>
|
|...................................... | 55%
#> ordinary text without R code
#>
#>
|
|.......................................... | 60%
#> label: VS_plot (with options)
#> List of 1
#> $ out.width: chr "100%"
#>
#>
|
|.............................................. | 65%
#> inline R code fragments
#>
#>
|
|................................................. | 70%
#> label: plots_for_the_best_model (with options)
#> List of 1
#> $ out.width: chr "50%"
#>
#>
|
|.................................................... | 75%
#> inline R code fragments
#>
#>
|
|........................................................ | 80%
#> label: feature_importance
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#>
|
|............................................................ | 85%
#> ordinary text without R code
#>
#>
|
|............................................................... | 90%
#> label: check_data (with options)
#> List of 1
#> $ results: chr "asis"
#>
#>
|
|.................................................................. | 95%
#> ordinary text without R code
#>
#>
|
|......................................................................| 100%
#> label: unnamed-chunk-2
#>
#> output file: report.knit.md
#> "C:/Program Files/RStudio/bin/quarto/bin/tools/pandoc" +RTS -K512m -RTS report.knit.md --to latex --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc206c4d7e4c.tex --lua-filter "C:\Users\AnnA\AppData\Local\R\win-library\4.2\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\AnnA\AppData\Local\R\win-library\4.2\rmarkdown\rmarkdown\lua\latex-div.lua" --self-contained --highlight-style tango --pdf-engine pdflatex --variable graphics --variable "geometry:margin=1in"
#>
#> Output created: regression.pdf