Summary method for xspliner object

# S3 method for xspliner
summary(object, predictor, ..., model = NULL,
  newdata = NULL, prediction_funs = list(function(object, newdata)
  predict(object, newdata)), env = parent.frame())

Arguments

object	xspliner object
predictor	predictor for xspliner model formula
...	Another arguments passed into model specific method.
model	Original black box model. Providing enables models comparison. See details.
newdata	Data used for models comparison. By default training data used for black box build.
prediction_funs	List of prediction functions for surrogate and black box model. For classification problem, different statistics are displayed based on predictions type. See details section for more info.
env	Environment in which newdata is stored (if not provided as parameter).

Details

The summary output depends strictly on data provided to it.

Standard output for providing only xspliner model (object parameter) return default glm::summary output.

Providing both xspliner model and predictor returns summary details for selecter variable. The following points decribe the rules:

When variable was quantitative and transformed with fitted spline, the output contain approximation details.
When variable was qualitative and transformed, factor matching is displayed.
When variable was not transformed, glm::summary output is displayed for the model.

If both object parameter and model (original black box) was provided, the summary displays comparison of original and surrogate model. The following points decribe the rules ($y_{s}$ and $y_{o}$ are predictions of surrogate and original model respectively on provided dataset). When comparing statistic is close to 1, this means surrogate model is similiar to black box one (according to this statistic).

For regression models:

1 - Maximum predictions normed-difference $$1 - \frac{\max_{i = 1}^{n} |y_{s}^{(i)} - y_{o}^{(i)}|}{\max_{i = 1}^{n} y_{o}^{(i)} - \min_{i = 1}^{n} y_{o}^{(i)}}$$
R^2 (https://christophm.github.io/interpretable-ml-book/global.html#theory-4) $$1 - \frac{\sum_{i = 1}^{n} ({y_{s}^{(i)} - y_{o}^{(i)}}) ^ {2}}{\sum_{i = 1}^{n} ({y_{o}^{(i)} - \overline{y_{o}}}) ^ {2}}$$
Mean square errors for each model.

For classification models the result depends on prediction type. When predictions are classified levels:

Mean predictions similarity$$\frac{1}{n} \sum_{i = 1}^{n} I_{y_{s}^{(i)} = y_{o}^{(i)}}$$
Accuracies for each models.

When predictions are response probabilities:

R^2 as for regression model.
1 - Maximum ROC difference$$1 - \max_{t \in T} ||ROC_{o}(t) - ROC_{s}(t)||_{2}$$ Calculates maximum of euclidean distances between ROC points for specified thresholds set T. In this imlplementation T is union of breakpoints for each ROC curve.
1 - Mean ROC difference Above version using mean instead of max measure.

Examples

library(randomForest)
set.seed(1)
data <- iris
# regression model
iris.rf <- randomForest(Petal.Width ~  Sepal.Length + Petal.Length + Species, data = data)
iris.xs <- xspline(iris.rf)
# Summary of quantitative variable transition
summary(iris.xs, "Sepal.Length")
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> yhat ~ s(Sepal.Length)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 1.205104   0.002266   531.8   <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Approximate significance of smooth terms:
#>                   edf Ref.df     F p-value    
#> s(Sepal.Length) 5.784  6.939 313.4  <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> R-sq.(adj) =  0.985   Deviance explained = 98.7%
#> GCV = 0.00022296  Scale est. = 0.00017975  n = 35
# Summary of qualitative variable transition
summary(iris.xs, "Species")
#>         orig       pred
#> 1     setosa     setosa
#> 2 versicolor versicolor
#> 3  virginica  virginica
# Comparing surrogate with original model (regression)
summary(iris.xs, model = iris.rf, newdata = data)
#> Models comparison 
#>   1 - Max prediction normed-diff:  0.8840345 
#>   R^2:  0.9960708 
#>   MSE Black Box:  0.02251746 
#>   MSE Surrogate:  0.02989727 

# Classification model

Summary method for xspliner object

Arguments

Details

Examples

Contents