Summary method for xspliner object

# S3 method for xspliner
summary(object, predictor, ..., model = NULL,
  newdata = NULL, prediction_funs = list(function(object, newdata)
  predict(object, newdata)), env = parent.frame())

Arguments

object

xspliner object

predictor

predictor for xspliner model formula

...

Another arguments passed into model specific method.

model

Original black box model. Providing enables models comparison. See details.

newdata

Data used for models comparison. By default training data used for black box build.

prediction_funs

List of prediction functions for surrogate and black box model. For classification problem, different statistics are displayed based on predictions type. See details section for more info.

env

Environment in which newdata is stored (if not provided as parameter).

Details

The summary output depends strictly on data provided to it.

Standard output for providing only xspliner model (object parameter) return default glm::summary output.

Providing both xspliner model and predictor returns summary details for selecter variable. The following points decribe the rules:

  • When variable was quantitative and transformed with fitted spline, the output contain approximation details.

  • When variable was qualitative and transformed, factor matching is displayed.

  • When variable was not transformed, glm::summary output is displayed for the model.

If both object parameter and model (original black box) was provided, the summary displays comparison of original and surrogate model. The following points decribe the rules (\(y_{s}\) and \(y_{o}\) are predictions of surrogate and original model respectively on provided dataset). When comparing statistic is close to 1, this means surrogate model is similiar to black box one (according to this statistic).

For regression models:

  • 1 - Maximum predictions normed-difference $$1 - \frac{\max_{i = 1}^{n} |y_{s}^{(i)} - y_{o}^{(i)}|}{\max_{i = 1}^{n} y_{o}^{(i)} - \min_{i = 1}^{n} y_{o}^{(i)}}$$

  • R^2 (https://christophm.github.io/interpretable-ml-book/global.html#theory-4) $$1 - \frac{\sum_{i = 1}^{n} ({y_{s}^{(i)} - y_{o}^{(i)}}) ^ {2}}{\sum_{i = 1}^{n} ({y_{o}^{(i)} - \overline{y_{o}}}) ^ {2}}$$

  • Mean square errors for each model.

For classification models the result depends on prediction type. When predictions are classified levels:

  • Mean predictions similarity$$\frac{1}{n} \sum_{i = 1}^{n} I_{y_{s}^{(i)} = y_{o}^{(i)}}$$

  • Accuracies for each models.

When predictions are response probabilities:

  • R^2 as for regression model.

  • 1 - Maximum ROC difference$$1 - \max_{t \in T} ||ROC_{o}(t) - ROC_{s}(t)||_{2}$$ Calculates maximum of euclidean distances between ROC points for specified thresholds set T. In this imlplementation T is union of breakpoints for each ROC curve.

  • 1 - Mean ROC difference Above version using mean instead of max measure.

Examples

library(randomForest) set.seed(1) data <- iris # regression model iris.rf <- randomForest(Petal.Width ~ Sepal.Length + Petal.Length + Species, data = data) iris.xs <- xspline(iris.rf) # Summary of quantitative variable transition summary(iris.xs, "Sepal.Length")
#> #> Family: gaussian #> Link function: identity #> #> Formula: #> yhat ~ s(Sepal.Length) #> #> Parametric coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 1.205104 0.002266 531.8 <2e-16 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Approximate significance of smooth terms: #> edf Ref.df F p-value #> s(Sepal.Length) 5.784 6.939 313.4 <2e-16 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> R-sq.(adj) = 0.985 Deviance explained = 98.7% #> GCV = 0.00022296 Scale est. = 0.00017975 n = 35
# Summary of qualitative variable transition summary(iris.xs, "Species")
#> orig pred #> 1 setosa setosa #> 2 versicolor versicolor #> 3 virginica virginica
# Comparing surrogate with original model (regression) summary(iris.xs, model = iris.rf, newdata = data)
#> Models comparison #> 1 - Max prediction normed-diff: 0.8840345 #> R^2: 0.9960708 #> MSE Black Box: 0.02251746 #> MSE Surrogate: 0.02989727
# Classification model