Estimates the partial dependence function of feature(s) v over a
grid of values. Both multivariate and multivariable situations are supported.
The resulting object can be plotted via plot().
partial_dep(object, ...)
# Default S3 method
partial_dep(
object,
v,
X,
pred_fun = stats::predict,
BY = NULL,
by_size = 4L,
grid = NULL,
grid_size = 49L,
trim = c(0.01, 0.99),
strategy = c("uniform", "quantile"),
na.rm = TRUE,
n_max = 1000L,
w = NULL,
...
)
# S3 method for class 'ranger'
partial_dep(
object,
v,
X,
pred_fun = NULL,
BY = NULL,
by_size = 4L,
grid = NULL,
grid_size = 49L,
trim = c(0.01, 0.99),
strategy = c("uniform", "quantile"),
na.rm = TRUE,
n_max = 1000L,
w = NULL,
survival = c("chf", "prob"),
...
)
# S3 method for class 'explainer'
partial_dep(
object,
v,
X = object[["data"]],
pred_fun = object[["predict_function"]],
BY = NULL,
by_size = 4L,
grid = NULL,
grid_size = 49L,
trim = c(0.01, 0.99),
strategy = c("uniform", "quantile"),
na.rm = TRUE,
n_max = 1000L,
w = object[["weights"]],
...
)Fitted model object.
Additional arguments passed to pred_fun(object, X, ...),
for instance type = "response" in a glm() model, or reshape = TRUE in a
multiclass XGBoost model.
One or more column names over which you want to calculate the partial dependence.
A data.frame or matrix serving as background dataset.
Prediction function of the form function(object, X, ...),
providing \(K \ge 1\) predictions per row. Its first argument represents the
model object, its second argument a data structure like X. Additional arguments
(such as type = "response" in a GLM, or reshape = TRUE in a multiclass XGBoost
model) can be passed via .... The default, stats::predict(), will work in
most cases.
Optional grouping vector or column name. The partial dependence
function is calculated per BY group. Each BY group
uses the same evaluation grid to improve assessment of (non-)additivity.
Numeric BY variables with more than by_size disjoint values will be
binned into by_size quantile groups of similar size. To improve robustness,
subsampling of X is done within group. This only applies to BY groups with
more than n_max rows.
Numeric BY variables with more than by_size unique values will
be binned into quantile groups. Only relevant if BY is not NULL.
Evaluation grid. A vector (if length(v) == 1L), or a matrix/data.frame
otherwise. If NULL, calculated via multivariate_grid().
Controls the approximate grid size. If x has p columns, then each
(non-discrete) column will be reduced to about the p-th root of grid_size values.
The default c(0.01, 0.99) means that values outside the
1% and 99% quantiles of non-discrete numeric columns are removed before calculation
of grid values. Set to 0:1 for no trimming.
How to find grid values of non-discrete numeric columns?
Either "uniform" or "quantile", see description of univariate_grid().
Should missing values be dropped from the grid? Default is TRUE.
If X has more than n_max rows, a random sample of n_max rows is
selected from X. In this case, set a random seed for reproducibility.
Optional vector of case weights. Can also be a column name of X.
Should cumulative hazards ("chf", default) or survival
probabilities ("prob") per time be predicted? Only in ranger() survival models.
An object of class "partial_dep" containing these elements:
data: data.frame containing the partial dependencies.
v: Same as input v.
K: Number of columns of prediction matrix.
pred_names: Column names of prediction matrix.
by_name: Column name of grouping variable (or NULL).
partial_dep(default): Default method.
partial_dep(ranger): Method for "ranger" models.
partial_dep(explainer): Method for DALEX "explainer".
Let \(F: R^p \to R\) denote the prediction function that maps the \(p\)-dimensional feature vector \(\mathbf{x} = (x_1, \dots, x_p)\) to its prediction. Furthermore, let $$ F_s(\mathbf{x}_s) = E_{\mathbf{x}_{\setminus s}}(F(\mathbf{x}_s, \mathbf{x}_{\setminus s})) $$ be the partial dependence function of \(F\) on the feature subset \(\mathbf{x}_s\), where \(s \subseteq \{1, \dots, p\}\), as introduced in Friedman (2001). Here, the expectation runs over the joint marginal distribution of features \(\mathbf{x}_{\setminus s}\) not in \(\mathbf{x}_s\).
Given data, \(F_s(\mathbf{x}_s)\) can be estimated by the empirical partial dependence function
$$ \hat F_s(\mathbf{x}_s) = \frac{1}{n} \sum_{i = 1}^n F(\mathbf{x}_s, \mathbf{x}_{i\setminus s}), $$ where \(\mathbf{x}_{i\setminus s}\) \(i = 1, \dots, n\), are the observed values of \(\mathbf{x}_{\setminus s}\).
A partial dependence plot (PDP) plots the values of \(\hat F_s(\mathbf{x}_s)\) over a grid of evaluation points \(\mathbf{x}_s\).
Friedman, Jerome H. "Greedy Function Approximation: A Gradient Boosting Machine." Annals of Statistics 29, no. 5 (2001): 1189-1232.
# MODEL 1: Linear regression
fit <- lm(Sepal.Length ~ . + Species * Petal.Length, data = iris)
(pd <- partial_dep(fit, v = "Species", X = iris))
#> Partial dependence object (3 rows). Extract via $data. Top rows:
#>
#> Species y
#> 1 setosa 5.476540
#> 2 versicolor 5.748857
#> 3 virginica 5.207491
plot(pd)
if (FALSE) { # \dontrun{
# Stratified by BY variable (numerics are automatically binned)
pd <- partial_dep(fit, v = "Species", X = iris, BY = "Petal.Length")
plot(pd)
# Multivariable input
v <- c("Species", "Petal.Length")
pd <- partial_dep(fit, v = v, X = iris, grid_size = 100L)
plot(pd, rotate_x = TRUE)
plot(pd, d2_geom = "line") # often better to read
# With grouping
pd <- partial_dep(fit, v = v, X = iris, grid_size = 100L, BY = "Petal.Width")
plot(pd, rotate_x = TRUE)
plot(pd, rotate_x = TRUE, d2_geom = "line")
plot(pd, rotate_x = TRUE, d2_geom = "line", swap_dim = TRUE)
# MODEL 2: Multi-response linear regression
fit <- lm(as.matrix(iris[, 1:2]) ~ Petal.Length + Petal.Width * Species, data = iris)
pd <- partial_dep(fit, v = "Petal.Width", X = iris, BY = "Species")
plot(pd, show_points = FALSE)
pd <- partial_dep(fit, v = c("Species", "Petal.Width"), X = iris)
plot(pd, rotate_x = TRUE)
plot(pd, d2_geom = "line", rotate_x = TRUE)
plot(pd, d2_geom = "line", rotate_x = TRUE, swap_dim = TRUE)
# Multivariate, multivariable, and BY (no plot available)
pd <- partial_dep(
fit, v = c("Petal.Width", "Petal.Length"), X = iris, BY = "Species"
)
pd
} # }
# MODEL 3: Gamma GLM -> pass options to predict() via ...
fit <- glm(Sepal.Length ~ ., data = iris, family = Gamma(link = log))
plot(partial_dep(fit, v = "Petal.Length", X = iris), show_points = FALSE)
plot(partial_dep(fit, v = "Petal.Length", X = iris, type = "response"))