Creating custom explainers • survex

The survex package strives to include the functionality for the automatic creation of explainers for as many survival models as possible. However, it is impossible to include them all because of the great variety of available packages and models. We provide the functionality to create an explainer for any survival model manually.

Automatic explainer creation

In the best case, creating an explainer for your desired model is already implemented. This means that everything can be extracted from the model object (sometimes, you need to set additional parameters). For example, this is the case for the proportional hazards model from the survival package:

library(survex)
library(survival)

cph <- coxph(Surv(time, status) ~ ., data = veteran, model = TRUE, x = TRUE)

auto_cph_explainer <- explain(cph)
#> Preparation of a new explainer is initiated 
#>   -> model label       :  coxph (  default  ) 
#>   -> data              :  137  rows  6  cols (  extracted from the model  ) 
#>   -> target variable   :  137  values ( 128 events and 9 censored , censoring rate = 0.066 ) (  extracted from the model  ) 
#>   -> times             :  50 unique time points , min = 1.5 , median survival time = 80 , max = 999 
#>   -> times             :  (  generated from y as uniformly distributed survival quantiles based on Kaplan-Meier estimator  ) 
#>   -> predict function  :  predict.coxph with type = 'risk' will be used (  default  ) 
#>   -> predict survival function  :  predictSurvProb.coxph will be used (  default  ) 
#>   -> predict cumulative hazard function  :  -log(predict_survival_function) will be used (  default  ) 
#>   -> model_info        :  package survival , ver. 3.7.0 , task survival (  default  ) 
#>   A new explainer has been created!

It can be seen that the only required parameter of the explain() function is the proportional hazards model itself. However, we needed to set model = TRUE, x = TRUE while creating it. If you forget to set these arguments, an error will be shown.

Manual explainer creation

The next base case is when all types of predictions can be made using your desired model. It is then possible to manually create an explainer using the explain_survival() function. Let’s look at an example - how to set all parameters of a coxph explainer by hand:

cph <- coxph(Surv(time, status) ~ ., data=veteran)

# data must not include the target columns
veteran_data <- veteran[, -c(3,4)]
veteran_y <- Surv(veteran$time, veteran$status)

# set prediction functions of the required format
risk_pred <- function(model, newdata) predict(model, newdata, type = "risk")
surv_pred <- function(model, newdata, times) pec::predictSurvProb(model, newdata, times)
chf_pred <- function(model, newdata, times) -log(surv_pred(model, newdata, times))

manual_cph_explainer <- explain_survival(model = cph,
                                         data = veteran_data,
                                         y = veteran_y,
                                         predict_function = risk_pred,
                                         predict_survival_function = surv_pred,
                                         predict_cumulative_hazard_function = chf_pred,
                                         label="manual coxph")
#> Preparation of a new explainer is initiated 
#>   -> model label       :  manual coxph 
#>   -> data              :  137  rows  6  cols 
#>   -> target variable   :  137  values ( 128 events and 9 censored ) 
#>   -> times             :  50 unique time points , min = 1.5 , median survival time = 80 , max = 999 
#>   -> times             :  (  generated from y as uniformly distributed survival quantiles based on Kaplan-Meier estimator  ) 
#>   -> predict function  :  risk_pred 
#>   -> predict survival function  :  surv_pred 
#>   -> predict cumulative hazard function  :  chf_pred 
#>   -> model_info        :  package survival , ver. 3.7.0 , task survival (  default  ) 
#>   A new explainer has been created!

Helpful utility functions

Sometimes, the survival model provides only one type of prediction, for example, the survival function. This package provides helpful utilities for converting between types of prediction and standardizing them for explainer creation.

For example, the random survival forest from the randomForestSRC package only provides the survival function predictions at the unique times from the training data set (predict.rsfrc()$survival). Let’s try to create an explainer for it using the utility function for converting this type of prediction to a step function that can be used for prediction at any time points.

surv_pred_rsf <- transform_to_stepfunction(predict,
                                           type="survival",
                                           prediction_element = "survival",
                                           times_element = "time.interest")

Because the predict.rsfrc() returns a list with the times at which the survival function was evaluated, we provide the name of the list element which contains the prediction and the evaluation times. If in your case, the prediction function returns only a matrix, you can pass the times at which the predictions were evaluated in the eval_times argument.

We could use the same utility to get the cumulative hazard predictions (commented code below). Still, to demonstrate another function, we make use of the mathematical relationship between the survival function ($\hat{S}$) and the cumulative hazard function ($\hat{H}$), i.e., $\hat{H} = \exp(-\hat{S})$.

# would also work 
# chf_pred_rsf <- transform_to_stepfunction(predict,
#                                           type="chf",
#                                           prediction_element = "chf",
#                                           times_element = "time.interest")

chf_pred_rsf <- function(model, newdata, times) {
  survival_to_cumulative_hazard(surv_pred_rsf(model, newdata, times))
}

The reverse utility (cumulative_hazard_to_survival()) also exists.

If no risk prediction is provided for your model, you can use a utility to sum the cumulative hazard function for each observation to achieve risk scores. This approach is recommended by Ishwaran et al. (1) and Sonabend et al. (2).

times <- unique(veteran$times)
risk_pred_rsf <- risk_from_chf(chf_pred_rsf, times)

Now that we have all predictions prepared, we can finally create the explainer:

library(randomForestSRC)
#> 
#>  randomForestSRC 3.2.3 
#>  
#>  Type rfsrc.news() to see new features, changes, and bug fixes. 
#>

rsf <- rfsrc(Surv(time, status) ~ ., data = veteran)

manual_rsf_explainer <- explain_survival(model = rsf,
                                         data = veteran_data,
                                         y = veteran_y,
                                         predict_function = risk_pred_rsf,
                                         predict_survival_function = surv_pred_rsf,
                                         predict_cumulative_hazard_function = chf_pred_rsf,
                                         label = "manual rsf")
#> Preparation of a new explainer is initiated 
#>   -> model label       :  manual rsf 
#>   -> data              :  137  rows  6  cols 
#>   -> target variable   :  137  values ( 128 events and 9 censored ) 
#>   -> times             :  50 unique time points , min = 1.5 , median survival time = 80 , max = 999 
#>   -> times             :  (  generated from y as uniformly distributed survival quantiles based on Kaplan-Meier estimator  ) 
#>   -> predict function  :  risk_pred_rsf 
#>   -> predict survival function  :  surv_pred_rsf 
#>   -> predict cumulative hazard function  :  chf_pred_rsf 
#>   -> model_info        :  package randomForestSRC , ver. 3.2.3 , task survival (  default  ) 
#>   A new explainer has been created!

References

[1] Ishwaran, Hemant, et al. “Random survival forests.” The Annals of Applied Statistics 2.3 (2008): 841-860.
[2] Sonabend, Raphael, et al. “Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures”, Bioinformatics 38.17 (2022): 4178–4184.