mergeFactors.formula

Method for mergeFactors() when first argument is a formula.

# S3 method for formula
mergeFactors(response, factor, ..., data = NULL,
  weights = NULL, family = "gaussian", method = "fast-adaptive",
  abbreviate = TRUE)

Arguments

response	Formula containing columns names from the `data` argument.
factor	A factor `vector` when we use `response` argument, otherwise the name of column from `data` argument containing which levels should be merged.
...	Other arguments corresponding to type of first argument/
data	A data frame to be used for modeling
weights	A weights `vector`, optional when we use `response` argument. For more information see: lm, glm, coxph
family	Model family to be used in merging. Available models are: `"gaussian",` `"survival", "binomial"`. By default `mergeFactors` uses `"gaussian"` model.
method	A string specifying method used during merging. Four methods are available: `method = "adaptive"`. The objective function that is maximized throughout procedure is the logarithm of likelihood. The set of pairs enabled to merge contains all possible pairs of groups available in a given step. Pairwise LRT distances are recalculated every step. This option is the slowest one since it requires the largest number of comparisons. It requires O(k^3) model evaluations. (with k - the initial number of groups) `method = "fast-adaptive"`. For Gaussian family of response, at the very beginning, the groups are ordered according to increasing averages and then the set of pairs compared contains only pairs of closest groups. For other families the order corresponds to beta coefficients in a regression model. This option is much faster than `method = "adaptive"` and requires O(k^2) model evaluations. `method = "fixed"`. This option is based on the DMR algorithm introduced in Proch. It was extended to cover survival models. The largest difference between this option and the `method = "adaptive"` is, that in the first step a pairwise distances are calculated between each groups based on the LRT statistic. Then the agglomerative clustering algorithm is used to merge consecutive pairs. It means that pairwise model differences are not recalculated as LRT statistics in every step but the `complete linkage` is used instead. This option is very fast and requires O(k^2) comparisons. `method = "fast-fixed"`. This option may be considered as a modification of `method = "fixed"`. Here, similarly as in the `fast-adaptive` version, we assume that if groups A, B and C are sorted according to their increasing beta coefficients, then the distance between groups A and B and the distance between groups B and C are not greater than the distance between groups A and C. This assumption enables to implement the `complete linkage` clustering more efficiently in a dynamic manner. The biggest difference is that in the first step we do not calculated whole matrix of pairwise differences, but instead only the differences between consecutive groups. Then in each step a only single distance is calculated. This helps to reduce the number of model evaluations to O(n). The default option is `"fast-adaptive"`.
abbreviate	Logical. If `TRUE`, the default, factor levels names are abbreviated.

Arguments

Contents