Method for mergeFactors()
when first argument is a formula.
# S3 method for formula
mergeFactors(response, factor, ..., data = NULL,
weights = NULL, family = "gaussian", method = "fast-adaptive",
abbreviate = TRUE)
Arguments
response |
Formula containing columns names from the data argument. |
factor |
A factor vector when we use response argument, otherwise the name of column from data argument containing which levels should be merged. |
... |
Other arguments corresponding to type of first argument/ |
data |
A data frame to be used for modeling |
weights |
A weights vector , optional when we use response argument. For more information see: lm, glm, coxph |
family |
Model family to be used in merging. Available models are: "gaussian",
"survival", "binomial" .
By default mergeFactors uses "gaussian" model. |
method |
A string specifying method used during merging.
Four methods are available:
method = "adaptive" . The objective function that is maximized
throughout procedure is the logarithm of likelihood. The set of pairs enabled to merge
contains all possible pairs of groups available in a given step.
Pairwise LRT distances are recalculated every step.
This option is the slowest one since it requires the largest number
of comparisons. It requires O(k^3) model evaluations. (with k - the initial number of groups)
method = "fast-adaptive" .
For Gaussian family of response, at the very beginning, the groups are ordered according to increasing
averages and then the set of pairs compared contains only pairs of closest groups.
For other families the order corresponds to beta coefficients in
a regression model.
This option is much faster than method = "adaptive" and requires O(k^2) model evaluations.
method = "fixed" . This option is based on the DMR
algorithm introduced in Proch. It was extended to cover
survival models. The largest difference between this option and
the method = "adaptive" is, that in the first
step a pairwise distances are calculated between each groups
based on the LRT statistic. Then the agglomerative clustering algorithm
is used to merge consecutive pairs. It means that pairwise model differences
are not recalculated as LRT statistics in every step but the
complete linkage is used instead.
This option is very fast and requires O(k^2) comparisons.
method = "fast-fixed" . This option may be considered
as a modification of method = "fixed" .
Here, similarly as in the fast-adaptive version,
we assume that if groups A, B and C are sorted according to their
increasing beta coefficients, then the distance between groups A and B
and the distance between groups B and C are not greater than the
distance between groups A and C. This assumption enables to implement
the complete linkage clustering more efficiently in a dynamic manner.
The biggest difference is that in the first step we do not calculated
whole matrix of pairwise differences, but instead only the differences
between consecutive groups. Then in each step a only single distance is
calculated. This helps to reduce the number of model evaluations to O(n).
The default option is "fast-adaptive" . |
abbreviate |
Logical. If TRUE , the default, factor levels names
are abbreviated. |