'Which model ...?' is the wrong question.
N.T. Longford
Abstract
This paper extends on the editorial "Model selection and efficiency:
is `Which model ...?' the right question?" (N.T. Longford, 2005, JRSS A
168, 469-472).
The weaknesses of the standard way of addressing model uncertainty,
by selecting one of the candidate models and then applying it for all the
intended inferences, are discussed and an alternative, composition of
the estimators, is proposed. The weaknesses are generic to all model
selection methods, because the action taken after the selection, evaluating
an estimator, ignores the consequences of the fact that the selection may
have been erroneous. In a typical setting of comparing the fit of a model
with the fit of its submodel, choosing the submodel may bring about bias
(of a model-based estimator), but the variance is (usually) reduced. Choosing
the (more general) model keeps the variance greater, but may reduce the bias.
We should therefore consider the relative merits of bias and variance reduction
adhering to efficiency as our original criterion. This entails an admission
that a model may yield efficient estimators for some quantities, but not for
others.
The connection of model validity and efficiency is tenuous, unless we are in
asymptotia, where establishing model validity is trivial. In practice, we are
never there, no model is correct, unless we control the data generating
process, and any claim that a model selection process is 'good' is out of
place because we do not have a metric for the distance between the selected
and the (ideal) valid model. Moreover, the aim to find the ideal model is
misguided, because it is not compatible with efficient estimation.
All attempts to minimise the probabilities of erroneous selection are
misguided when the ultimate goal of the analysis is efficient estimation,
because efficiency, defined as small mean squared error, is only distantly
related to the probability (of model correctness). Further, an act of selection
is a two-edged inferential sword: we may find a 'better' model, but we
incur a penalty for searching. This penalty is often ignored and, as a problem,
it is understood selectively. Its concise diagnosis is that the distribution
of the estimator based on a selected model depends on the process of selection
and is a mixture of the distributions of the estimators based on all the
candidate models. The properties of these mixtures are difficult to explore
because the mixing and mixed distributions are correlated.
Supporting any theory of model selection by asymptotic results is not
helpful, because model selection is, in essence, a small-sample problem.
In small samples, some outright invalid models may yield far more efficient
estimators (of certain parameters or other quantities) than the valid model;
restricting ourselves to unbiased estimators is a big handicap. In any case,
the act of selection can destroy the property of no bias of an estimator,
even conditionally on having selected correctly.
I conclude that the search for the valid (or suitable) model leads to a blind
alley in finite time (and with finite samples), because quantities of interest
can be estimated more efficiently by selecting models specifically for the
quantity. Simple examples from practice (ANOVA, small-area estimation
and clinical trials) will be discussed. Further gains, sometimes substantial
can be made by linearly combining (composing) alternative estimators.
Empirical Bayes estimators can be interpreted as a successful application
of this idea. Note that this is different from Bayes factors, in which estimators
are also combined, but with weights that depend solely on the fits of the
candidate models, and not on the target.
Related articles:
N.T. Longford (2003). An alternative to model selection in ordinary regression.
Statistics and Computing 13, 67-80.
N.T. Longford (2008). An alternative analysis of variance.
SORT, Journal of the Catalan Institute of Statistics 32, 77-91.
January 2011.