Department of Civil and Environmental Engineering
1 Model Averaging - Introduction and Main Theory
Predictive uncertainty analyses is typically carried out using a single conceptual mathematical
model of the system of interest, rejecting a priori valid alternative plausible models and possibly
underestimating uncertainty in the model itself (Hoeting et al., 1999; Neuman, 2003; Raftery and
Zheng, 2003; Raftery et al., 2005; Vrugt et al., 2006). Model averaging is a statistical methodology
that is frequently utilized in the statistical and meteorological literature to account explicitly for
conceptual model uncertainty. The motivating idea behind model averaging is that, with various
competing models at hand, each having its own strengths and weaknesses, it should be possible to
combine the individual model forecasts into a single new forecast that, up to one’s favorite standard,
is at least as good as any of the individual forecasts. As usual in statistical model building, the aim
is to use the available information e ciently, and to construct a predictive model with the right
balance between model exibility and over tting. Viewed as such, model averaging is a natural
generalization of the more traditional aim of model selection. Indeed, the model averaging literature
has its roots in the model selection literature, which continues to be a very active area of research.
Figure 1 provides a simple overview of model averaging. Consider that at a given time we have
available the output of multiple di erent models (calibrated or not). Now the goal is to weight the
di erent models in such a way that the weighted estimate (model) is a better (point) predictor of
the observed system behavior. (data) than any of the individual models. Moreover, the density of
the averaged model is hopefully a good estimator of the total predictive uncertainty.
Figure 1: Schematic overview of the concept of model averaging using the outcome of K = 3 di erent
computer models. The simulated values of each model are weighted in such away that their average is a
better predictor of the observed data than any of the individual models. This weighted average constitutes
a point forecast (see middle panel). Some model averaging methods also estimate jointly a forecast density,
which allows for quanti cation of predictive uncertainty of the averaged model (right panel).
CEE 290 - Models Data Homework 7, Page 2
To formalize the various model averaging strategies considered herein, let us denote by eY =
fey1;:::;eyng a time series of measurements of a certain quantity of interest. Further assume that
there is an ensemble of K di erent models available with associated point forecasts, Dtk, wherein
t =f1;:::;ng and k =f1;:::;Kg.
A popular way to combine point forecasts is to consider the following linear model combining the
individual predictions
where =f 1;:::; Kgsigni es a horizontal vector with the weights of the K-models, the symbol
T denotes transpose, and f g is a n-sequence of zero-mean white noise with unknown variance.
A bias correction step of the individual forecasts is performed prior to the construction of the
weights. For instance, a linear transformation
Dcortk = ak +bkDtk; (2)
will often su ce. The coe cients ak and bk of each of the models are found by ordinary least
squares (OLS) using simple linear regression
eY = ak +bkDk +"; (3)
of each model’s output, Dk, against the training data, eY, where " is a n-vector of zero-mean
residuals. This bias correction may lead to a small improvement of the predictive performance of
the individual models, with ak close to zero and bk close to unity. If the calibration set is very
small, the OLS estimates becomes noisy, and bias correction may destabilize the ensemble (Vrugt
and Robinson, 2007). Although a (linear) bias correction is recommended for each of the constituent
models of the ensemble, this is not made explicit in the subsequent notation. I simply continue to
use the notation Dtk (rather than Dcortk ) for the bias corrected simulations of each model.
The point ( = averaged) forecasts associated with equation (1) are
where the superscript. ’e’ is used to indicate the expected (predicted) value of the averaged model.
CEE 290 - Models Data Homework 7, Page 3
2 Model Averaging - Di erent Methods
In this homework, we will a variety of di erent methods for model averaging. This includes equal
weights averaging (EWA) with equal weights for each model of the ensemble, Bates-Granger av-
eraging (BGA) with model weights proportionate to the variance of their forecast errors, informa-
tion criterion averaging (ICA) with weights that trade-o goodness-of- t and model complexity,
Granger-Ramanthan averaging (GRA) with weights determined from linear regression using ordi-
nary least squares (OLS), and Mallows model averaging (MMA) with weights from OLS but with
penalty for model complexity. We also consider, separately, Bayesian model averaging (BMA),
which estimates the weights and predictive densities of the models. This latter method not only
provides an averaged (point) forecast but also gives associated prediction intervals. Most of these
methods restrict the weights, , to the unit simplex, SK, or
In words, the weights cannot be negative, and must sum up to unity. Methods such as GRA and
MMA relax this assumption and allow for any values of the weights, . This may even include
negative values.