In our climate example we used one fifth of all measurements in the year 1990. Figure 1.9 shows all measurements in 1990 and periodic spline fits using all measurements with GCV, GML and UBR choices of the smoothing parameter. Obviously the GCV and UBR criteria under-estimate the smoothing parameter which leads to wiggly fits. What is causing the GCV and UBR methods to breakdown?
![]() |
![]() |
In model (1.2) we have assumed that random
errors are iid with mean zero and variance . The
middle panel of Fig. 1.1 indicates that
variation of the maximum temperature is larger during the
winter. Also, temperatures close in time may be
correlated. Thus the assumption of homoscedasticity and
independence may not hold. What kind of impact, if any, do
these potential violations have on the model selection
procedures?
For illustration, we again consider two simulations with
heteroscedastic and auto-correlated random errors respectively. We
use the same function and design points as the simulation in
Sect. 1.2 with the true function shown in the left
panel of Fig. 1.4. For heteroscedasticity, we
generate random errors
,
, where the
variance increases with
. For correlation, we generate the
's as a first-order autoregressive process with mean zero,
standard deviation 0.5 and first-order correlation
0.5. The first and the second rows in
Fig. 1.10 show the fits by the trigonometric
model with cross-validation,
and
choices of orders under
heteroscedastic and auto-correlated random errors respectively but
without adjustment for the heteroscedasticity or correlation. The
third and the fourth rows in Fig. 1.10 show the
fits by the periodic spline with GCV, GML and UBR choices of smoothing
parameters under heteroscedastic and auto-correlated random errors
respectively but without adjustment for the heteroscedasticity or
correlation. These kind of fits are typical under two simulation
settings. The heteroscedasticity has some effects on the model
selection, but far less severe than the impact of auto-correlation. It
is well-known that positive auto-correlation leads to under-smoothing
for non-parametric models with data-driven choices of the smoothing
parameter ([53,40]). Figure 1.10
shows that the same problem exists for parametric regression models as
well.
The breakdown of the GCV and UBR criteria for the climate data is likely caused by the auto-correlation which is higher when daily measurements are used as observations. Extensions of the GCV, GML and UBR criteria for correlated data can be found in [53].