The reason one cannot use the for model selection is that it generally under-estimates the generalization error of a model ([16,25]). For example, (1.21) shows that the RSS under-estimates the by . Thus, similar to the correction term in the , the second term in the criterion corrects this bias. The bias in RSS is a result of using the same data for model fitting and model evaluation. Ideally, these two tasks should be separated using independent samples. This can be achieved by splitting the whole data into two subsamples, a training (calibration) sample for model fitting and a test (validation) sample for model evaluation. This approach, however, is not efficient unless the sample size is large. The idea behind the cross-validation is to recycle data by switching the roles of training and test samples.
Suppose that one has decided on a measure of discrepancy for model evaluation, for example the prediction error. A -fold cross-validation selects a model as follows.
The cross-validation is a general procedure that can be applied to estimate tuning parameters in a wide variety of problems. To be specific, we now consider the regression model (1.2). For notational simplicity, we consider the delete-1 (leave-one-out) cross-validation with . Suppose our objective is prediction. Let be the vector with the th observation, , removed from the original response vector . Let be the estimate based on observations . The ordinary cross-validation (OCV) estimate of the prediction error is
See [50] and [24] for proofs. Note that even though it is called the leaving-out-one lemma, similar results hold for the leaving-out-of-cluster cross-validation ([55]). See also [56], [59] and [33] for the leaving-out-one lemma for more complicated problems.
For trigonometric regression and periodic spline models, for any . Thus when is replaced by , we have . Denote the elements of as . Then
Replacing in (1.25) by the average of all diagonal elements, [13] proposed the following generalized cross-validation (GCV) criterion
For the trigonometric regression,
Instead of deleting one observation at a time, one may delete observations at a time as described in the V-fold cross-validation. We will call such a method as delete-d CV. [47] classified various model selection criteria into the following three classes:
Class 1: | , , delete-1 CV and GCV. |
Class 2: | Criterion (1.20) with as , and delete-d CV with . |
Class 3: | Criterion (1.20) with a fixed , and delete-d CV with . |
is a special case of the Class 2. [47] showed that the criteria in Class 1 are asymptotically valid if there is no fixed-dimensional correct model and the criteria in Class 2 are asymptotically valid when the opposite is true. Methods in Class 3 are compromises of those in Classes 1 and 2. Roughly speaking, criteria in the first class would perform better if the true model is ''complex'' and the criteria in the second class would do better if the true model is ''simple''. See also [60] and [46].
The climate data subset was selected by first dividing days in the year 1990 into five-day periods, and then selecting measurements on the third day in each period as observations. This is our training sample. Measurements excluding these selected days may be used as the test sample. This test sample consists observations. For the trigonometric model with fixed frequency , we calculate the prediction error using the test sample
|
As a general methodology, the cross-validation may also be used to select in (1.20) ([43]). Let be the estimate based on the delete-one data where is selected using (1.20), also based on . Then the OCV estimate of prediction error is