Next: 3.6 Conclusions Up: 3. Design and Analysis Previous: 3.4 Designs for Linear

3.5 Kriging

Let's return to the example in Fig. 3.1. If the analysts are interested in the input/output behavior within 'local area 1', then a first-order polynomial may be adequate. Maybe, a second-order polynomial is required to get a valid approximation in 'local area 2', which is larger and shows non-linear behavior of the input/output function. However, Kleijnen and Van Beers (2003a) present an example illustrating that the second-order polynomial gives very poor predictions - compared with Kriging.

Kriging has been often applied in deterministic simulation models. Such simulations are used for the development of airplanes, automobiles, computer chips, computer monitors, etc.; see Sacks et al. (1989)'s pioneering article, and - for an update - see Simpson et al. (2001). For Monte Carlo experiments, I do not know any applications yet. First, I explain the basics of Kriging; then DOE aspects.

3.5.1 Kriging Basics

Kriging is an interpolation method that predicts unknown values of a random process; see the classic textbook on Kriging in spatial statistics, Cressie (1993). More precisely, a Kriging prediction is a weighted linear combination of all output values already observed. These weights depend on the distances between the input for which the output is to be predicted and the inputs already simulated. Kriging assumes that the closer the inputs are, the more positively correlated the outputs are. This assumption is modeled through the correlogram or the related variogram, discussed below.

Note that in deterministic simulation, Kriging has an important advantage over regression analysis: Kriging is an exact interpolator; that is, predicted values at observed input values are exactly equal to the observed (simulated) output values. In random simulation, however, the observed output values are only estimates of the true values, so exact interpolation loses its intuitive appeal. Therefore regression uses OLS, which minimizes the residuals - squared and summed over all observations.

The simplest type of Kriging - to which I restrict myself in this chapter - assumes the following metamodel (also see (3.4) with $\mu =\beta _{0}$ and $\beta _{1} = \beta _{2} = \beta _{3} = 0)$ :

$\begin{subequations}\begin{align}y&=\mu +e \quad{\text{with}}\\ E(e)&=0,\quad {\text{var}}(e) = \sigma^{2}\,, \end{align}\end{subequations}$

where $\mu$ is the mean of the stochastic process

, and

is the additive noise, which is assumed to have zero mean and non-constant finite variance $\sigma^{2}(\boldsymbol{x})$ (furthermore, many authors assume normality). Kriging further assumes a stationary covariance process; i.e., $\mu$ and $\sigma$ in (3.13a) are constants, and the covariances of $y(\boldsymbol{x}+\boldsymbol{h})$ and $y(\boldsymbol{x})$ depend only on the distance (or 'lag') between their inputs, namely $\vert \boldsymbol{h} \vert = \vert(\boldsymbol{x}+\boldsymbol{h})-(\boldsymbol{x})\vert$ . (In deterministic simulation, the analysts assume that the deterministic input/output behavior can be adequately approximated by the random model given in (3.13b).)

The Kriging predictor for the unobserved input $\boldsymbol{x}_{0}$ - denoted by $\widehat{y}(\boldsymbol{x}_{0})$ - is a weighted linear combination of all the output data already observed - $y(\boldsymbol{x}_{i})$ :

$\begin{subequations}\begin{align}\widehat{y}(\boldsymbol{x}_{0} )&=\sum\limits_{... ...-3mm] \sum \limits_{i = 1}^{n} \lambda _{i} &= 1\,,\end{align}\end{subequations}$

where $\boldsymbol{\lambda }=(\lambda _{1} , \ldots , \lambda _{n} )^{T}$ and $\boldsymbol{y}=(y_{1} , \ldots , y_{n} )^{T}$ .

To quantify the weights $\boldsymbol{\lambda}$ in (3.14), Kriging derives the best linear unbiased estimator (BLUE), which minimizes the Mean Squared Error (MSE) of the predictor:

$\displaystyle \notag {\text{MSE}}\left( {\widehat{y}(\boldsymbol{x}_{0} )} \rig... ...ft( y(\boldsymbol{x}_{0} )-\widehat{y}(\boldsymbol{x}_{0} ) \right)^{2} \right)$

with respect to $\boldsymbol{\lambda}$ . Obviously, these weights depend on the covariances mentioned below (3.13). Cressie (1993) characterizes these covariances through the variogram, defined as $2\gamma (\boldsymbol{h})={\text{var}} (y(\boldsymbol{x}+\boldsymbol{h})-y(\boldsymbol{x}))$ . (I follow Cressie (1993), who uses variograms to express covariances, whereas Sacks et al. (1989) use correlation functions.) It can be proven that the optimal weights in (3.14) are

$\displaystyle \boldsymbol{\lambda }^{T}=\left( {\boldsymbol{\gamma }+\boldsymbo... ...symbol{1}}\boldsymbol{1}}} \right)^{^{T}}\boldsymbol{\Gamma }^{-\boldsymbol{1}}$

(3.15)

with the following symbols:

$\boldsymbol{\gamma }$ : vector of the (co)variances between the output at the new input $\boldsymbol{x}_{0}$ and the outputs at the old inputs, so $\boldsymbol{\gamma }=(\gamma (\boldsymbol{x}_{0} -\boldsymbol{x}_{1} ),\,\ldots ,\,\gamma (\boldsymbol{x}_{0} -\boldsymbol{x}_{n} ))^{T}$

$\boldsymbol{\Gamma}$ : $n \times n$ matrix of the covariances between the outputs at the old inputs - with element () equal to $\gamma (\boldsymbol{x}_{i} -\boldsymbol{x}_{j} )$

$\boldsymbol{1}$ : vector of ones.

I point out that the optimal weights defined by (3.15) vary with the input value for which output is to be predicted (see $\boldsymbol{\gamma }$ ), whereas linear regression uses the same estimated parameters $\boldsymbol{\widehat{\beta }}$ for all inputs to be predicted.

3.5.2 Designs for Kriging

The most popular design type for Kriging is Latin hypercube sampling (LHS). This design type was invented by McKay, Beckman, and Conover (1979) for deterministic simulation models. Those authors did not analyze the input/output data by Kriging (but they did assume input/output functions more complicated than the low-order polynomials in classic DOE). Nevertheless, LHS is much applied in Kriging nowadays, because LHS is a simple technique (it is part of spreadsheet add-ons such as @Risk).

LHS offers flexible design sizes (number of scenarios simulated) for any number of simulation inputs, . A simplistic example is shown for and in Table 3.5 and Fig. 3.4, which are constructed as follows.

The table illustrates that LHS divides each input range into intervals of equal length, numbered from to (in the example, we have ; see the numbers in the last two columns); i.e., the number of values per input can be much larger than in the designs discussed in Sect. 3.4.
Next, LHS places these integers $1,\ldots, n$ such that each integer appears exactly once in each row and each column of the design. (This explains the term 'Latin hypercube': it resembles Latin squares in classic DOE.)
Within each cell of the design in the table, the exact input value may be sampled uniformly; see Fig. 3.4. (Alternatively, these values may be placed systematically in the middle of each cell. In risk analysis, this uniform sampling may be replaced by sampling from some other distribution for the input values.)

**Figure 3.4:** A LHS design for two factors and four scenarios
$\includegraphics[clip]{text/3-3/fig4.eps}$

**Table 3.5:** A LHS design for two factors and four scenarios
Scenario	Interval factor 1	Interval factor 2

Because LHS implies randomness, the resulting design may happen to include outlier scenarios (to be simulated). Furthermore, it might happen - with small probability - that in Fig. 3.4 all scenarios lie on the main diagonal, so the values of the two inputs have a correlation coefficient of . Therefore LHS may be adjusted to give (nearly) orthogonal designs; see Ye (1998).

Let's compare classic designs and LHS geometrically. Figure 3.3 illustrates that many classic designs consists of corners of -dimensional cubes. These designs imply simulation of extreme scenarios. LHS, however, has better space filling properties.

This property has inspired many statisticians to develop other space filling designs. One type maximizes the minimum Euclidean distance between any two points in the -dimensional experimental area. Related designs minimize the maximum distance. See Koehler and Owen (1996), Santner et al. (2003), and also Kleijnen et al. (2003a).

Next: 3.6 Conclusions Up: 3. Design and Analysis Previous: 3.4 Designs for Linear