next up previous contents index
Next: 6.4 Linear Reduction of Up: 6. Dimension Reduction Methods Previous: 6.2 Linear Reduction of

Subsections


6.3 Nonlinear Reduction of High-dimensional Data

In the previous section, we discussed linear methods i.e. methods for dimension reduction through the use of linear projections. We will now move on to nonlinear methods for dimension reduction. First, we will describe a generalized principal component analysis (GPCA) method that is a nonlinear extension of PCA. Algebraic curve fitting methods will then be mentioned for a further extension of GPCA. Finally, we will introduce principal curves i.e. the parametric curves that pass through the middle of the data.

6.3.1 Generalized Principal Component Analysis

As long as data have a near-linear structure, the singularities of the data can be pointed out using PCA. On the contrary, if data have a nonlinear structure, GPCA will not be adequate for drawing conclusions regarding the nature of the data. To overcome this difficulty, GPCA has been proposed by [3], whereby fitting functions to the data points can be discovered.

Suppose that we have observations of $ p$ variables $ {\boldsymbol{x}}=(x_1,x_2,\ldots,x_p)$ on each of $ n$ individuals. Let $ f_i({\boldsymbol{x}})
(i=1,2,\ldots,k)$ be $ k$ real-valued functions of the original variables.

The aim of GPCA is to discover a new set of variables (or functions of  $ {\boldsymbol{x}}$), as denoted by $ z_1,z_2,\ldots,z_k$, which are mutually uncorrelated and whose variances decrease, from first to last. Each $ z_j (\kern.5pt j=1,2,\ldots,k)$ is considered to be a linear combination of $ f_i({\boldsymbol{x}})
(i=1,2,\ldots,k)$, so that

$\displaystyle z_j=\sum_{i=1}^k l_{ij}f_i({\boldsymbol{x}})={\boldsymbol{l}}_j^{\top}{\boldsymbol{f}}({\boldsymbol{x}}){},$    

where $ {\boldsymbol{l}}_j=(l_{1j},l_{2j},\ldots,l_{kj})^{\top}$ are $ k$ constant vectors such that $ {\boldsymbol{l}}_j^{\top}{\boldsymbol{l}}_j=1$, and $ {\boldsymbol{f}}({\boldsymbol{x}})=(\kern.5pt f_1({\boldsymbol{x}}),f_2({\boldsymbol{x}}),\ldots,f_k({\boldsymbol{x}}))^{\top}$. The vectors $ {\boldsymbol{l}}_1, {\boldsymbol{l}}_2, \ldots, {\boldsymbol{l}}_k$ are the eigenvectors of the covariance matrix of $ (\kern.5pt f_1({\boldsymbol{x}}),f_2({\boldsymbol{x}}),\ldots,f_k({\boldsymbol{x}}))$, as in PCA. The function $ z_k$ defined by the ''smallest'' eigenvalue is considered to be one of the fitting functions to the data.

PCA is a special case of GPCA: real-valued functions $ f_i({\boldsymbol{x}})$ are reduced to $ x_i (i=1,2,\ldots,p)$.

Quadratic principal component analysis (QPCA) is specified by the following functions:

$\displaystyle \left\{
 \begin{array}{ll}
 f_i({\boldsymbol{x}})= x_i &\quad (i=...
...\quad \left(i=p+1,\ldots,\left(p^2+3p\right)/2\right){},
 \end{array}
 \right .$    

where $ j, m$ is uniquely determined by

$\displaystyle i = \left\{\left(2p-j+3\right)j/2\right\}+m-1{},$    
$\displaystyle 1 \leq j \leq m \leq p{},$    

for $ i (i=p+1,\ldots,(p^2+3p)/2)$.

QPCA for two dimensional data is defined by

$\displaystyle f_1(x,y)=x$    
$\displaystyle f_2(x,y)=y$    
$\displaystyle f_3(x,y)=x^2$    
$\displaystyle f_4(x,y)=xy$    
$\displaystyle f_5(x,y)=y^2{}.$    

Most GPCA methods are not invariant under orthogonal transformations and/or the translations (parallel transformations) of a coordinate system, though PCA is invariant under them. For example, QPCA is not invariant under them. The expression ''the method is invariant'' in this subsection means that the results of the method are never changed in the original coordinate by coordinate transformation. In the following, the determination of the GPCA methods that are invariant under the orthogonal transformations of a coordinate system will be described in the case of two variables. Translations of a coordinate system are disregarded here because the data can be standardized to have a zero mean vector.

Hereafter, let us assume the following conditions:

A1
$ f_1({\boldsymbol{x}}), f_2({\boldsymbol{x}}), \ldots, f_k({\boldsymbol{x}})$ are linearly independent as functions of  $ {\boldsymbol{x}}$.

A2
For any orthogonal matrix $ T$, there is a matrix $ W$ such that $ {\boldsymbol{f}}(T{\boldsymbol{x}})\equiv W{\boldsymbol{f}}({\boldsymbol{x}})$.

A3
$ f_i({\boldsymbol{x}})$ are continuous functions.
Conditions A1 and A3 may be proper for GPCA, and condition A2 is necessary for discussing the influence of orthogonal coordinate transformations. PCA and QPCA clearly satisfy these conditions.

A GPCA method is referred to as ''invariant'' if its results in the original coordinate system are not changed by the orthogonal transformation of a coordinate system. It can be mathematically described as follows. For any orthogonal coordinate transformation: $ {\boldsymbol{x}}^{\ast}=T{\boldsymbol{x}}$,

$\displaystyle z_j^{\ast}$ $\displaystyle ={\boldsymbol{l}}_j^{*\top}{\boldsymbol{f}}({\boldsymbol{x}}^{\ast})$    
  $\displaystyle ={\boldsymbol{l}}_j^{*\top}{\boldsymbol{f}}(T{\boldsymbol{x}})\quad (\kern.5pt j=1,2,\ldots,k)$    

denote the results of the method for transformed variables $ {\boldsymbol{x}}^{\ast}$, where $ {\boldsymbol{l}}_j^{\ast}$ are eigenvectors of $ {\text{Cov}}({\boldsymbol{f}}({\boldsymbol{x}}^{\ast}))$. The method is ''invariant'' if it holds that

$\displaystyle {\boldsymbol{l}}_j^{\top}{\boldsymbol{f}}({\boldsymbol{x}})\equiv...
...}}_j^{*\top}{\boldsymbol{f}}(T{\boldsymbol{x}})\quad (\kern.5pt j=1,2,\ldots,k)$    

as vector-valued functions of $ {\boldsymbol{x}}$ for any orthogonal matrix $ T$. The plus or minus sign is indicated only for the orientations of the eigenvectors.

The GPCA method specified by $ {\boldsymbol{f}}({\boldsymbol{x}})$ is invariant under an orthogonal transformation, if and only if the matrix $ W$ is an orthogonal matrix for any orthogonal matrix $ T$. The proof will be described below. If the method is invariant, $ W$ can be taken as

$\displaystyle \left({\boldsymbol{l}}_1^{\ast},{\boldsymbol{l}}_2^{\ast},\ldots,...
...\boldsymbol{l}}_1,{\boldsymbol{l}}_2,\ldots,{\boldsymbol{l}}_k\right)^{\top}{},$    

which is an orthogonal matrix. Conversely, if $ W$ is an orthogonal matrix, $ W^{\top}{\boldsymbol{l}}_j^{\ast}$ are eigenvectors of $ {\text{Cov}}({\boldsymbol{f}}({\boldsymbol{x}}))$. Therefore the following is obtained:

$\displaystyle {\boldsymbol{l}}_j^{\top} =\pm {\boldsymbol{l}}_j^{*\top}W{}.$    

[12] derived a theorem on invariant GPCA.

Theorem 1   GPCA methods for two-dimensional data $ (x,y)$ under the conditions A1, A2 and A3 that are invariant under rotations can be restricted to those specified by the following functions.

(1)
$ s$ pairs of functions:

$\displaystyle \left\{ \begin{array}{lcl}
 f_{2i-1}(x,y) &=& g_i\left(\sqrt{x^2+...
...4\end{array} \right)y^4x^{N_i-4}
 -\ldots \right)\\ [4mm]
 \end{array}
 \right.$    
$\displaystyle (i=1,2,\ldots,s){},$    

where $ g_i$, $ h_i$ are arbitrary continuous functions of $ \sqrt{x^2+y^2}$ and $ N_i$ are arbitrary positive integers.

(2)
Continuous functions of $ \sqrt{x^2+y^2}$.

The above theorem can be extended for use with GPCA methods for $ p$-dimensional data because invariant GPCA for $ p$-dimensional data methods are invariant under the rotations of any pair of two variables and the reverse is also true.

We will show some set of functions for invariant GPCA here.

(1)
$ 3$ dimensional and degree $ 1$:

$\displaystyle x, y, z{}.$    

(2)
$ 3$ dimensional and degree $ 2$:

$\displaystyle x^2, y^2, z^2, \sqrt{2}xy, \sqrt{2}yz, \sqrt{2}zx{}.$    

(3)
$ 3$ dimensional and degree $ 3$:

$\displaystyle x^3, y^3, z^3, \sqrt{3}x^2y, \sqrt{3}y^2z, \sqrt{3}z^2x,
 \sqrt{3}xy^2, \sqrt{3}yz^2, \sqrt{3}zx^2, \sqrt{6}xyz{}.$    

(4)
$ 3$ dimensional and degree $ q$:

$\displaystyle \sqrt{\frac{q!}{i!j!k!}}x^iy^{\kern.5pt j}z^k$    
$\displaystyle (i+j+k=q{};\quad 0 \leq i,j,k){}.$    

(5)
$ p$ dimensional and degree $ q$:

$\displaystyle \sqrt{\frac{q!}{\prod_{i=1}^pk_t!}}\prod_{t=1}^p(x_t)^k_t$    
$\displaystyle \sum_{t=1}^p k_t=q {};\quad 0 \leq k_t{}.$    

6.3.2 Algebraic Curve and Surface Fitting

Next, we will discuss a method involving algebraic curve and surface fitting to multidimensional data.

The principal component line minimizes the sum of squared deviations in each of the variables. The PCA cannot find non-linear structures in the data. GPCA is used to discover an algebraic curve fitted to data; the function $ z_k$ defined by the ''smallest'' eigenvalue is considered to be one of the fitting functions to the data. However, it is difficult to interpret algebraic curves statistically derived form GPCA.

We will now describe methods for estimating the algebraic curve or surface that minimizes the sum of squares of perpendicular distances from multidimensional data.

[18] developed an algorithm for discovering the algebraic curve for which the sum of approximate squares distances between data points and the curve is minimized. The approximate squares distance does not always agree with the exact squares distance. Mizuta ([13], [14]) presented an algorithm for evaluating the exact distance between the data point and the curve, and have presented a method for algebraic curve fitting with exact distances. In this subsection, we describe the method of algebraic surface fitting with exact distances. The method of the algebraic curve fitting is nearly identical to that of surface fitting, and is therefore omited here.

6.3.2.1 Algebraic Curve and Surface

A $ p$-dimensional algebraic curve or surface is the set of zeros of $ k$-polynomials $ {\boldsymbol{f}}({\boldsymbol{x}})=(\kern.5pt f_1({\boldsymbol{x}}), \ldots, f_k({\boldsymbol{x}}))$ on $ {\boldsymbol{R}}^p$,

$\displaystyle Z({\boldsymbol{f}}) = \left\{{\boldsymbol{x}}:{\boldsymbol{f}}({\boldsymbol{x}})=0\right\} {}.$    

In the case of $ p = 2$ and $ k=1$, $ Z(\kern.5pt f)$ is a curve in the plane. For example, $ Z(x^2+2 y^2-1)$ is an ellipse and $ Z(y^2-x^2+1)$ is a hyperbola. In the case of $ p=3$ and $ k = 2$, $ Z({\boldsymbol{f}})$ is a curve in the space.

In the case of $ p=3$ and $ k=1$, $ Z(\kern.5pt f)$ is a surface:

$\displaystyle Z(\kern.5pt f) = \left\{(x,y,z):f(x,y,z)=0\right\}{}.$    

Hereafter, we will primarily discuss this case.

6.3.2.2 Approximate Distance

The distance from a point $ {\boldsymbol{a}}$ to the surface $ Z(\kern.5pt f)$ is usually defined by

$\displaystyle \mathrm{dist}\left({\boldsymbol{a}},Z(\kern.5pt f)\right)
 = \inf...
...l{a}}-{\boldsymbol{y}} \parallel :{\boldsymbol{y}} \in Z(\kern.5pt f)\right){}.$    

It was said that the distance between a point and the algebraic curve or surface cannot be computed using direct methods. Thus, Taubin proposed an approximate distance from $ {\boldsymbol{a}}$ to  $ Z(\kern.5pt f)$ ([18]). The point $ \hat{{\boldsymbol{y}}}$ that approximately minimizes the distance $ \parallel {\boldsymbol{y}}-{\boldsymbol{a}}\parallel$, is given by

$\displaystyle \hat{{\boldsymbol{y}}} = {\boldsymbol{a}}-\left(\nabla f({\boldsymbol{a}})^{\top}\right)^+f({\boldsymbol{a}}){},$    

where $ (\nabla f({\boldsymbol{a}})^{\top})^+$ is the pseudoinverse of $ \nabla
f({\boldsymbol{a}})^{\top}$. The distance from $ {\boldsymbol{a}}$ to $ Z(\kern.5pt f)$ is approximated to

$\displaystyle \mathrm{dist}\left({\boldsymbol{a}},Z(\kern.5pt f)\right)^2 \appr...
...ac{f({\boldsymbol{a}})^2}{\parallel \nabla f({\boldsymbol{a}}) \parallel ^2}{}.$    

Taubin also presented an algorithm to find the algebraic curve for which the sum of approximate squares distances between data points and the curve is minimized.

6.3.2.3 Exact Distance

In the following, we present a method for calculating the distance between a point $ {\boldsymbol{a}}=(\alpha,\beta,\gamma)$ and an algebraic surface  $ Z(\kern.5pt f)$.

If $ (x,y,z)$ is the nearest point to the point $ {\boldsymbol{a}}=(\alpha,\beta,\gamma)$ on $ Z(\kern.5pt f)$, $ (x,y,z)$ satisfies the following simultaneous equations:

$\displaystyle \left\{
 \begin{array}{lcl}
 \phi_1(x,y,z) & = & 0\\ 
 \phi_2(x,y,z) & = & 0\\ 
 f(x,y,z) & = & 0{},
 \end{array}
 \right.$ (6.1)

where $ \phi_1(x,y,z) = (x-\alpha) (\partial f/\partial y) - (y-\beta)
(\partial f/\partial x)$, and $ \phi_2(x,y,z) = (z-\gamma) (\partial
f/\partial y) - (y-\beta) (\partial f/\partial z)$.

Equations (6.1) can be solved using the Newton-Rapson method:

  1. Set $ x_0, y_0$ and $ z_0$ (see below).
  2. Solve the equations:

    $\displaystyle \left\{
 \begin{array}{lcl}
 {} h \frac{\partial \phi_1}{\partial...
...tial y}+l \frac{\partial f}{\partial z}
 &=& -f(x,y,z){}.
 \end{array}
 \right.$ (6.2)

  3. Replace $ x,y$:

    $\displaystyle \left\{
 \begin{array}{lcl}
 x_{i+1} &=& x_i + h\\ 
 y_{i+1} &=& y_i + k\\ 
 z_{i+1} &=& z_i + l{}.
 \end{array}
 \right.$    

  4. Stop if $ h^2+k^2+l^2$ is below a certain threshold. Otherwise, go to Step 2.

One of the important points to consider when applying the Newton-Rapson method is to compute an initial point. We have a good initial point: $ (\alpha,\beta,\gamma)$.

When $ x_0=\alpha, y_0=\beta, z_0=\gamma$, (6.2) are

$\displaystyle \left\{
 \begin{array}{ll}
 {} h \frac{\partial \phi_1}{\partial ...
...tial y}+l \frac{\partial f}{\partial z}
 &=~ -f(x,y,z){}.
 \end{array}
 \right.$    

It is very simple to show that the distance between $ (x_1,y_1,z_1)$ and $ (\alpha,\beta,\gamma)$ agrees with Taubin's approximate distance.

6.3.2.4 Algebraic Surface Fitting

We have already described the method for calculating the distance between a point and a surface.

The problem of finding a fitting surface that minimizes the sum of the distances from data points can therefore be solved by using an optimization method without derivatives. However, for computing efficiency, the partial derivatives of the sum of squares of distances from data with the coefficients of an algebraic curve are derived.

In general, a polynomial $ f$ in a set is denoted by

$\displaystyle f\left(b_1,\ldots,b_q; x,y,z\right){},$    

where $ b_1, \ldots, b_q$ are the parameters of the set.

Let $ {\boldsymbol{a}}_i=(\alpha_i,\beta_i,\gamma_i) (i=1,2,\ldots,n)$ be $ n$ data points within the space. The point in $ Z(\kern.5pt f)$ that minimizes the distance from $ (\alpha_i,\beta_i,\gamma_i)$ is denoted by $ (x_i,y_i,z_i) (i=1,2,\ldots,n)$.

The sum of squares of distances is

$\displaystyle R = \sum_{i=1}^{n} ({\boldsymbol{x}}_{i}-{\boldsymbol{a}}_{i})^{\top}({\boldsymbol{x}}_{i}-{\boldsymbol{a}}_{i}){}.$    

$ R$ can be minimized with respect to the parameters of polynomial $ f$ with the Levenberg-Marquardt Method. This method requires partial derivatives of $ R$ with respect to $ b_j$:

$\displaystyle \frac{\partial R}{\partial b_j} = \sum_{i=1}^{n}
 \frac{\partial R_i}{\partial b_j}{},$ (6.3)

where

$\displaystyle \frac{\partial R_i}{\partial b_j} = 
 2 {\left( (x_i-\alpha_i)\fr...
...}{\partial b_j}
 +(z_i-\gamma_i)\frac{\partial z_i}{\partial b_j}
 \right )}{}.$ (6.4)

The only matter left to discuss is a solution for $ \partial
x_i/\partial b_j, \partial y_i/\partial b_j$ and $ \partial
z_i/\partial b_j$. Hereafter, the subscript $ i$ is omitted. By the derivative of both sides of $ f(b_1,\ldots,b_q,x,y,z)=0$ with respect to $ b_j \ (\kern.5pt
j=1,\ldots,q)$, we obtain

$\displaystyle \frac{\partial f}{\partial x} \frac{\partial x}{\partial b_j} +
 ...
...al z}\frac{\partial z}{\partial b_j} + \frac{\mathrm{d} f}{\mathrm{d} b_j}=0{},$ (6.5)

where $ \mathrm{d} f/\mathrm{d} b_j$ is the differential of $ f$ with $ b_j$ when $ x$ and $ y$ are fixed.

Since $ {\boldsymbol{x}}_i$ is on the normal line from $ {\boldsymbol{a}}_i$,

$\displaystyle \left( \left. \frac{\partial f}{\partial x}
 \right\vert _{{\bold...
...oldsymbol{x}}_i}
 \right)^{\top} ({\boldsymbol{x}}_i-{\boldsymbol{a}}_i) = 0{}.$    

By the derivative of

$\displaystyle (y- \beta)(z- \gamma) \left. \frac{\partial f}{\partial x} \right\vert _{{\boldsymbol{x}}} = t$    
$\displaystyle (x- \alpha)(z- \gamma) \left. \frac{\partial f}{\partial y} \right\vert _{{\boldsymbol{x}}} = t$    
$\displaystyle (x- \alpha)(y- \beta) \left. \frac{\partial f}{\partial z} \right\vert _{{\boldsymbol{x}}} = t$    

with respect to $ b_j$, we obtain the linear combinations of $ \partial
x/\partial b_j, \partial y/\partial b_j$ and $ \partial z/\partial
b_j$:

$\displaystyle c_{1m} \frac{\partial x}{\partial b_j}+c_{2m} \frac{\partial y}{\...
...} \frac{\partial z}{\partial b_j} + c_{4m} = \frac{\partial t}{\partial b_j}{},$ (6.6)

where $ c_{1m},\ldots,c_{4m}$ are constants ( $ m=1,\ldots,3$).

Equations (6.5) and (6.6) are simultaneous linear equations in four variables $ \partial x/\partial
b_j, \partial y/\partial b_j, \partial z/\partial b_j$ and $ \partial
t/\partial b_j$. We then obtain $ \partial
x/\partial b_j, \partial y/\partial b_j$ and $ \partial z/\partial
b_j$ at $ (x_i,y_i,z_i)$. By (6.4), we have the partial differentiation of $ R_i$ with respect to $ b_j$.

Therefore, we can obtain the algebraic curve that minimizes the sum of squares of distances from data points with the Levenberg-Marquardt method.

6.3.2.5 Bounded and Stably Bounded Algebraic Curve and Surface

Although algebraic curves can fit the data very well, they usually contain points far remote from the given data set. In 1994, [9] and [20] independently developed algorithms for a bounded (closed) algebraic curve with approximate squares distance. We will now introduce the definition and properties of a bounded algebraic curve.

$ Z(\kern.5pt f)$ is referred to as bounded if there exists a constant $ r$ such that $ Z(\kern.5pt f) \subset
\{{\boldsymbol{x}}:\parallel {\boldsymbol{x}} \parallel < r\}$. For example, it is clear that $ Z(x^2+y^2-1)$ is bounded, but $ Z(x^2-y^2)$ is not bounded.

[9] defined $ Z(\kern.5pt f)$ to be stably bounded if a small perturbation of the coefficients of the polynomial leaves its zero set bounded. An algebraic curve $ Z((x-y)^4+x^2+y^2-1)$ is bounded but not stably bounded because $ Z((x-y)^4+x^2+y^2-1+\varepsilon x^3)$ is not bounded for any $ \varepsilon \neq 0$.

Let $ f_k(x,y)$ be the form of degree $ k$ of a polynomial $ f(x,y)$: $ f(x,y)=\sum_{k=0}^d f_k(x,y)$. The leading form of a polynomial $ f(x,y)$ of degree $ d$ is defined by $ f_d(x,y)$. For example, the leading form of $ f(x,y)=x^2+2xy-y^2+5x-y+3$ is $ f_2(x,y)=x^2+2xy-y^2$.

Lemma 1   For an even positive integer $ d$, any leading form $ f_d(x,y)$ can be represented by $ {\boldsymbol{x}}^{\top} A{\boldsymbol{x}}$. Where $ A$ is a symmetric matrix and $ {\boldsymbol{x}} =(x^{d/2},x^{d/2-1}y,\ldots,xy^{d/2-1},y^{d/2})^{\top}$.

Theorem 2   ([9]): The $ Z(\kern.5pt f)$ is stably bounded if and only if $ d$ is even and there exists a symmetric positive definite matrix $ A$ such that

$\displaystyle f_d(x,y)={\boldsymbol{x}}^{\top} A{\boldsymbol{x}}{},$    

where $ {\boldsymbol{x}} =(x^{d/2},x^{d/2-1}y,\ldots,xy^{d/2-1},y^{d/2})^{\top}$.

These definitions and theorem for algebraic curves are valid for algebraic surfaces. Hereafter, we will restrict our discussion to algebraic surfaces.

6.3.2.6 Parameterization

We parameterize the set of all polynomials of degree $ k$ and the set of polynomials that induce (stably) bounded algebraic surfaces. In general, a polynomial $ f$ of degree $ p$ with $ q$ parameters can be denoted by $ f(b_1,\ldots,b_q; x,y)$, where $ b_1, \ldots, b_q$ are the parameters of the polynomial.

For example, all of the polynomials of degree $ 2$ can be represented by

$\displaystyle f\left(b_1,b_2,\ldots,b_{10}; x,y,z\right)=B^{\top} X{},$    

where $ X=(1,x,y,z,x^2,y^2,z^2,x y,y z,z x)^{\top},
B=(b_1,b_2,\ldots,b_{10})^{\top}$.

For stably bounded algebraic curves of degree $ 4$,

  $\displaystyle f\left(b_1,\ldots,b_{41}; x,y,z\right)$    
  $\displaystyle = \left(x^2, y^2, z^2, x y, y z, z x\right) A^2\left(x^2,
 y^2, z...
...+\!
 \left(b_{22},\ldots,b_{41}\right)\left(1,x,y,z,\ldots,z^3\right)^{\top}{},$    

where

$\displaystyle A=\left(
 \begin{array}{cccccc}
 b_1 & b_2 & b_3 & b_4 & b_5 & b_...
...0} \\ 
 b_6 & b_{11} & b_{15} & b_{18} & b_{20} & b_{21}
 \end{array}
 \right).$    

6.3.2.7 Examples

Here we will show a numerical example of the algebraic surface and bounded algebraic surface fitting methods.

The data in this example is three-dimensional data of size $ 210$. The $ 210$ points nearly lie on a closed cylinder (Fig. 6.1). The result of GPCA is set for an initial surface and the method is used to search for a fitting algebraic surface of degree $ 4$ (Figs. 6.2, 6.3 and 6.4). The value of $ R$ is $ 0.924$.

Figure 6.1: Surface fitting for distributed cylinder data (Original Data Points)
\includegraphics[width=5cm]{text/3-6/can_noise_data.eps}

Figure 6.2: Surface fitting for distributed cylinder data (Unbounded Fitting Surface)
\includegraphics[width=6.2cm]{text/3-6/canN.eps}

Figure 6.3: Surface fitting for distributed cylinder data (Global View of 2)
\includegraphics[width=6.2cm]{text/3-6/canNGL.eps}

Figure 6.4: Surface fitting for distributed cylinder data (Cutting View of 2)
\includegraphics[width=6.2cm]{text/3-6/canNCUT.eps}

Figure 6.5 presents the result of a bounded algebraic surface fitting the same data. The value of $ R$ is $ 1.239$, and is greater than that of unbounded fitting. The bounded surface, however, directly reveals the outline of the data.

Figure 6.5: Surface fitting for distributed cylinder data (Bounded Fitting Surface)
\includegraphics[width=6.2cm]{text/3-6/canNo.eps}

In this subsection, we have discussed algebraic surface fitting to multidimensional data. Two sets of algebraic surfaces were described: an unbounded algebraic surface and a bounded algebraic surface. This method can be extended for use with any other family of algebraic surfaces.

[19] proposed the approximate distance of order $ k$ and presented algorithms for rasterizing algebraic curves. The proposed algorithm for exact distance can also be used for rasterizing algebraic curves and surfaces. [15] has successfully developed a program for rasterizing them with exact distances.

6.3.3 Principal Curves

Curve fitting to data is an important method for data analysis. When we obtain a fitting curve for data, the dimension of the data is nonlinearly reduced to one dimension. [5] proposed the concept of a principal curve and developed a concrete algorithm to find the principal curve, which is represented by a parametric curve. We can therefore obtain a new nonlinear coordinate for the data using the principal curve.

6.3.3.1 Definition of Principal Curve

First, we will define principal curves for a $ p$-dimensional distribution function $ h({\boldsymbol{x}})({\boldsymbol{x}}\in R^p)$, rather than a dataset.

The expectation of $ X$ with density function $ h$ in $ R^p$ is denoted by $ E_h(X)$. The parametric curve within the $ p$-dimensional space is represented by $ {\boldsymbol{f}}(\lambda)$, where $ \lambda $ is the parameter.

For each point $ {\boldsymbol{x}}$ in $ R^p$, the parameter $ \lambda $ of the nearest point on the curve $ {\boldsymbol{f}}(\lambda)$ is denoted by $ \lambda_{{\boldsymbol{f}}}({\boldsymbol{x}})$, which is referred to as the projection index. The projection index, which is different from projection index in projection pursuit, is defined as follows:

$\displaystyle \lambda_{{\boldsymbol{f}}}({\boldsymbol{x}})=\sup_{\lambda}
 \lef...
...
 \inf_{\mu}\parallel{\boldsymbol{x}}-{\boldsymbol{f}}(\mu)\parallel\right\}{}.$    

The curve $ {\boldsymbol{f}}(\lambda)$ is referred to as the principal curve of density function $ h$, if

$\displaystyle E_h\left({\boldsymbol{x}}\parallel \lambda_{f}({\boldsymbol{x}})=\lambda\right) =
 {\boldsymbol{f}}(\lambda) \quad ($for a.e. $\displaystyle \lambda)$    

is satisfied. After all, for any point $ {\boldsymbol{f}}(\lambda)$ on the curve, the average of the conditional distribution of $ {\boldsymbol{x}}$ given $ \lambda_{f}({\boldsymbol{x}})=\lambda$ is consistent with $ {\boldsymbol{f}}(\lambda)$ with the exception of a set of measure  $ \mathrm{0}$.

The principal curves of a given distribution are not always unique. For example, two principal components of the two-dimensional normal distribution are principal curves.

The algorithm for finding the principal curves of a distribution is:

  1. Initialization. Put

    $\displaystyle {\boldsymbol{f}}^{(0)}(\lambda)=\bar{{\boldsymbol{x}}}+{\boldsymbol{a}}\lambda{},$    

    where $ {\boldsymbol{a}}$ is the first principal component of the distribution defined by the density function $ h$ and $ \bar{{\boldsymbol{x}}}$ is the average of  $ {\boldsymbol{x}}$.

  2. Expectation Step (update of $ {\boldsymbol{f}}(\lambda)$).

    $\displaystyle {\boldsymbol{f}}^{(\kern.5pt j)}(\lambda)=E\left({\boldsymbol{x}}...
...l{f}}^{(\kern.5pt j-1)}}({\boldsymbol{x}})=\lambda\right)
 \quad \forall\lambda$    

  3. Projection Step (update of $ \lambda $).

    $\displaystyle \lambda^{(\kern.5pt j)}({\boldsymbol{x}})=\lambda_{{\boldsymbol{f}}^{(\kern.5pt j)}}({\boldsymbol{x}})\quad \forall{\boldsymbol{x}}\in R^{p}$    

    And transform the $ \lambda^{(\kern.5pt j)}$ to be arc length.

  4. Evaluation. Calculate

    $\displaystyle D^{2}\left(h,{\boldsymbol{f}}^{(\kern.5pt j)}\right)=E_{\lambda^{...
...x}})\right)\parallel^2\vert\lambda^{(\kern.5pt j)}({\boldsymbol{x}})\right\}{}.$    

    If the value

    $\displaystyle \frac{\left\vert D^{2}\left(h,{\boldsymbol{f}}^{(\kern.5pt j-1)}\...
... j)}\right)\right\vert}
 {D^2\left(h,{\boldsymbol{f}}^{(\kern.5pt j-1)}\right)}$    

    is smaller than $ \varepsilon$, then stop, otherwise $ j=j+1$ and go to Step 1.

In the Expectation Step, calculate the expectation with respect to the distribution $ h$ of the set of $ {\boldsymbol{x}}$ satisfying $ \lambda_{{\boldsymbol{f}}^{(\kern.5pt j-1)}}({\boldsymbol{x}})=\lambda$ and substitute $ {\boldsymbol{f}}^{(\kern.5pt j)}(\lambda)$ for it. In the Projection Step, project data points in $ R^p$ to the curve $ {\boldsymbol{f}}^{(\kern.5pt j)}(\lambda)$ and assign  $ \lambda^{(\kern.5pt j)}({\boldsymbol{x}})$.

For actual data analysis, only a set of data points is given and the distribution is unknown. [5] also proposed an algorithm with which to derive the principal curve for given $ p$-dimensional data of size $ n$: $ x_{ik}(i=1,2,\ldots,N; k=1,2,\ldots,p)$. In this case, the principal curves are represented by lines determined by $ N$ points $ (\lambda_i,{\boldsymbol{f}}_i)$.

  1. Initialization.

    $\displaystyle {\boldsymbol{f}}^{(0)}(\lambda)=\bar{{\boldsymbol{x}}}+{\boldsymbol{u}}\lambda{},$    

    where $ {\boldsymbol{u}}$ is the first principal component of the data and $ \bar{{\boldsymbol{x}}}$ is the average of  $ {\boldsymbol{x}}$.

  2. Expectation Step. Smooth $ x_{ik}$ $ (i=1,2,\ldots,N)$ with respect to $ \lambda $ for each $ k$ independently and calculate $ {\boldsymbol{f}}^{(\kern.5pt j)}(\lambda)$.

  3. Projection Step. Search for the nearest point on the curve (line curve) of each data point and assign it to their value of $ \lambda $.

  4. Evaluation. If a terminal condition is satisfied, the algorithm is stopped. If not, $ j=j+1$ and go to Step 2.


next up previous contents index
Next: 6.4 Linear Reduction of Up: 6. Dimension Reduction Methods Previous: 6.2 Linear Reduction of