15.2 Metric Multidimensional Scaling

Metric MDS begins with a $(n \times n)$ distance matrix $\data{D}$ with elements $d_{ij}$ where $ i, j=1,\ldots,n$. The objective of metric MDS is to find a configuration of points in $p$-dimensional space from the distances between the points such that the coordinates of the $n$ points along the $p$ dimensions yield a Euclidean distance matrix whose elements are as close as possible to the elements of the given distance matrix $\data{D}$.


15.2.1 The Classical Solution

The classical solution is based on a distance matrix that is computed from a Euclidean geometry.

DEFINITION 15.1   A $(n \times n)$ distance matrix $\data{D} = (d_{ij})$ is Euclidean if for some points $x_{1},\ldots, x_n \in \mathbb{R}^p; d^2_{ij} = (x_i - x_j)^{\top} (x_i - x_j)$.

The following result tells us whether a distance matrix is Euclidean or not.

THEOREM 15.1   Define $\data{A}=(a_{ij}), a_{ij} = -\frac{1}{2}d_{ij}^2$ and $\data{B}=\data{H}\data{A}\data{H}$ where $\data{H}$ is the centering matrix. $\data{D}$ is Euclidean if and only if $\data{B}$ is positive semidefinite. If $\data{D}$ is the distance matrix of a data matrix $\data{X}$, then $\data{B}=\data{H}\data{X}\data{X}^{T}\data{H}$. $\data{B}$ is called the inner product matrix.

Recovery of coordinates

The task of MDS is to find the original Euclidean coordinates from a given distance matrix. Let the coordinates of $n$ points in a $p$ dimensional Euclidean space be given by $x_i$ ($i=1,\ldots ,n$) where $x_i=(x_{i1},\ldots,x_{ip})^{\top}$. Call ${\data{X}}=(x_1,\dots,x_n)^{\top}$ the coordinate matrix and assume $\overline{x}=0$. The Euclidean distance between the $i$-th and $j$-th points is given by:
\begin{displaymath}
d_{ij}^2 = \sum_{k = 1}^p (x_{ik}-x_{jk})^2.
\end{displaymath} (15.1)

The general $b_{ij}$ term of $\data{B}$ is given by:
\begin{displaymath}
b_{ij} = \sum_{k=1}^p x_{ik} x_{jk} =x_i^{\top} x_j.
\end{displaymath} (15.2)

It is possible to derive $\data{B}$ from the known squared distances $d_{ij}$, and then from $\data{B}$ the unknown coordinates.
$\displaystyle d_{ij}^2$ $\textstyle =$ $\displaystyle x_i^{\top} x_i + x_j^{\top} x_j - 2x_i^{\top} x_j$  
  $\textstyle =$ $\displaystyle b_{ii} + b_{jj} -2b_{ij}.$ (15.3)

Centering of the coordinate matrix ${\data{X}}
$ implies that $\sum_{i=1}^n b_{ij}=0$. Summing (15.3) over $i$, over $j$, and over $i$ and $j$, we find:
$\displaystyle \frac{1}{n}\sum_{i=1}^n d_{ij}^2$ $\textstyle =$ $\displaystyle \frac{1}{n}\sum_{i=1}^n b_{ii} + b_{jj}$  
$\displaystyle \frac{1}{n}\sum_{j=1}^n d_{ij}^2$ $\textstyle =$ $\displaystyle b_{ii} + \frac{1}{n}\sum_{j=1}^n b_{jj}$  
$\displaystyle \frac{1}{n^2}\sum_{i=1}^n \sum_{j=1}^n d_{ij}^2$ $\textstyle =$ $\displaystyle \frac{2}{n} \sum_{i=1}^n b_{ii}.$ (15.4)

Solving (15.3) and (15.4) gives:
\begin{displaymath}
b_{ij} = -\frac{1}{2}(d_{ij}^2-d_{i \bullet}^2
-d_{\bullet j}^2+d_{\bullet \bullet}^2).
\end{displaymath} (15.5)

With $a_{ij}=-\frac{1}{2}d_{ij}^2$, and
$\displaystyle a_{i \bullet}$ $\textstyle =$ $\displaystyle \frac{1}{n}\sum_{j=1}^n a_{ij}$  
$\displaystyle a_{\bullet j}$ $\textstyle =$ $\displaystyle \frac{1}{n}\sum_{i=1}^n a_{ij}$  
$\displaystyle a_{\bullet \bullet}$ $\textstyle =$ $\displaystyle \frac{1}{n^2}\sum_{i=1}^n\sum_{j=1}^n a_{ij}$ (15.6)

we get:
\begin{displaymath}
b_{ij} = a_{ij}-a_{i \bullet} - a_{\bullet j} + a_{\bullet \bullet}.
\end{displaymath} (15.7)

Define the matrix $\data{A}$ as $(a_{ij})$, and observe that:
\begin{displaymath}
\data{B}=\data{H}\data{A}\data{H}.
\end{displaymath} (15.8)

The inner product matrix $\data{B}$ can be expressed as:
\begin{displaymath}
\data{B} = \data{X}\data{X}^{\top},
\end{displaymath} (15.9)

where $ \data{X}=(x_1, \ldots ,x_n)^{\top}$ is the $(n\times p)$ matrix of coordinates. The rank of $\data{B}$ is then
\begin{displaymath}
rank(\data{B})=rank(\data{X}\data{X}^{\top})=rank(\data{X}) = p.
\end{displaymath} (15.10)

As required in Theorem 15.1 the matrix $\data{B}$ is symmetric, positive semidefinite and of rank $p$, and hence it has $p$ non-negative eigenvalues and $n-p$ zero eigenvalues. $\data{B}$ can now be written as:
\begin{displaymath}
\data{B}=\Gamma \Lambda \Gamma^{\top}
\end{displaymath} (15.11)

where $\Lambda = \mathop{\hbox{diag}}(\lambda_1 ,\ldots,\lambda_p)$, the diagonal matrix of the eigenvalues of $\data{B}$, and $\Gamma = (\gamma_1 ,\ldots,\gamma_p )$, the matrix of corresponding eigenvectors. Hence the coordinate matrix $\data{X}$ containing the point configuration in $\mathbb{R}^p$ is given by:
\begin{displaymath}
\data{X} = \Gamma \Lambda^{\frac{1}{2}}.
\end{displaymath} (15.12)

How many dimensions?

The number of desired dimensions is small in order to provide practical interpretations, and is given by the rank of ${\data{B}}$ or the number of nonzero eigenvalues $\lambda_i$. If $\data{B}$ is positive semidefinite, then the number of nonzero eigenvalues gives the number of eigenvalues required for representing the distances $d_{ij}$.

The proportion of variation explained by $p$ dimensions is given by

\begin{displaymath}
\frac{\sum_{i=1}^p \lambda_{i}}{\sum_{i=1}^{n-1} \lambda_{i}}.
\end{displaymath} (15.13)

It can be used for the choice of $p$. If $\data{B}$ is not positive semidefinite we can modify (15.13) to
\begin{displaymath}
\frac{\sum_{i=1}^p \lambda_{i}}{\sum (\textrm{\lq\lq positive eigenvalues''})}.
\end{displaymath} (15.14)

In practice the eigenvalues $\lambda_{i}$ are almost always unequal to zero. To be able to represent the objects in a space with dimensions as small as possible we may modify the distance matrix to:

\begin{displaymath}
\data{D^*} = d^*_{ij}
\end{displaymath} (15.15)

with
\begin{displaymath}
d^*_{ij} = \left\{ \begin{array}{lr}
0 & ;i=j\\
d_{ij}+e \geq 0 &;i \neq j
\end{array} \right.
\end{displaymath} (15.16)

where $e$ is determined such that the inner product matrix $\data{B}$ becomes positive semidefinite with a small rank.

Similarities

In some situations we do not start with distances but with similarities. The standard transformation (see Chapter 11) from a similarity matrix $\data{C}$ to a distance matrix $\data{D}$ is:
\begin{displaymath}
d_{ij} = (c_{ii}-2c_{ij}+c_{jj})^\frac{1}{2}.
\end{displaymath} (15.17)

THEOREM 15.2   If $\data{C} \leq \ 0$, then the distance matrix $\data{D}$ defined by (15.17) is Euclidean with centered inner product matrix $\data{B}=\data{H}\data{C}\data{H}$.

Relation to Factorial Analysis

Suppose that the ($n\times p$) data matrix $\data{X}$ is centered so that $ \data{X}^{\top}\data{X}$ equals a multiple of the covariance matrix $n\data{S}$. Suppose that the $p$ eigenvalues $\lambda_1,\ldots,\lambda_p$ of $n\data{S}$ are distinct and non zero. Using the duality Theorem 8.4 of factorial analysis we see that $\lambda_1,\ldots,\lambda_p$ are also eigenvalues of $\data{X}\data{X}^{\top}$= $\data{B}$ when $\data{D}$ is the Euclidean distance matrix between the rows of $\data{X}$. The $k$-dimensional solution to the metric MDS problem is thus given by the $k$ first principal components of $\data{X}$.

Optimality properties of the classical MDS solution

Let $\data{X}$ be a $(n\times p)$ data matrix with some inter-point distance matrix $\data{D}$. The objective of MDS is thus to find $\data{X}_1$, a representation of $\data{X}$ in a lower dimensional Euclidean space $\mathbb{R}^{k}$ whose inter-point distance matrix $\data{D}_1$ is not far from $\data{D}$. Let $\data{L} = (\data{L}_{1},
\data{L}_{2})$ be a $(p \times p)$ orthogonal matrix where $\data{L}_1$ is $(p\times k)$. $\data{X}_1=\data{XL}_1$ represents a projection of $\data{X}$ on the column space of $\data{L}_1$; in other words, $\data{X}_1$ may be viewed as a fitted configuration of $\data{X}$ in $\mathbb{R}^k$. A measure of discrepancy between $\data{D}$ and $\data{D}_1 =(d_{ij}^{(1)})$ is given by
\begin{displaymath}
\phi = \sum_{i,j=1}^n (d_{ij}-d_{ij}^{(1)})^2
.
\end{displaymath} (15.18)

THEOREM 15.3   Among all projections $\data{XL}_{1}$ of $\data{X}$ onto $k$-dimensional subspaces of $\mathbb{R}^p$ the quantity $\phi$ in (15.18) is minimized when $\data{X}$ is projected onto its first $k$ principal factors.

We see therefore that the metric MDS is identical to principal factor analysis as we have defined it in Chapter 8.

Summary
$\ast$
Metric MDS starts with a distance matrix $\data{D}$.
$\ast$
The aim of metric MDS is to construct a map in Euclidean space that corresponds to the given distances.
$\ast$
A practical algorithm is given as:
  1. start with distances $d_{ij}$
  2. define $\data{A}=-\frac{1}{2}d_{ij}^2$
  3. put $\data{B}=\left(a_{ij}-a_{i\bullet}-a_{\bullet j}+a_{\bullet \bullet}\right)$
  4. find the eigenvalues $\lambda_1,\ldots,\lambda_p$ and the associated eigenvectors $\gamma_1 ,\ldots,\gamma_p$ where the eigenvectors are normalized so that $\gamma_i^{\top} \gamma_i = \lambda_i$.
  5. Choose an appropriate number of dimensions $p$ (ideally $p=2$)
  6. The coordinates of the $n$ points in the Euclidean space are given by $x_{ij} = \gamma_{ij}\lambda_j^{1/2}$ for $i=1,\dots,n$ and $j=1,\dots,p$.
$\ast$
Metric MDS is identical to principal components analysis.