15.3 Nonmetric Multidimensional Scaling

The object of nonmetric MDS, as well as of metric MDS, is to find the coordinates of the points in $p$-dimensional space, so that there is a good agreement between the observed proximities and the inter-point distances. The development of nonmetric MDS was motivated by two main weaknesses in the metric MDS (Fahrmeir and Hamerle; 1984, Page 679):

  1. the definition of an explicit functional connection between dissimilarities and distances in order to derive distances out of given dissimilarities, and
  2. the restriction to Euclidean geometry in order to determine the object configurations.

Figure 15.5: Ranks and distances. 47237 MVAMDSnonmstart.xpl
\includegraphics[width=1\defpicwidth]{MDS-monoreg.ps}

The idea of a nonmetric MDS is to demand a less rigid relationship between the dissimilarities and the distances. Suppose that an unknown monotonic increasing function $f$,
\begin{displaymath}
d_{ij}=f (\delta_{ij}),
\end{displaymath} (15.19)

is used to generate a set of distances $d_{ij}$ as a function of given dissimilarities $\delta_{ij}$. Here $f$ has the property that if $\delta_{ij}<\delta_{rs}$, then $f (\delta_{ij})<f (\delta_{rs})$. The scaling is based on the rank order of the dissimilarities. Nonmetric MDS is therefore ordinal in character.

Figure 15.6: Pool-adjacent violators algorithm. 47244 MVAMDSpooladj.xpl
\includegraphics[width=1\defpicwidth]{MDS-pooladj.ps}

The most common approach used to determine the elements $d_{ij}$ and to obtain the coordinates of the objects $x_1, x_2, \ldots, x_n$ given only rank order information is an iterative process commonly referred to as the Shepard-Kruskal algorithm.

15.3.1 Shepard-Kruskal algorithm

In a first step, called the initial phase, we calculate Euclidean distances $d_{ij}^{(0)}$ from an arbitrarily chosen initial configuration $\data{X}_0$ in dimension $p^*$, provided that all objects have different coordinates. One might use metric MDS to obtain these initial coordinates. The second step or nonmetric phase determines disparities $\hat{d}_{ij}^{(0)}$ from the distances $d_{ij}^{(0)}$ by constructing a monotone regression relationship between the $d_{ij}^{(0)}$'s and $\delta_{ij}$'s, under the requirement that if $\delta_{ij}<\delta_{rs}$, then $\hat d_{ij}^{(0)} \leq \ \hat d_{rs}^{(0)}$. This is called the weak monotonicity requirement. To obtain the disparities $\hat{d}_{ij}^{(0)}$, a useful approximation method is the pool-adjacent violators (PAV) algorithm (see Figure 15.6). Let
\begin{displaymath}
(i_1,j_1)>(i_2,j_2)>...>(i_k,j_k)
\end{displaymath} (15.20)

be the rank order of dissimilarities of the $k=n(n-1)/2$ pairs of objects. This corresponds to the points in Figure 15.5. The PAV algorithm is described as follows: ``beginning with the lowest ranked value of $\delta_{ij}$, the adjacent $d_{ij}^{(0)}$ values are compared for each $\delta_{ij}$ to determine if they are monotonically related to the $\delta_{ij}$'s. Whenever a block of consecutive values of $d_{ij}^{(0)}$ are encountered that violate the required monotonicity property the $d_{ij}^{(0)}$ values are averaged together with the most recent non-violator $d_{ij}^{(0)}$ value to obtain an estimator. Eventually this value is assigned to all points in the particular block''.

In a third step, called the metric phase, the spatial configuration of $\data{X}_0$ is altered to obtain $\data{X}_1$. From $\data{X}_1$ the new distances $d_{ij}^{(1)}$ can be obtained which are more closely related to the disparities $\hat{d}_{ij}^{(0)}$ from step two.

EXAMPLE 15.3   Consider a small example with 4 objects based on the car marks data set.

Table 15.3: Dissimilarities $\delta_{ij}$ for car marks.
  j 1 2 3 4
i   Mercedes Jaguar Ferrari VW
1 Mercedes -      
2 Jaguar 3 -    
3 Ferrari 2 1 -  
4 VW 5 4 6 -


Our aim is to find a representation with $p^*=2$ via MDS. Suppose that we choose as an initial configuration of ${\data{X}}_0$ the coordinates given in Table 15.4.

Table 15.4: Initial coordinates for MDS.
i   $x_{i1}$ $x_{i2}$
1 Mercedes 3 2
2 Jaguar 2 7
3 Ferrari 1 3
4 VW 10 4


The corresponding distances $d_{ij}=\sqrt{(x_i-x_j)^{\top}(x_i-x_j)}$ are calculated in Table 15.5

Table 15.5: Ranks and distances.
$i,j$ $d_{ij}$ $rank(d_{ij})$ $\delta_{ij}$
1,2 5.1 3 3
1,3 2.2 1 2
1,4 7.3 4 5
2,3 4.1 2 1
2,4 8.5 5 4
3,4 9.1 6 6


Figure 15.7: Initial configuration of the MDS of the car data. 47258 MVAnmdscar1.xpl
\includegraphics[width=1.1\defpicwidth]{MVAnmdscar1.ps}

Figure 15.8: Scatterplot of dissimilarities against distances. 47265 MVAnmdscar2.xpl
\includegraphics[width=1.1\defpicwidth]{MVAnmdscar2.ps}

A plot of the dissimilarities of Table 15.5 against the distance yields Figure 15.8. This relation is not satisfactory since the ranking of the $\delta_{ij}$ did not result in a monotone relation of the corresponding distances $d_{ij}$. We apply therefore the PAV algorithm.

The first violator of monotonicity is the second point $(1,3)$. Therefore we average the distances $d_{13}$ and $d_{23}$ to obtain the disparities

\begin{displaymath}
\hat d_{13}=\hat d_{23}=\frac{d_{13}+d_{23}}{2}=\frac{2.2+4.1}{2}=3.17.
\end{displaymath}

Applying the same procedure to $(2,4)$ and $(1,4)$ we obtain $\hat d_{24}= \hat d_{14}=7.9$. The plot of $\delta_{ij}$ versus the disparities $\hat d_{ij}$ represents a monotone regression relationship.

In the initial configuration (Figure 15.7), the third point (Ferrari) could be moved so that the distance to object 2 (Jaguar) is reduced. This procedure however also alters the distance between objects 3 and 4. Care should be given when establishing a monotone relation between $\delta_{ij}$ and $d_{ij}$.

In order to assess how well the derived configuration fits the given dissimilarities Kruskal suggests a measure called STRESS1 that is given by

\begin{displaymath}
STRESS1= \left(\frac{\sum_{i<j}(d_{ij}-\hat{d}_{ij})^2}
{\sum_{i<j} d_{ij}^{2}}\right)^{\frac{1}{2}}.
\end{displaymath} (15.21)

An alternative stress measure is given by
\begin{displaymath}
STRESS2= \left(\frac{\sum_{i<j}(d_{ij}-\hat{d}_{ij})^2}
{\sum_{i<j} (d_{ij}-\overline{d})^2 }\right)^{\frac{1}{2}},
\end{displaymath} (15.22)

where $\overline{d}$ denotes the average distance.

EXAMPLE 15.4   The Table 15.6 presents the STRESS calculations for the car example.

The average distance is $\overline{d}=36.4/6=6.1$. The corresponding STRESS measures are:

\begin{displaymath}
STRESS1=\sqrt{2.6/256}=0.1
\end{displaymath}


\begin{displaymath}
STRESS2=\sqrt{2.6/36.4}=0.27.
\end{displaymath}


Table 15.6: STRESS calculations for car marks example.
$(i,j)$ $\delta_{ij}$ $d_{ij}$ $\hat d_{ij}$ $(d_{ij}-\hat d_{ij})^2$ $d_{ij}^2$ $(d_{ij}-\overline{d})^2$
(2,3) 1 4.1 3.15 0.9 16.8 3.8
(1,3) 2 2.2 3.15 0.9 4.8 14.8
(1,2) 3 5.1 5.1 0 26.0 0.9
(2,4) 4 8.5 7.9 0.4 72.3 6.0
(1,4) 5 7.3 7.9 0.4 53.3 1.6
(3,4) 6 9.1 9.1 0 82.8 9.3
$\Sigma$   36.3   2.6 256.0 36.4


The goal is to find a point configuration that balances the effects STRESS and non monotonicity. This is achieved by an iterative procedure. More precisely, one defines a new position of object $i$ relative to object $j$ by

\begin{displaymath}
x_{il}^{NEW}=x_{il}+\alpha\left(1-\frac{\hat d_{ij}}{d_{ij}}\right)
(x_{jl}-x_{il}), \qquad l=1,\dots,p^*.
\end{displaymath} (15.23)

Here $\alpha$ denotes the step width of the iteration.

By (15.23) the configuration of object $i$ is improved relative to object $j$. In order to obtain an overall improvement relative to all remaining points one uses:

\begin{displaymath}
x_{il}^{NEW}=x_{il}+\frac{\alpha}{n-1}
\sum_{j=1,j\neq i}^n ...
... d_{ij}}{d_{ij}}\right)
(x_{jl}-x_{il}), \qquad l=1,\dots,p^*.
\end{displaymath} (15.24)

The choice of step width $\alpha$ is crucial. Kruskal proposes a starting value of $\alpha=0.2$. The iteration is continued by a numerical approximation procedure, such as steepest descent or the Newton-Raphson procedure.

In a fourth step, the evaluation phase, the STRESS measure is used to evaluate whether or not its change as a result of the last iteration is sufficiently small that the procedure is terminated. At this stage the optimal fit has been obtained for a given dimension. Hence, the whole procedure needs to be carried out for several dimensions.

EXAMPLE 15.5   Let us compute the new point configuration for $i=3$ (Ferrari). The initial coordinates from Table 15.4 are

\begin{displaymath}
x_{31}=1\textrm{ and } x_{32}=3.
\end{displaymath}

Applying (15.24) yields (for $\alpha=3$):
$\displaystyle x_{31}^{NEW}$ $\textstyle =$ $\displaystyle 1+\frac{3}{4-1}\sum_{j=1,j\neq 3}^{4}
\left(1-\frac{\hat d_{3j}}{d_{3j}}\right)(x_{j1}-1)$  
  $\textstyle =$ $\displaystyle 1 + \left( 1-\frac{3.15}{2.2} \right)(3 - 1)
+ \left( 1-\frac{3.15}{4.1} \right)(2 - 1)
+ \left( 1-\frac{9.1}{9.1} \right)(10 - 1)$  
  $\textstyle =$ $\displaystyle 1 - 0.86 + 0.23 +0$  
  $\textstyle =$ $\displaystyle 0.37.$  

Similarly we obtain $x_{32}^{NEW}=4.36$.

Figure 15.9: First iteration for Ferrari. 47275 MVAnmdscar3.xpl
\includegraphics[width=1\defpicwidth]{MVAnmdscar3.ps}

To find the appropriate number of dimensions, $p^*$, a plot of the minimum STRESS value as a function of the dimensionality is made. One possible criterion in selecting the appropriate dimensionality is to look for an elbow in the plot. A rule of thumb that can be used to decide if a STRESS value is sufficiently small or not is provided by Kruskal:

\begin{displaymath}
S>20 \% ,\ \textrm{poor}; \ S=10 \%, \ \textrm{fair}; \ S<5\%,\ \textrm{good};\ S=0, \ \textrm{perfect}.
\end{displaymath} (15.25)

Summary
$\ast$
Nonmetric MDS is only based on the rank order of dissimilarities.
$\ast$
The object of nonmetric MDS is to create a spatial representation of the objects with low dimensionality.
$\ast$
A practical algorithm is given as:
  1. Choose an initial configuration.
  2. Find $d_{ij}$ from the configuration.
  3. Fit $\hat d_{ij}$, the disparities, by the PAV algorithm.
  4. Find a new configuration $\data{X}_{n+1}$ by using the steepest descent.
  5. Go to 2.