11.5 Exercises

EXERCISE 11.1   Prove formula (11.16).

EXERCISE 11.2   Prove that $I_{R}=\mathop{\hbox{tr}}(\data{S}_{R})$, where $\data{S}_{R}$ denotes the empirical covariance matrix of the observations contained in $R$.

EXERCISE 11.3   Prove that

\begin{displaymath}\Delta(R,P+Q)=\frac{n_{R}+n_{P}}{n_{R}+n_{P}+n_{Q}}\; \Delta(...
...; \Delta(R,Q)
- \frac{n_{R}}{n_{R}+n_{P}+n_{Q}}\; \Delta(P,Q),\end{displaymath}

when the centroid formula is used to define $d^2(R,P+Q)$.

EXERCISE 11.4   Repeat the 8-point example (Example 11.5) using the complete linkage and the Ward algorithm. Explain the difference to single linkage.

EXERCISE 11.5   Explain the differences between various proximity measures by means of an example.

EXERCISE 11.6   Repeat the bank notes example (Example 11.6) with another random sample of 20 notes.

EXERCISE 11.7   Repeat the bank notes example (Example 11.6) with another clustering algorithm.

EXERCISE 11.8   Repeat the bank notes example (Example 11.6) or the 8-point example (Example 11.5) with the $L_{1}$-norm.

EXERCISE 11.9   Analyze the U.S. companies example (Table B.5) using the Ward algorithm and the $L_{2}$-norm.

EXERCISE 11.10   Analyze the U.S. crime data set (Table B.10) with the Ward algorithm and the $L_{2}$-norm on standardized variables (use only the crime variables).

EXERCISE 11.11   Repeat Exercise 11.10 with the U.S. health data set (use only the number of deaths variables).

EXERCISE 11.12   Redo Exercise 11.10 with the $\chi^{2}$-metric. Compare the results.

EXERCISE 11.13   Redo Exercise 11.11 with the $\chi^{2}$-metric and compare the results.