The aim of the book is to present multivariate data analysis in a way that is understandable for non-mathematicians and practitioners who are confronted by statistical data analysis. This is achieved by focusing on the practical relevance and through the e-book character of this text. All practical examples may be recalculated and modified by the reader using a standard web browser and without reference or application of any specific software.
The book is divided into three main parts. The first part is devoted to graphical techniques describing the distributions of the variables involved. The second part deals with multivariate random variables and presents from a theoretical point of view distributions, estimators and tests for various practical situations. The last part is on multivariate techniques and introduces the reader to the wide selection of tools available for multivariate data analysis. All data sets are given in the appendix and are downloadable from www.md-stat.com . The text contains a wide variety of exercises the solutions of which are given in a separate textbook. In addition a full set of transparencies on www.md-stat.com is provided making it easier for an instructor to present the materials in this book. All transparencies contain hyper links to the statistical web service so that students and instructors alike may recompute all examples via a standard web browser.
The first section on descriptive techniques is on the construction of the boxplot. Here the standard data sets on genuine and counterfeit bank notes and on the Boston housing data are introduced. Flury faces are shown in Section 1.5, followed by the presentation of Andrews curves and parallel coordinate plots. Histograms, kernel densities and scatterplots complete the first part of the book. The reader is introduced to the concept of skewness and correlation from a graphical point of view.
At the beginning of the second part of the book the reader goes on a short excursion into matrix algebra. Covariances, correlation and the linear model are introduced. This section is followed by the presentation of the ANOVA technique and its application to the multiple linear model. In Chapter 4 the multivariate distributions are introduced and thereafter specialized to the multinormal. The theory of estimation and testing ends the discussion on multivariate random variables.
The third and last part of this book starts with a geometric decomposition of data matrices. It is influenced by the French school of analyse de données. This geometric point of view is linked to principal components analysis in Chapter 9. An important discussion on factor analysis follows with a variety of examples from psychology and economics. The section on cluster analysis deals with the various cluster techniques and leads naturally to the problem of discrimination analysis. The next chapter deals with the detection of correspondence between factors. The joint structure of data sets is presented in the chapter on canonical correlation analysis and a practical study on prices and safety features of automobiles is given. Next the important topic of multidimensional scaling is introduced, followed by the tool of conjoint measurement analysis. The conjoint measurement analysis is often used in psychology and marketing in order to measure preference orderings for certain goods. The applications in finance (Chapter 17) are numerous. We present here the CAPM model and discuss efficient portfolio allocations. The book closes with a presentation on highly interactive, computationally intensive techniques.
This book is designed for the advanced bachelor and first year graduate student as well as for the inexperienced data analyst who would like a tour of the various statistical tools in a multivariate data analysis workshop. The experienced reader with a bright knowledge of algebra will certainly skip some sections of the multivariate random variables part but will hopefully enjoy the various mathematical roots of the multivariate techniques. A graduate student might think that the first part on description techniques is well known to him from his training in introductory statistics. The mathematical and the applied parts of the book (II, III) will certainly introduce him into the rich realm of multivariate statistical data analysis modules.
The inexperienced computer user of this e-book is slowly introduced to an interdisciplinary way of statistical thinking and will certainly enjoy the various practical examples. This e-book is designed as an interactive document with various links to other features. The complete e-book may be downloaded from www.xplore-stat.de using the license key given on the last page of this book. Our e-book design offers a complete PDF and HTML file with links to MD*Tech computing servers.
The reader of this book may therefore use all the presented methods and data via the local XploRe Quantlet Server (XQS) without downloading or buying additional software. Such XQ Servers may also be installed in a department or addressed freely on the web (see www.i-xplore.de for more information).
A book of this kind would not have been possible without the help of many friends, colleagues and students. For the technical production of the e-book we would like to thank Jörg Feuerhake, Zdenek Hlávka, Torsten Kleinow, Sigbert Klinke, Heiko Lehmann, Marlene Müller. The book has been carefully read by Christian Hafner, Mia Huber, Stefan Sperlich, Axel Werwatz. We would also like to thank Pavel Cízek, Isabelle De Macq, Holger Gerhardt, Alena Myšicková and Manh Cuong Vu for the solutions to various statistical problems and exercises. We thank Clemens Heine from Springer Verlag for continuous support and valuable suggestions on the style of writing and on the contents covered.
W. Härdle and L. Simar
Berlin and Louvain-la-Neuve, August 2003