The concept of smoothing is a central idea in statistics. Its role is to extract structural elements of variable complexity from patterns of random variation. The nonparametric smoothing concept is designed to simultaneously estimate and model the underlying structure. This involves high dimensional objects, like density functions, regression surfaces or conditional quantiles. Such objects are difficult to estimate for data sets with mixed, high dimensional and partially unobservable variables. The semiparametric modeling technique compromises the two aims, flexibility and simplicity of statistical procedures, by introducing partial parametric components. These (low dimensional) components allow one to match structural conditions like for example linearity in some variables and may be used to model the influence of discrete variables. The flexibility of semiparametric modeling has made it a widely accepted statistical technique.
The aim of this monograph is to present the statistical and mathematical principles of smoothing with a focus on applicable techniques. The necessary mathematical treatment is easily understandable and a wide variety of interactive smoothing examples are given. This text is an e-book; it is a downloadable entity ( http://www.i-xplore.de ) which allows the reader to recalculate all arguments and applications without reference to a specific software platform. This new technique for proliferation of methods and ideas is specifically designed for the beginner in nonparametric and semiparametric statistics. It is based on the XploRe quantlet technology, developed at Humboldt-Universität zu Berlin.
The text has evolved out of the courses ``Nonparametric Modeling'' and ``Semiparametric Modeling'', that the authors taught at Humboldt-Universität zu Berlin, ENSAE Paris, Charles University Prague, and Universidad de Cantabria, Santander. The book divides itself naturally into two parts:
The first part (Chapters 2-4) covers the methodological aspects of nonparametric function estimation for cross-sectional data, in particular kernel smoothing methods. Although our primary focus will be on flexible regression models, a closely related topic to consider is nonparametric density estimation. Since many techniques and concepts for the estimation of probability density functions are also relevant for regression function estimation, we first consider histograms (Chapter 2) and kernel density estimates (Chapter 3) in more detail. Finally, in Chapter 4 we introduce several methods of nonparametrically estimating regression functions. The main part of this chapter is devoted to kernel regression, but other approaches such as splines, orthogonal series and nearest neighbor methods are also covered.
The first part is intended for undergraduate students majoring in mathematics, statistics, econometrics or biometrics. It is assumed that the audience has a basic knowledge of mathematics (linear algebra and analysis) and statistics (inference and regression analysis). The material is easy to utilize since the e-book character of the text allows maximum flexibility in learning (and teaching) intensity.
The second part (Chapters 5-9) is devoted to semiparametric regression models, in particular extensions of the parametric generalized linear model. In Chapter 5 we summarize the main ideas of the generalized linear model (GLM). Typical concepts are the logit and probit models. Nonparametric extensions of the GLM consider either the link function (single index models, Chapter 6) or the index argument (generalized partial linear models, additive and generalized additive models, Chapters 7-9). Single index models focus on the nonparametric error distribution in an underlying latent variable model. Partial linear models take the pragmatic point of fixing the error distribution but let the index be of non- or semiparametric structure. Generalized additive models concentrate on a (lower dimensional) additive structure of the index with fixed link function. This model class balances the difficulty of high-dimensional smoothing with the flexibility of nonparametrics.
In addition to the methodological aspects, the second part also covers computational algorithms for the considered models. As in the first part we focus on cross-sectional data. It is intended to be used by Master and PhD students or researchers.
This book would not have been possible without substantial support from many colleagues and students. It has benefited at several stages from useful remarks and suggestions of our students at Humboldt-Universität zu Berlin, ENSAE Paris and Charles University Prague. We are grateful to Lorens Helmchen, Stephanie Freese, Danilo Mercurio, Thomas Kühn, Ying Chen and Michal Benko for their support in text processing and programming, Caroline Condron for language checking and Pavel Cízek, Zdenek Hlávka and Rainer Schulz for their assistance in teaching. We are indebted to Joel Horowitz (Northwestern University), Enno Mammen (Universität Heidelberg) and Helmut Rieder (Universität Bayreuth) for their valuable comments on earlier versions of the manuscript. Thanks go also to Clemens Heine, Springer Verlag, for being a very supportive and helpful editor.
Berlin/Kaiserslautern/Madrid, JanuaryFebruaryMarchApril MayJuneJulyAugustSeptemberOctoberNovember December