Next: 1.3 Why This Handbook Up: 1. Computational Statistics: An Previous: 1.1 Computational Statistics and

1.2 The Emergence of a Field of Computational Statistics

Statistical computing is truly a multidisciplinary field and the diverse problems have created a yeasty atmosphere for research and development. This has been the case from the beginning. The roles of statistical laboratories and the applications that drove early developments in statistical computing are surveyed by Grier (1999)[7]. As digital computers began to be used, the field of statistical computing came to embrace not only numerical methods but also a variety of topics from computer science.

The development of the field of statistical computing was quite fragmented, with advances coming from many directions - some by persons with direct interest and expertise in computations, and others by persons whose research interests were in the applications, but who needed to solve a computational problem. Through the 1950s the major facts relevant to statistical computing were scattered through a variety of journal articles and technical reports. Many results were incorporated into computer programs by their authors and never appeared in the open literature. Some persons who contributed to the development of the field of statistical computing were not aware of the work that was beginning to put numerical analysis on a sound footing. This hampered advances in the field.

1.2.1 Early Developments in Statistical Computing

An early book that assembled much of the extant information on digital computations in the important area of linear computations was by Dwyer (1951)[3]. In the same year, Von Neumann's (1951) NBS publication[13] described techniques of random number generation and applications in Monte Carlo. At the time of these publications, however, access to digital computers was not widespread. Dwyer (1951)[3] was also influential in regression computations performed on calculators. Some techniques, such as use of ''machine formulas'', persisted into the age of digital computers.

Developments in statistical computing intensified in the 1960s, as access to digital computers became more widespread. Grier (1991)[6] describes some of the effects on statistical practice by the introduction of digital computers, and how statistical applications motivated software developments. The problems of rounding errors in digital computations were discussed very carefully in a pioneering book by Wilkinson (1963)[15]. A number of books on numerical analysis using digital computers were beginning to appear. The techniques of random number generation and Monte Carlo were described by Hammersley and Handscomb (1964)[8]. In 1967 the first book specifically on statistical computing appeared, Hemmerle (1967)[9].

1.2.2 Early Conferences and Formation of Learned Societies

The 1960s also saw the beginnings of conferences on statistical computing and sections on statistical computing within the major statistical societies. The Royal Statistical Society sponsored a conference on statistical computing in December 1966. The papers from this conference were later published in the RSS's Applied Statistics journal. The conference led directly to the formation of a Working Party on Statistical Computing within the Royal Statistical Society. The first Symposium on the Interface of Computer Science and Statistics was held February 1, 1967. This conference has continued as an annual event with only a few exceptions since that time (see Goodman, 1993[5], Billard and Gentle, 1993[1], and Wegman, 1993[14]). The attendance at the Interface Symposia initially grew rapidly year by year and peaked at over in 1979. In recent years the attendance has been slightly under . The proceedings of the Symposium on the Interface have been an important repository of developments in statistical computing. In April, 1969, an important conference on statistical computing was held at the University of Wisconsin. The papers presented at that conference were published in a book edited by Milton and Nelder (1969)[11], which helped to make statisticians aware of the useful developments in computing and of their relevance to the work of applied statisticians.

In the 1970s two more important societies devoted to statistical computing were formed. The Statistical Computing Section of the ASA was formed in 1971 (see Chambers and Ryan, 1990)[2]. The Statistical Computing Section organizes sessions at the annual meetings of the ASA, and publishes proceedings of those sessions. The International Association for Statistical Computing (IASC) was founded in 1977 as a Section of ISI. In the meantime, the first of the biennial COMPSTAT Conferences on computational statistics was held in Vienna in 1974. Much later, regional sections of the IASC were formed, one in Europe and one in Asia. The European Regional Section of the IASC is now responsible for the organization of the COMPSTAT conferences.

Also, beginning in the late 1960s and early 1970s, most major academic programs in statistics offered one or more courses in statistical computing. More importantly, perhaps, instruction in computational techniques has permeated many of the standard courses in applied statistics.

As mentioned above, there are several journals whose titles include some variants of both ''computing'' and ''statistics''. The first of these, the Journal of Statistical Computation and Simulation, was begun in 1972. There are dozens of journals in numerical analysis and in areas such as ''computational physics'', ''computational biology'', and so on, that publish articles relevant to the fields of statistical computing and computational statistics.

By 1980 the field of statistical computing, or computational statistics, was well-established as a distinct scientific subdiscipline. Since then, there have been regular conferences in the field, there are scholarly societies devoted to the area, there are several technical journals in the field, and courses in the field are regularly offered in universities.

1.2.3 The PC

The 1980s was a period of great change in statistical computing. The personal computer brought computing capabilities to almost everyone. With the PC came a change not only in the number of participants in statistical computing, but, equally important, completely different attitudes toward computing emerged. Formerly, to do computing required an account on a mainframe computer. It required laboriously entering arcane computer commands onto punched cards, taking these cards to a card reader, and waiting several minutes or perhaps a few hours for some output - which, quite often, was only a page stating that there was an error somewhere in the program. With a personal computer for the exclusive use of the statistician, there was no incremental costs for running programs. The interaction was personal, and generally much faster than with a mainframe. The software for PCs was friendlier and easier to use. As might be expected with many non-experts writing software, however, the general quality of software probably went down.

The democratization of computing resulted in rapid growth in the field, and rapid growth in software for statistical computing. It also contributed to the changing paradigm of the data sciences.

1.2.4 The Cross Currents of Computational Statistics

Computational statistics of course is more closely related to statistics than to any other discipline, and computationally-intensive methods are becoming more commonly used in various areas of application of statistics. Developments in other areas, such as computer science and numerical analsysis, are also often directly relevant to computational statistics, and the research worker in this field must scan a wide range of literature.

Numerical methods are often developed in an ad hoc way, and may be reported in the literature of any of a variety of disciplines. Other developments important for statistical computing may also be reported in a wide range of journals that statisticians are unlikely to read. Keeping abreast of relevant developments in statistical computing is difficult not only because of the diversity of the literature, but also because of the interrelationships between statistical computing and computer hardware and software.

An example of an area in computational statistics in which significant developments are often made by researchers in other fields is Monte Carlo simulation. This technique is widely used in all areas of science, and researchers in various areas often contribute to the development of the science and art of Monte Carlo simulation. Almost any of the methods of Monte Carlo, including random number generation, are important in computational statistics.

1.2.5 Literature

Some of the major periodicals in statistical computing and computational statistics are the following. Some of these journals and proceedings are refereed rather rigorously, some refereed less so, and some are not refereed.

ACM Transactions on Mathematical Software, published quarterly by the ACM (Association for Computing Machinery), includes algorithms in Fortran and C. Most of the algorithms are available through netlib. The ACM collection of algorithms is sometimes called CALGO.
www.acm.org/toms/
ACM Transactions on Modeling and Computer Simulation, published quarterly by the ACM.
www.acm.org/tomacs/
Applied Statistics, published quarterly by the Royal Statistical Society. (Until 1998, it included algorithms in Fortran. Some of these algorithms, with corrections, were collected by Griffiths and Hill, 1985. Most of the algorithms are available through statlib at Carnegie Mellon University.)
www.rss.org.uk/publications/
Communications in Statistics - Simulation and Computation, published quarterly by Marcel Dekker. (Until 1996, it included algorithms in Fortran. Until 1982, this journal was designated as Series B.)
www.dekker.com/servlet/product/productid/SAC/
Computational Statistics published quarterly by Physica-Verlag (formerly called Computational Statistics Quarterly).
comst.wiwi.hu-berlin.de/
Computational Statistics. Proceedings of the xx-th Symposium on Computational Statistics (COMPSTAT), published biennially by Physica-Verlag/Springer.
Computational Statistics & Data Analysis, published by Elsevier Science. There are twelve issues per year. (This is also the official journal of the International Association for Statistical Computing and as such incorporates the Statistical Software Newsletter.)
www.cbs.nl/isi/csda.htm
Computing Science and Statistics. This is an annual publication containing papers presented at the Interface Symposium. Until 1992, these proceedings were named Computer Science and Statistics: Proceedings of the xx-th Symposium on the Interface. (The 24th symposium was held in 1992.) In 1997, Volume 29 was published in two issues: Number 1, which contains the papers of the regular Interface Symposium; and Number 2, which contains papers from another conference. The two numbers are not sequentially paginated. Since 1999, the proceedings have been published only in CD-ROM form, by the Interface Foundation of North America.
www.galaxy.gmu.edu/stats/IFNA.html
Journal of Computational and Graphical Statistics, published quarterly as a joint publication of ASA, the Institute of Mathematical Statistics, and the Interface Foundation of North America.
www.amstat.org/publications/jcgs/
Journal of the Japanese Society of Computational Statistics, published once a year by JSCS.
www.jscs.or.jp/oubun/indexE.html
Journal of Statistical Computation and Simulation, published in twelve issues per year by Taylor & Francis.
www.tandf.co.uk/journals/titles/00949655.asp
Proceedings of the Statistical Computing Section, published annually by ASA.
www.amstat.org/publications/
SIAM Journal on Scientific Computing, published bimonthly by SIAM. This journal was formerly SIAM Journal on Scientific and Statistical Computing.
www.siam.org/journals/sisc/sisc.htm
Statistical Computing & Graphics Newsletter, published quarterly by the Statistical Computing and the Statistical Graphics Sections of ASA.
www.statcomputing.org/
Statistics and Computing, published quarterly by Chapman & Hall.

In addition to literature and learned societies in the traditional forms, an important source of communication and a repository of information are computer databases and forums. In some cases, the databases duplicate what is available in some other form, but often the material and the communications facilities provided by the computer are not available elsewhere.

Next: 1.3 Why This Handbook Up: 1. Computational Statistics: An Previous: 1.1 Computational Statistics and