Next: 8.2 Basic Ideas Up: 8. Parallel Computing Techniques Previous: 8. Parallel Computing Techniques

8.1 Introduction

Parallel computing means to divide a job into several tasks and use more than one processor simultaneously to perform these tasks. Assume you have developed a new estimation method for the parameters of a complicated statistical model. After you prove the asymptotic characteristics of the method (for instance, asymptotic distribution of the estimator), you wish to perform many simulations to assure the goodness of the method for reasonable numbers of data values and for different values of parameters. You must generate simulated data, for example, times for each length and parameter value. The total simulation work requires a huge number of random number generations and takes a long time on your PC. If you use $100\,$ PCs in your institute to run these simulations simultaneously, you may expect that the total execution time will be . This is the simple idea of parallel computing.

Computer scientists noticed the importance of parallel computing many years ago ([10]). It is true that the recent development of computer hardware has been very rapid. Over roughly 40 years from 1961, the so called ''Moore's law'' holds: the number of transistors per silicon chip has doubled approximately every 18 months ([39]). This means that the capacity of memory chips and processor speeds have also increased roughly exponentially. In addition, hard disk capacity has increased dramatically. Consequently, modern personal computers are more powerful than ''super computers'' were a decade ago. Unfortunately, even such powerful personal computers are not sufficient for our requirements. In statistical analysis, for example, while computers are becoming more powerful, data volumes are becoming larger and statistical techniques are becoming more computer intensive. We are continuously forced to realize more powerful computing environments for statistical analysis. Parallel computing is thought to be the most promising technique.

However, parallel computing has not been popular among statisticians until recently ([33]). One reason is that parallel computing was available only on very expensive computers, which were installed at some computer centers in universities or research institutes. Few statisticians could use these systems easily. Further, software for parallel computing was not well prepared for general use.

Recently, cheap and powerful personal computers changed this situation. The Beowulf project ([34]), which realized a powerful computer system by using many PCs connected by a network, was a milestone in parallel computer development. Freely available software products for parallel computing have become more mature. Thus, parallel computing has now become easy for statisticians to access.

In this chapter, we describe an overview of available technologies for parallel computing and give examples of their use in statistics. The next section considers the basic ideas of parallel computing, including memory architectures. Section 8.3 introduces the available software technologies such as process forking, threading, OpenMP, PVM (Parallel Virtual Machine), MPI (Message Passing Interface) and HPF (High Performance Fortran). The last section describes some examples of parallel computing in statistics.

Next: 8.2 Basic Ideas Up: 8. Parallel Computing Techniques Previous: 8. Parallel Computing Techniques