13.2 Empirical Distribution Function

A natural estimate for the loss distribution is the observed (empirical) claim size distribution. However, if there have been changes in monetary values during the observation period, inflation corrected data should be used. For a sample of observations $ \{x_1, \ldots, x_n\}$ the empirical distribution function (edf) is defined as:

$\displaystyle F_n(x) = \frac{1}{n} \char93 \{i: x_i \le x \},$ (13.1)

i.e. it is a piecewise constant function with jumps of size $ 1/n$ at points $ x_i$. Very often, especially if the sample is large, the edf is approximated by a continuous, piecewise linear function with the ``jump points'' connected by linear functions, see Figure 13.1.

Figure: Left panel: Empirical distribution function (edf) of a 10-element log-normally distributed sample with parameters $ \mu =0.5$ and $ \sigma =0.5$, see Section 13.3.1. Right panel: Approximation of the edf by a continuous, piecewise linear function (black solid line) and the theoretical distribution function (red dotted line).
\includegraphics[width=.7\defpicwidth]{STFloss01a.ps} \includegraphics[width=.7\defpicwidth]{STFloss01b.ps}

The empirical distribution function approach is appropriate only when there is a sufficiently large volume of claim data. This is rarely the case for the tail of the distribution, especially in situations where exceptionally large claims are possible. It is often advisable to divide the range of relevant values of claims into two parts, treating the claim sizes up to some limit on a discrete basis, while the tail is replaced by an analytical cdf.