5.3 Functional Data Analysis

In the functional data framework, the objects are usually modelled as realizations of a stochastic process $ X(t), t\ \in J$, where $ J$ is a bounded interval in $ \mathbb{R}$. Thus, the set of functions

$\displaystyle x_i(t),\ i=1,2,\ldots n,\ t\in J,$

represents the data set. We assume the existence of the mean, variance, and covariance functions of the process $ X(t)$ and denote these by $ \textrm{E}X(t)$, $ \textrm{Var}(t)$ and $ \textrm{Cov}(s,t)$ respectively.

For the functional sample we can define the sample-counterparts of $ \textrm{E}X(t)$, $ \textrm{Var}(t)$ and $ \textrm{Cov}(s,t)$ in a straightforward way:

\begin{displaymath}\begin{array}{rcl}
\bar{X}(t) &=& \frac{1}{n}\sum\limits_{i=...
...X}(s)\right\}\left\{x_i(t)-\bar{X}(t) \right\}.\\
\end{array} \end{displaymath}

In practice, we observe the function values $ {\cal X} \stackrel{\mathrm{def}}{=}\{x_i(t_{i1}),x_i(t_{i2}),\ldots ,x_i(t_{ip_i})$; $ i=1,\ldots ,n\}$ only on a discrete grid $ \{t_{i1},t_{i2},\ldots,t_{ip_i}\}\ \in J$, where $ p_i$ is the number of grid points for the $ i$th observation. One may estimate the functions $ x_1,\ldots ,x_n$ via standard nonparametric regression methods, Härdle (1990). Another popular way is to use a truncated functional basis expansion. More precisely, let us denote a functional basis on the interval $ J$ by $ \{\Theta_1,\Theta_2,\ldots , \}$ and assume that the functions $ x_i$ are approximated by the first $ L$ basis functions $ \Theta_l$, $ l=1,2,\ldots,L:$

$\displaystyle x_i(t) = \sum\limits_{l=1}^L c_{il}\Theta_l(t)=\mathbf{c}_i^{\top}\mathbf{\Theta}(t),$ (5.2)

where $ \mathbf{\Theta} = \left(\Theta_1,\ldots , \Theta_L \right)^{\top}$ and $ \mathbf{c}_i = \left(c_{i1},\ldots ,c_{iL}\right)^{\top}$. The number of basis functions $ L$ determines the tradeoff between data fidelity and smoothness. The analysis of the functional objects will be implemented through the coefficient matrix

$\displaystyle \mathbf{C} = \{c_{il},\ i=1,\ldots,n,\ l=1,\ldots,L\}.$

The mean, variance, and covariance functions are calculated by:
$\displaystyle \bar{X}(t)$ $\displaystyle =$ $\displaystyle \mathbf{\bar{c}}^{\top}\mathbf{\Theta}(t),$  
$\displaystyle \widehat{\textrm{Var}}(t)$ $\displaystyle =$ $\displaystyle \mathbf{\Theta}(t)^{\top}\textrm{Cov}(\mathbf{C})\mathbf{\Theta}(t),$  
$\displaystyle \widehat{\textrm{Cov}}(s,t)$ $\displaystyle =$ $\displaystyle \mathbf{\Theta}(s)^{\top}\textrm{Cov}(\mathbf{C})\mathbf{\Theta}(t),$  

where $ \mathbf{\bar{c}}_l\stackrel{ \mathrm{def}}{=}\frac{1}{n} \sum\limits_{i=1}^n c_{il},\ l=1,\ldots ,L$ and $ \textrm{Cov}(\mathbf{C})\stackrel{ \mathrm{def}}{=}\frac{1}{n-1} \sum\limits_{i=1}^n (\mathbf{c}_i-\bar{\mathbf{c}})
(\mathbf{c}_i-\bar{\mathbf{c}})^{\top}$.
The scalar product in the functional space is defined by:

$\displaystyle \langle x_i,x_j \rangle \ \stackrel{\mathrm{def}}{=}\int\limits_J x_i(t) x_j(t) dt=\mathbf{c}_i^{\top} \mathbf{W}\mathbf{c}_j,$

where

$\displaystyle \mathbf{W}\stackrel{ \mathrm{def}}{=} \int\limits_J\mathbf{\Theta}(t)\mathbf{\Theta}(t)^{\top}dt.$ (5.3)

In practice, the coefficient matrix $ \mathbf{C}$ needs to be estimated from the data set $ {\cal X}$.

An example for a functional basis is the Fourier basis defined on $ J$ by:

\begin{displaymath}\Theta_l(t)=\left\{
\begin{array}{rl}
1 ,&\ l=0,\\
\sin(r\om...
...,&\ l=2r-1,\\
\cos(r\omega t) ,&\ l=2r,\\
\end{array}\right. \end{displaymath}

where the frequency $ \omega$ determines the period and the length of the interval $ \vert J\vert=2\pi/\omega$. The Fourier basis defined above can be easily transformed to the orthonormal basis, hence the scalar-product matrix in (5.3) is simply the identity matrix.

Our aim is to estimate the IV-functions for fixed $ \tau\ =$ 1 month (1M) and 2 months (2M) from the daily-specific grid of the maturities. We estimate the Fourier coefficients on the moneyness-range $ \kappa \in [0.9,1.1]$ for maturities observed on particular day $ i$. For $ \tau^*=$ 1M, 2M we calculate $ \hat{\sigma}_i(\kappa,\tau^*)$ by linear interpolation of the closest observable IV string with $ \tau \leq \tau^*$, $ \widehat{\sigma}_i(\kappa,\tau^*_{i-})$ and $ \tau \geq \tau^*$, $ \widehat{\sigma}_i(\kappa,\tau^*_{i+})$:

$\displaystyle \hat{\sigma}_i(\kappa,\tau^*)=\hat{\sigma}_i(\kappa,\tau^*_{i-})\...
...pa,\tau^*_{i+})\left(\frac{\tau^*-\tau^*_{i-}}{\tau^*_{i+}-\tau^*_{i-}}\right),$

for $ i$ where $ \tau^*_{i-}$ and $ \tau^*_{i-}$ exist. In Figure 5.2 we show the situation for $ \tau^*=$1M on May 30, 2001. The blue points and the blue finely dashed curve correspond to the transactions with $ \tau^*_{-}=$16 days and the green points and the green dashed curve to the transactions with $ \tau^*_{+}=$ 51 days. The solid black line is the linear interpolation at $ \tau^*=$ 30 days.

Figure 5.2: Linear interpolation of IV strings on May 30, 2001 with $ L=9$.

The choice of $ L=9$ delivers a good tradeoff between flexibility and smoothness of the strings. At this moment we exclude from our analysis those days, where this procedure cannot be performed due to the complete absence of the needed maturities, and strings with poor performance of estimated coefficients, due to the small number of contracts in a particular string or presence of strong outliers. Using this procedure we obtain 77 ``functional" observations $ x^{1M}_{i_1}(\kappa) \stackrel{\mathrm{def}}{=}\hat{\sigma}_{i_1}(\kappa,1M),\ i_1=1,\ldots , 77$, for the 1M-maturity and 66 observations $ x^{2M}_{i_2}(\kappa) \stackrel{\mathrm{def}}{=}\hat{\sigma}_{i_2}(\kappa,2M),\ i_2=1,\ldots , 66$, for the 2M-maturity, as displayed in Figure 5.3.

Figure 5.3: Functional observations estimated using Fourier basis with $ L=9$, $ \hat{\sigma}_{i_1}(\kappa,1M),
\ i_1=1,\ldots , 77$, in the left panel, $ \hat{\sigma}_{i_2}(\kappa,2M)\ i_2=1,\ldots , 66$ in the right panel.