12.2 Nonparametric Hull Methods

The production set $ \Psi$ and the production function $ g$ is usually unknown, but a sample of production units or decision making units (DMU's) is available instead:

$\displaystyle {\cal X}= \{(x_i, y_i), i = 1, \ldots, n \}.
$

The aim of productivity analysis is to estimate $ \Psi$ or $ g$ from the data $ {\cal X}$. Here we consider only the deterministic frontier model, i.e. no noise in the observations and hence $ {\cal X}\subset\Psi$ with probability $ 1$. For example, when $ q=1$ the structure of $ {\cal X}$ can be expressed as:

$\displaystyle y_i = g(x_i)-u_i,~i=1,\ldots,n
$

or

$\displaystyle y_i = g(x_i)v_i,~i=1,\ldots,n
$

where $ g$ is the frontier function, and $ u_i\ge 0$ and $ v_i\le 1$ are the random terms for inefficiency of the observed pair $ (x_i, y_i)$ for $ i=1,\ldots,n$.

The most popular nonparametric method is Data Envelopment Analysis (DEA), which assumes that the production set is convex and free disposable. This model is an extension of  Farrel (1957)'s idea and was popularized by  Charnes, Cooper, and Rhodes (1978).  Deprins, Simar, and Tulkens (1984), assuming only free disposability on the production set, proposed a more flexible model, say, Free Disposal Hull (FDH) model. Statistical properties of these hull methods have been studied in the literature.  Park (2001),  Simar and Wilson (2000) provide reviews on the statistical inference of existing nonparametric frontier models. For the nonparametric frontier models in the presence of noise, so called nonparametric stochastic frontier models, we refer to Simar (2003),  Kumbhakar, Park, Simar and Tsionas (2004) and references therein.


12.2.1 Data Envelopment Analysis

The Data Envelopment Analysis (DEA) of the observed sample $ {\cal X}$ is defined as the smallest free disposable and convex set containing $ {\cal X}$:

$\displaystyle \widehat{\Psi}_{\rm DEA}$ $\displaystyle =$ $\displaystyle \{(x, y)\in \mathbb{R}_+^p\times\mathbb{R}_+^q\,\vert\,
x \ge {\sum_{i=1}^n}\gamma_i x_i, ~ y \le {\sum_{i=1}^n}\gamma_i y_i,$  
    $\displaystyle \textrm{ for some }(\gamma_1, \ldots, \gamma_n) \textrm{ such that }$  
    $\displaystyle {\sum_{i=1}^n}\gamma_i = 1, \gamma_i \ge 0 ~\forall i=1,\ldots,n\}.$  

The DEA efficiency scores for a given input-output level $ (x_0,
\index{data envelopment analysis (DEA)!efficiency score}
y_0)$ are obtained via (12.3):
$\displaystyle \widehat{\theta}^{\rm IN}(x_0, y_0)$ $\displaystyle =$ $\displaystyle \min \{\theta>0\,\vert\,
(\theta x_0, y_0)\in \widehat{\Psi}_{\rm DEA}\},$  
$\displaystyle \widehat{\theta}^{\rm OUT}(x_0, y_0)$ $\displaystyle =$ $\displaystyle \max \{\theta>0\,\vert\,
(x_0, \theta y_0)\in \widehat{\Psi}_{\rm DEA}\}.$  

The DEA efficient levels for a given level $ (x_0,y_0)$ are given by (12.1) and (12.2) as:

$\displaystyle \widehat{x^{\partial}}(y_0)
= \widehat{\theta}^{\rm IN}(x_0,y_0)x_0; \quad
\widehat{y^{\partial}}(x_0)
= \widehat{\theta}^{\rm OUT}(x_0,y_0)y_0.
$

Figure 12.4 depicts 50 simulated production units and the frontier built by DEA efficient input levels. The simulated model is as follows:

$\displaystyle x_i\sim {\rm Uniform}[0,1],~y_i = g(x_i)e^{-z_i},
~g(x) = 1+\sqrt{x}, ~z_i\sim {\rm Exp(3)},
$

for $ i=1,\ldots, 50$, where $ {\rm Exp}(\nu)$ denotes the exponential distribution with mean $ 1/\nu$. Note that $ \mathop{\textrm{E}}[-z_i]=0.75$. The scenario with an exponential distribution for the logarithm of inefficiency term and 0.75 as an average of inefficiency are reasonable in the productivity analysis literature (Gijbels, Mammen, Park, and Simar; 1999).

Figure 12.4: 50 simulated production units (circles), the frontier of the DEA estimate (solid line), and the true frontier function $ g(x) = 1+\sqrt {x}$ (dotted line).
\includegraphics[width=1.04\defpicwidth]{STFnpadea.ps}

The DEA estimate is always downward biased in the sense that $ \widehat{\Psi}_{\rm DEA}\subset\Psi$. So the asymptotic analysis quantifying the discrepancy between the true frontier and the DEA estimate would be appreciated. The consistency and the convergence rate of DEA efficiency scores with multidimensional inputs and outputs were established analytically by  Kneip, Park, and Simar (1998). For $ p=1$ and $ q=1$,  Gijbels, Mammen, Park, and Simar (1999) obtained its limit distribution depending on the curvature of the frontier and the density at the boundary.  Jeong and Park (2004) and  Kneip, Simar, and Wilson (2003) extended this result to higher dimensions.


12.2.2 Free Disposal Hull

The Free Disposal Hull (FDH) of the observed sample $ {\cal X}$ is defined as the smallest free disposable set containing $ {\cal X}$:

$\displaystyle \widehat{\Psi}_{\rm FDH} = \{(x, y)\in \mathbb{R}_+^p\times\mathbb{R}_+^q\,\vert\,
x \ge x_i, ~ y \le y_i, ~ i = 1, \ldots, n \}.
$

We can obtain the FDH estimates of efficiency scores for a given input-output level $ (x_0,y_0)$ by substituting $ \widehat{\Psi}_{\rm DEA}$ with $ \widehat{\Psi}_{\rm FDH}$ in the definition of DEA efficiency scores. Note that, unlike DEA estimates, their closed forms can be derived by a straightforward calculation:
$\displaystyle \widehat{\theta}^{\rm IN}(x_0, y_0)$ $\displaystyle =$ $\displaystyle \min_{i\vert y\le y_i} ~ \max_{1\le j \le p}
~ {x_i^j}\Big/{x_0^j},$  
$\displaystyle \widehat{\theta}^{\rm OUT}(x_0, y_0)$ $\displaystyle =$ $\displaystyle \max_{i\vert x\ge x_i} ~ \min_{1\le k \le q}
~ {y_i^k}\Big/{y_0^k},$  

where $ v^j$ is the $ j$th component of a vector $ v$. The efficient levels for a given level $ (x_0,y_0)$ are obtained by the same way as those for DEA. See Figure 12.5 for an illustration by a simulated example:

$\displaystyle x_i\sim {\rm Uniform}[1,2], y_i = g(x_i)e^{-z_i},
g(x) = 3(x-1.5)^3+0.25x+1.125, z_i\sim {\rm Exp(3)},
$

for $ i=1,\ldots, 50$.  Park, Simar, and Weiner (1999) showed that the limit distribution of the FDH estimator in a multivariate setup is a Weibull distribution depending on the slope of the frontier and the density at the boundary.

Figure 12.5: 50 simulated production units (circles) the frontier of the FDH estimate (solid line), and the true frontier function $ g(x) = 3(x-1.5)^3+0.25x+1.125$ (dotted line).
\includegraphics[width=1.04\defpicwidth]{STFnpafdh.ps}