Direction Selection in Stochastic Directional Distance Functions

04/02/2019 ∙ by Kevin Layer, et al. ∙ 0

Researchers rely on the distance function to model multiple product production using multiple inputs. A stochastic directional distance function (SDDF) allows for noise in potentially all input and output variables. Yet, when estimated, the direction selected will affect the functional estimates because deviations from the estimated function are minimized in the specified direction. The set of identified parameters of a parametric SDDF can be narrowed via data-driven approaches to restrict the directions considered. We demonstrate a similar narrowing of the identified parameter set for a shape constrained nonparametric method, where the shape constraints impose standard features of a cost function such as monotonicity and convexity. Our Monte Carlo simulation studies reveal significant improvements, as measured by out of sample radial mean squared error, in functional estimates when we use a directional distance function with an appropriately selected direction. From our Monte Carlo simulations we conclude that selecting a direction that is approximately orthogonal to the estimated function in the central region of the data gives significantly better estimates relative to the directions commonly used in the literature. For practitioners, our results imply that selecting a direction vector that has non-zero components for all variables that may have measurement error provides a significant improvement in the estimator's performance. We illustrate these results using cost and production data from samples of approximately 500 US hospitals per year operating in 2007, 2008, and 2009, respectively, and find that the shape constrained nonparametric methods provide a significant increase in flexibility over second order local approximation parametric methods.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The focus of this paper is direction selection in stochastic directional distance functions (SDDF).111Here we use the term stochastic in reference to a model with a noise term. While the DDF is typically used to measure efficiency, in this paper we use a nonparametric shape constrained SDDF to model the conditional mean behavior of production. The stochastic distance function (SDF) was introduced by Lovell et al. (1994) and was used in a series of early empirical studies by Coelli and Perelman (1999, 2000) and Sickles et al. (2002). The parameters of a parametric distance function are point identified; however, if the direction in the DDF is not specified, then the parameters of a parametric DDF are set identified.222Let be what is known (e.g., via assumptions and restrictions) about the data generating process (DGP). Let represent the parameters to be identified, let denote all possible values of , and let be the true but unknown value of . Then the vector of unknown parameters is point identified if it is uniquely determined from . However, is set identified if some of the possible values of are observationally equivalent to (Lewbel (forthcoming)). A set of axiomatic properties related to production and cost functions, such as monotonicity and convexity in the case of a cost function, are well established in the production literature (Shephard (1970), Chambers (1988)). Although the stochastic distance function literature acknowledges the axiomatic properties necessary for duality, it does not impose them globally. Instead, authors typically impose them only on a particular point in the data (e.g., Atkinson et al. (2003)). Recognizing these issues, we provide an axiomatic nonparametric estimator of the SDDF and a method to restrict the pool of the directions to choose from for the SDDF, thereby reducing the size of the set identified parameter set. Most empirical studies that use establishment or hospital level data to estimate production or cost functions either assume a specific parametric form or ignore noise, or both ((Hollingsworth, 2003)). In contrast, we use an axiomatic nonparametric SDDF estimator and the proposed method to determine a set of acceptable directions to estimate a cost function that maintains global axiomatic properties for the US hospital industry. Furthermore, we demonstrate the importance of global axiomatic properties for the estimation of most productive scale size and marginal costs.

A few papers have attempted to implement the directional distance function in a stochastic setting (see, for example, Färe et al. (2005), Färe et al. (2010), and Färe and Vardanyan (2016)). The latter two papers discuss the challenges of selecting a parametric functional form that does not violate the axioms typically assumed in production economics. Based on their observations, Färe and Vardanyan (2016) use a quadratic functional specification.333As Kuosmanen and Johnson (2017) note, the translog function used for multi-output production cannot satisfy the standard assumptions for the production technology globally for any parameter values. The quadratic functional form does not have this shortcoming. Yet several papers show a loss of flexibility in parametric functional forms, such as the translog or the quadratic functional form, when shape constraints are imposed (e.g., Diewert and Wales (1987)). Also important to implementation, the selection of the direction vector in the SDDF has been discussed in Färe et al. (2017) and Atkinson and Tsionas (2016), among others. These papers focus on selecting the direction corresponding to a particular interpretation of the inefficiency measure, based on the distance to the economically efficient point. In contrast, we consider Kuosmanen and Johnson (2017)’s multi-step efficiency analysis and focus on the first step, estimating a conditional mean function. Our goal is to select the direction that best recovers the underlying technology while acknowledging that the data is likely to contain noise in potentially all variables.444For researchers interested in productivity measurement and productivity variation (e.g., Syverson (2011)), the results from this paper can be used directly. For authors interested in efficiency analysis, the insights from this paper could be used to improve the estimates from the first stage of Kuosmanen and Johnson (2017)’s three-step procedure where efficiency is estimated in the third step.

To model multi-product production, Kuosmanen and Johnson (2017) have proposed the use of axiomatic nonparametric methods to estimate the SDDF which they name Directional Convex Nonparametric Least Squares (CNLS-d), a type of sieve estimator. Their methods have the benefits of relaxing standard functional form assumptions for production, cost, or distance functions, but also improve the interpretability and finite sample efficiency over nonparametric methods such as kernel regression (Yagi et al. (2018)). A variety of models can be interpreted as special cases of Kuosmanen and Johnson (2017), among these are a set of models that specify the direction (e.g., Johnson and Kuosmanen (2011), Kuosmanen and Kortelainen (2012)). All CNLS models are sieve estimators and fall into the category of partially identified or set identified estimators discussed in Manski (2003) and Tamer (2010). The guidance our paper provides in selecting a direction will reduce the size of the set identified for CNLS-d and other DDF estimators with flexible direction specifications.

Much of the production function literature concerns endogeneity issues, for example see Olley and Pakes (1996), Levinsohn and Petrin (2003), and Ackerberg et al. (2015). These methods are often referred to as proxy variable approaches. The argument for endogeneity is typically that decisions regarding variable inputs such as labor are made with some knowledge of the factors included in the unobserved residuals. Recently, these methods have been reinterpreted as instrumental variable approaches (Wooldridge (2009)), or control function approaches (Ackerberg et al. (2015)

). Unfortunately, the assumptions on the particular timing of input decisions is not innocuous. Indeed every firm must adjust its inputs in exactly the same way, otherwise the moment restrictions needed for point identification are violated. For an alternative in the stochastic frontier setting, see

Kutlu (2018).

Kuosmanen and Johnson (2017) have shown that a production function estimated using a stochastic distance function under a constant returns-to-scale assumption is robust to endogeneity issues because the normalization by one of the inputs or outputs causes the errors-in-variables to cancel each other. In this paper we consider the more general case of a convex technology that does not necessarily satisfy constant returns-to-scale, and show that when errors across variables are highly correlated, a specific type of endogeneity, the SDDF improves estimation performance significantly over the typical alternative of ignoring the endogeneity.

When considering alternative directions in the DDF, we show that the direction that performs the best is often related to the particular performance measure used. We use an out-of-sample mean squared error (MSE) that is measured radially to address this issue. This measure is motivated by the results of our Monte Carlo simulations and is natural for a function that satisfies monotonicity and convexity, assuring the true function and the estimated function are close in the areas were most data are observed.

We analyze US hospital data and characterize the most productive scale size and marginal costs for the US hospital sector. We demonstrate that out-of-sample MSE is reduced significantly by relaxing parametric functional form restrictions. We also observe the advantage of imposing axioms that allow the estimated function to still be interpretable. Concerning the direction selection, we find, for this data set, that the exact direction selected is not very critical in terms of MSE performance, but some commonly used directions should be avoided.

The remainder of this paper is organized as follows. Section 2 introduces the statistical model and the production model. Section 3 describes the estimators used for the analysis. Section 4 outlines our reasons for the MSE measure we propose. Section 5 highlights the importance of the direction selection through Monte Carlo experiments. Section 6 describes our direction selection method. Section 7 demonstrates the benefits of using non-parametric shape-constrained estimators with an appropriately selected direction for US hospital data. Section 8 concludes.

2 Models

2.1 Statistical Model

We consider a statistical model that allows for measurement error in potentially all of the input and output variables. Let , be a vector of random input variables of length and , , be a vector of random output variables of length , where indexes observations. Let , , be a vector of random error variables of length and , , be a vector of random error variables of length . One way of modeling the errors-in-variable (EIV) is:


Equation (1) is only identified when multiple measurements exist for the same vector of regressors or when a subsample of observations exists in which the regressors are measured exactly (Carroll et al. (2006)). Carroll et al. (2006) discussed a standard regression setting, not a multi-input/multi-output production process. Thus, repeated measurement requires all but one of the netputs to be identical across at least two observations.555Here we use the term netputs to describe the union of the input and output vectors. Neither of of these conditions is likely to hold for typical production data sets; therefore, we develop an alternative approach to identification.

As our starting point, we use the alternative, but equivalent, representation of the EIV model proposed by Kuosmanen and Johnson (2017):


Clearly, the representations of Carroll et al. (2006) and Kuosmanen and Johnson (2017) are equivalent if:


We define the following normalization:


which implies:


We refer to as the true noise direction and in the most general case we allow the direction to be observation specific.666When the noise direction is observation specific and random, all inputs and outputs potentially contain noise and therefore are endogeneous variables. If some components of the vector are zero, this implies the associated variables are exogeneous and measured with certainty. See Kuosmanen and Johnson (2017) for more details. The estimation methods to consider noise in potentially all inputs will depend on our assumptions about the production technology, which are discussed in the following subsection.

2.2 Production Model

Researchers use production function models, cost function models, or distance function models to characterize production technologies. Considering a general production process with multiple inputs used to produce multiple outputs, we define the production possibility set as:


Following Shephard (1970), we adopt the following standard assumptions to assure that represents a production technology:

  1. T is closed;

  2. T is convex;

  3. Free Disposability of inputs and outputs; i.e., if and , then .

For an alternative representation, see, for example, Frisch (1964).

Developing methods to estimate characteristics of the production technology while imposing these standard axioms was a popular and fruitful topic from the early 1950’s until the early 1980’s, generating such classic papers as Koopmans (1951), Shephard (1953, 1970), Afriat (1972), Charnes et al. (1978),777Data Envelopment Analysis is perhaps one of the largest success stories and has become an extremely popular method in the OR toolbox for studying efficiency. and Varian (1984)

. Unfortunately, these methods are deterministic in the sense that they rely on a strong assumption that the data do not contain any measurement errors, omitted variables, or other sources of random noise. Furthermore, for some research communities linear programs were seen as harder to implement than parametric regression which could be calculated via normal equations. Thus, most econometricians and applied economists have chosen to use parametric models, sacrificing flexibility for ease of estimation and the inclusion of noise in the model.

Here we focus our attention on the distance function because it allows the joint production of multi-outputs using multi-inputs. The production function and cost functions can be seen as special cases of the distance function in which there is either a single output or a single input (cost), respectively. Further, motivated by our discussion of EIV models above, we consider a directional distance function which allows for measurement error in potentially all variables. We try to relax both the parametric and deterministic assumptions common in earlier approaches to modeling multi-output/multi-input technologies. We do this by building on an emerging literature that revisits the axiomatic nonparametric approach incorporating standard statistical structures including noise (Kuosmanen (2008);Kuosmanen and Johnson (2010)).

2.2.1 The Deterministic Directional Distance Function (DDF)

Luenberger (1992) and Chambers et al. (1996, 1998) introduced the directional distance function, defined for a technology T as:


where and are the observed input and output vectors, such that and are assumed to be observed without noise and fully describe the resources used in production and the goods or services generated from production. is the direction vector in the input space, is the direction vector in the output space, and defines the direction from the point in which the distance function is measured.888We assume ; i.e., at least one of the components of either or is non-zero. is commonly interpreted as a measure of inefficiency by quantifying the number of bundles of size needed to move the observed point to the boundary of the technology in a deterministic setting.

Chambers et al. (1998) explained how the directional distance function characterizes the technology T for a given direction vector ; specifically:


If T satisfies the assumptions stated in Section 2.2, then the directional distance function has the following properties (see Chambers et al. (1998)):

  1. is upper semicontinuous in and (jointly);

  2. ;

  3. ;

  4. ;

  5. If T is convex, then is concave in .

An additional property of the DDF is the translation invariance:

  1. .

Several theoretical contributions have been made to extend the deterministic DDF, see for example Färe and Grosskopf (2010), Aparicio et al. (2017), Kapelko and Oude Lansink (2017), and Roshdi et al. (2018). The deterministic DDF has been used in several recent applications, including Baležentis and De Witte (2015), Adler and Volta (2016), and Fukuyama and Matousek (2018).

2.2.2 The Stochastic Directional Distance Function

The properties of the deterministic DDF also apply for the stochastic DDF (Färe et al. (2017)). Here we focus on estimating a stochastic DDF considering a residual which is mean zero.999Two models are possible, 1) a mean zero residual indicating that the residual contains only noise used to pursue a productivity analysis, or 2) a composed residual with both inefficiency and noise. Our direction selection analysis is used in the first step of Kuosmanen and Johnson’s three step procedure in which a conditional mean is estimated. This is represented in Figure 1.

Figure 1: SDDF in mean zero case

Using the statistical model in Section 2.1 and the functional representation of technology in Section 2.2, we restate Proposition 2 in Kuosmanen and Johnson (2017) as:

Proposition 0.

If the observed data are generated according to the statistical model described in Section 2.1, then the value of the DDF in the observed data point

is equal to the realization of the random variable

with mean zero; specifically

In the stochastic distance function literature, the translation property, (f) above, is commonly invoked to move an arbitrarily chosen netput variable out of the distance function to the left-hand side of the equation, yielding an equation that looks like a standard regression model; see, for example, Lovell et al. (1994) and Kuosmanen and Johnson (2017). Instead, we write the SDDF with all of the outputs on one side to emphasize that all netputs are treated symmetrically.

Under the assumption of constant returns to scale, normalizing by one of the netputs causes the noise terms to cancel for the regressors, thus eliminating the issue of endogeneity (e.g., Coelli (2000), Kuosmanen and Johnson (2017)). However, since we relax the constant returns to scale assumption, endogeneity can still be an issue.101010If the endogeneity is caused by correlations in the errors across variables, it can be addressed by selecting an appropriate direction for the directional distance function. This is the direction we explore in the Monte Carlo simulation below in Section 4.1.

Färe et al. (2017), among others, have recognized that the selection of the direction vector affects the parameter estimates of the production function. In A.1, for the linear parametric DDF defined below, we prove that alternative directions lead to distinct parameter estimates.

3 Estimation

We now describe the estimation of the DDF under a specific parametric functional form and under nonparametric shape constrained methods.

3.1 Parametric Estimation and the DDF

Consider data composed of observations where the inputs are defined by and the outputs by . The estimator minimizes the squared residuals for a DDF with an arbitrary prespecified direction . For a linear production function, we formulate the estimator as:


where is the intercept, and are the vectors of the marginal effects of the inputs and outputs, respectively, and the are the residuals.

Equation (9b) enforces the translation property described in Chambers et al. (1998); i.e., scaling the netput vector by in the direction causes the distance function to decrease by . The combination of Equation (9a) and Equation (9b) ensures that the residual is computed along the direction . Intuitively this is because the and are rescaled proportionally to the direction in Equation (9b). For a formal proof, see Kuosmanen and Johnson (2017), Proposition 2.

3.2 The CNLS-d Estimator

Convex Nonparametric Least Squares (CNLS) is a non-parametric estimator that imposes the axiomatic properties, such as monotonicity and concavity, on the production technology. The estimator CNLS-d is the directional distance function generalization of CNLS (Hildreth (1954), Kuosmanen (2008)). While CNLS allows for just a single output, CNLS-d permits multiple outputs. In CNLS the direction along which residuals are computed is specified a priori and is typically measured in terms of the unique output, . This corresponds to the assumption that noise is only present in and that all other variables, , do not contain noise. CNLS-d allows the residual to be measured in an arbitrary prespecified direction. If all components of the direction vector are non-zero, this corresponds to an assumption that noise is present in all inputs.

Using the same input-output data defined in Section 2.1, the CNLS-d estimator is given by:


where is the vector of the intercept terms, and are the matrices of the marginal effects of the inputs and the outputs, respectively, and is the vector of the residuals (Kuosmanen and Johnson, 2017).

Equation (10a) is similar to (9a) with the notable different that are indexed by

indicating each observation has their own hyperplane defined by the triplet

. Equation (10b), which corresponds to the Afriat inequalities, imposes concavity. Given Equation (10b), Equation (10c) imposes the monotonicity of the estimated frontier relative to the inputs. Equation (10d) enforces the translation property described in Chambers et al. (1998) and has the same interpretation as Equation (9b). Similar to Equation (10c), the combination of Equation (10b) and Equation (10e) imposes the monotonicity of the DDF relative to the outputs. In Equation (10), we specify the CNLS-d estimator with a single common direction, .111111Alternatively, some researchers may be interested in using observation specific directions or perhaps group specific directions (Daraio and Simar (2016)). In A.3, we derive the conditions under which multiple directions can be used in CNLS-d while still maintaining the axiomatic property of global convexity of the production technology. Consider two groups each with their own direction used in the directional distance function. Essentially, the convexity constraint holds as long as the noise is orthogonal to the difference of the two directions used in the estimation. A simple example of this situation is all the noise being in one dimension and the difference between the two directions for this dimension is zero. However, this condition is restrictive when noise is potentially present in all variables. Thus, specifying multiple directions in CNLS-d while maintaining the axiomatic properties of the estimator, specifically, the convexity of the production possibility set, is still an open research question.

4 Measuring MSE under Alternative Directions

4.1 Illustrative Example

Data Generation Process

For our illustrative example, we use a simple linear cost function and a directional distance linear parametric estimator. We consider two noise generation processes: a random noise direction and a fixed noise direction. Here we discuss the random noise direction case, but direct the reader to B for a discussion of the fixed noise direction case.

For our example we consider a single output cost function where the observations , are created by the Data Generation Process (DGP) outlined in Algorithm 1:

[nobreak=true] Algorithm 1

  1. Output,

    , is drawn from the continuous uniform distribution


  2. Cost is calculated as , where .

  3. The noise terms, , are constructed as follows:

    1. is calculated as:


      where and are the means of the output and cost without noise, respectively.

    2. The scalar length of the noise is rescaled by the vector, , in each dimension. These scaling factors are calculated as where are drawn from a continuous uniform distribution .

    3. , where

      is a scalar length drawn from the normal distribution,

      , where

      is prespecified initial value for the standard deviation and

       is a normalized direction vector.

  4. The observations with noise are obtained by appending the noise terms to the generated data:

Figure 2: Algorithm 1: Linear function data generation process with random noise directions

Figure 3 illustrates the results for two cases of the data generating process; in the first case the direction of the noise is random, while in the second case the direction of the noise is fixed.

Figure 3: Linear Case with Random Noise Direction (left), Linear Case with Fixed Noise Direction (right)
Evaluating the Parametric Estimator’s Performance

We use two criteria to assess the performance of the parametric estimator: 1) Mean Squared Error (MSE) comparing the true function to the estimated function, and 2) MSE comparing the estimated function to a testing data set. While we can calculate both metrics for our Monte Carlo simulations, only the second metric can be used with our application data below.

To calculate deviations, we use the MSE direction . For any particular point of the testing set, , we determine the estimates, , defined as the intersection of the estimated function characterized by the coefficients and the line passing through , and direction vector . We evaluate the value of the MSE as:


To compare the true function to the estimated function, we use the Linear Function Data Generation Process, Algorithm 1, steps 1 and 2, to construct our testing data set . To evaluate the estimated function without knowing the true function the testing set is built using the full Linear Function Data Generation Process.

Figure 4 show the MSE computations.

Figure 4: MSE calculated relative to the True Function in the MSE direction (left), MSE calculated using a testing data set in the MSE direction (right)
Additional Information Describing the Simulations

We apply the DGP described above to generate a training set, , and a testing set , in which noise is introduced to the observations in random directions. We set the noise scaling coefficient to and the number of observations to . We run repetitions of the simulation for each experiment on a computer with a processor Intel Core i7 CPU 860 2.80 GHz and 8 GB RAM. We use the quadratic solver on MATLAB 2017a.

For the estimator, we define the direction vector used in the parametric DDF as a function of an angular variable , which allows us to investigate alternative directions. Specifically, the direction vector used in the DDF is . We examine the set of directions corresponding to the angles .

Results: Random Noise Directions

Table 1 and Table 2 show results corresponding to the two performance criteria introduced above and shown in Figure 4, the MSE relative to the true function and the MSE relative to a testing data set, respectively. Table 1 shows that the direction corresponding to the angle , , produces the smallest values of MSE (shown in bold in the table) regardless of the direction used for the MSE computation. However, the estimator’s quality diminishes if we select the extreme directions corresponding to the angles and . Table 2 reports performance via a testing set, the direction corresponding to the smallest MSE value (shown in bold) is always the one matching the direction used in the MSE computation. In applications, using a testing set is necessary because the true function is unknown. Table 2 shows the benefits of matching the direction of MSE evaluation direction outweigh the benefits of selecting a direction based on the properties of the function being estimated.

Avg MSE: Comparison
to the True Function
DDF Angle
MSE Dir Angle
2.09 0.75 0.56 1.16 3.68
1.36 0.46 0.32 0.63 1.89
1.25 0.41 0.28 0.51 1.48
1.59 0.50 0.32 0.57 1.60
3.06 0.91 0.55 0.92 2.44
Note: Displayed are measured values multiplied by .
Table 1: Average MSE over 100 simulations for the Linear Estimator compared to the true function with a DGP using random noise directions
Avg MSE: Comparison
to Out-of-Sample
DDF Angle
MSE Dir Angle
28.28 29.43 31.29 34.23 40.67
18.03 17.79 18.19 19.09 21.32
16.38 15.55 15.45 15.77 16.90
20.50 18.67 18.04 17.90 18.46
38.63 33.07 30.68 29.29 28.70
Note: Displayed are measured values multiplied by .
Table 2: Average MSE over 100 simulations for the Linear Estimator compared to an out-of-sample testing set with a DGP using random noise directions

For the out-of-sample testing set, the direction that provides the smallest MSE value is the direction used for the MSE computation. Because the functional estimate is optimized for the direction specified in the SDDF, it is perhaps expected that using the same direction that will be used in the MSE evaluation would produce a relatively low MSE compared to other directions. However, when the functional estimate is compared to the true function, the MSE values are around ten times smaller than the out-of-sample testing case. In out-of-sample testing the presence of noise in the observations causes a deviation regardless of the quality of the estimator or the number of observations. The DDF direction corresponding to the smallest MSE is the direction orthogonal to the true function (i.e., for our DGP). This direction provides the shortest distance from the observations to the true function. We conclude that, in this experiment, it is preferable to select a direction orthogonal to the true function (see Section 5 for further experiments).

From the fixed noise direction experiments (see B.1), we observe that using a direction for the estimator that matches the direction used for the noise generation significantly reduces the MSE values compared to the true function. From this, we infer that when endogeneity is severe, using a direction that matches the characteristics of this endogeneity significantly improves the fit of the estimator; i.e., the MSE is smaller for the matching direction than for the second best direction in of the cases (see Section 5 for the details).

Finally, we need to solve the problem of evaluating alternative directions when the true function is unknown so that we can evaluate alternative directions in the application data. Below, we describe our proposed alternative measure of fit.

4.2 Radial MSE Measure

MSE is typically measured by the average sum of squared errors in the dimension of a single variable, such as cost or output. As explained in Section 4.1, when we compare out-of-sample performance, we find that the best direction to use in estimating a SDDF is the direction used for MSE evaluation regardless of the direction of noise in the DGP or any other characteristics of the DGP. To avoid this relationship between the direction of estimation and the direction of evaluation, we propose a radial MSE measure.

We begin by normalizing the data to a unit cube and consider a case of outputs and observations, where the original observations are:

The normalized observations are:


Our radial MSE measure is the distance from the testing set observation to the estimated function measured along a ray from the testing set observations to the center . Having normalized the data, the center for the radial measure is

The radial MSE measure is the average of the distance from each testing set observation to the estimated function measured radially. Figure 5 illustrates this measure. For a convex function, a radial measure reduces the bias in the measure for extreme values in the domain.

Figure 5: A Radial MSE Measure on a Cost Function with Two Outputs

5 Monte Carlo Simulations

We next examine how different DGPs affect the optimal direction for the DDF estimator based on a set of Monte Carlo simulations. We consider both random noise directions for each observation and a fixed noise direction representing a high endogeneity case. We consider the effects of the different variance levels for the noise and changes in the underlying distribution of the production data. Using the simplest case of two outputs and a fixed cost level for all observed units allows us to separate the effects of the data and of the function.

5.1 CNLS-d Formulation for Cost Isoquant Estimation

Before describing our experiments, we first outline the CNLS-d for estimating the iso-cost level set. It is based on the following optimization problem:


Note all observations, , have a common cost level. This allows us to focus on a 2-dimensional estimation problem. For results related to 3-dimensional estimation problems see B.2, Experiment 6.

We can recover the fitted values, , and the coefficient, , using:


5.2 Experiments

We conducted several experiments to investigate the optimal direction for the DDF estimator. Four experiments’ results are shown in the main text of the paper with two additional experiments described in the appendix.

Experiment 1 - Base case: A two output circular isoquant with uniformly distributed angle parameters and random noise direction

For the base case, we consider a fixed cost level and approximate a two output isoquant; i.e., . Indexing the outputs by and observations by , we generate the output variables as:


where is the observation on the isoquant and is the noise. We generate the output levels as:


where , is drawn randomly from a continuous uniform distribution, . The noise terms, , have the following expressions:


where the length is drawn from the normal distribution , the angle is observation specific and characterizes the noise direction for each observation, and is drawn from a continuous uniform distribution . The values considered for the directions in CNLS-d estimator are . The standard deviation of the normal distribution is . We perform the experiment times for each parameter setting.

Table 3 reports the radial MSE values from a testing set of observations lying on the true function.

CNLS-d Direction Angle
Average MSE across simulations 13.90 4.65 3.32 4.49 13.93
Note: Displayed are measured values multiplied by .
Table 3: Experiment 1: Values of the radial MSE relative to the true function. The angle used in CNLS-d estimator varies and the noise direction is randomly selected. In the DGP, the standard deviation of the noise distribution, , is 0.1.

As shown in Table 3, the angle corresponding to the smallest MSE (shown in bold) is the one that gives an orthogonal direction to the center of the true function, , and that the MSE values differ significantly, increasing at similar rates as the direction angle deviates from in either direction.

Experiment 2 - The base case with fixed noise directions

In this experiment, , which characterizes the noise direction for each observation, is constant for all observations, . The values used for and the directions in CNLS-d estimator are the same, . The standard deviation of the normal distribution is again . We perform the experiment times for each parameter settings. Table 4 reports the results.

Each row in the Table 4 corresponds to a different noise direction in DGP. The bold numbers identify the directions in CNLS-d estimator that obtain the smallest MSE for each noise direction. We confirm our previous insight, from the parametric estimator and fixed noise direction case described in B.1, that the bold values appearing on the diagonal (from the upper-left to the lower-right of Table 4) correspond to the directions used in CNLS-d. This result indicates that selecting the direction in the SDDF that matches the underlying noise direction in the DGP results in improved functional estimates.

CNLS-d Direction Angle
Noise Direction Angle
2.69 3.03 4.49 8.86 25.47
7.49 3.44 4.00 8.07 28.83
20.28 5.79 4.30 5.80 19.06
25.58 7.80 4.18 3.51 6.84
25.90 9.09 4.73 3.10 2.57
Note: Displayed are measured values multiplied by .
Table 4: Experiment 2: Values of radial MSE relative to the true function varying the DGP noise direction and the CNLS-d estimator direction. In the DGP, the standard deviation of the noise distribution, , is 0.1.

Experiment 3. Base case with fixed noise direction and different noise levels

In Experiment 3, we vary the noise term by changing the coefficient. Table 5 reports the results for .

CNLS-d Direction Angle
Noise Direction Angle
0.92 0.82 0.96 1.53 5.12
1.83 1.09 1.09 1.47 5.45
3.70 1.41 1.29 1.43 3.93
5.75 1.68 1.27 1.18 1.86
4.61 1.40 0.95 0.79 0.90
Note: Displayed are measured values multiplied by .
Table 5: Experiment 3–Less Noise: Values of radial MSE relative to the true function varying the DGP noise direction and the CNLS-d direction. In the DGP, the standard deviation of the noise distribution, , is 0.05.

In Table 5 (Experiment 3, with ), we do not observe the same diagonal pattern observed in Experiment 2, and the best direction for CNLS-d estimator does not match the direction selected for the noise. This leads us to hypothesize that when the noise level is small, data characteristics, such as the distribution of the regressors or the shape of the function, affect the estimation whereas when the noise level is large, regressors’ relative variability becomes a more dominant factor in determining the best direction for the CNSL-d estimator.

However, with the results of Experiment 3 are consistent with those from Experiment 2; i.e., the best direction always coincides with the noise direction selected. The results of Experiment 3 with are reported in B, Table 15 (Experiment 3 with ).

Experiment 4: Base case with different distributions for the initial observations on the true function

In Experiment 4, we seek to understand how changing the DGP for the angle, , affects the optimal direction. We consider the three normal distributions with different parameters: , and . We truncate the tails of the distribution so that the generated angles fall in the range . Noise is specified as in Experiment 1. Table 6 reports the results of this experiment.

Mean of the CNLS-d Direction angle
Normal Distribution ()
3.19 2.21 3.89 10.28 46.47
8.44 2.92 1.98 3.17 9.00
45.64 10.25 4.02 2.43 3.07
Note: Displayed are measured values multiplied by .
Table 6: Experiment 4: Values of radial MSE relative to the true function varying the CNLS-d direction and the mean of the normal distribution used in the DGP.

In Table 6, we observe that selecting a direction in the SDDF to match , the mean of the distribution for the angle variable used in the DGP, corresponds to the smallest MSE value. This result suggests that the estimator’s performance improves when we select a direction that points to the “center” of the data.

B.2 presents additional experiments, varying the distribution of the observations and considering three outputs with a fixed costed level. These experiments lend further support to the strategy of selecting a direction pointed to the “center” of the data.

6 Proposed Approach to Direction Selection

Based on Monte Carlo simulations, we found that the optimal direction depends on the shape of the function and the distribution of the observed data. This of itself is not surprising. However, by assuming a unimodal distribution for the data generation process, a direction that aims towards the “center” of the data and is perpendicular to the true function at that point tends to outperform other directions. To apply this finding for a data set with outputs and observations, , we suggest selecting the direction for the DDF as follows:

  1. Normalize the data:

  2. Select the direction:


This provides a method for direction selection that can be used in applications when the true direction is unknown.121212A cost function is convex with respect to the point . Therefore, to have a ray that points from the point to the median of the data, the directional vector is needed. We test the proposed method by estimating a cost function for a US hospital data set.

7 Cost Function Estimation of the US Hospital Sector

We analyze the cost variation across US hospitals using a conditional mean estimate of the cost function. We estimate a multi-output cost function for the US hospital sector by implementing our data-driven method for selecting the direction vector for the DDF. We report most productive scale size and marginal cost estimates.

7.1 Description of the Data Set

We obtain cost data from the American Hospital Association’s (AHA) Annual Survey Databases from 2007 to 2009. The costs reported include payroll, employee benefits, depreciation, interest, supply expenses and other expenses. We estimate a cost function which can be interpreted as a distance function with a single input when hospitals face the same input prices131313Unfortunately we do not observe input prices. We chose to estimate a cost function and make the assumption of common input prices rather than impose an arbitrary division of the cost.. We obtain hospital output data from the Healthcare Cost and Utilization Project (HCUP) National Inpatient Sample (NIS) core file that captures data annually for all discharges for a 20% sample of US community hospitals. The hospital sample changes every year. For each patient discharged, all procedures received are recorded as International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9-CM) codes. The typical hospital in the US relies on these detailed codes to quantify the medical services it provides (Zuckerman et al. (1994)). We map the codes to four categories of procedures, specifically the procedure categories are “Minor Diagnostic,” “Minor Therapeutic,” “Major Diagnostic,” and “Major Therapeutic” which are standard output categories in the literature (Pope and Johnson (2013)). The number of procedures is each category are summed for each hospital by year to construct the output variables. The total number of hospitals sampled is around 1,000 per year from 2007 to 2009.141414The NIS survey is a stratified systematic random sample. The strata criteria are urban or rural location, teaching status, ownership, and bed size. This stratification ensures a more representative sample of discharges than a simple random sample would yield. For details see{463754B8-A305-47E3-B7EE-A43953AA9478}. However, mapping between the two databases is only possible for approximately 50% of the hospitals in the HCUP data, resulting in approximately 450 to 525 observations available each year.

(523 observations)
Cost ($) MajDiag MajTher MinDiag MinTher
Mean 146M 162 4083 3499 7299
Skewness 3.51 2.89 2.63 5.19 3.28
25-percentile 24M 9 277 108 512
50-percentile 72M 73 1688 938 3108
75-percentile 182M 207 5443 4082 9628
(511 observations)
Cost ($) MajDiag MajTher MinDiag MinTher
Mean 163M 175 4433 3688 7657
Skewness 4.19 3.80 2.97 4.87 2.82
25-percentile 28M 10 325 120 545
50-percentile 83M 76 1809 1013 3350
75-percentile 189M 246 5984 4569 10781
(458 observations)
Cost ($) MajDiag MajTher MinDiag MinTher
Mean 175M 161 4471 3615 7905
Skewness 3.39 3.78 2.43 4.68 2.41
25-percentile 31M 12 420 148 713
50-percentile 91M 69 1737 1136 3458
75-percentile 220M 230 6402 4694 10989
Table 7: Summary Statistics of the Hospital Data Set

7.2 Pre-Analysis of the Data Set

7.2.1 Testing the Relevance of the Regressors

We begin by testing the statistical significance of our four output variables, , for predicting cost. While the variables selected have been used in previous studies, we use these tests to evaluate whether this variable specification can be rejected for the current data set of U.S. hospitals from 2007-2009.

The null hypothesis stated for the

th output is:

against:151515Where the notation implies the vector excluding the th component.

We implement the test with a Local Constant Least Squares (LCLS) estimator described in Henderson and Parmeter (2015), calculating bandwidths using least-squares cross-validation. We use 399 wild bootstraps. We found that all output variables were highly statistically significant for all years.

7.3 Results

CNLS-d and Different Directions

We analyze each year of data as a separate cross-section because, as noted above, the HCUP does not track the same set of hospitals across years. To illuminate the direction’s effect on the functional estimates, we graph “Cost” as a function of “Major Diagnostic Procedures” and “Major Therapeutic Procedures” holding “Minor Diagnostic Procedures” and “Minor Therapeutic Procedures” constant at their median values. Figure 6 illustrates the estimates for three different directions, one with only a cost component, one with only a component in Major Therapeutic Procedures, and one that comes from our median approach. Visual inspection indicates that the estimates with different directions produce significantly different estimates, highlighting the importance of considering the question of direction selection.

Figure 6: US Hospital Cost Function Estimates for Three Directions

We compare the estimator’s performance when using different directions. Table 8 reports the MSE for three sample directions in each year. We define our direction vector as .161616We focus on types of directions found to be competitive in our Monte Carlo simulations.

Direction Year
( 2007 2008 2009
(0.45, 0.45, 0.45, 0.45, 0.45) 2.10 1.30 1.50
(0.35, 0.35, 0.35, 0.35, 0.71) 2.15 1.65 1.29
Median Direction 1.79 1.55 1.34
Note: Displayed are the measured values
multiplied by
Table 8: Results of the radial MSE values for different directions by year

We pick two directions, one with equal components in all dimensions, and a second direction that has a cost component that is double the value of the output components. The median vector is , which is very close to the cost-only direction. The MSE varies by 15-30% over the different directions. We observe that there is no clear dominant direction; however, the median direction performs reasonably well in all cases. We conclude that as long as a direction with non-zero components for all variables that could contain noise is selected, then the precise direction selected is not critical to obtaining improved estimation results.

Comparison with other estimators

We compare three methods to estimate a cost function: 1) a quadratic functional form (without the cross-product terms), Färe et al. (2010); 2) CNLS-d with the direction selection method proposed in Section 6; and 3) lower bound estimate calculated using a local linear kernel regression with a Gaussian kernel and leave one-out cross-validation for bandwidth selection, Li and Racine (2007).171717For CNLS-d, we select a value for an upper bound through a tuning process, , and impose the upper bound on the slope coefficients estimated (Lim, 2014). We select these estimators because a quadratic functional form to model production has been used in recent productivity and efficiency analysis of healthcare. See, for example, Ferrier et al. (2018). The local linear kernel is selected because it is an extremely flexible nonparametric estimator and provides a lower bound for the performance of a functional estimate. However, note that the local linear kernel does not satisfy standard properties of a cost function; i.e., cost is monotonic in output and marginal costs are increasing as output increases.

We will use the criteria of K-fold average MSE with to compare the approaches. This means we split the data equally into 5 parts. We use 4 of the 5 parts for estimation (training) and evaluate the performance of the estimator on the 5th part (testing). We do this for all 5 parts and average the results. The values presented in Table 9 correspond to the average across folds.

Quadratic CNLS-d Lower Bound
Year Regression (Median Direction) Estimator
2007 3.43 2.44 2.35
2008 2.76 1.93 1.48
2009 2.43 1.80 1.53
Note: The MSE values displayed are the measured
values multiplied by
Table 9: US Hospital K-fold Average MSE in Cost to the Cost Function Estimates for the Three Functional Specifications by Year

While the average MSEs for all years are lowest for the lower bound estimator, CNLS-d performs relatively well as it is close to the lower bound in terms of fitting performance while imposing standard axioms of a cost function. As is true of most production data, the hospital data are very noisy. The shape restrictions imposed in CNLS-d improves the interpretability. The CNLS-d estimator outperforms the parametric approach, indicating the general benefits of nonparametric estimators.

Description of Functional Estimates - MPSS and Marginal Costs

We report the most productive scale size (MPSS) and the marginal costs for the a quadratic parametric estimator, the CNLS-d estimator with our proposed direction selection method, and an alternative.181818Here most productive scale size is measured on each ray from the origin (fixing the output ratios) and is defined as the cost level that maximizes the ratio of aggregate output to cost. Marginal cost is measured on each ray from the origin (fixing the output ratios) and is defined as the cost to increase aggregate output by one unit. These metrics are determined on the averaged K-fold estimations for each estimation method. For the MPSS, we present the cost levels obtained for different ratios of Minor Therapeutic procedures (MinTher) and Major Therapeutic procedures (MajTher), with the minor and major diagnostics held constant at their median levels.

MPSS results are presented in Table 10 and the values for CNLS-d (Median Direction) are illustrated in Figure 7. We observe small variations across both years and estimators. The differences across years are in part due to the sample changing across years. Most hospitals are small and operate close to the MPSS. However, there are several large hospitals that are operating significantly above MPSS. Hospitals might choose to operate at larger scales and provide a large array of services allowing consumers to fulfill multiple healthcare needs.

For marginal costs, we present the values for different percentiles of the MinTher and MajTher, with the minor and major diagnostics held constant at their median levels. A more exhaustive comparison across all outputs is presented in C. Marginal cost information can be used by hospital decision makers to select the types of improvements that are likely to result in higher productivity with minimal cost increase. For example, consider a hospital that is in the percentile of the data set for all four outputs in 2008 and the hospital manager has the option to expand operations for either minor or major diagnostic procedures. Results reported in Tables 11 and 12 indicate that an increase of 1 minor therapeutic procedures would result in a increase in cost. Alternatively, an increase of 1 major therapeutic procedures would result in a increase in cost. A decision maker would want to consider the revenue generated by the different procedures; however, these estimates provide insights regarding the incremental cost of additional major and minor therapeutic procedures.

CNLS-d is the most flexible of the estimators and allows MPSS values to fluctuate significantly across percentiles. CNLS-d does not smooth variation, rather it minimizes the distance from each observation to the shape constrained estimator. In C, results for the local linear kernel estimator are also presented. Even though the local linear kernel bandwidths are selected via cross-validation, relatively large values are selected due to the relatively noisy data and the highly skewed distribution of output. These large bandwidths and the parametric nature of the quadratic function make these two estimators relatively less flexible compared to CNLS-d. A feature of performance that is captured only by CNLS-d is that, hospitals specializing in either minor or major therapeutics maximize productivity at a larger scales of operation as illustrated in Figure 7.

Ratio Quadratic Regression CNLS-d (median) CNLS-d (equal)
MajTher/MinTher 2007 2008 2009 2007 2008 2009 2007 2008 2009
20% 13 379 252 210 61 88 224 137 106
30% 17 861 640 146 66 83 134 129 148
40% 272 377 1090 107 56 77 127 85 135
50% 870 249 1552 112 64 85 124 126 134
60% 360 210 276 90 70 120 88 96 142
70% 205 182 187 111 66 184 132 104 104
80% 151 170 150 174 69 286 221 110 111
Note: The values displayed are in $M
Table 10: Most Productive Scale Size measured in cost () conditional on Minor Therapeutic procedures (MinTher) and Major Therapeutic procedures (MajTher), Minor Diagnostic procedures (MinDiag) and Major Diagnostic procedures (MajDiag) held constant at the 50th percentile
Figure 7: Most Productive Scale Size (in red) on the estimated function by “CNLS-d (Med)”, CNLS-d using the median approach for the direction, for different ratios of Major Therapeutic Procedures over Minor Therapeutic Procedures
Percentile Quadratic Regression CNLS-d (median) CNLS-d (equal)
MinTher MajTher 2007 2008 2009 2007 2008 2009 2007 2008 2009
25 25 8.9 6.5 13.2 0.03 0.03 0.03 0.2 0.02 0.1
25 50 8.9 6.5 13.2 0.05 0.1 0.1 0.04 0.1 0.04
25 75 8.9 6.5 13.2 0.2 0.04 0.03 0.1 0.02 0.02
50 25 8.1 6.1 12.4 6.9 5.5 7.4 5.9 6.3 7.8
50 50 8.1 6.1 12.4 4.3 4.9 7.8 2.1 3.7 7.4
50 75 8.1 6.1 12.4 0.2 0.4 0.03 0.1 0.02 0.02
75 25 6.0 5.0 10.4 9.6 13.5 14.0 9.5 10.9 14.1
75 50 6.0 5.0 10.4 9.6 13.5 14.3 9.6 10.9 13.8
75 75 6.0 5.0 10.4 5.7 10.1 6.4 4.6 8.7 6.4
Note: The values displayed are in $k
Table 11: Marginal Cost of Minor Therapeutic Procedures
Percentile Quadratic Regression CNLS-d (median) CNLS-d (equal)
MinTher MajTher 2007 2008 2009 2007 2008 2009 2007 2008 2009
25 25 10.5 11.5 9.8 0.1 0.04 0.1 0.2 0.03 0.1
25 50 11.7 13.0 10.8 11.3 11.8 15.7 10.5 10.3 14.6
25 75 15.1 17.2 14.5 19.8 22.1 24.6 19.8 21.8 24.0
50 25 10.5 11.5 9.8 0.4 0.2 0.5 0.1 0.1 0.4
50 50 11.7 13.0 10.8 3.7 7.7 1.7 6.9 7.1 3.7
50 75 15.1 17.2 14.5 19.8 22.0 24.6 19.8 21.8 24.0
75 25 10.5 11.5 9.8 0.2 0.03 0.1 0.0 0.1 0.1
75 50 11.7 13.0 10.8 0.2 0.2 0.4 0.8 0.1 0.3
75 75 15.1 17.2 14.5 18.3 12.4 19.8 16.2 11.0 15.2
Note: The values displayed are in $k
Table 12: Marginal Cost of Major Therapeutic Procedures

Figure 8: Marginal Cost of the Minor Therapeutic procedures (left) and Marginal Cost of the Major Therapeutic procedures (right) (“CNLS-d (Med)” corresponds to CNLS-d using the median approach for the direction and “CNLS-d (Eq)” corresponds to CNLS-d using the direction with equal components in all netputs

The marginal cost results for Minor Therapeutic procedures are presented in Table 11 and Figure 8 (left) and the marginal cost results for Major Therapeutic procedures are reported in Table 12 and Figure 8 (right). As was the case for MPSS (see Table 10), CNLS-d is more flexible and its marginal cost estimates vary significantly across percentiles. The CNLS-d with different directions provides very similar marginal costs estimates. However, the CNLS-d estimates differ significantly from the marginal cost estimates obtained with the parametric estimator. For CNLS-d the marginal costs results are in line with the theory that marginal costs are increasing with scale. This property can also be violated if using a non-parametric estimator without any shape constraints imposed. For example this can be seen in the marginal costs of minor therapeutic procedures for the parametric (quadratic) regression estimator, Figure 8.

Our data set, which combines AHA cost data with AHRQ output data for a broad sample of hospitals from across the US, is unique to the best of our knowledge. However, the marginal cost estimates are broadly in line with marginal cost estimates for US hospitals for similar time periods. Gowrisankaran et al. (2015) studied a considerably smaller set of Northern Virginia hospitals observed in 2006 that, on average, were larger that hospitals in our data set. Due to the differences in the measures of output the marginal cost levels are not directly comparable. However, conditional on the size variation, the variation in marginal costs is similar to the variation we observe for the parametric (quadratic) regression specification applied to our data. Boussemart et al. (2015) analyzed data on nearly 150 hospitals located in Florida observed in 2005. The authors use a different output specification and a translog model; however, their distribution of hospital size is similar to our data set and we observe similar variances in marginal costs with the parametric (quadratic) regression specification applied to our data.

8 Conclusions

This paper investigated the improvement in functional estimates when specifying a particular direction in CNLS-d. Based on Monte Carlo experiments, two primary findings emerged from our analysis. First, directions close to the average orthogonal direction to the true function performed well. Second, when the data are noisy, selecting a direction that matched the noise direction of the DGP improves estimator performance. Our simulations indicate that CNLS-d with a direction orthogonal to the data is preferable if the noise level is not too large and that a direction that matches the noise direction of the DGP is preferred if the noise level is large. Thus, if users know the shape of the data or the characteristics of the noise, they can use CNLS-d with a direction orthogonal to the data if the noise coefficient is small. Or if the noise coefficient is large, the user can select a direction close to the true noise direction, with non-zero components in all variables that potentially have noise. Our application to US hospital data shows that CNLS-d performs similarly across different directions that all include non-zero components of the direction vector for variables that potentially have noise in their measurement.

In future research, we propose developing an alternative estimator that incorporates multiple directions in CNLS-d while maintaining the concavity axiom. This would permit treating subgroups within the data, allowing different assumptions to be made across subgroups (e.g., for-profit vs. not-for-profit hospitals).


  • Ackerberg et al. (2015) Ackerberg, D. A., Caves, K., Frazer, G., 2015. Identification properties of recent production function estimators. Econometrica 83 (6), 2411–2451.
  • Adler and Volta (2016) Adler, N., Volta, N., 2016. Accounting for externalities and disposability: a directional economic environmental distance function. European Journal of Operational Research 250 (1), 314–327.
  • Afriat (1972) Afriat, S. N., 1972. Efficiency estimation of production functions. International Economic Review 13 (3), 568–598.
  • Aparicio et al. (2017) Aparicio, J., Pastor, J., Zofio, J., 2017. Can Farrell’s allocative efficiency be generalized by the directional distance function approach? European Journal of Operational Research 257 (1), 345–351.
  • Atkinson et al. (2003) Atkinson, S., Cornwell, C., Honerkamp, O., 2003. Measuring and decomposing productivity change: stochastic distance function estimation versus data envelopment analysis. Journal of Business & Economic Statistics 21 (2), 284–294.
  • Atkinson and Tsionas (2016) Atkinson, S., Tsionas, M., 2016. Directional distance functions: optimal endogenous directions. Journal of Econometrics 190 (2), 301–314.
  • Baležentis and De Witte (2015) Baležentis, T., De Witte, K., 2015. One- and multi-directional conditional efficiency measurement: efficiency in Lithuanian family farms. European Journal of Operational Research 245 (2), 612–622.
  • Bertsekas (1999) Bertsekas, D. P., 1999. Nonlinear programming. Athena Scientific, Belmont, MA.
  • Boussemart et al. (2015) Boussemart, J.-P., Leleu, H., Valdmanis, V., 2015. A two-stage translog marginal cost pricing approach for Floridian hospital outputs. Applied Economics 47 (38), 4116–4127.
  • Carroll et al. (2006)

    Carroll, R., Ruppert, D., Stefanski, L., Crainiceanu, C., 2006. Measurement error in nonlinear models: a modern perspective, Second Edition. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. CRC Press, Boca Raton, FL.

  • Chambers (1988) Chambers, R. G., 1988. Applied production analysis. Cambridge University Press, New York, NY.
  • Chambers et al. (1996) Chambers, R. G., Chung, Y., Fare, R., 1996. Benefit and distance functions. Journal of Economic Theory 70 (2), 407–419.
  • Chambers et al. (1998) Chambers, R. G., Chung, Y., Fare, R., 1998. Profit, directional distance functions, and Nerlovian efficiency. Journal of Optimization Theory and Applications 98 (2), 351–364.
  • Charnes et al. (1978) Charnes, A., Cooper, W. W., Rhodes, E., 1978. Measuring the efficiency of decision making units. European Journal of Operational Research 2 (6), 429–444.
  • Coelli (2000) Coelli, T., 2000. On the econometric estimation of the distance function representation of a production technology. Université Catholique de Louvain. Center for Operations Research and Econometrics [CORE].
  • Coelli and Perelman (1999) Coelli, T., Perelman, S., 1999. A comparison of parametric and non-parametric distance functions: with application to European railways. European Journal of Operational Research 117 (2), 326–339.
  • Coelli and Perelman (2000) Coelli, T., Perelman, S., 2000. Technical efficiency of European railways: a distance function approach. Applied Economics 32 (15), 1967–1976.
  • Daraio and Simar (2016) Daraio, C., Simar, L., 2016. Efficiency and benchmarking with directional distances: a data-driven approach. Journal of the Operational Research Society 67 (7), 928–944.
  • Diewert and Wales (1987) Diewert, W. E., Wales, T. J., 1987. Flexible functional forms and global curvature conditions. Econometrica 55 (1), 43–68.
  • Färe and Grosskopf (2010) Färe, R., Grosskopf, S., 2010. Directional distance functions and slacks-based measures of efficiency. European Journal of Operational Research 200 (1), 320–322.
  • Färe et al. (2005) Färe, R., Grosskopf, S., Noh, D.-W., Weber, W., 2005. Characteristics of a polluting technology: theory and practice. Journal of Econometrics 126 (2), 469–492.
  • Färe et al. (2010) Färe, R., Martins-Filho, C., Vardanyan, M., 2010. On functional form representation of multi-output production technologies. Journal of Productivity Analysis 33 (2), 81–96.
  • Färe et al. (2017) Färe, R., Pasurka, C., Vardanyan, M., 2017. On endogenizing direction vectors in parametric directional distance function-based models. European Journal of Operational Research 262 (1), 361–369.
  • Färe and Vardanyan (2016) Färe, R., Vardanyan, M., 2016. A note on parameterizing input distance functions: does the choice of a functional form matter? Journal of Productivity Analysis 45 (2), 121–130.
  • Ferrier et al. (2018) Ferrier, G. D., Leleu, H., Valdmanis, V. G., Vardanyan, M., 2018. A directional distance function approach for identifying the input/output status of medical residents. Applied Economics 50 (9), 1006–1021.
  • Frisch (1964) Frisch, R., 1964. Theory of production. Springer Science & Business Media, Dordrecht, Netherlands.
  • Fukuyama and Matousek (2018) Fukuyama, H., Matousek, R., 2018. Nerlovian revenue inefficiency in a bank production context: evidence from Shinkin banks. European Journal of Operational Research 271 (1), 317–330.
  • Gowrisankaran et al. (2015) Gowrisankaran, G., Nevo, A., Town, R., 2015. Mergers when prices are negotiated: evidence from the hospital industry. American Economic Review 105 (1), 172–203.
  • Henderson and Parmeter (2015) Henderson, D. J., Parmeter, C. F., 2015. Applied nonparametric econometrics. Cambridge University Press, New York, NY.
  • Hildreth (1954) Hildreth, C., 1954. Point estimates of ordinates of concave functions. Journal of the American Statistical Association 49 (267), 598–619.
  • Hollingsworth (2003) Hollingsworth, B., 2003. Non-parametric and parametric applications measuring efficiency in health care. Health Care Managements Science 6 (4), 203–218.
  • Johnson and Kuosmanen (2011) Johnson, A. L., Kuosmanen, T., 2011. One-stage estimation of the effects of operational conditions and practices on productive performance: asymptotically normal and efficient, root-n consistent stoNEZD method. Journal of Productivity Analysis 36 (2), 219–230.
  • Kapelko and Oude Lansink (2017) Kapelko, M., Oude Lansink, A., 2017. Dynamic multi-directional inefficiency analysis of European dairy manufacturing firms. European Journal of Operational Research 257 (1), 338–344.
  • Koopmans (1951) Koopmans, T. C., 1951. An analysis of production as an efficient combination of activities. In: Koopmans, T. C. (Ed.), Activity Analysis of Production and Allocation. John Wiley & Sons, Inc., New York, pp. 33–97.
  • Kuosmanen (2008) Kuosmanen, T., 2008. Representation theorem for convex nonparametric least squares. Econometrics Journal 11 (2), 308–325.
  • Kuosmanen and Johnson (2017) Kuosmanen, T., Johnson, A., 2017. Modeling joint production of multiple outputs in stoned: Directional distance function approach. European Journal of Operational Research 262 (2), 792–801.
  • Kuosmanen and Johnson (2010) Kuosmanen, T., Johnson, A. L., 2010. Data envelopment analysis as nonparametric least-squares regression. Operations Research 58 (1), 149–160.
  • Kuosmanen and Kortelainen (2012) Kuosmanen, T., Kortelainen, M., 2012. Stochastic non-smooth envelopment of data: semi-parametric frontier estimation subject to shape constraints. Journal of Productivity Analysis 38 (1), 11–28.
  • Kutlu (2018) Kutlu, L., 2018. A distribution-free stochastic frontier model with endogenous regressors. Economics Letters 163, 152–154.
  • Levinsohn and Petrin (2003) Levinsohn, J., Petrin, A., 2003. Estimating production functions using inputs to control for unobservables. Review of Economic Studies 70 (2), 317–341.
  • Lewbel (forthcoming) Lewbel, A., forthcoming. The identification zoo: meanings of identification in econometrics. Journal of Economic Literature.
  • Li and Racine (2007) Li, Q., Racine, J. S., 2007. Nonparametric econometrics: theory and practice. Princeton University Press, Princeton, NJ.
  • Lim (2014) Lim, E., 2014. On convergence rates of convex regression in multiple dimensions. INFORMS Journal on Computing 26 (3), 616–628.
  • Lovell et al. (1994) Lovell, C. K., Travers, P., Richardson, S., Wood, L., 1994. Resources and functionings: a new view of inequality in Australia. In: Eichhorn (Ed.), Models and measurement of welfare and iInequality. Springer, Berlin, Germany, pp. 787–807.
  • Luenberger (1992) Luenberger, D. G., 1992. Benefit functions and duality. Journal of Mathematical Economics 21 (5), 461–481.
  • Manski (2003)

    Manski, C., 2003. Partial identification of probability distributions. Springer Series in Statistics. Springer, New York, NY.

  • Olley and Pakes (1996) Olley, G. S., Pakes, A., 1996. The dynamics of productivity in the telecommunications equipment industry. Econometrica 64 (6), 1263–1297.
  • Pope and Johnson (2013) Pope, B., Johnson, A. L., 2013. Returns to scope: a metric for production synergies demonstrated for hospital production. Journal of Productivity Analysis 40 (2), 239–250.
  • Roshdi et al. (2018) Roshdi, I., Hasannasab, M., Margaritis, D., Rouse, P., 2018. Generalised weak disposability and efficiency measurement in environmental technologies. European Journal of Operational Research 266 (3), 1000–1012.
  • Shephard (1953) Shephard, R. W., 1953. Cost and production functions. Princeton University Press, Princeton, NJ.
  • Shephard (1970) Shephard, R. W., 1970. Theory of cost and production functions. Princeton University Press, Princeton, NJ.
  • Sickles et al. (2002) Sickles, R. C., Good, D. H., Getachew, L., 2002. Specification of distance functions using semi-and nonparametric methods with an application to the dynamic performance of Eastern and Western European air carriers. Journal of Productivity Analysis 17 (1-2), 133–155.
  • Syverson (2011) Syverson, C., 2011. What determines productivity? Journal of Economic Literature 49 (2), 326–365.
  • Tamer (2010) Tamer, E., 2010. Partial identification in econometrics. Annual Review of Economics 2, 167–195.
  • Varian (1984) Varian, H. R., 1984. The nonparametric approach to production analysis. Econometrica, 579–597.
  • Wooldridge (2009) Wooldridge, J. M., 2009. On estimating firm-level production functions using proxy variables to control for unobservables. Economics Letters 104 (3), 112–114.
  • Yagi et al. (2018) Yagi, D., Chen, Y., Johnson, A. L., Kuosmanen, T., 2018. Shape-constrained kernel-weighted least squares: Estimating production functions for chilean manufacturing industries. Journal of Business & Economic Statistics, 1–12.
  • Zuckerman et al. (1994) Zuckerman, S., Hadley, J., Iezzoni, L., 1994. Measuring hospital efficiency with frontier cost functions. Journal of Health Economics 13 (3), 255–280.

Appendix A Properties of Directional Distance Functions and CNLS-d

a.1 Direction Selection in Directional Distance Functions

In this appendix we prove that the direction vector affects the functional estimates. Let , then we can state the following theorem:

Theorem 2.

Suppose that two direction vectors exist, and , such that . Then the directional distance function estimates using these two different directions are not equal, .


Rewrite Problem (10) from Section 3.2 as


Observe that all decision variables appear in the objective function and that the objective function is a quadratic function while the constraints define a convex solution space; i.e., this optimization problem has a unique solution (Bertsekas (1999)). If we solve Problem (27) with , then the resulting solution vector is . Changing the direction vector from to the normalization constraint no longer holds for and . However, the previous argument holds for the uniqueness of . Thus, .

a.2 Details of CNLS-d

An alternative expression for CNLS-d (cf. equations (16)-(16c) from Section 5.1) is given by: