1 Introduction
The focus of this paper is direction selection in stochastic directional distance functions (SDDF).^{1}^{1}1Here we use the term stochastic in reference to a model with a noise term. While the DDF is typically used to measure efficiency, in this paper we use a nonparametric shape constrained SDDF to model the conditional mean behavior of production. The stochastic distance function (SDF) was introduced by Lovell et al. (1994) and was used in a series of early empirical studies by Coelli and Perelman (1999, 2000) and Sickles et al. (2002). The parameters of a parametric distance function are point identified; however, if the direction in the DDF is not specified, then the parameters of a parametric DDF are set identified.^{2}^{2}2Let be what is known (e.g., via assumptions and restrictions) about the data generating process (DGP). Let represent the parameters to be identified, let denote all possible values of , and let be the true but unknown value of . Then the vector of unknown parameters is point identified if it is uniquely determined from . However, is set identified if some of the possible values of are observationally equivalent to (Lewbel (forthcoming)). A set of axiomatic properties related to production and cost functions, such as monotonicity and convexity in the case of a cost function, are well established in the production literature (Shephard (1970), Chambers (1988)). Although the stochastic distance function literature acknowledges the axiomatic properties necessary for duality, it does not impose them globally. Instead, authors typically impose them only on a particular point in the data (e.g., Atkinson et al. (2003)). Recognizing these issues, we provide an axiomatic nonparametric estimator of the SDDF and a method to restrict the pool of the directions to choose from for the SDDF, thereby reducing the size of the set identified parameter set. Most empirical studies that use establishment or hospital level data to estimate production or cost functions either assume a specific parametric form or ignore noise, or both ((Hollingsworth, 2003)). In contrast, we use an axiomatic nonparametric SDDF estimator and the proposed method to determine a set of acceptable directions to estimate a cost function that maintains global axiomatic properties for the US hospital industry. Furthermore, we demonstrate the importance of global axiomatic properties for the estimation of most productive scale size and marginal costs.
A few papers have attempted to implement the directional distance function in a stochastic setting (see, for example, Färe et al. (2005), Färe et al. (2010), and Färe and Vardanyan (2016)). The latter two papers discuss the challenges of selecting a parametric functional form that does not violate the axioms typically assumed in production economics. Based on their observations, Färe and Vardanyan (2016) use a quadratic functional specification.^{3}^{3}3As Kuosmanen and Johnson (2017) note, the translog function used for multioutput production cannot satisfy the standard assumptions for the production technology globally for any parameter values. The quadratic functional form does not have this shortcoming. Yet several papers show a loss of flexibility in parametric functional forms, such as the translog or the quadratic functional form, when shape constraints are imposed (e.g., Diewert and Wales (1987)). Also important to implementation, the selection of the direction vector in the SDDF has been discussed in Färe et al. (2017) and Atkinson and Tsionas (2016), among others. These papers focus on selecting the direction corresponding to a particular interpretation of the inefficiency measure, based on the distance to the economically efficient point. In contrast, we consider Kuosmanen and Johnson (2017)’s multistep efficiency analysis and focus on the first step, estimating a conditional mean function. Our goal is to select the direction that best recovers the underlying technology while acknowledging that the data is likely to contain noise in potentially all variables.^{4}^{4}4For researchers interested in productivity measurement and productivity variation (e.g., Syverson (2011)), the results from this paper can be used directly. For authors interested in efficiency analysis, the insights from this paper could be used to improve the estimates from the first stage of Kuosmanen and Johnson (2017)’s threestep procedure where efficiency is estimated in the third step.
To model multiproduct production, Kuosmanen and Johnson (2017) have proposed the use of axiomatic nonparametric methods to estimate the SDDF which they name Directional Convex Nonparametric Least Squares (CNLSd), a type of sieve estimator. Their methods have the benefits of relaxing standard functional form assumptions for production, cost, or distance functions, but also improve the interpretability and finite sample efficiency over nonparametric methods such as kernel regression (Yagi et al. (2018)). A variety of models can be interpreted as special cases of Kuosmanen and Johnson (2017), among these are a set of models that specify the direction (e.g., Johnson and Kuosmanen (2011), Kuosmanen and Kortelainen (2012)). All CNLS models are sieve estimators and fall into the category of partially identified or set identified estimators discussed in Manski (2003) and Tamer (2010). The guidance our paper provides in selecting a direction will reduce the size of the set identified for CNLSd and other DDF estimators with flexible direction specifications.
Much of the production function literature concerns endogeneity issues, for example see Olley and Pakes (1996), Levinsohn and Petrin (2003), and Ackerberg et al. (2015). These methods are often referred to as proxy variable approaches. The argument for endogeneity is typically that decisions regarding variable inputs such as labor are made with some knowledge of the factors included in the unobserved residuals. Recently, these methods have been reinterpreted as instrumental variable approaches (Wooldridge (2009)), or control function approaches (Ackerberg et al. (2015)
). Unfortunately, the assumptions on the particular timing of input decisions is not innocuous. Indeed every firm must adjust its inputs in exactly the same way, otherwise the moment restrictions needed for point identification are violated. For an alternative in the stochastic frontier setting, see
Kutlu (2018).Kuosmanen and Johnson (2017) have shown that a production function estimated using a stochastic distance function under a constant returnstoscale assumption is robust to endogeneity issues because the normalization by one of the inputs or outputs causes the errorsinvariables to cancel each other. In this paper we consider the more general case of a convex technology that does not necessarily satisfy constant returnstoscale, and show that when errors across variables are highly correlated, a specific type of endogeneity, the SDDF improves estimation performance significantly over the typical alternative of ignoring the endogeneity.
When considering alternative directions in the DDF, we show that the direction that performs the best is often related to the particular performance measure used. We use an outofsample mean squared error (MSE) that is measured radially to address this issue. This measure is motivated by the results of our Monte Carlo simulations and is natural for a function that satisfies monotonicity and convexity, assuring the true function and the estimated function are close in the areas were most data are observed.
We analyze US hospital data and characterize the most productive scale size and marginal costs for the US hospital sector. We demonstrate that outofsample MSE is reduced significantly by relaxing parametric functional form restrictions. We also observe the advantage of imposing axioms that allow the estimated function to still be interpretable. Concerning the direction selection, we find, for this data set, that the exact direction selected is not very critical in terms of MSE performance, but some commonly used directions should be avoided.
The remainder of this paper is organized as follows. Section 2 introduces the statistical model and the production model. Section 3 describes the estimators used for the analysis. Section 4 outlines our reasons for the MSE measure we propose. Section 5 highlights the importance of the direction selection through Monte Carlo experiments. Section 6 describes our direction selection method. Section 7 demonstrates the benefits of using nonparametric shapeconstrained estimators with an appropriately selected direction for US hospital data. Section 8 concludes.
2 Models
2.1 Statistical Model
We consider a statistical model that allows for measurement error in potentially all of the input and output variables. Let , be a vector of random input variables of length and , , be a vector of random output variables of length , where indexes observations. Let , , be a vector of random error variables of length and , , be a vector of random error variables of length . One way of modeling the errorsinvariable (EIV) is:
(1) 
Equation (1) is only identified when multiple measurements exist for the same vector of regressors or when a subsample of observations exists in which the regressors are measured exactly (Carroll et al. (2006)). Carroll et al. (2006) discussed a standard regression setting, not a multiinput/multioutput production process. Thus, repeated measurement requires all but one of the netputs to be identical across at least two observations.^{5}^{5}5Here we use the term netputs to describe the union of the input and output vectors. Neither of of these conditions is likely to hold for typical production data sets; therefore, we develop an alternative approach to identification.
As our starting point, we use the alternative, but equivalent, representation of the EIV model proposed by Kuosmanen and Johnson (2017):
(2) 
Clearly, the representations of Carroll et al. (2006) and Kuosmanen and Johnson (2017) are equivalent if:
(3) 
We define the following normalization:
(4) 
which implies:
(5) 
We refer to as the true noise direction and in the most general case we allow the direction to be observation specific.^{6}^{6}6When the noise direction is observation specific and random, all inputs and outputs potentially contain noise and therefore are endogeneous variables. If some components of the vector are zero, this implies the associated variables are exogeneous and measured with certainty. See Kuosmanen and Johnson (2017) for more details. The estimation methods to consider noise in potentially all inputs will depend on our assumptions about the production technology, which are discussed in the following subsection.
2.2 Production Model
Researchers use production function models, cost function models, or distance function models to characterize production technologies. Considering a general production process with multiple inputs used to produce multiple outputs, we define the production possibility set as:
(6) 
Following Shephard (1970), we adopt the following standard assumptions to assure that represents a production technology:

T is closed;

T is convex;

Free Disposability of inputs and outputs; i.e., if and , then .
For an alternative representation, see, for example, Frisch (1964).
Developing methods to estimate characteristics of the production technology while imposing these standard axioms was a popular and fruitful topic from the early 1950’s until the early 1980’s, generating such classic papers as Koopmans (1951), Shephard (1953, 1970), Afriat (1972), Charnes et al. (1978),^{7}^{7}7Data Envelopment Analysis is perhaps one of the largest success stories and has become an extremely popular method in the OR toolbox for studying efficiency. and Varian (1984)
. Unfortunately, these methods are deterministic in the sense that they rely on a strong assumption that the data do not contain any measurement errors, omitted variables, or other sources of random noise. Furthermore, for some research communities linear programs were seen as harder to implement than parametric regression which could be calculated via normal equations. Thus, most econometricians and applied economists have chosen to use parametric models, sacrificing flexibility for ease of estimation and the inclusion of noise in the model.
Here we focus our attention on the distance function because it allows the joint production of multioutputs using multiinputs. The production function and cost functions can be seen as special cases of the distance function in which there is either a single output or a single input (cost), respectively. Further, motivated by our discussion of EIV models above, we consider a directional distance function which allows for measurement error in potentially all variables. We try to relax both the parametric and deterministic assumptions common in earlier approaches to modeling multioutput/multiinput technologies. We do this by building on an emerging literature that revisits the axiomatic nonparametric approach incorporating standard statistical structures including noise (Kuosmanen (2008);Kuosmanen and Johnson (2010)).
2.2.1 The Deterministic Directional Distance Function (DDF)
Luenberger (1992) and Chambers et al. (1996, 1998) introduced the directional distance function, defined for a technology T as:
(7) 
where and are the observed input and output vectors, such that and are assumed to be observed without noise and fully describe the resources used in production and the goods or services generated from production. is the direction vector in the input space, is the direction vector in the output space, and defines the direction from the point in which the distance function is measured.^{8}^{8}8We assume ; i.e., at least one of the components of either or is nonzero. is commonly interpreted as a measure of inefficiency by quantifying the number of bundles of size needed to move the observed point to the boundary of the technology in a deterministic setting.
Chambers et al. (1998) explained how the directional distance function characterizes the technology T for a given direction vector ; specifically:
(8) 
If T satisfies the assumptions stated in Section 2.2, then the directional distance function has the following properties (see Chambers et al. (1998)):

is upper semicontinuous in and (jointly);

;

;

;

If T is convex, then is concave in .
An additional property of the DDF is the translation invariance:

.
Several theoretical contributions have been made to extend the deterministic DDF, see for example Färe and Grosskopf (2010), Aparicio et al. (2017), Kapelko and Oude Lansink (2017), and Roshdi et al. (2018). The deterministic DDF has been used in several recent applications, including Baležentis and De Witte (2015), Adler and Volta (2016), and Fukuyama and Matousek (2018).
2.2.2 The Stochastic Directional Distance Function
The properties of the deterministic DDF also apply for the stochastic DDF (Färe et al. (2017)). Here we focus on estimating a stochastic DDF considering a residual which is mean zero.^{9}^{9}9Two models are possible, 1) a mean zero residual indicating that the residual contains only noise used to pursue a productivity analysis, or 2) a composed residual with both inefficiency and noise. Our direction selection analysis is used in the first step of Kuosmanen and Johnson’s three step procedure in which a conditional mean is estimated. This is represented in Figure 1.
Using the statistical model in Section 2.1 and the functional representation of technology in Section 2.2, we restate Proposition 2 in Kuosmanen and Johnson (2017) as:
Proposition 0.
If the observed data are generated according to the statistical model described in Section 2.1, then the value of the DDF in the observed data point
is equal to the realization of the random variable
with mean zero; specificallyIn the stochastic distance function literature, the translation property, (f) above, is commonly invoked to move an arbitrarily chosen netput variable out of the distance function to the lefthand side of the equation, yielding an equation that looks like a standard regression model; see, for example, Lovell et al. (1994) and Kuosmanen and Johnson (2017). Instead, we write the SDDF with all of the outputs on one side to emphasize that all netputs are treated symmetrically.
Under the assumption of constant returns to scale, normalizing by one of the netputs causes the noise terms to cancel for the regressors, thus eliminating the issue of endogeneity (e.g., Coelli (2000), Kuosmanen and Johnson (2017)). However, since we relax the constant returns to scale assumption, endogeneity can still be an issue.^{10}^{10}10If the endogeneity is caused by correlations in the errors across variables, it can be addressed by selecting an appropriate direction for the directional distance function. This is the direction we explore in the Monte Carlo simulation below in Section 4.1.
3 Estimation
We now describe the estimation of the DDF under a specific parametric functional form and under nonparametric shape constrained methods.
3.1 Parametric Estimation and the DDF
Consider data composed of observations where the inputs are defined by and the outputs by . The estimator minimizes the squared residuals for a DDF with an arbitrary prespecified direction . For a linear production function, we formulate the estimator as:
(9)  
(9a)  
(9b) 
where is the intercept, and are the vectors of the marginal effects of the inputs and outputs, respectively, and the are the residuals.
Equation (9b) enforces the translation property described in Chambers et al. (1998); i.e., scaling the netput vector by in the direction causes the distance function to decrease by . The combination of Equation (9a) and Equation (9b) ensures that the residual is computed along the direction . Intuitively this is because the and are rescaled proportionally to the direction in Equation (9b). For a formal proof, see Kuosmanen and Johnson (2017), Proposition 2.
3.2 The CNLSd Estimator
Convex Nonparametric Least Squares (CNLS) is a nonparametric estimator that imposes the axiomatic properties, such as monotonicity and concavity, on the production technology. The estimator CNLSd is the directional distance function generalization of CNLS (Hildreth (1954), Kuosmanen (2008)). While CNLS allows for just a single output, CNLSd permits multiple outputs. In CNLS the direction along which residuals are computed is specified a priori and is typically measured in terms of the unique output, . This corresponds to the assumption that noise is only present in and that all other variables, , do not contain noise. CNLSd allows the residual to be measured in an arbitrary prespecified direction. If all components of the direction vector are nonzero, this corresponds to an assumption that noise is present in all inputs.
Using the same inputoutput data defined in Section 2.1, the CNLSd estimator is given by:
(10)  
(10a)  
(10b)  
(10c)  
(10d)  
(10e) 
where is the vector of the intercept terms, and are the matrices of the marginal effects of the inputs and the outputs, respectively, and is the vector of the residuals (Kuosmanen and Johnson, 2017).
Equation (10a) is similar to (9a) with the notable different that are indexed by
indicating each observation has their own hyperplane defined by the triplet
. Equation (10b), which corresponds to the Afriat inequalities, imposes concavity. Given Equation (10b), Equation (10c) imposes the monotonicity of the estimated frontier relative to the inputs. Equation (10d) enforces the translation property described in Chambers et al. (1998) and has the same interpretation as Equation (9b). Similar to Equation (10c), the combination of Equation (10b) and Equation (10e) imposes the monotonicity of the DDF relative to the outputs. In Equation (10), we specify the CNLSd estimator with a single common direction, .^{11}^{11}11Alternatively, some researchers may be interested in using observation specific directions or perhaps group specific directions (Daraio and Simar (2016)). In A.3, we derive the conditions under which multiple directions can be used in CNLSd while still maintaining the axiomatic property of global convexity of the production technology. Consider two groups each with their own direction used in the directional distance function. Essentially, the convexity constraint holds as long as the noise is orthogonal to the difference of the two directions used in the estimation. A simple example of this situation is all the noise being in one dimension and the difference between the two directions for this dimension is zero. However, this condition is restrictive when noise is potentially present in all variables. Thus, specifying multiple directions in CNLSd while maintaining the axiomatic properties of the estimator, specifically, the convexity of the production possibility set, is still an open research question.4 Measuring MSE under Alternative Directions
4.1 Illustrative Example
Data Generation Process
For our illustrative example, we use a simple linear cost function and a directional distance linear parametric estimator. We consider two noise generation processes: a random noise direction and a fixed noise direction. Here we discuss the random noise direction case, but direct the reader to B for a discussion of the fixed noise direction case.
For our example we consider a single output cost function where the observations , are created by the Data Generation Process (DGP) outlined in Algorithm 1:
[nobreak=true] Algorithm 1

Cost is calculated as , where .

The noise terms, , are constructed as follows:

is calculated as:
(11) where and are the means of the output and cost without noise, respectively.

The scalar length of the noise is rescaled by the vector, , in each dimension. These scaling factors are calculated as where are drawn from a continuous uniform distribution .

, where
is a scalar length drawn from the normal distribution,
, whereis prespecified initial value for the standard deviation and
is a normalized direction vector.


The observations with noise are obtained by appending the noise terms to the generated data:
(12)
Figure 3 illustrates the results for two cases of the data generating process; in the first case the direction of the noise is random, while in the second case the direction of the noise is fixed.
Evaluating the Parametric Estimator’s Performance
We use two criteria to assess the performance of the parametric estimator: 1) Mean Squared Error (MSE) comparing the true function to the estimated function, and 2) MSE comparing the estimated function to a testing data set. While we can calculate both metrics for our Monte Carlo simulations, only the second metric can be used with our application data below.
To calculate deviations, we use the MSE direction . For any particular point of the testing set, , we determine the estimates, , defined as the intersection of the estimated function characterized by the coefficients and the line passing through , and direction vector . We evaluate the value of the MSE as:
(13) 
To compare the true function to the estimated function, we use the Linear Function Data Generation Process, Algorithm 1, steps 1 and 2, to construct our testing data set . To evaluate the estimated function without knowing the true function the testing set is built using the full Linear Function Data Generation Process.
Figure 4 show the MSE computations.
Additional Information Describing the Simulations
We apply the DGP described above to generate a training set, , and a testing set , in which noise is introduced to the observations in random directions. We set the noise scaling coefficient to and the number of observations to . We run repetitions of the simulation for each experiment on a computer with a processor Intel Core i7 CPU 860 2.80 GHz and 8 GB RAM. We use the quadratic solver on MATLAB 2017a.
For the estimator, we define the direction vector used in the parametric DDF as a function of an angular variable , which allows us to investigate alternative directions. Specifically, the direction vector used in the DDF is . We examine the set of directions corresponding to the angles .
Results: Random Noise Directions
Table 1 and Table 2 show results corresponding to the two performance criteria introduced above and shown in Figure 4, the MSE relative to the true function and the MSE relative to a testing data set, respectively. Table 1 shows that the direction corresponding to the angle , , produces the smallest values of MSE (shown in bold in the table) regardless of the direction used for the MSE computation. However, the estimator’s quality diminishes if we select the extreme directions corresponding to the angles and . Table 2 reports performance via a testing set, the direction corresponding to the smallest MSE value (shown in bold) is always the one matching the direction used in the MSE computation. In applications, using a testing set is necessary because the true function is unknown. Table 2 shows the benefits of matching the direction of MSE evaluation direction outweigh the benefits of selecting a direction based on the properties of the function being estimated.
Avg MSE: Comparison  
to the True Function  
DDF Angle  
MSE Dir Angle  
2.09  0.75  0.56  1.16  3.68  
1.36  0.46  0.32  0.63  1.89  
1.25  0.41  0.28  0.51  1.48  
1.59  0.50  0.32  0.57  1.60  
3.06  0.91  0.55  0.92  2.44  
Note: Displayed are measured values multiplied by . 
Avg MSE: Comparison  
to OutofSample  
DDF Angle  
MSE Dir Angle  
28.28  29.43  31.29  34.23  40.67  
18.03  17.79  18.19  19.09  21.32  
16.38  15.55  15.45  15.77  16.90  
20.50  18.67  18.04  17.90  18.46  
38.63  33.07  30.68  29.29  28.70  
Note: Displayed are measured values multiplied by . 
For the outofsample testing set, the direction that provides the smallest MSE value is the direction used for the MSE computation. Because the functional estimate is optimized for the direction specified in the SDDF, it is perhaps expected that using the same direction that will be used in the MSE evaluation would produce a relatively low MSE compared to other directions. However, when the functional estimate is compared to the true function, the MSE values are around ten times smaller than the outofsample testing case. In outofsample testing the presence of noise in the observations causes a deviation regardless of the quality of the estimator or the number of observations. The DDF direction corresponding to the smallest MSE is the direction orthogonal to the true function (i.e., for our DGP). This direction provides the shortest distance from the observations to the true function. We conclude that, in this experiment, it is preferable to select a direction orthogonal to the true function (see Section 5 for further experiments).
From the fixed noise direction experiments (see B.1), we observe that using a direction for the estimator that matches the direction used for the noise generation significantly reduces the MSE values compared to the true function. From this, we infer that when endogeneity is severe, using a direction that matches the characteristics of this endogeneity significantly improves the fit of the estimator; i.e., the MSE is smaller for the matching direction than for the second best direction in of the cases (see Section 5 for the details).
Finally, we need to solve the problem of evaluating alternative directions when the true function is unknown so that we can evaluate alternative directions in the application data. Below, we describe our proposed alternative measure of fit.
4.2 Radial MSE Measure
MSE is typically measured by the average sum of squared errors in the dimension of a single variable, such as cost or output. As explained in Section 4.1, when we compare outofsample performance, we find that the best direction to use in estimating a SDDF is the direction used for MSE evaluation regardless of the direction of noise in the DGP or any other characteristics of the DGP. To avoid this relationship between the direction of estimation and the direction of evaluation, we propose a radial MSE measure.
We begin by normalizing the data to a unit cube and consider a case of outputs and observations, where the original observations are:
The normalized observations are:
(14)  
(15) 
Our radial MSE measure is the distance from the testing set observation to the estimated function measured along a ray from the testing set observations to the center . Having normalized the data, the center for the radial measure is
The radial MSE measure is the average of the distance from each testing set observation to the estimated function measured radially. Figure 5 illustrates this measure. For a convex function, a radial measure reduces the bias in the measure for extreme values in the domain.
5 Monte Carlo Simulations
We next examine how different DGPs affect the optimal direction for the DDF estimator based on a set of Monte Carlo simulations. We consider both random noise directions for each observation and a fixed noise direction representing a high endogeneity case. We consider the effects of the different variance levels for the noise and changes in the underlying distribution of the production data. Using the simplest case of two outputs and a fixed cost level for all observed units allows us to separate the effects of the data and of the function.
5.1 CNLSd Formulation for Cost Isoquant Estimation
Before describing our experiments, we first outline the CNLSd for estimating the isocost level set. It is based on the following optimization problem:
(16)  
(16a)  
(16b)  
(16c) 
Note all observations, , have a common cost level. This allows us to focus on a 2dimensional estimation problem. For results related to 3dimensional estimation problems see B.2, Experiment 6.
We can recover the fitted values, , and the coefficient, , using:
(17)  
(18) 
5.2 Experiments
We conducted several experiments to investigate the optimal direction for the DDF estimator. Four experiments’ results are shown in the main text of the paper with two additional experiments described in the appendix.
Experiment 1  Base case: A two output circular isoquant with uniformly distributed angle parameters and random noise direction
For the base case, we consider a fixed cost level and approximate a two output isoquant; i.e., . Indexing the outputs by and observations by , we generate the output variables as:
(19) 
where is the observation on the isoquant and is the noise. We generate the output levels as:
(20)  
(21) 
where , is drawn randomly from a continuous uniform distribution, . The noise terms, , have the following expressions:
(22)  
(23) 
where the length is drawn from the normal distribution , the angle is observation specific and characterizes the noise direction for each observation, and is drawn from a continuous uniform distribution . The values considered for the directions in CNLSd estimator are . The standard deviation of the normal distribution is . We perform the experiment times for each parameter setting.
Table 3 reports the radial MSE values from a testing set of observations lying on the true function.
CNLSd Direction Angle  
Average MSE across simulations  13.90  4.65  3.32  4.49  13.93 
Note: Displayed are measured values multiplied by . 
As shown in Table 3, the angle corresponding to the smallest MSE (shown in bold) is the one that gives an orthogonal direction to the center of the true function, , and that the MSE values differ significantly, increasing at similar rates as the direction angle deviates from in either direction.
Experiment 2  The base case with fixed noise directions
In this experiment, , which characterizes the noise direction for each observation, is constant for all observations, . The values used for and the directions in CNLSd estimator are the same, . The standard deviation of the normal distribution is again . We perform the experiment times for each parameter settings. Table 4 reports the results.
Each row in the Table 4 corresponds to a different noise direction in DGP. The bold numbers identify the directions in CNLSd estimator that obtain the smallest MSE for each noise direction. We confirm our previous insight, from the parametric estimator and fixed noise direction case described in B.1, that the bold values appearing on the diagonal (from the upperleft to the lowerright of Table 4) correspond to the directions used in CNLSd. This result indicates that selecting the direction in the SDDF that matches the underlying noise direction in the DGP results in improved functional estimates.
CNLSd Direction Angle  
Noise Direction Angle  
2.69  3.03  4.49  8.86  25.47  
7.49  3.44  4.00  8.07  28.83  
20.28  5.79  4.30  5.80  19.06  
25.58  7.80  4.18  3.51  6.84  
25.90  9.09  4.73  3.10  2.57  
Note: Displayed are measured values multiplied by . 
Experiment 3. Base case with fixed noise direction and different noise levels
In Experiment 3, we vary the noise term by changing the coefficient. Table 5 reports the results for .
CNLSd Direction Angle  
Noise Direction Angle  
0.92  0.82  0.96  1.53  5.12  
1.83  1.09  1.09  1.47  5.45  
3.70  1.41  1.29  1.43  3.93  
5.75  1.68  1.27  1.18  1.86  
4.61  1.40  0.95  0.79  0.90  
Note: Displayed are measured values multiplied by . 
In Table 5 (Experiment 3, with ), we do not observe the same diagonal pattern observed in Experiment 2, and the best direction for CNLSd estimator does not match the direction selected for the noise. This leads us to hypothesize that when the noise level is small, data characteristics, such as the distribution of the regressors or the shape of the function, affect the estimation whereas when the noise level is large, regressors’ relative variability becomes a more dominant factor in determining the best direction for the CNSLd estimator.
Experiment 4: Base case with different distributions for the initial observations on the true function
In Experiment 4, we seek to understand how changing the DGP for the angle, , affects the optimal direction. We consider the three normal distributions with different parameters: , and . We truncate the tails of the distribution so that the generated angles fall in the range . Noise is specified as in Experiment 1. Table 6 reports the results of this experiment.
Mean of the  CNLSd Direction angle  

Normal Distribution ()  
3.19  2.21  3.89  10.28  46.47  
8.44  2.92  1.98  3.17  9.00  
45.64  10.25  4.02  2.43  3.07  
Note: Displayed are measured values multiplied by . 
In Table 6, we observe that selecting a direction in the SDDF to match , the mean of the distribution for the angle variable used in the DGP, corresponds to the smallest MSE value. This result suggests that the estimator’s performance improves when we select a direction that points to the “center” of the data.
B.2 presents additional experiments, varying the distribution of the observations and considering three outputs with a fixed costed level. These experiments lend further support to the strategy of selecting a direction pointed to the “center” of the data.
6 Proposed Approach to Direction Selection
Based on Monte Carlo simulations, we found that the optimal direction depends on the shape of the function and the distribution of the observed data. This of itself is not surprising. However, by assuming a unimodal distribution for the data generation process, a direction that aims towards the “center” of the data and is perpendicular to the true function at that point tends to outperform other directions. To apply this finding for a data set with outputs and observations, , we suggest selecting the direction for the DDF as follows:

Normalize the data:
(24) (25) 
Select the direction:
(26)
This provides a method for direction selection that can be used in applications when the true direction is unknown.^{12}^{12}12A cost function is convex with respect to the point . Therefore, to have a ray that points from the point to the median of the data, the directional vector is needed. We test the proposed method by estimating a cost function for a US hospital data set.
7 Cost Function Estimation of the US Hospital Sector
We analyze the cost variation across US hospitals using a conditional mean estimate of the cost function. We estimate a multioutput cost function for the US hospital sector by implementing our datadriven method for selecting the direction vector for the DDF. We report most productive scale size and marginal cost estimates.
7.1 Description of the Data Set
We obtain cost data from the American Hospital Association’s (AHA) Annual Survey Databases from 2007 to 2009. The costs reported include payroll, employee benefits, depreciation, interest, supply expenses and other expenses. We estimate a cost function which can be interpreted as a distance function with a single input when hospitals face the same input prices^{13}^{13}13Unfortunately we do not observe input prices. We chose to estimate a cost function and make the assumption of common input prices rather than impose an arbitrary division of the cost.. We obtain hospital output data from the Healthcare Cost and Utilization Project (HCUP) National Inpatient Sample (NIS) core file that captures data annually for all discharges for a 20% sample of US community hospitals. The hospital sample changes every year. For each patient discharged, all procedures received are recorded as International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9CM) codes. The typical hospital in the US relies on these detailed codes to quantify the medical services it provides (Zuckerman et al. (1994)). We map the codes to four categories of procedures, specifically the procedure categories are “Minor Diagnostic,” “Minor Therapeutic,” “Major Diagnostic,” and “Major Therapeutic” which are standard output categories in the literature (Pope and Johnson (2013)). The number of procedures is each category are summed for each hospital by year to construct the output variables. The total number of hospitals sampled is around 1,000 per year from 2007 to 2009.^{14}^{14}14The NIS survey is a stratified systematic random sample. The strata criteria are urban or rural location, teaching status, ownership, and bed size. This stratification ensures a more representative sample of discharges than a simple random sample would yield. For details see https://www.hcupus.ahrq.gov/tech_assist/sampledesign/508_compliance/508course.htm#{463754B8A30547E3B7EEA43953AA9478}. However, mapping between the two databases is only possible for approximately 50% of the hospitals in the HCUP data, resulting in approximately 450 to 525 observations available each year.
2007  

(523 observations)  
Cost ($)  MajDiag  MajTher  MinDiag  MinTher  
Mean  146M  162  4083  3499  7299 
Skewness  3.51  2.89  2.63  5.19  3.28 
25percentile  24M  9  277  108  512 
50percentile  72M  73  1688  938  3108 
75percentile  182M  207  5443  4082  9628 
2008  
(511 observations)  
Cost ($)  MajDiag  MajTher  MinDiag  MinTher  
Mean  163M  175  4433  3688  7657 
Skewness  4.19  3.80  2.97  4.87  2.82 
25percentile  28M  10  325  120  545 
50percentile  83M  76  1809  1013  3350 
75percentile  189M  246  5984  4569  10781 
2009  
(458 observations)  
Cost ($)  MajDiag  MajTher  MinDiag  MinTher  
Mean  175M  161  4471  3615  7905 
Skewness  3.39  3.78  2.43  4.68  2.41 
25percentile  31M  12  420  148  713 
50percentile  91M  69  1737  1136  3458 
75percentile  220M  230  6402  4694  10989 
7.2 PreAnalysis of the Data Set
7.2.1 Testing the Relevance of the Regressors
We begin by testing the statistical significance of our four output variables, , for predicting cost. While the variables selected have been used in previous studies, we use these tests to evaluate whether this variable specification can be rejected for the current data set of U.S. hospitals from 20072009.
The null hypothesis stated for the
th output is:against:^{15}^{15}15Where the notation implies the vector excluding the th component.
We implement the test with a Local Constant Least Squares (LCLS) estimator described in Henderson and Parmeter (2015), calculating bandwidths using leastsquares crossvalidation. We use 399 wild bootstraps. We found that all output variables were highly statistically significant for all years.
7.3 Results
CNLSd and Different Directions
We analyze each year of data as a separate crosssection because, as noted above, the HCUP does not track the same set of hospitals across years. To illuminate the direction’s effect on the functional estimates, we graph “Cost” as a function of “Major Diagnostic Procedures” and “Major Therapeutic Procedures” holding “Minor Diagnostic Procedures” and “Minor Therapeutic Procedures” constant at their median values. Figure 6 illustrates the estimates for three different directions, one with only a cost component, one with only a component in Major Therapeutic Procedures, and one that comes from our median approach. Visual inspection indicates that the estimates with different directions produce significantly different estimates, highlighting the importance of considering the question of direction selection.
We compare the estimator’s performance when using different directions. Table 8 reports the MSE for three sample directions in each year. We define our direction vector as .^{16}^{16}16We focus on types of directions found to be competitive in our Monte Carlo simulations.
Direction  Year  

(  2007  2008  2009 
(0.45, 0.45, 0.45, 0.45, 0.45)  2.10  1.30  1.50 
(0.35, 0.35, 0.35, 0.35, 0.71)  2.15  1.65  1.29 
Median Direction  1.79  1.55  1.34 
Note: Displayed are the measured values  
multiplied by 
We pick two directions, one with equal components in all dimensions, and a second direction that has a cost component that is double the value of the output components. The median vector is , which is very close to the costonly direction. The MSE varies by 1530% over the different directions. We observe that there is no clear dominant direction; however, the median direction performs reasonably well in all cases. We conclude that as long as a direction with nonzero components for all variables that could contain noise is selected, then the precise direction selected is not critical to obtaining improved estimation results.
Comparison with other estimators
We compare three methods to estimate a cost function: 1) a quadratic functional form (without the crossproduct terms), Färe et al. (2010); 2) CNLSd with the direction selection method proposed in Section 6; and 3) lower bound estimate calculated using a local linear kernel regression with a Gaussian kernel and leave oneout crossvalidation for bandwidth selection, Li and Racine (2007).^{17}^{17}17For CNLSd, we select a value for an upper bound through a tuning process, , and impose the upper bound on the slope coefficients estimated (Lim, 2014). We select these estimators because a quadratic functional form to model production has been used in recent productivity and efficiency analysis of healthcare. See, for example, Ferrier et al. (2018). The local linear kernel is selected because it is an extremely flexible nonparametric estimator and provides a lower bound for the performance of a functional estimate. However, note that the local linear kernel does not satisfy standard properties of a cost function; i.e., cost is monotonic in output and marginal costs are increasing as output increases.
We will use the criteria of Kfold average MSE with to compare the approaches. This means we split the data equally into 5 parts. We use 4 of the 5 parts for estimation (training) and evaluate the performance of the estimator on the 5th part (testing). We do this for all 5 parts and average the results. The values presented in Table 9 correspond to the average across folds.
Quadratic  CNLSd  Lower Bound  
Year  Regression  (Median Direction)  Estimator 
2007  3.43  2.44  2.35 
2008  2.76  1.93  1.48 
2009  2.43  1.80  1.53 
Note: The MSE values displayed are the measured  
values multiplied by 
While the average MSEs for all years are lowest for the lower bound estimator, CNLSd performs relatively well as it is close to the lower bound in terms of fitting performance while imposing standard axioms of a cost function. As is true of most production data, the hospital data are very noisy. The shape restrictions imposed in CNLSd improves the interpretability. The CNLSd estimator outperforms the parametric approach, indicating the general benefits of nonparametric estimators.
Description of Functional Estimates  MPSS and Marginal Costs
We report the most productive scale size (MPSS) and the marginal costs for the a quadratic parametric estimator, the CNLSd estimator with our proposed direction selection method, and an alternative.^{18}^{18}18Here most productive scale size is measured on each ray from the origin (fixing the output ratios) and is defined as the cost level that maximizes the ratio of aggregate output to cost. Marginal cost is measured on each ray from the origin (fixing the output ratios) and is defined as the cost to increase aggregate output by one unit. These metrics are determined on the averaged Kfold estimations for each estimation method. For the MPSS, we present the cost levels obtained for different ratios of Minor Therapeutic procedures (MinTher) and Major Therapeutic procedures (MajTher), with the minor and major diagnostics held constant at their median levels.
MPSS results are presented in Table 10 and the values for CNLSd (Median Direction) are illustrated in Figure 7. We observe small variations across both years and estimators. The differences across years are in part due to the sample changing across years. Most hospitals are small and operate close to the MPSS. However, there are several large hospitals that are operating significantly above MPSS. Hospitals might choose to operate at larger scales and provide a large array of services allowing consumers to fulfill multiple healthcare needs.
For marginal costs, we present the values for different percentiles of the MinTher and MajTher, with the minor and major diagnostics held constant at their median levels. A more exhaustive comparison across all outputs is presented in C. Marginal cost information can be used by hospital decision makers to select the types of improvements that are likely to result in higher productivity with minimal cost increase. For example, consider a hospital that is in the percentile of the data set for all four outputs in 2008 and the hospital manager has the option to expand operations for either minor or major diagnostic procedures. Results reported in Tables 11 and 12 indicate that an increase of 1 minor therapeutic procedures would result in a increase in cost. Alternatively, an increase of 1 major therapeutic procedures would result in a increase in cost. A decision maker would want to consider the revenue generated by the different procedures; however, these estimates provide insights regarding the incremental cost of additional major and minor therapeutic procedures.
CNLSd is the most flexible of the estimators and allows MPSS values to fluctuate significantly across percentiles. CNLSd does not smooth variation, rather it minimizes the distance from each observation to the shape constrained estimator. In C, results for the local linear kernel estimator are also presented. Even though the local linear kernel bandwidths are selected via crossvalidation, relatively large values are selected due to the relatively noisy data and the highly skewed distribution of output. These large bandwidths and the parametric nature of the quadratic function make these two estimators relatively less flexible compared to CNLSd. A feature of performance that is captured only by CNLSd is that, hospitals specializing in either minor or major therapeutics maximize productivity at a larger scales of operation as illustrated in Figure 7.
Ratio  Quadratic Regression  CNLSd (median)  CNLSd (equal)  

MajTher/MinTher  2007  2008  2009  2007  2008  2009  2007  2008  2009 
20%  13  379  252  210  61  88  224  137  106 
30%  17  861  640  146  66  83  134  129  148 
40%  272  377  1090  107  56  77  127  85  135 
50%  870  249  1552  112  64  85  124  126  134 
60%  360  210  276  90  70  120  88  96  142 
70%  205  182  187  111  66  184  132  104  104 
80%  151  170  150  174  69  286  221  110  111 
Note: The values displayed are in $M 
Percentile  Quadratic Regression  CNLSd (median)  CNLSd (equal)  

MinTher  MajTher  2007  2008  2009  2007  2008  2009  2007  2008  2009 
25  25  8.9  6.5  13.2  0.03  0.03  0.03  0.2  0.02  0.1 
25  50  8.9  6.5  13.2  0.05  0.1  0.1  0.04  0.1  0.04 
25  75  8.9  6.5  13.2  0.2  0.04  0.03  0.1  0.02  0.02 
50  25  8.1  6.1  12.4  6.9  5.5  7.4  5.9  6.3  7.8 
50  50  8.1  6.1  12.4  4.3  4.9  7.8  2.1  3.7  7.4 
50  75  8.1  6.1  12.4  0.2  0.4  0.03  0.1  0.02  0.02 
75  25  6.0  5.0  10.4  9.6  13.5  14.0  9.5  10.9  14.1 
75  50  6.0  5.0  10.4  9.6  13.5  14.3  9.6  10.9  13.8 
75  75  6.0  5.0  10.4  5.7  10.1  6.4  4.6  8.7  6.4 
Note: The values displayed are in $k 
Percentile  Quadratic Regression  CNLSd (median)  CNLSd (equal)  

MinTher  MajTher  2007  2008  2009  2007  2008  2009  2007  2008  2009 
25  25  10.5  11.5  9.8  0.1  0.04  0.1  0.2  0.03  0.1 
25  50  11.7  13.0  10.8  11.3  11.8  15.7  10.5  10.3  14.6 
25  75  15.1  17.2  14.5  19.8  22.1  24.6  19.8  21.8  24.0 
50  25  10.5  11.5  9.8  0.4  0.2  0.5  0.1  0.1  0.4 
50  50  11.7  13.0  10.8  3.7  7.7  1.7  6.9  7.1  3.7 
50  75  15.1  17.2  14.5  19.8  22.0  24.6  19.8  21.8  24.0 
75  25  10.5  11.5  9.8  0.2  0.03  0.1  0.0  0.1  0.1 
75  50  11.7  13.0  10.8  0.2  0.2  0.4  0.8  0.1  0.3 
75  75  15.1  17.2  14.5  18.3  12.4  19.8  16.2  11.0  15.2 
Note: The values displayed are in $k 
The marginal cost results for Minor Therapeutic procedures are presented in Table 11 and Figure 8 (left) and the marginal cost results for Major Therapeutic procedures are reported in Table 12 and Figure 8 (right). As was the case for MPSS (see Table 10), CNLSd is more flexible and its marginal cost estimates vary significantly across percentiles. The CNLSd with different directions provides very similar marginal costs estimates. However, the CNLSd estimates differ significantly from the marginal cost estimates obtained with the parametric estimator. For CNLSd the marginal costs results are in line with the theory that marginal costs are increasing with scale. This property can also be violated if using a nonparametric estimator without any shape constraints imposed. For example this can be seen in the marginal costs of minor therapeutic procedures for the parametric (quadratic) regression estimator, Figure 8.
Our data set, which combines AHA cost data with AHRQ output data for a broad sample of hospitals from across the US, is unique to the best of our knowledge. However, the marginal cost estimates are broadly in line with marginal cost estimates for US hospitals for similar time periods. Gowrisankaran et al. (2015) studied a considerably smaller set of Northern Virginia hospitals observed in 2006 that, on average, were larger that hospitals in our data set. Due to the differences in the measures of output the marginal cost levels are not directly comparable. However, conditional on the size variation, the variation in marginal costs is similar to the variation we observe for the parametric (quadratic) regression specification applied to our data. Boussemart et al. (2015) analyzed data on nearly 150 hospitals located in Florida observed in 2005. The authors use a different output specification and a translog model; however, their distribution of hospital size is similar to our data set and we observe similar variances in marginal costs with the parametric (quadratic) regression specification applied to our data.
8 Conclusions
This paper investigated the improvement in functional estimates when specifying a particular direction in CNLSd. Based on Monte Carlo experiments, two primary findings emerged from our analysis. First, directions close to the average orthogonal direction to the true function performed well. Second, when the data are noisy, selecting a direction that matched the noise direction of the DGP improves estimator performance. Our simulations indicate that CNLSd with a direction orthogonal to the data is preferable if the noise level is not too large and that a direction that matches the noise direction of the DGP is preferred if the noise level is large. Thus, if users know the shape of the data or the characteristics of the noise, they can use CNLSd with a direction orthogonal to the data if the noise coefficient is small. Or if the noise coefficient is large, the user can select a direction close to the true noise direction, with nonzero components in all variables that potentially have noise. Our application to US hospital data shows that CNLSd performs similarly across different directions that all include nonzero components of the direction vector for variables that potentially have noise in their measurement.
In future research, we propose developing an alternative estimator that incorporates multiple directions in CNLSd while maintaining the concavity axiom. This would permit treating subgroups within the data, allowing different assumptions to be made across subgroups (e.g., forprofit vs. notforprofit hospitals).
References
 Ackerberg et al. (2015) Ackerberg, D. A., Caves, K., Frazer, G., 2015. Identification properties of recent production function estimators. Econometrica 83 (6), 2411–2451.
 Adler and Volta (2016) Adler, N., Volta, N., 2016. Accounting for externalities and disposability: a directional economic environmental distance function. European Journal of Operational Research 250 (1), 314–327.
 Afriat (1972) Afriat, S. N., 1972. Efficiency estimation of production functions. International Economic Review 13 (3), 568–598.
 Aparicio et al. (2017) Aparicio, J., Pastor, J., Zofio, J., 2017. Can Farrell’s allocative efficiency be generalized by the directional distance function approach? European Journal of Operational Research 257 (1), 345–351.
 Atkinson et al. (2003) Atkinson, S., Cornwell, C., Honerkamp, O., 2003. Measuring and decomposing productivity change: stochastic distance function estimation versus data envelopment analysis. Journal of Business & Economic Statistics 21 (2), 284–294.
 Atkinson and Tsionas (2016) Atkinson, S., Tsionas, M., 2016. Directional distance functions: optimal endogenous directions. Journal of Econometrics 190 (2), 301–314.
 Baležentis and De Witte (2015) Baležentis, T., De Witte, K., 2015. One and multidirectional conditional efficiency measurement: efficiency in Lithuanian family farms. European Journal of Operational Research 245 (2), 612–622.
 Bertsekas (1999) Bertsekas, D. P., 1999. Nonlinear programming. Athena Scientific, Belmont, MA.
 Boussemart et al. (2015) Boussemart, J.P., Leleu, H., Valdmanis, V., 2015. A twostage translog marginal cost pricing approach for Floridian hospital outputs. Applied Economics 47 (38), 4116–4127.

Carroll et al. (2006)
Carroll, R., Ruppert, D., Stefanski, L., Crainiceanu, C., 2006. Measurement error in nonlinear models: a modern perspective, Second Edition. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. CRC Press, Boca Raton, FL.
 Chambers (1988) Chambers, R. G., 1988. Applied production analysis. Cambridge University Press, New York, NY.
 Chambers et al. (1996) Chambers, R. G., Chung, Y., Fare, R., 1996. Benefit and distance functions. Journal of Economic Theory 70 (2), 407–419.
 Chambers et al. (1998) Chambers, R. G., Chung, Y., Fare, R., 1998. Profit, directional distance functions, and Nerlovian efficiency. Journal of Optimization Theory and Applications 98 (2), 351–364.
 Charnes et al. (1978) Charnes, A., Cooper, W. W., Rhodes, E., 1978. Measuring the efficiency of decision making units. European Journal of Operational Research 2 (6), 429–444.
 Coelli (2000) Coelli, T., 2000. On the econometric estimation of the distance function representation of a production technology. Université Catholique de Louvain. Center for Operations Research and Econometrics [CORE].
 Coelli and Perelman (1999) Coelli, T., Perelman, S., 1999. A comparison of parametric and nonparametric distance functions: with application to European railways. European Journal of Operational Research 117 (2), 326–339.
 Coelli and Perelman (2000) Coelli, T., Perelman, S., 2000. Technical efficiency of European railways: a distance function approach. Applied Economics 32 (15), 1967–1976.
 Daraio and Simar (2016) Daraio, C., Simar, L., 2016. Efficiency and benchmarking with directional distances: a datadriven approach. Journal of the Operational Research Society 67 (7), 928–944.
 Diewert and Wales (1987) Diewert, W. E., Wales, T. J., 1987. Flexible functional forms and global curvature conditions. Econometrica 55 (1), 43–68.
 Färe and Grosskopf (2010) Färe, R., Grosskopf, S., 2010. Directional distance functions and slacksbased measures of efficiency. European Journal of Operational Research 200 (1), 320–322.
 Färe et al. (2005) Färe, R., Grosskopf, S., Noh, D.W., Weber, W., 2005. Characteristics of a polluting technology: theory and practice. Journal of Econometrics 126 (2), 469–492.
 Färe et al. (2010) Färe, R., MartinsFilho, C., Vardanyan, M., 2010. On functional form representation of multioutput production technologies. Journal of Productivity Analysis 33 (2), 81–96.
 Färe et al. (2017) Färe, R., Pasurka, C., Vardanyan, M., 2017. On endogenizing direction vectors in parametric directional distance functionbased models. European Journal of Operational Research 262 (1), 361–369.
 Färe and Vardanyan (2016) Färe, R., Vardanyan, M., 2016. A note on parameterizing input distance functions: does the choice of a functional form matter? Journal of Productivity Analysis 45 (2), 121–130.

Ferrier et al. (2018)
Ferrier, G. D., Leleu, H., Valdmanis, V. G., Vardanyan, M., 2018. A directional
distance function approach for identifying the input/output status of medical
residents. Applied Economics 50 (9), 1006–1021.
URL https://doi.org/10.1080/00036846.2017.1349287  Frisch (1964) Frisch, R., 1964. Theory of production. Springer Science & Business Media, Dordrecht, Netherlands.
 Fukuyama and Matousek (2018) Fukuyama, H., Matousek, R., 2018. Nerlovian revenue inefficiency in a bank production context: evidence from Shinkin banks. European Journal of Operational Research 271 (1), 317–330.
 Gowrisankaran et al. (2015) Gowrisankaran, G., Nevo, A., Town, R., 2015. Mergers when prices are negotiated: evidence from the hospital industry. American Economic Review 105 (1), 172–203.
 Henderson and Parmeter (2015) Henderson, D. J., Parmeter, C. F., 2015. Applied nonparametric econometrics. Cambridge University Press, New York, NY.
 Hildreth (1954) Hildreth, C., 1954. Point estimates of ordinates of concave functions. Journal of the American Statistical Association 49 (267), 598–619.
 Hollingsworth (2003) Hollingsworth, B., 2003. Nonparametric and parametric applications measuring efficiency in health care. Health Care Managements Science 6 (4), 203–218.
 Johnson and Kuosmanen (2011) Johnson, A. L., Kuosmanen, T., 2011. Onestage estimation of the effects of operational conditions and practices on productive performance: asymptotically normal and efficient, rootn consistent stoNEZD method. Journal of Productivity Analysis 36 (2), 219–230.
 Kapelko and Oude Lansink (2017) Kapelko, M., Oude Lansink, A., 2017. Dynamic multidirectional inefficiency analysis of European dairy manufacturing firms. European Journal of Operational Research 257 (1), 338–344.
 Koopmans (1951) Koopmans, T. C., 1951. An analysis of production as an efficient combination of activities. In: Koopmans, T. C. (Ed.), Activity Analysis of Production and Allocation. John Wiley & Sons, Inc., New York, pp. 33–97.
 Kuosmanen (2008) Kuosmanen, T., 2008. Representation theorem for convex nonparametric least squares. Econometrics Journal 11 (2), 308–325.
 Kuosmanen and Johnson (2017) Kuosmanen, T., Johnson, A., 2017. Modeling joint production of multiple outputs in stoned: Directional distance function approach. European Journal of Operational Research 262 (2), 792–801.
 Kuosmanen and Johnson (2010) Kuosmanen, T., Johnson, A. L., 2010. Data envelopment analysis as nonparametric leastsquares regression. Operations Research 58 (1), 149–160.
 Kuosmanen and Kortelainen (2012) Kuosmanen, T., Kortelainen, M., 2012. Stochastic nonsmooth envelopment of data: semiparametric frontier estimation subject to shape constraints. Journal of Productivity Analysis 38 (1), 11–28.
 Kutlu (2018) Kutlu, L., 2018. A distributionfree stochastic frontier model with endogenous regressors. Economics Letters 163, 152–154.
 Levinsohn and Petrin (2003) Levinsohn, J., Petrin, A., 2003. Estimating production functions using inputs to control for unobservables. Review of Economic Studies 70 (2), 317–341.
 Lewbel (forthcoming) Lewbel, A., forthcoming. The identification zoo: meanings of identification in econometrics. Journal of Economic Literature.
 Li and Racine (2007) Li, Q., Racine, J. S., 2007. Nonparametric econometrics: theory and practice. Princeton University Press, Princeton, NJ.
 Lim (2014) Lim, E., 2014. On convergence rates of convex regression in multiple dimensions. INFORMS Journal on Computing 26 (3), 616–628.
 Lovell et al. (1994) Lovell, C. K., Travers, P., Richardson, S., Wood, L., 1994. Resources and functionings: a new view of inequality in Australia. In: Eichhorn (Ed.), Models and measurement of welfare and iInequality. Springer, Berlin, Germany, pp. 787–807.
 Luenberger (1992) Luenberger, D. G., 1992. Benefit functions and duality. Journal of Mathematical Economics 21 (5), 461–481.

Manski (2003)
Manski, C., 2003. Partial identification of probability distributions. Springer Series in Statistics. Springer, New York, NY.
 Olley and Pakes (1996) Olley, G. S., Pakes, A., 1996. The dynamics of productivity in the telecommunications equipment industry. Econometrica 64 (6), 1263–1297.
 Pope and Johnson (2013) Pope, B., Johnson, A. L., 2013. Returns to scope: a metric for production synergies demonstrated for hospital production. Journal of Productivity Analysis 40 (2), 239–250.
 Roshdi et al. (2018) Roshdi, I., Hasannasab, M., Margaritis, D., Rouse, P., 2018. Generalised weak disposability and efficiency measurement in environmental technologies. European Journal of Operational Research 266 (3), 1000–1012.
 Shephard (1953) Shephard, R. W., 1953. Cost and production functions. Princeton University Press, Princeton, NJ.
 Shephard (1970) Shephard, R. W., 1970. Theory of cost and production functions. Princeton University Press, Princeton, NJ.
 Sickles et al. (2002) Sickles, R. C., Good, D. H., Getachew, L., 2002. Specification of distance functions using semiand nonparametric methods with an application to the dynamic performance of Eastern and Western European air carriers. Journal of Productivity Analysis 17 (12), 133–155.
 Syverson (2011) Syverson, C., 2011. What determines productivity? Journal of Economic Literature 49 (2), 326–365.
 Tamer (2010) Tamer, E., 2010. Partial identification in econometrics. Annual Review of Economics 2, 167–195.
 Varian (1984) Varian, H. R., 1984. The nonparametric approach to production analysis. Econometrica, 579–597.
 Wooldridge (2009) Wooldridge, J. M., 2009. On estimating firmlevel production functions using proxy variables to control for unobservables. Economics Letters 104 (3), 112–114.
 Yagi et al. (2018) Yagi, D., Chen, Y., Johnson, A. L., Kuosmanen, T., 2018. Shapeconstrained kernelweighted least squares: Estimating production functions for chilean manufacturing industries. Journal of Business & Economic Statistics, 1–12.
 Zuckerman et al. (1994) Zuckerman, S., Hadley, J., Iezzoni, L., 1994. Measuring hospital efficiency with frontier cost functions. Journal of Health Economics 13 (3), 255–280.
Appendix A Properties of Directional Distance Functions and CNLSd
a.1 Direction Selection in Directional Distance Functions
In this appendix we prove that the direction vector affects the functional estimates. Let , then we can state the following theorem:
Theorem 2.
Suppose that two direction vectors exist, and , such that . Then the directional distance function estimates using these two different directions are not equal, .
Proof.
Observe that all decision variables appear in the objective function and that the objective function is a quadratic function while the constraints define a convex solution space; i.e., this optimization problem has a unique solution (Bertsekas (1999)). If we solve Problem (27) with , then the resulting solution vector is . Changing the direction vector from to the normalization constraint no longer holds for and . However, the previous argument holds for the uniqueness of . Thus, .
∎
a.2 Details of CNLSd
An alternative expression for CNLSd (cf. equations (16)(16c) from Section 5.1) is given by:
(28)  
(28a)  
(28b)  
Comments
There are no comments yet.