 # Analytic expressions for the Cumulative Distribution Function of the Composed Error Term in Stochastic Frontier Analysis with Truncated Normal and Exponential Inefficiencies

In the stochastic frontier model, the composed error term consists of the measurement error and the inefficiency term. A general assumption is that the inefficiency term follows a truncated normal or exponential distribution. In a wide variety of models evaluating the cumulative distribution function of the composed error term is required. This work introduces and proves four representation theorems for these distributions - two for each distributional assumptions. These representations can be utilized for a fast and accurate evaluation.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In the stochastic frontier model, the composed error term consists of the measurement error and the inefficiency term . The inefficiency

is assumed to be greater or equal to zero and random. Thus a random variable with a positive support is facilitated to model

. If one assumes independence of and , the composed error for production inefficiency is defined as . The cumulative distribution function (cdf) of is specified as:

 Fϵ(κ)=∫κ−∞fϵ(t)dt.

The cost inefficiency composed error term is defined as . Thus the cdf of can be written as: 111The proof is analogous to the one of Theorem 3.2. meesters2014note

notes that common assumptions for the distribution of the inefficiency terms are the truncated (below zero) normal , exponential and half-normal distribution. If

is assumed to follow a truncated (below zero) normal distribution, i.e.

,then the probability density function(pdf) of

is derived by kumbhakar2015practitioner as:

 fϵ(ϵ)=1√σ2v+σ2uΦ(μσu)ϕ(ϵ+μ√σ2v+σ2u)Φ(μσ2v−ϵσ2u√σ2v+σ2uσvσu) (1)

with , . Further, and are the pdf and cdf of the standard normal distribution respectively. Setting in the truncated normal distribution yields the half-normal distribution, thus the truncated normal distribution is a generalization of the half-normal distribution. Consequently, the truncated normal distribution is more flexible, with the trade-off of having one additional parameter. It was first introduced by stevenson1980likelihood.
Alternatively assume that the random variable follows an exponential distribution, i.e. where , then the pdf of is given by:

 fϵ(ϵ)=λexp{λϵ+σ2vλ22}Φ(−ϵσv−λσv) (2)

see kumbhakar2015practitioner. The mode of the distribution is at , thus implying the mode of producers to be efficient. This approach to inefficiency modeling was first introduced by meeusen1977efficiency.

Recently more and more models are developed, which do not only require the pdf but also the cdf of the random error to estimate the model parameters. Examples are

genius2012measuring, lai2013maximum, tsay2013simple, amsler2014using, tran2015endogeneity and sriboonchitta2017double. The recent paper by amsler2019evaluating introduced a representation of the cdf of the composed error term assuming follows the half-normal distribution. Before that, one had to rely on numerical integration methods to evaluate 222lai2013maximum introduced a numerical approximation, which breaks down for some parameter combinations. It also contained a typo so that for . The correction of which is supplied by Lai and Huang if requested.. Utilizing analytical representations of integrals is generally more accurate and yields a faster computation which thus allows for the estimation of more complex models. This work introduces the cdfs of for inefficiency terms following a truncated normal or exponential distribution. Section 2 introduces two separate representation theorems for the composed error term involving a truncated normal distribution. In Section 3, two theorems are introduced that allow to analytically represent if follows and exponential distribution. The proof of all theorems and lemmas are provided. The findings are then validated through simulation in Section 4.

## 2 Truncated Normal Inefficiency Model

In the following section two representations of with are introduced. Further, the proofs are provided. Additionally, information on the limiting behavior is given.

### 2.1 Representation using Owen’s T function

###### Theorem 2.1.

Let and be independent, then it holds that the cdf of can be represented as:

 Fϵ(κ) =1Φ(μσu)[12Φ(φ(κ))+12Φ(a√1+b2)−121(−∞,0)(aφ(κ)√1+b2) −T(φ(κ),a+bφ(κ)φ(κ))−T(a√1+b2,ab+φ(κ)(1+b2)a)]

where , and . Further, denotes Owen’s T function defined as:

 T(h,g)=12π∫g0exp{−12h2(1+t2)}1+t2dt

with in owen1956tables.

Theorem 1 is a direct consequence of the following Lemma:

###### Lemma 1.

Let and be independent, then it holds that the cdf of can be represented as:

 ∫κ−∞fϵ(t)dt =1Φ(μσu)∫φ(κ)−∞ϕ(y)Φ(a+by)dy

where , and .

#### 2.1.1 Proof of Lemma 1

Initially Lemma 1 is proven and then it is shown how Theorem 2.1 follows.

###### Proof.

Given the cdf as constructed through the integral of Equation 1, the expression may be simplified by substition:

which can be rearranged as:

 t=y√σ2v+σ2u−μ. (3)

The derivative of w.r.t. is:

 dydt=1√σ2v+σ2u⇔dt=√σ2v+σ2udy

Appropriately transforming the limits of the integral results in:

 φ(κ)=κ+μ√σ2v+σ2u limκ→∞φ(−κ)=−∞

and finally introducing and for ease of representation:

 μσ2v−tσ2u√σ2v+σ2uσvσu\xLeftrightarrow[](???)μσ2v−(y√σ2v+σ2u−μ)σ2u√σ2v+σ2uσvσu⇔μ√σ2v+σ2uσvσua+(−σuσv)by

Substituting of and into the integral of Equation 1 then yields
Lemma 1:

 ∫κ−∞fϵ(t)dt=1Φ(μσu)∫φ(κ)−∞ϕ(y)Φ(a+by)dy

#### 2.1.2 Proof of Theorem 2.1

###### Proof.

Theorem 2.1 follows by applying Lemma 1 and Equation 10,010.3 by owen1956tables:

 ∫ϕ(y)Φ(a+by)dy=T(y,ay√1+b2)+T(a√1+b2,y√1+b2a)−T(y,a+byy)−T(a√1+b2,ab+y(1+b2)a)+Φ(y)Φ(a√1+b2) (4)

in order to solve the obtained integral. An alternative representation is achieved by utilizing the following identity from owen1956tables:

 T(h,g)=12Φ(h)+12Φ(gh)−Φ(h)Φ(gh)−T(gh,1g)−121(−∞,0)(g) with g≠0

to rewrite the first term of Equation 4 to:

 T(y,ay√1+b2) =12Φ(y)+12Φ(a√1+b2)−Φ(y)Φ(a√1+b2) −T(a√1+b2,y√1+b2a)−121(−∞,0)(ay√1+b2)

Equation 4 can therefore be rewritten as:

 = 12Φ(y)+12Φ(a√1+b2)−121(−∞,0)(ay√1+b2) −T(y,a+byy)−T(a√1+b2,ab+y(1+b2)a)

Thus resulting in a compact representation of the integral of Equation 1, which from here on is referred to as Owen’s T function CDF:

 Fϵ(κ)= 1Φ(μσu)[12Φ(y)+12Φ(a√1+b2)−121(−∞,0)(ay√1+b2) −T(y,a+byy)−T(a√1+b2,ab+y(1+b2)a)]φ(κ)−∞ = 1Φ(μσu)[12Φ(φ(κ))+12Φ(a√1+b2)−121(−∞,0)(aφ(κ)√1+b2) −T(φ(κ),a+bφ(κ)φ(κ))−T(a√1+b2,ab+φ(κ)(1+b2)a)]

Here it becomes clear that if as tends towards , does the same, leading to a singularity. For the case the truncated normal distribution becomes the half-normal distribution, for which there is a closed form by amsler2019evaluating. For the sake of completeness it is provided below:

 Fϵ(κ) =2T(κ√σ2v+σ2u,σuσv)+Φ(κ√σ2v+σ2u)

For the function exhibtis a singularity.

 limy→0Fϵ(y)= 1Φ(μσu)[14+12Φ(a√1+b2)−121(−∞,0)(a) −1−Φ(0)2−T(a√1+b2,b)]

#### 2.1.3 Limiting Behavior

The following equations, which can be found in owen1956tables can be utilized to find the limits of the integral:

 limx→−∞Φ(x)=0 limg→∞T(h,g)=1−Φ(|h|)2 T(−h,g)=T(h,g) T(h,−g)=−T(h,g) limh→∞T(h,g)=0 T(0,g)=arctan(g)2π

The functional value of the cdf as tends towards is:

 limκ→−∞Fϵ(κ)= limκ→−∞1Φ(μσu)[12Φ(κ)+12Φ(a√1+b2)−121(−∞,0)(aκ√1+b2) −T(κ,a+b(κ)κ)−T(a√1+b2,ab+(κ)(1+b2)a)] = limκ→−∞1Φ(μσu)[12Φ(a√1+b2)−121(−∞,0)(aκ) −T(κ,b)−T(a√1+b2,κa)] = limκ→−∞1Φ(μσu)[12Φ(a√1+b2)−121(−∞,0)(aκ)+sgn(a)(12−12Φ(|a√1+b2|))]

In the case of :

 12Φ(a√(1+b2))+(12−12Φ(|a√1+b2|))=12Φ(a√(1+b2))+(12−12(1−Φ(a√1+b2))=0

If

 12Φ(a√1+b2)−12+12−12Φ(a√1+b2)=0

The functional value of the cdf as tends torwards is:

 limκ→∞Fϵ(κ)= −T(κ,a+b(κ)κ)−T(a√1+b2,ab+(κ)(1+b2)a)] =

If

 Φ(a√1+b2)Φ(μσu)=1

If

 12−12+12Φ(a√1+b2)−(−1)(12−12(1−Φ(a√1+b2)))=1

For , becomes a degenerate random variable, i.e. deterministically assumes value . Thus .
Further, if , becomes a degenerate random variable taking value . Thus .

### 2.2 Representation using the Bivariate Normal Distribution

###### Theorem 2.2.

Let and be independent, then it holds that the cdf of can be represented as:

 Fϵ(κ)dt =1Φ(μσu)BvN⎛⎜ ⎜ ⎜ ⎜⎝μ√σ2v+σ2uσvσu√1+(−σuσv)2,t+μ√σ2v+σ2u,ρ=−(−σuσv)√1+(−σuσv)2⎞⎟ ⎟ ⎟ ⎟⎠

where is the cdf of a bivariate normal distribution with correlation parameter .

#### 2.2.1 Proof of Theorem 2.2

A similar approach to the proof of Theorem 2.1 can be used to proof
Theorem 2.2.

###### Proof.

The Theorem 2.2 follows by applying equation by owen1980table:

 ∫φ(κ)−∞ϕ(y)Φ(a+by)dy =12π√1−ρ2∫φ(κ)−∞∫a√1+b2−∞exp[−r2−2ρrs+s22(1−ρ2)]drds =BvN(a√1+b2,φ(κ),ρ=−b√1+b2)

to the result of Lemma 1, thus constructing a representation in terms of the cdf of the bivariate normal distribution. Utilizing the introduced substitions from Lemma 1, the equation simplifies to

 Fϵ(κ)=1Φ(μσu)[BvN(a√1+b2,φ(κ),ρ=−b√1+b2)]

which is referred to as BvN CDF. ∎

## 3 Exponential Inefficiency Model

In the following section, two representations of with are introduced. Further, the proofs are provided. Additionally, information on the limiting behavior is given.

### 3.1 Representation using exp function

###### Theorem 3.1.

Let and be independent, then it holds that the cdf of can be represented as:

 Fϵ(κ)dt=1+exp{−a22}[exp{aφ(κ)}Φ(y)−exp{a22}Φ(φ(κ)−a)]

where and

Theorem 3.1 is a direct consequence of the following Lemma:

###### Lemma 2.

Let and be independent, then it holds that the cdf of can be represented as:

 ∫κ−∞fϵ(t)dt =−aexp{−a22}∫∞φ(κ)exp{ay}Φ(y)dy

where and .

Initially Lemma 2 is proven and then it is shown how Theorem 3.1 follows.

#### 3.1.1 Proof of Lemma 2

###### Proof.

Utilising standard algebra, the Equation 2 can be rearranged as follows:

 λ∫κ−∞exp{λt}exp{σ2vλ22}exp{−λ2σ2v}exp{λ2σ2v}Φ(−t+λσ2vσv)dt ⇔λexp{σ2vλ22−λ2σ2v}∫κ−∞exp{λt+λ2σ2v}Φ(−t+λσ2vσv)dt ⇔λexp{−σ2vλ22}∫κ−∞exp{−λσva% (−t+λσ2vσv)}Φ⎛⎜⎝1b(−t+λσ2vσv)⎞⎟⎠dt

Given the cdf as constructed through the integral of Equation 2, the expression may be simplified by substition:

 y=φ(t):=−(t+λσ2vσv),

which can be rearranged as:

 t=−yσv−λσ2v

The derivative of w.r.t. is:

 dydt=−1σv↔dt=−σvdy.

Appropriately transforming the limits of the integral results in:

 φ(κ)=−(κ+λσ2vσv) limκ→−∞φ(κ)=∞

Substituting and in the integral of Equation 2 then yields Lemma 2:

 ∫κ−∞fϵ(t)dt =−aexp{−a22}∫∞φ(κ)exp{ay}Φ(y)dy

#### 3.1.2 Proof of Theorem 3.1

###### Proof.

Theorem 3.1 follows immediately by applying Lemma 2 and Equation by owen1980table:

Thus resulting in a compact representation of the integral in Equation 2:

 Fϵ(κ) =−aexp{−a22}[1aexp{ay}Φ(by)−1aexp{a22b2}Φ(by−ab)]∞φ(κ) =1+exp{−a22}[exp{aφ(κ)}Φ(y)−exp{a22}Φ(φ(κ)−a)]

#### 3.1.3 Limiting Behaviour

The limits can be simplified with

 limκ→−∞exp(κ)=0 limκ→−∞Φ(κ)=0 limκ→−∞exp(−κ)Φ(κ)=0 limκ→∞exp(κ)Φ(−κ)=0

.

The functional value of the cdf as tends torwards is:

 limκ→−∞Fϵ(κ) =limκ→−∞(1+exp{−a22}[exp{aφ(κ)}Φ(φ(κ))−exp{a22}Φ(φ(κ)−a)]) =1+exp{−a22}[−exp{a22}] =1+(−1)=0

The functional value of the cdf as tends torwards is:

 limκ→∞Fϵ(κ) =limκ→∞(1+exp{−a22}[exp{aφ(κ)}Φ(φ(κ))−exp{a22}Φ(φ(κ)−a)]) =1+0=1

For , becomes a degenerate random variable, i.e. deterministically assumes value . Thus .

### 3.2 Representation using the Exponentially Modified Gaussian Distribution

###### Theorem 3.2.

Let and be independent, then it holds that the cdf of can be represented as:

 Fϵ(κ)dt=1−Fϵ∗(−κ)

where

is cdf of the Exponentially Modified Gaussian Distribution with parameters

and .

The Exponentially Modified Gaussian (EMG) distributed random variable is the sum of an independent normal and an exponential random variables. Thus

 ϵ∗ =v+u.

The cdf of the EMG with the mean of Gaussian component being is:

 Fϵ∗(κ)=Φ(λκ)−exp(−λκ+(λσv)2/2+log(Φ(λκ,(λσv)2,λσv))

#### 3.2.1 Proof of Theorem 3.2

###### Proof.

Since the random variable is symmetric around zero and follow the same distribution, i.e. and . Consequently

 ϵ∗=−ϵ=−v−u.

Thus

 Fϵ(κ)=P(ϵ≤κ)=P(−ϵ≥−κ)=1−P(−ϵ≤−κ)=1−F−ϵ(−κ)=1−Fϵ∗(−κ).

This distribution is referred to as EmG CDF. ∎

## 4 Simulation

Validation of the results is done by comparing the function values of the analytical cdfs to the function values of the empirical cdf. Construction of the empirical cdf was done by drawing random numbers were drawn from and

. The cdfs were evaluated at the empirical quantiles

with for any permutations of parameter values:

 μ∈{−8,−4,−2,−1,1,2,4,8} σu∈{0.25,0.5,1,2,4} λ∈{0.25,0.5,1,2,4,8} σv∈{0.25,0.5,1,2,4}

The accuracy of the implementation is defined as:

 Fϵ(Q∗(p))−F∗ϵ(Q∗(p))=Fϵ(Q∗(p))−p.

For each simulation scenario observations were generated, as in amsler2019evaluating. Further, the accuracy and computation time of the of the representations of the cdfs are compared to numerical integration. Here, the double exponential integration method was chosen as it is considered a fairly good numerical integration method by Weisstein.

### 4.1 Simulation Results for the Truncated Normal Inefficiency Model

For the truncated normal inefficiency model the derived formulas are identical in theory, but the accuracy of the numerical implementation does depend on both the implementation of the
Owen’s T function and the bivariate normal cdf 333The statistical software R () was utilized. The truncated normal distributed random numbers were generated using the implementation of an an accept-reject sampler in the package truncnorm (). The Owen’s T function of the pracma() and cdf of the bivariate normal distribution of the pbivnorm () package were used.. The figure 1 summarises the differences over all parameter combinations for different values of for the representations Owen CDF and BvN CDF. Figure 1: Out of a total of 1800 evaluation points 269 outliers of the Owen CDF and 12 outliers of BvN CDF are not displayed in Figure 1. The observed loss of accuracy for values close to −μ is due to described singularity. The BvN CDF yields more accurate results.

The implementation of the BvN CDF is more accurate in this simulation. Thus, in the further analysis the Owen CDF is neglected.

In Table 1 the relative accuracy and relative computation time of the numerical integration implementation relative to BvN CDF is presented 444For the numerical integration the pracma package’s function quadinf was used. To measure the time the package microbenchmark() was utilized.

The results show that the BvN CDF is faster in terms of computation time.

### 4.2 Simulation Results for the Exponential Inefficiency Model

For the exponential inefficiency model, both representations seem identical in terms of accuracy. The simulation results are 666For the cdf of the EmG distribution the package emg () was used: Figure 2: The accuracy of both implementations seem not to differ.

The implementation of the EmG CDF is slightly faster. Thus the further analysis will focus on this representation.

The results in Table 2 show that the EmG CDF is equally accurate and faster in terms of computation time compared to the numerical integration. The accuracy is close to the numerical integration.

A more detailed table of the simulation results for the BvN CDF and EmG CDF is presented in Section 6.1.

## 5 Conclusions

The contribution of this paper are the analytical integrals of the cdf of the composed error term for the case of the inefficiency term following a truncated normal or exponential distribution. For the truncated normal inefficiency model the cdf can be written as the Owen CDF and BvN CDF, which are analytically the same but the numerical implementation of the latter is more accurate. In the exponential inefficiency model, the cdf is written as the Erf CDF and EmG CDF, which yield similiar results, in terms of accuracy with the second being faster to compute. The analytical representation of the cdfs allow for accurate and fast evaluation.

###### Acknowledgements.
The authors would like to thank Alexander Ritz for his insights and helpful comments. His mathematical support played an integral part in the derivation of the presented work. The authors received financial support from the German Research Foundation (DFG) within the research project KN 922/9-1

## Conflict of interest

The authors declare that they have no conflict of interest.