1 Introduction
On June 22, 1919 in the Czechoslovakia, a law was passed introducing the obligation to establish a library in each municipality. One hundred years have passed and the Czech Republic with Slovakia have one of the densest networks of public libraries in the world. There was one public library for every citizens in the Czech Republic in 2017. Such a large amount of libraries requires welladvised management and careful allocation of public resources.
In this study, we analyze technical efficiency of Czech public libraries established by municipalities. We follow the data envelopment analysis (DEA) approach pioneered by Charnes et al. (1978) and Banker et al. (1984). DEA is a nonparametric method measuring how efficiently can decision making units (DMU) transform a set of inputs to a set of outputs. We utilize the Chebyshev distance DEA model with variable returns to scale recently proposed by Hladík (2019). This model is based on the robust optimization viewpoint and has many desirable properties – superefficiency, comparability of efficiency scores across different analyzes, inclusion of zero inputs and outputs, units invariance, order of rankings identical to the classical approach and straightforward interpretability.
There are many studies in the literature assessing technical efficiency of libraries. We select input and output variables consistently with the literature. Specifically, we consider total expenditures, employees and book collection as inputs with registrations, book circulation, events attendance and collection additions as outputs. The most similar studies in terms of inputs and outputs are Reichmann (2004), Miidla and Kikas (2009) and Shahwan and Kaba (2013). Our paper is, however, unique in the sample size – we analyze municipal libraries in total. For comparison, the average sample size is 73 in the 16 studies we review in tables 1 and 2. Such a large data sample allows us to thoroughly investigate the impact of the operating environment on performance of libraries. We consider three possible environmental variables – population of municipality, population density and distance to municipality with extended powers^{1}^{1}1The Czech Republic is divided into 8 cohesion regions (NUTS 2 – region soudržnosti), 14 regions (NUTS 3 – kraj), 77 districts (LAU 1 – okres), 206 municipalities with extended powers (obec s rozšířenou působností), 393 municipalities with authorized municipal office (obec s pověřeným obecním úřadem) and municipalities (LAU 2 – obec) as of April 1, 2019.
. Using regression analysis, we find that the efficiency score is significantly increasing with population. Extremely small villages are the exception as they tend to have higher efficiency score than villages with slightly higher population due to their very low and often zero inputs. We also find that for smaller villages the efficiency score is decreasing with distance to municipality with extended powers. Population density is insignificant in our analysis. Motivated by these results, we split the sample of libraries into 11 categories using decision tree analysis. We perform DEA separately for each category filtering out the influence of heterogeneous operating environment. This also decreases the discriminatory power of DEA which is very high in the preliminary analysis due to large sample size. The effect of distance is removed but the effect of population is not completely eliminated although it is reduced. This means that the distance can be safely treated as environmental variable while the population requires a more cautious approach as it is partially environmnetal and partially explanatory variable. Our proposed separation approach is quite suitable for this situation in contrast to the allinone, twostage and multistage models that would take population as strictly environmental variable (see e.g.
Yang and Pollitt, 2009; De Witte and Marques, 2010). We also perform DEA for expertdefined categories and find that the proposed separation approach is robust to specification of subsamples to a certain degree. Our study contributes to the field of twostage efficiency analysis – one of the four active research fronts in DEA according to Liu et al. (2016).The rest of the paper is structured as follows. In Section 2, we review the literature dealing with DEA and efficiency of libraries. In Section 3, we describe the Chebyshev distance DEA model used in the first stage and the regression model with decision tree model for analyzing efficiency scores used in the second stage. In Section 4, we compute efficiency scores of Czech public libraries in the year 2017 and investigate the impact of the operating environment. We conclude the paper in Section 5.
Paper:  Chen (1997) 

Sample:  23 University Libraries in Taipei, Taiwan 
Inputs:  Operating Expenditures, Employees, Area 
Outputs:  Visits, Circulation, InterLibrary Circulation, Consultations 
Paper:  Sharma et al. (1999) 
Sample:  47 Public Libraries in Hawaii, United States. 
Inputs:  Operating Expenditures, Employees, Collection, Days Open 
Outputs:  Visits, Circulation, Consultations 
Paper:  Chen et al. (2005) 
Sample:  23 Public Libraries in Tokyo, Japan 
Inputs:  Employees, Collection, Area, Population 
Outputs:  Registrations, Circulation 
Paper:  Miidla and Kikas (2009) 
Sample:  20 Central Public Libraries in Estonia 
Inputs:  Operating Expenditures, Personnel Expenditures, Collection, Area 
Outputs:  Registrations, Circulation 
Paper:  Reichmann and SommersguterReichmann (2010) 
Sample:  68 University Libraries in North America, Austria and Germany 
Inputs:  Employees, Collection 
Outputs:  Circulation, Collection Additions, Serial Subscriptions 
Paper:  Simon et al. (2011) 
Sample:  34 University Libraries in Spain 
Inputs:  Operating Expenditures, Employees, Area 
Inter.:  Collection, Serial Subscriptions, Opening Hours, Seats 
Outputs:  Circulation, InterLibrary Circulation, Downloads 
Paper:  De Carvalho et al. (2012) 
Sample:  37 University Libraries in Rio de Janeiro, Brazil 
Inputs:  Employees, Collection, Area 
Outputs:  Registrations, Visits, Circulation, Consultations 
Paper:  Shahwan and Kaba (2013) 
Sample:  11 Academic Libraries in the Arab States of the Gulf 
Inputs:  Total Expenditures, Employees, Collection 
Outputs:  Registrations, Circulation, Collection Additions 
Paper:  Stroobants and Bouckaert (2014) 
Sample:  13 Local Public Libraries in Flanders, Belgium 
Inputs:  Total Expenditures / Operating Expenditures, Employees 
Outputs:  Circulation / Circulation, Opening Hours 
Paper:  Srakar et al. (2017) 
Sample:  58 Public General Libraries in Slovenia 
Inputs:  Total Expenditures, Employees, Area, Ratio of Service Points to Potential Users 
Outputs:  Registrations, Visits / Circulation / Equipment / Events, Events Attendance 
Paper:  Guccio et al. (2018) 
Sample:  44 Public State Libraries in Italy 
Inputs:  NonPersonnel Expenditures, Employees, Shelf Size, Seats. 
Inter.:  Book, Manuscript, Periodical and Other Collections, Assets Value. 
Outputs:  Visits, Circulation, InterLibrary Circulation, Consultations 
Paper:  Vitaliano (1998) 

Sample:  184 Public Libraries in New York, United States 
Inputs:  Collection, Collection Additions, Serial Subscriptions, Opening Hours 
Outputs:  Circulation, Consultations 
Paper:  Hammond (2002) 
Sample:  99 Public Libraries in the United Kingdom 
Inputs:  Collection, Collection Additions, Serial Subscriptions, Opening Hours 
Outputs:  Circulation, Consultations, Requests 
Paper:  Reichmann (2004) 
Sample:  118 University Libraries in EnglishSpeaking and GermanSpeaking Countries 
Inputs:  Employees, Collection 
Outputs:  Circulation, Opening Hours, Collection Additions, Serial Subscriptions 
Paper:  De Witte and Geys (2011) 
Sample:  290 Municipal Public Libraries in Flanders, Belgium 
Inputs:  Operating Expenditures, Personnel Expenditures, Infrastructure Expenditures 
Inter.:  Youth Book Collection, Book Collection, Media Collection, Opening Hours 
Paper:  Vrabková and Friedrich (2019) 
Sample:  92 Public Libraries in the Czech Republic and Slovakia 
Inputs:  Employees, Collection, Collection Additions, Events, Opening Hours 
Outputs:  Visits 
2 Literature Review
2.1 Data Envelopment Analysis
Data envelopment analysis (DEA) is a nonparametric method for the estimation of the production frontier (or, more precisely, the bestpractice frontier) introduced by
Charnes et al. (1978). It measures technical efficiency of a decision making unit (DMU) relatively to other units in the sample. The units that form the frontier are classified as efficient while the units not on the frontier are considered as inefficient. Inefficient units are further assigned efficiency score measuring their shortcomings. The efficiency classification as well as the efficiency score is determined based on how efficiently can a unit transform a set of inputs to a set of outputs. The original model of
Charnes et al. (1978) denoted as the CCR model utilizes the constant returns to scale (CRS), i.e. it is assumed that an increase in inputs results in a proportionate increase in outputs. Variable returns to scale (VRS) relax this assumption and are utilized in the model of Banker et al. (1984) denoted as the BCC model. Many more models are proposed in the literature addressing various issues in DEA. A particulary convenient and elegant model is the Chebyshev distance model of Hladík (2019). It is based on the robust optimization viewpoint and has many attractive properties such as the superefficiency, i.e. ability to assign scores to efficient units, and natural normalization, i.e. comparability of efficiency scores across different analyzes. For a survey of DEA theory, see Cook and Seiford (2009).DEA is a very popular benchmarking tool in operations research and has a wide range of applications including but not limited to banking (Fukuyama and Matousek, 2017), business (Shabani et al., 2019), agriculture (Atici and Podinovski, 2015), transportation (Wu et al., 2016), health care (Ozcan and Khushalani, 2017), education (Jablonsky, 2016), research (Holý and Šafr, 2018) and sport (Jablonsky, 2018). For a survey of DEA applications, see Liu et al. (2013).
Procedures for the practical use of DEA with its pitfalls are presented in Golany and Roll (1989), Boussofiane et al. (1991), Dyson et al. (2001) and Cook et al. (2014). One particular issue many studies face is heterogeneous operating environment. For DEA to make sense, however, the operating environment should be homogeneous. There exist several approaches for dealing with heterogeneous operating environment in the literature. For a review of such methods, see Yang and Pollitt (2009) and De Witte and Marques (2010). We briefly describe the four most commonly used methods for DEA. The separation approach splits the heterogeneous data sample into several homogeneous subsamples according to one or more environmental variables and performs DEA separately for each subsample. The advantage of this approach is its simplicity and straightforward interpretability. However, it significantly reduces the sample size making it unusable in many studies. The allinone model directly includes environmental variables in DEA as inputs or outputs. The twostage model adjusts the efficiency scores based on the dependence between preliminary efficiency scores and environmental variables using regression analysis. The multistage model regress input slacks on environmental variables, adjusts inputs and finally performs DEA with adjusted inputs. The latter three models are more sofisticated and do not reduce sample size but are more cumbersome to interpret.
Whether theoretical, applicational or practical, the literature dealing with DEA is very extensive and still growing. Emrouznejad and Yang (2018) report a listing of scientific articles related to DEA from the seminal paper of Charnes et al. (1978) to 2016. Liu et al. (2016) identify the research activities (or the research fronts) in DEA from 2000 to 2014.
2.2 Efficiency of Libraries
One of the possible uses of DEA is assesing the efficiency of public or university libraries in a given area at a given time. We review 16 papers dealing with efficiency of libraries. The overview of papers is presented in tables 1 and 2. Most studies utilize the classical CCR or BCC DEA models although some studies adopt free disposal hull (FDH) approach. Simon et al. (2011) and Guccio et al. (2018) consider intermediate outputs and adopt network DEA with two steps. De Witte and Geys (2011) focus only on the first step that produces intermediate outputs. We compare all 16 studies based on the sample size, selection of the inputs and outputs and treatment of the operating environment.
The sample size of the reviewed studies ranges from 11 to 290. Five papers, namely Vitaliano (1998), Hammond (2002), Reichmann (2004), De Witte and Geys (2011) and Vrabková and Friedrich (2019), have medium sample size ranging from 92 to 290 while the rest have small sample size ranging from 11 to 68.
The reviewed studies utilize up to 5 inputs and up to 4 outputs. The most common inputs are the number of employees or personnel expenditures (87.50% of studies), book or other collections (62.50% of studies), variables related to expenditures (56.25% of studies) and the area of library (37.50% of studies). The most common outputs are the circulation or the number of loans (93.33% of studies), the number of visits (40.00% of studies), the number of consultations (40.00% of studies) and the number of registrations (33.33% of studies). The number of additions to collection, the opening hours and the number of serial subscriptions appear less often in the literature and in some studies are considered as inputs while in others as outputs or intermediate outputs.
Some of the studies consider operating environment to a certain degree. Sharma et al. (1999), Reichmann (2004), Chen et al. (2005), Miidla and Kikas (2009), Reichmann and SommersguterReichmann (2010), Stroobants and Bouckaert (2014) and Vrabková and Friedrich (2019) analyze behavior of libraries in several predefined groups and compare their efficiency scores. Srakar et al. (2017) follow a similar approach but cluster libraries according to their efficiency and size with additional spatial constraints. Vitaliano (1998) uses the tobit regression to model efficiencies and find that they are positively dependent on population, negatively on wages of the directors and positively on town or village associations. Hammond (2002) includes population density and accessibility measures in the DEA model as nondiscretionary inputs. De Witte and Geys (2011) employ the conditional efficiency model and find that the efficiency increases with leftwing ideological stance of the local government, wealth of the population, population density and local funding.
3 Methodology
3.1 Chebyshev Distance Data Envelopment Analysis
To obtain technical efficiencies, we utilize the Chebyshev distance DEA with variable returns to scale (VRS) proposed by Hladík (2019). Let be the nonnegative matrix of inputs and be the nonnegative matrix of outputs. We denote and
the vectors corresponding to the
th row. We also denote and the matrices with th row missing, i.e. the inputs and outputs of every DMU but .As in classical DEA models, the problem of measuring efficiency of a DMU is formulated as finding the optimal weights of input and output variables with respect to the other DMUs. Note that each DMU has its own optimization problem. The idea of the Chebyshev distance DEA is to rank DMUs based on robustness of efficiency or inefficiency classification to variations of input and output data using the Chebyshev distance. Specifically, the resulting efficiency score for th DMU is equal to , where is the optimal solution to the optimization problem
δ_i  (1)  
such that  
where are the weights of inputs, are the weights of outputs and is the auxiliary variable used for ensuring VRS. The above formulation is a nonlinear optimization problem which Hladík (2019) further propose to linearize. Let us reparametrize the weights and the VRS variable as
(2) 
The linear approximation of (1) is then given by
δ_i  (3)  
such that  
Hladík (2019) shows in several examples that the linear approximation (3) is quite precise and can be effectively utilized in practice.
The efficiency scores , lie in the interval whether given by the original nonlinear optimization problem (1) or its linear approximation (3). Values indicate inefficient DMUs while values indicate efficient DMUs. The Chebyshev distance DEA further possesses the following properties:

Robust Interpretation: The efficiency scores of the Chebyshev distance DEA indicate how DMUs are sensitive to changes in their inputs and outputs. Specifically, the efficiency scores for inefficient DMUs are the smallest possible variations of all inputs and outputs causing efficiency in terms of the Chebyshev distance while the efficiency scores for efficient DMUs are the largest possible variations of all inputs and outputs preserving efficiency.

SuperEfficiency: As noted above, the Chebyshev distance DEA ranks inefficient as well as efficient DMUs. In contrast, the basic formulation of the classical DEA allows only for ranking inefficient DMUs.

Normalization: The efficiency scores of the Chebyshev distance DEA are naturally normalized due to their robust interpretation. Therefore, the efficiency scores can be compared across different analyzes.

NonNegativeness: Unlike classical DEA, the Chebyshev distance DEA allows for zero inputs and zero outputs as well.

Units Invariance: Similarly to the classical DEA, the inputs and outputs can be arbitrarily scaled without affecting the efficiency scores of the Chebyshev distance DEA model. Therefore, it does not matter in which units are the inputs and outputs measured.

Ranking Order: The classification to efficient and inefficient DMUs as well as the order of inefficient DMUs according to their efficiency score is exactly the same in the Chebyshev distance DEA model as in the classical CCR model (or the BBC model when assuming VRS). The values of the efficiency scores, however, differ.
3.2 Analysis of Efficiency Scores in the Second Stage
We utilize the linear regression for modeling efficiency scores in the same way as
Holý and Šafr (2018). Let be the number of regressors and the design matrix with the values of the regressors. We further denote the vector corresponding to the th row of . As efficiency scores of the Chebyshev distance DEA are bounded from bellow by 0 and from above by 2, we resort to the regression model with the logistic function(4) 
where and are the unknown parameters. Next, we use the transformation and arrive at the linear regression model
(5) 
Note that we assume that are independent. This is clearly not the case as there is inherent dependency between the efficiency scores obtained by DEA. Serial correlation affects mainly the inference while the estimate of coefficients remains unbiased and consistent. As studied by Simar and Wilson (2007), the dependency structure is complex and unknown but disapperars asymptotically. Our data sample is quite large and we therefore resort to the independece simplification as most studies.
We further analyze efficiency scores using the decision tree approach. To build the decision tree, we adopt the RPART routine of Therneau and Atkinson (2019). Again, we analyze dependency of the efficiency scores on the regressors , .
4 Empirical Study
4.1 Data Sample
We analyze efficiency of public libraries established by Czech municipalities during the year 2017. In total, there are public libraries in 2017. Of these, are established by municipalities excluding Prague and by municipal and administrative districts of Prague. The remaining libraries include the National Library of the Czech Republic, the Moravian Library in Brno, the 13 regional libraries, libraries established by districts, etc. We focus only on the municipal libraries outside the capital. In our data, 2.71 % libraries have some observations missing. We remove these libraries from the analysis. Our data sample therefore consists of municipal libraries with no missing data. We have data available for the years 2016 and 2017. The two year history allows us to utilize aggregated values and first differences in the analysis.
For indepth statistics about public libraries in the Czech Republic, we refer to the National Information and Consulting Centre for Culture (NIPOS).
4.2 Variable Selection
In our study, we utilize 10 variables in total. Descriptive statistics of the variables are reported in Table
3. The correlation matrix is illustrated in Figure 1. All variables except the town distance are strongly positively correlated while the town distance is moderately negatively correlated with the others. For the efficiency analysis, we consider the following input variables:
Total Expenditures: The total expenditures in CZK by the municipality to library activities (class 3314 in the sectoral classification of budget structure) in 2016 and 2017. We aggregate the expenditures to two years to capture longterm investments and smooth out annual budget changes. The data source is information portal MONITOR of the Ministry of Finance of the Czech Republic.

Employees: The number of fulltime equivalents of library employees in 2017. Note that 64.07% of libraries have no own employees as very small libraries are run either by employees of the municipal office or volunteers. The data source is NIPOS.

Collection: The total number of book units owned by the library in 2016. This variable represents the capital of the library. We use the value from the previous year as we consider the increase in book collection in the current year to be output variable reflecting the performance of the library management. The data source is NIPOS.
We denote the input variables respectively as , and , . Further inputs such as the area of the library, the equipment, more detailed expenditures or more detailed collection could also be utilized. Unfortunately, we do not have these variables available in our data.
We consider the following output variables:

Registrations: The total number of users registered in the library in 2017. This variable captures the size of the reader base. The data source is NIPOS.

Circulation: The total number of book loans in 2017. This variable captures the main activity of libraries – book lending. The data source is NIPOS.

Events Attendance: The total number of visitors of events organized by the library in 2017. This variable captures the cultural role of libraries. Many libraries do not organize any events while others offer regular cultural program. The data source is NIPOS.

Collection Additions: The positive part of difference between the book collection in 2017 and 2016. This variable captures the increase of the capital of libraries. According to Table 3, the book collection of 50.56% libraries remains the same as in 2016 or in some cases even decreases. The data source is NIPOS.
We denote the output variables respectively as , , and , . Further outputs such as the number of visits, the number of consultations, the opening hours, the interlibrary circulation or various measures of the internet activity could also be utilized. However, we do not have these variables available in our data.
Finally, we consider the folowing 3 variables potentially describing the environment in which libraries operate:

Population: The number of inhabitants of the municipality as of January 1, 2018. The data source is the Czech Statistical Office (CSO). We denote this variable as , .

Population Density: The number of inhabitants of the municipality per hectare as of January 1, 2018. The data source is CSO. We denote this variable as , .

Town Distance: The travel time by car in minutes to the municipality with extended powers^{2}^{2}2We have also considered different specifications of distance and reference town. Instead of the travel time, we have tried the air distance and road distance. Instead of the municipality with extended powers, we have tried the district capital (LAU 1 – okresní město), regional capital (NUTS 3 – krajské město), town with population higher than and city with general significance. All combinations of distances and reference towns have lead to weaker results.. The data source is web mapping service Mapy.cz. We denote this variable as , .
Not.  Variable  Min.  Max.  Mean  Std. Dev.  Zeros 

Total Expenditures  
Employees  
Collection  
Registrations  
Circulation  
Event Attendance  
Collection Additions  
Population  
Population Density  
Town Distance 
4.3 Preliminary Efficiency Analysis
First, we apply the presented Chebyshev distance DEA with selected inputs and outputs to the full dataset of libraries. We denote this as preliminary efficiency analysis. Note that we consider VRS as there are huge differences in sizes of libraries and we do not assume proportional changes in inputs and outputs. Returns to scale can then be either increasing, decreasing or even constant. The estimated density function of preliminary efficiency scores is illustrated in Figure 2. For the estimation of the density, we utilize the Gaussian kernel. As expected for such large dataset, most libraries are inefficient with very low efficiency score. Specifically, 98.45% of all units are inefficient with mean score 0.1916 and median score 0.0999.
In the next steps, we improve this preliminary approach and focus on two issues – the operational environment and the discriminatory power. We investigate whether our sample of units is homogeneous (i.e. all libraries operate within the same environment) or heterogeneous (i.e. libraries operate under different conditions). Based on our findings, we divide the full sample into several smaller categories according to the environmental influences. This not only ensures homogeneity but also reduces the overly strict discriminatory power.
4.4 Dependence on Explanatory Variables
We study the influence of the population , the population density and the town distance on the transformed preliminary efficiency score of the unit using the linear regression. We arrive at the model formulation^{3}^{3}3Before arriving at this final model, we have tried several specifications of the regression model including all variables , and with logarithmic and power transformations as well as various interactions.
(6) 
where , , , and are the parameters. Results of the regression model are reported in Table 4
. For the preliminary efficiency scores, all regressors are statistically significant at any reasonable confidence level. The model explains 22.90% variance in the dependent variable.
The above regression model has the following interpretation. The efficiency score increases with population as the coefficient is positive. For very small population, however, the efficiency score also increases as the coefficient is also positive. Finally, the efficiency score increases with decreasing town distance as the coefficient is negative. This relation is more distinctive for smaller population as the town distance is divided by the population . We do not include the population density in the final model as it is not significant in any transformation.
The regression model describes the relationship between the efficiency score and possible environmental factors. However, it does not tell us whether the population and town distance cause change in the efficiency and can be considered as environmental factors.
Model  Coeff.  Regressor  Estimate  Std. Error  tStatistic  pValue 

Preliminary  Intercept  24.0894  1.4866  16.2049  0.0000  
1.9496  0.1082  18.0230  0.0000  
54.3415  5.1511  10.5495  0.0000  
2.7975  0.7090  3.9456  0.0001  
Decision Tree  Intercept  15.6972  2.2017  7.1295  0.0000  
1.3447  0.1602  8.3933  0.0000  
35.5161  7.6292  4.6553  0.0000  
0.2859  1.0501  0.2723  0.7854  
Expert  Intercept  21.8288  2.2922  9.5233  0.0000  
1.8568  0.1668  11.1323  0.0000  
53.6474  7.9426  6.7544  0.0000  
1.0758  1.0932  0.9841  0.3251 
4.5 Efficiency Analysis with Decision Tree Categories
The regression model indicates dependency of the efficiency score on the population and town distance. We further support this claim by the decision tree analysis. Other motivation for the use of the decision tree is separation of the data sample to several subsamples. As our goal is to use subsamples for separate efficiency analysis, we want them to have rougly the same number of units. Unfortunately, this is not guaranteed by the decision tree and we must therefore control the building of the tree by restricting the minimum number of units in a category. We find that in our case, the minimum of units leads to the most interpretable results. Another tuning parameter is the number of categories or the depth of the tree. We find that 11 categories with depth 7 is an adequate choice.
The categories of libraries given by the decision tree together with mean values of preliminary efficiency scores are reported in Table 5. We denote the categories as D01–D11. The decision tree divides the units into small with population lower than (categories D01–D05), medium with population between and (categories D06–D09) and large with population higher than (categories D10 and D11). Small units are further divided according to the town distance, medium according to the population and large to municipalities with extended powers (category D11) and other towns (category D10). As in the regression model, the town distance is more important for the smaller units. However, the mean efficiency scores suggest that the relation might be more complex – likely due to dependece between population and town distance. Decision tree also finds that it is significant whether the town distance is zero (and the unit is therefore the reference town) or positive as it puts all municipalities with extended powers into the category D11. The building of the decision tree is illustrated in Figure 3.
Next, we calculate efficiency scores separately for each category given by the decision tree. The mean scores are reported in Table 5. The discriminatory power of this efficiency analysis is more reasonable as 92.30% of all units are inefficient with mean score 0.4371 and median score 0.3070. The shape of the score density function is relatively mild as illustrated in Figure 2. Note that the preliminary scores have different interpretation than the decision tree scores as they use different samples. For example the fact that the mean decision tree score of D05 is higher than the mean score of D04 does not imply that D05 is more efficient. On the contrary, preliminary scores show that D04 is on average more efficienct. Only with the removal of D04 units and others from the efficiency analysis of D05, the D05 units become more efficient on average.
As for the preliminary scores, we use the regression model for the decision tree scores. Note that we can compare efficiency scores in different categories thanks to the normalization property of the Chebyshev distance DEA. Table 4 shows that town distance is no longer significant for the new scores. This suggests that the influence of the town distance is eliminated by the decision tree categories and the town distance is indeed an environmental factor. Our adjustment for the town distance in categories therefore leads to more fair comparison of libraries. The effect of the population, however, remains significant althought it is a bit lower as the model explains only 6.00% in the efficiency scores variance. It is also evident from Table 5 that more units have higher efficiency score for categories with higher population. This suggests that the population have some partial environmental influence but we cannot attribute unilateral causal influence to it. Libraries in towns with larger population are simply far more efficient on average even if we treat smaller towns separately.
This is an important result advocating our separation approach. Unlike the allinone model, twostage and multistage models, we do not consider exogenous variables to fully affect the operating environment. We use them to measure similarity between DMUs and then retain only similar DMUs in the data sample. Our approach therefore diminishes the environmental influence of dissimilar DMUs while keeping the unaltered influence of similar DMUs.
Cat.  Population  Distance  Units  Preliminary  Dec. Tree  Expert 

D01  373  0.1147  0.4893  0.3163  
D02  408  0.1413  0.3685  0.3630  
D03  867  0.1077  0.2869  0.3047  
D04  481  0.1451  0.2916  0.3924  
D05  367  0.0923  0.4609  0.2866  
D06  165  0.1865  0.5353  0.4527  
D07  871  0.1519  0.3332  0.4265  
D08  380  0.2048  0.6471  0.6318  
D09  206  0.2999  0.7081  0.6497  
D10  404  0.4514  0.6130  0.7558  
D11  138  0.8012  0.9256  0.9256  
All  0.1916  0.4371  0.4458 
4.6 Efficiency Analysis with Expert Categories
The categorization by the decision tree is purely datadriven approach with its benefits and limitations. For example, it is a well known fact that decision trees are quite sensitive to changes in data and have tendency to overfit. We compare the categories given by the decision tree with categories selected by an expert. The expert categories can be useful in several ways. From the statistical point of view, their simpler rules can prevent sensitivity to data changes and offer more robust approach. From the applicability point of view, they can be used in variety of applications and time frames in contrast with our decision tree specifically designed for the efficiency analysis of public libraries in 2017. From the managerial point of view, it might be easier to convince management of the decision making units that expert categories with "nicer looking" rules are more fair. Nevertheless, the datadriven categories offer valuable insight and should serve as the benchmark.
Our expert categories with their rules are described in Table 6. We keep the number of categories at 11 and denote them E01E11. We divide units into 5 population levels and 2 distance levels forming 10 categories based on very simple rules with roughly the same size. We keep municipalities with extended powers in the separate category E11 identically to the decision tree category D11.
We follow the same procedure as for the efficiency analysis based on the decision tree. Efficiency scores within expert categories are reported in Table 6. The discriminatory power is quite similar to the decision tree efficiency analysis as 92.04% of all units are inefficient with mean score 0.4458 and median score 0.3164. Furthermore, the kernel density functions of the scores are almost identical for the two categorizations as illustrated in Figure 2.
Finally, we fit the regression model and arrive at the same conclusion – the population remain significant while the distance is not significant. The model explains 8.34% in the variance of the efficiency scores which is slightly higher number than in the decision tree model. This means that the decision tree model captures environmental effects better but the two models are quite comparable.
Cat.  Population  Distance  Units  Preliminary  Dec. Tree  Expert 

E01  270  0.1364  0.4274  0.4181  
E02  376  0.1094  0.3369  0.3275  
E03  785  0.1155  0.3513  0.3089  
E04  682  0.1144  0.3135  0.3001  
E05  741  0.1461  0.3833  0.3346  
E06  474  0.1543  0.3882  0.4641  
E07  463  0.2129  0.5681  0.5887  
E08  249  0.2032  0.5695  0.6862  
E09  281  0.4312  0.6546  0.6350  
E10  201  0.4185  0.5999  0.8787  
E11  138  0.8012  0.9256  0.9256  
All  0.1916  0.4371  0.4458 
4.7 Comparison of Efficiency Scores
The preliminary efficiency analysis does not account for heterogeneous environment and we therefore do not recommend to use its efficiency scores to rank libraries. Efficiency analysis with either decision tree categories or expert categories considers environmental effects of population with town distance and is suitable to rank libraries. The categories given by the decision tree better remove the influence of the operating environment. Both approaches are, however, rather similar as the correlation coefficient between their efficiency scores is 0.8405. Preliminary efficiency scores are more different as their correlation coefficient is 0.7523 for decision tree scores and 0.7609 for expert scores.
5 Conclusion
We assess technical effiencies of public libraries established by municipalities in the Czech Republic in the year 2017. In the first stage, we adopt the Chebyshev distance DEA and utilize its many attractive properties including the superefficiency and natural normalization. We consider total expenditures, employees and book collection as inputs with registrations, book circulation, event attendance and collection additions as outputs. In the second stage, we perform the regression analysis and find that the efficiency scores are significantly dependent on the population of the municipality and distance to the municipality with extended powers. To remove the influence of the operating environment, we employ DEA for libraries separated into categories given by the decision tree analysis. Interestingly, the effect of population is not completely removed suggesting it is partially environmental variable and partially explanatory variable. We also consider categories designed by an expert and find that the proposed separation approach is robust to the specification of categories to a certain degree. The proposed methodology can be used in similar applications when the data sample is large and the operating environment exhibits heterogeneity.
Acknowledgements
The author would like to thank Jan Kubát for his help with data preparation and Bojka Hamerníková, Vladimír Beneš and Marek Jetmar for their comments.
Funding
The work on this paper was supported by the Technology Agency of the Czech Republic under Grant TL01000463 in the Eta program.
References
 Atici and Podinovski (2015) Atici, K. B., Podinovski, V. V. 2015. Using Data Envelopment Analysis for the Assessment of Technical Efficiency of Units with Different Specialisations: An Application to Agriculture. Omega. Volume 54. Pages 72–83. ISSN 03050483. {https://doi.org/10.1016/j.omega.2015.01.015}.
 Banker et al. (1984) Banker, R. D., Charnes, A., Cooper, W. W. 1984. Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis. Management Science. Volume 30. Issue 9. Pages 1078–1092. ISSN 00251909. {https://doi.org/10.1287/mnsc.30.9.1078}.
 Boussofiane et al. (1991) Boussofiane, A., Dyson, R. G., Thanassoulis, E. 1991. Applied Data Envelopment Analysis. European Journal of Operational Research. Volume 52. Issue 1. Pages 1–15. ISSN 03772217. {https://doi.org/10.1016/03772217(91)90331O}.
 Charnes et al. (1978) Charnes, A., Cooper, W. W., Rhodes, E. 1978. Measuring the Efficiency of Decision Making Units. European Journal of Operational Research. Volume 2. Issue 6. Pages 429–444. ISSN 03772217. {https://doi.org/10.1016/03772217(78)901388}.
 Chen (1997) Chen, T.Y. 1997. A Measurement of the Resource Utilization Efficiency of University Libraries. International Journal of Production Economics. Volume 53. Issue 1. Pages 71–80. ISSN 09255273. {https://doi.org/10.1016/S09255273(97)001023}.
 Chen et al. (2005) Chen, Y., Morita, H., Zhu, J. 2005. ContextDependent DEA with an Application to Tokyo Public Libraries. International Journal of Information Technology & Decision Making. Volume 4. Issue 3. Pages 385–394. ISSN 02196220. {https://doi.org/10.1142/s0219622005001635}.
 Cook and Seiford (2009) Cook, W. D., Seiford, L. M. 2009. Data Envelopment Analysis (DEA)  Thirty Years On. European Journal of Operational Research. Volume 192. Issue 1. Pages 1–17. ISSN 03772217. {https://doi.org/10.1016/j.ejor.2008.01.032}.
 Cook et al. (2014) Cook, W. D., Tone, K., Zhu, J. 2014. Data Envelopment Analysis: Prior to Choosing a Model. Omega. Volume 44. Pages 1–4. ISSN 03050483. {https://doi.org/10.1016/j.omega.2013.09.004}.
 De Carvalho et al. (2012) De Carvalho, F. A., Jorge, M. J., Jorge, M. F., Russo, M., De Sa, N. O. 2012. Library Performance Management in Rio de Janeiro, Brazil: Applying DEA to a Sample of University Libraries in 20062007. Library Management. Volume 33. Issue 45. Pages 297–306. ISSN 01435124. {https://doi.org/10.1108/01435121211242335}.
 De Witte and Geys (2011) De Witte, K., Geys, B. 2011. Evaluating Efficient Public Good Provision: Theory and Evidence from a Generalised Conditional Efficiency Model for Public Libraries. Journal of Urban Economics. Volume 69. Issue 3. Pages 319–327. ISSN 00941190. {https://doi.org/10.1016/j.jue.2010.12.002}.

De Witte and Marques (2010)
De Witte, K., Marques, R. C.
2010.
Incorporating Heterogeneity in NonParametric Models : A Methodological Comparison.
International Journal of Operational Research. Volume 9. Issue 2. Pages 188–204. ISSN 17457645. {https://doi.org/10.1504/ijor.2010.035044}.  Dyson et al. (2001) Dyson, R. G., Allen, R., Camanho, A. S., Podinovski, V. V., Sarrico, C. S., Shale, E. A. 2001. Pitfalls and Protocols in DEA. European Journal of Operational Research. Volume 132. Issue 2. Pages 245–259. ISSN 03772217. {https://doi.org/10.1016/S03772217(00)001491}.
 Emrouznejad and Yang (2018) Emrouznejad, A., Yang, G.L. 2018. A Survey and Analysis of the First 40 Years of Scholarly Literature in DEA: 1978–2016. SocioEconomic Planning Sciences. Volume 61. Pages 4–8. ISSN 00380121. {https://doi.org/10.1016/j.seps.2017.01.008}.
 Fukuyama and Matousek (2017) Fukuyama, H., Matousek, R. 2017. Modelling Bank Performance: A Network DEA Approach. European Journal of Operational Research. Volume 259. Issue 2. Pages 721–732. ISSN 03772217. {https://doi.org/10.1016/j.ejor.2016.10.044}.
 Golany and Roll (1989) Golany, B., Roll, Y. 1989. An Application Procedure for DEA. Omega. Volume 17. Issue 3. Pages 237–250. ISSN 03050483. {https://doi.org/10.1016/03050483(89)900297}.
 Guccio et al. (2018) Guccio, C., Mignosa, A., Rizzo, I. 2018. Are Public State Libraries Efficient? An Empirical Assessment Using Network Data Envelopment Analysis. SocioEconomic Planning Sciences. Volume 64. Pages 78–91. ISSN 00380121. {https://doi.org/10.1016/j.seps.2018.01.001}.
 Hammond (2002) Hammond, C. J. 2002. Efficiency in the Provision of Public Services: A Data Envelopment Analysis of UK Public Library Systems. Applied Economics. Volume 34. Issue 5. Pages 649–657. ISSN 00036846. {https://doi.org/10.1080/00036840110053252}.
 Hladík (2019) Hladík, M. 2019. Universal Efficiency Scores in Data Envelopment Analysis Based on a Robust Approach. Expert Systems with Applications. Volume 122. Pages 242–252. ISSN 09574174. {https://doi.org/j.eswa.2019.01.019}.
 Holý and Šafr (2018) Holý, V., Šafr, K. 2018. Are Economically Advanced Countries More Efficient in Basic and Applied Research? Central European Journal of Operations Research. Volume 26. Issue 4. Pages 933–950. ISSN 16139178. {https://doi.org/10.1007/s1010001805592}.
 Jablonsky (2016) Jablonsky, J. 2016. Efficiency Analysis in MultiPeriod Systems: An Application to Performance Evaluation in Czech Higher Education. Central European Journal of Operations Research. Volume 24. Issue 2. Pages 283–296. ISSN 1435246X. {https://doi.org/10.1007/s101000150401z}.
 Jablonsky (2018) Jablonsky, J. 2018. Ranking of Countries in Sporting Events Using TwoStage Data Envelopment Analysis Models: A Case of Summer Olympic Games 2016. Central European Journal of Operations Research. Volume 26. Issue 4. Pages 951–966. ISSN 1435246X. {https://doi.org/10.1007/s1010001805378}.
 Liu et al. (2013) Liu, J. S., Lu, L. Y. Y., Lu, W.M., Lin, B. J. Y. 2013. A Survey of DEA Applications. Omega. Volume 41. Issue 5. Pages 893–902. ISSN 03050483. {https://doi.org/10.1016/j.omega.2012.11.004}.
 Liu et al. (2016) Liu, J. S., Lu, L. Y. Y., Lu, W.M. 2016. Research Fronts in Data Envelopment Analysis. Omega. Volume 58. Pages 33–45. ISSN 03050483. {https://doi.org/10.1016/j.omega.2015.04.004}.
 Miidla and Kikas (2009) Miidla, P., Kikas, K. 2009. The Efficiency of Estonian Central Public Libraries. Performance Measurement and Metrics. Volume 10. Issue 1. Pages 49–58. ISSN 14678047. {https://doi.org/10.1108/14678040910949684}.
 Ozcan and Khushalani (2017) Ozcan, Y. A., Khushalani, J. 2017. Assessing Efficiency of Public Health and Medical Care Provision in OECD Countries After a Decade of Reform. Central European Journal of Operations Research. Volume 25. Issue 2. Pages 325–343. ISSN 1435246X. {https://doi.org/10.1007/s1010001604400}.
 Reichmann (2004) Reichmann, G. 2004. Measuring University Library Efficiency Using Data Envelopment Analysis. Libri. Volume 54. Issue 2. Pages 136–146. ISSN 00242667. {https://doi.org/10.1515/libr.2004.136}.
 Reichmann and SommersguterReichmann (2010) Reichmann, G., SommersguterReichmann, M. 2010. Efficiency Measures and Productivity Indexes in the Context of University Library Benchmarking. Applied Economics. Volume 42. Issue 3. Pages 311–323. ISSN 00036846. {https://doi.org/10.1080/00036840701604511}.
 Shabani et al. (2019) Shabani, A., Visani, F., Barbieri, P., Dullaert, W., Vigo, D. 2019. Reliable Estimation of Suppliers’ Total Cost of Ownership: An Imprecise Data Envelopment Analysis Model with Common Weights. Omega. Volume 87. Pages 57–70. ISSN 03050483. {https://doi.org/10.1016/j.omega.2018.08.002}.
 Shahwan and Kaba (2013) Shahwan, T. M., Kaba, A. 2013. Efficiency Analysis of GCC Academic Libraries: An Application of Data Envelopment Analysis. Performance Measurement and Metrics. Volume 14. Issue 3. Pages 197–210. ISSN 14678047. {https://doi.org/10.1108/pmm0720130023}.
 Sharma et al. (1999) Sharma, K. R., Leung, P.S., Zane, L. 1999. Performance Measurement of Hawaii State Public Libraries: An Application of Data Envelopment Analysis (DEA). Agricultural and Resource Economics Review. Volume 28. Issue 2. Pages 190–198. ISSN 23722614. {https://doi.org/10.1017/s1068280500008182}.
 Simar and Wilson (2007) Simar, L., Wilson, P. W. 2007. Estimation and Inference in TwoStage, SemiParametric Models of Production Processes. Journal of Econometrics. Volume 136. Issue 1. Pages 31–64. ISSN 03044076. {https://doi.org/10.1016/j.jeconom.2005.07.009}.
 Simon et al. (2011) Simon, J., Simon, C., Arias, A. 2011. Changes in Productivity of Spanish University Libraries. Omega. Volume 39. Issue 5. Pages 578–588. ISSN 03050483. {https://doi.org/10.1016/j.omega.2010.12.003}.
 Srakar et al. (2017) Srakar, A., KodričDačić, E., Koman, K., Kavaš, D. 2017. Efficiency of Slovenian Public General Libraries: A Data Envelopment Analysis Approach. Lex Localis. Volume 15. Issue 3. Pages 559–581. ISSN 15815374. {https://doi.org/10.4335/15.3.559581(2017)}.
 Stroobants and Bouckaert (2014) Stroobants, J., Bouckaert, G. 2014. Benchmarking Local Public Libraries Using NonParametric Frontier Methods: A Case Study of Flanders. Library & Information Science Research. Volume 36. Issue 34. Pages 211–224. ISSN 07408188. {https://doi.org/10.1016/j.lisr.2014.06.002}.
 Therneau and Atkinson (2019) Therneau, T. M., Atkinson, E. J. 2019. An Introduction to Recursive Partitioning Using the RPART Routines. Technical Report. {https://cran.rproject.org/web/packages/rpart/vignettes/longintro.pdf}.
 Vitaliano (1998) Vitaliano, D. F. 1998. Assessing Public Library Efficiency Using Data Envelopment Analysis. Annals of Public and Cooperative Economics. Volume 69. Issue 1. Pages 107–122. ISSN 13704788. {https://doi.org/10.1111/14678292.00075}.
 Vrabková and Friedrich (2019) Vrabková, I., Friedrich, V. 2019. The Productivity of Main Services of City Libraries: Using the Example from the Czech Republic and the Slovak Republic. Library & Information Science Research. Volume 41. Issue 3. Pages 100962/1–100962/11. ISSN 07408188. {https://doi.org/10.1016/j.lisr.2019.100962}.
 Wu et al. (2016) Wu, J., Zhu, Q., Chu, J., Liu, H., Liang, L. 2016. Measuring Energy and Environmental Efficiency of Transportation Systems in China Based on a Parallel DEA Approach. Transportation Research, Part D: Transport and Environment. Volume 48. Pages 460–472. ISSN 13619209. {https://doi.org/10.1016/j.trd.2015.08.001}.
 Yang and Pollitt (2009) Yang, H., Pollitt, M. 2009. Incorporating Both Undesirable Outputs and Uncontrollable Variables into DEA: The Performance of Chinese CoalFired Power Plants. European Journal of Operational Research. Volume 197. Issue 3. Pages 1095–1105. ISSN 03772217. {https://doi.org/10.1016/j.ejor.2007.12.052}.
Comments
There are no comments yet.