The Impact of Operating Environment on Efficiency of Public Libraries

09/18/2019 ∙ by Vladimír Holý, et al. ∙ University of Economics, Prague (Vysoká škola ekonomická v Praze) 0

Analysis of technical efficiency is an important tool in management of public libraries. We assess the efficiency of 4660 public libraries established by municipalities in the Czech Republic in the year 2017. For this purpose, we utilize the data envelopment analysis (DEA) based on the Chebyshev distance. We pay special attention to the operating environment and find that the efficiency scores significantly depend on the population of the municipality and distance to the municipality with extended powers. To remove the effect of the operating environment, we perform DEA separately for categories based on the decision tree analysis as well as categories designed by an expert.



There are no comments yet.


page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

On June 22, 1919 in the Czechoslovakia, a law was passed introducing the obligation to establish a library in each municipality. One hundred years have passed and the Czech Republic with Slovakia have one of the densest networks of public libraries in the world. There was one public library for every citizens in the Czech Republic in 2017. Such a large amount of libraries requires well-advised management and careful allocation of public resources.

In this study, we analyze technical efficiency of Czech public libraries established by municipalities. We follow the data envelopment analysis (DEA) approach pioneered by Charnes et al. (1978) and Banker et al. (1984). DEA is a non-parametric method measuring how efficiently can decision making units (DMU) transform a set of inputs to a set of outputs. We utilize the Chebyshev distance DEA model with variable returns to scale recently proposed by Hladík (2019). This model is based on the robust optimization viewpoint and has many desirable properties – super-efficiency, comparability of efficiency scores across different analyzes, inclusion of zero inputs and outputs, units invariance, order of rankings identical to the classical approach and straightforward interpretability.

There are many studies in the literature assessing technical efficiency of libraries. We select input and output variables consistently with the literature. Specifically, we consider total expenditures, employees and book collection as inputs with registrations, book circulation, events attendance and collection additions as outputs. The most similar studies in terms of inputs and outputs are Reichmann (2004), Miidla and Kikas (2009) and Shahwan and Kaba (2013). Our paper is, however, unique in the sample size – we analyze municipal libraries in total. For comparison, the average sample size is 73 in the 16 studies we review in tables 1 and 2. Such a large data sample allows us to thoroughly investigate the impact of the operating environment on performance of libraries. We consider three possible environmental variables – population of municipality, population density and distance to municipality with extended powers111The Czech Republic is divided into 8 cohesion regions (NUTS 2 – region soudržnosti), 14 regions (NUTS 3 – kraj), 77 districts (LAU 1 – okres), 206 municipalities with extended powers (obec s rozšířenou působností), 393 municipalities with authorized municipal office (obec s pověřeným obecním úřadem) and municipalities (LAU 2 – obec) as of April 1, 2019.

. Using regression analysis, we find that the efficiency score is significantly increasing with population. Extremely small villages are the exception as they tend to have higher efficiency score than villages with slightly higher population due to their very low and often zero inputs. We also find that for smaller villages the efficiency score is decreasing with distance to municipality with extended powers. Population density is insignificant in our analysis. Motivated by these results, we split the sample of libraries into 11 categories using decision tree analysis. We perform DEA separately for each category filtering out the influence of heterogeneous operating environment. This also decreases the discriminatory power of DEA which is very high in the preliminary analysis due to large sample size. The effect of distance is removed but the effect of population is not completely eliminated although it is reduced. This means that the distance can be safely treated as environmental variable while the population requires a more cautious approach as it is partially environmnetal and partially explanatory variable. Our proposed separation approach is quite suitable for this situation in contrast to the all-in-one, two-stage and multi-stage models that would take population as strictly environmental variable (see e.g. 

Yang and Pollitt, 2009; De Witte and Marques, 2010). We also perform DEA for expert-defined categories and find that the proposed separation approach is robust to specification of subsamples to a certain degree. Our study contributes to the field of two-stage efficiency analysis – one of the four active research fronts in DEA according to Liu et al. (2016).

The rest of the paper is structured as follows. In Section 2, we review the literature dealing with DEA and efficiency of libraries. In Section 3, we describe the Chebyshev distance DEA model used in the first stage and the regression model with decision tree model for analyzing efficiency scores used in the second stage. In Section 4, we compute efficiency scores of Czech public libraries in the year 2017 and investigate the impact of the operating environment. We conclude the paper in Section 5.

Paper: Chen (1997)
Sample: 23 University Libraries in Taipei, Taiwan
Inputs: Operating Expenditures, Employees, Area
Outputs: Visits, Circulation, Inter-Library Circulation, Consultations
Paper: Sharma et al. (1999)
Sample: 47 Public Libraries in Hawaii, United States.
Inputs: Operating Expenditures, Employees, Collection, Days Open
Outputs: Visits, Circulation, Consultations
Paper: Chen et al. (2005)
Sample: 23 Public Libraries in Tokyo, Japan
Inputs: Employees, Collection, Area, Population
Outputs: Registrations, Circulation
Paper: Miidla and Kikas (2009)
Sample: 20 Central Public Libraries in Estonia
Inputs: Operating Expenditures, Personnel Expenditures, Collection, Area
Outputs: Registrations, Circulation
Paper: Reichmann and Sommersguter-Reichmann (2010)
Sample: 68 University Libraries in North America, Austria and Germany
Inputs: Employees, Collection
Outputs: Circulation, Collection Additions, Serial Subscriptions
Paper: Simon et al. (2011)
Sample: 34 University Libraries in Spain
Inputs: Operating Expenditures, Employees, Area
Inter.: Collection, Serial Subscriptions, Opening Hours, Seats
Outputs: Circulation, Inter-Library Circulation, Downloads
Paper: De Carvalho et al. (2012)
Sample: 37 University Libraries in Rio de Janeiro, Brazil
Inputs: Employees, Collection, Area
Outputs: Registrations, Visits, Circulation, Consultations
Paper: Shahwan and Kaba (2013)
Sample: 11 Academic Libraries in the Arab States of the Gulf
Inputs: Total Expenditures, Employees, Collection
Outputs: Registrations, Circulation, Collection Additions
Paper: Stroobants and Bouckaert (2014)
Sample: 13 Local Public Libraries in Flanders, Belgium
Inputs: Total Expenditures / Operating Expenditures, Employees
Outputs: Circulation / Circulation, Opening Hours
Paper: Srakar et al. (2017)
Sample: 58 Public General Libraries in Slovenia
Inputs: Total Expenditures, Employees, Area, Ratio of Service Points to Potential Users
Outputs: Registrations, Visits / Circulation / Equipment / Events, Events Attendance
Paper: Guccio et al. (2018)
Sample: 44 Public State Libraries in Italy
Inputs: Non-Personnel Expenditures, Employees, Shelf Size, Seats.
Inter.: Book, Manuscript, Periodical and Other Collections, Assets Value.
Outputs: Visits, Circulation, Inter-Library Circulation, Consultations
Table 1: Overview of relevant studies with small sample size.
Paper: Vitaliano (1998)
Sample: 184 Public Libraries in New York, United States
Inputs: Collection, Collection Additions, Serial Subscriptions, Opening Hours
Outputs: Circulation, Consultations
Paper: Hammond (2002)
Sample: 99 Public Libraries in the United Kingdom
Inputs: Collection, Collection Additions, Serial Subscriptions, Opening Hours
Outputs: Circulation, Consultations, Requests
Paper: Reichmann (2004)
Sample: 118 University Libraries in English-Speaking and German-Speaking Countries
Inputs: Employees, Collection
Outputs: Circulation, Opening Hours, Collection Additions, Serial Subscriptions
Paper: De Witte and Geys (2011)
Sample: 290 Municipal Public Libraries in Flanders, Belgium
Inputs: Operating Expenditures, Personnel Expenditures, Infrastructure Expenditures
Inter.: Youth Book Collection, Book Collection, Media Collection, Opening Hours
Paper: Vrabková and Friedrich (2019)
Sample: 92 Public Libraries in the Czech Republic and Slovakia
Inputs: Employees, Collection, Collection Additions, Events, Opening Hours
Outputs: Visits
Table 2: Overview of relevant studies with medium sample size.

2 Literature Review

2.1 Data Envelopment Analysis

Data envelopment analysis (DEA) is a non-parametric method for the estimation of the production frontier (or, more precisely, the best-practice frontier) introduced by

Charnes et al. (1978)

. It measures technical efficiency of a decision making unit (DMU) relatively to other units in the sample. The units that form the frontier are classified as efficient while the units not on the frontier are considered as inefficient. Inefficient units are further assigned efficiency score measuring their shortcomings. The efficiency classification as well as the efficiency score is determined based on how efficiently can a unit transform a set of inputs to a set of outputs. The original model of

Charnes et al. (1978) denoted as the CCR model utilizes the constant returns to scale (CRS), i.e. it is assumed that an increase in inputs results in a proportionate increase in outputs. Variable returns to scale (VRS) relax this assumption and are utilized in the model of Banker et al. (1984) denoted as the BCC model. Many more models are proposed in the literature addressing various issues in DEA. A particulary convenient and elegant model is the Chebyshev distance model of Hladík (2019). It is based on the robust optimization viewpoint and has many attractive properties such as the super-efficiency, i.e. ability to assign scores to efficient units, and natural normalization, i.e. comparability of efficiency scores across different analyzes. For a survey of DEA theory, see Cook and Seiford (2009).

DEA is a very popular benchmarking tool in operations research and has a wide range of applications including but not limited to banking (Fukuyama and Matousek, 2017), business (Shabani et al., 2019), agriculture (Atici and Podinovski, 2015), transportation (Wu et al., 2016), health care (Ozcan and Khushalani, 2017), education (Jablonsky, 2016), research (Holý and Šafr, 2018) and sport (Jablonsky, 2018). For a survey of DEA applications, see Liu et al. (2013).

Procedures for the practical use of DEA with its pitfalls are presented in Golany and Roll (1989), Boussofiane et al. (1991), Dyson et al. (2001) and Cook et al. (2014). One particular issue many studies face is heterogeneous operating environment. For DEA to make sense, however, the operating environment should be homogeneous. There exist several approaches for dealing with heterogeneous operating environment in the literature. For a review of such methods, see Yang and Pollitt (2009) and De Witte and Marques (2010). We briefly describe the four most commonly used methods for DEA. The separation approach splits the heterogeneous data sample into several homogeneous subsamples according to one or more environmental variables and performs DEA separately for each subsample. The advantage of this approach is its simplicity and straightforward interpretability. However, it significantly reduces the sample size making it unusable in many studies. The all-in-one model directly includes environmental variables in DEA as inputs or outputs. The two-stage model adjusts the efficiency scores based on the dependence between preliminary efficiency scores and environmental variables using regression analysis. The multi-stage model regress input slacks on environmental variables, adjusts inputs and finally performs DEA with adjusted inputs. The latter three models are more sofisticated and do not reduce sample size but are more cumbersome to interpret.

Whether theoretical, applicational or practical, the literature dealing with DEA is very extensive and still growing. Emrouznejad and Yang (2018) report a listing of scientific articles related to DEA from the seminal paper of Charnes et al. (1978) to 2016. Liu et al. (2016) identify the research activities (or the research fronts) in DEA from 2000 to 2014.

2.2 Efficiency of Libraries

One of the possible uses of DEA is assesing the efficiency of public or university libraries in a given area at a given time. We review 16 papers dealing with efficiency of libraries. The overview of papers is presented in tables 1 and 2. Most studies utilize the classical CCR or BCC DEA models although some studies adopt free disposal hull (FDH) approach. Simon et al. (2011) and Guccio et al. (2018) consider intermediate outputs and adopt network DEA with two steps. De Witte and Geys (2011) focus only on the first step that produces intermediate outputs. We compare all 16 studies based on the sample size, selection of the inputs and outputs and treatment of the operating environment.

The sample size of the reviewed studies ranges from 11 to 290. Five papers, namely Vitaliano (1998), Hammond (2002), Reichmann (2004), De Witte and Geys (2011) and Vrabková and Friedrich (2019), have medium sample size ranging from 92 to 290 while the rest have small sample size ranging from 11 to 68.

The reviewed studies utilize up to 5 inputs and up to 4 outputs. The most common inputs are the number of employees or personnel expenditures (87.50% of studies), book or other collections (62.50% of studies), variables related to expenditures (56.25% of studies) and the area of library (37.50% of studies). The most common outputs are the circulation or the number of loans (93.33% of studies), the number of visits (40.00% of studies), the number of consultations (40.00% of studies) and the number of registrations (33.33% of studies). The number of additions to collection, the opening hours and the number of serial subscriptions appear less often in the literature and in some studies are considered as inputs while in others as outputs or intermediate outputs.

Some of the studies consider operating environment to a certain degree. Sharma et al. (1999), Reichmann (2004), Chen et al. (2005), Miidla and Kikas (2009), Reichmann and Sommersguter-Reichmann (2010), Stroobants and Bouckaert (2014) and Vrabková and Friedrich (2019) analyze behavior of libraries in several predefined groups and compare their efficiency scores. Srakar et al. (2017) follow a similar approach but cluster libraries according to their efficiency and size with additional spatial constraints. Vitaliano (1998) uses the tobit regression to model efficiencies and find that they are positively dependent on population, negatively on wages of the directors and positively on town or village associations. Hammond (2002) includes population density and accessibility measures in the DEA model as non-discretionary inputs. De Witte and Geys (2011) employ the conditional efficiency model and find that the efficiency increases with left-wing ideological stance of the local government, wealth of the population, population density and local funding.

3 Methodology

3.1 Chebyshev Distance Data Envelopment Analysis

To obtain technical efficiencies, we utilize the Chebyshev distance DEA with variable returns to scale (VRS) proposed by Hladík (2019). Let be the non-negative matrix of inputs and be the non-negative matrix of outputs. We denote and

the vectors corresponding to the

-th row. We also denote and the matrices with -th row missing, i.e. the inputs and outputs of every DMU but .

As in classical DEA models, the problem of measuring efficiency of a DMU is formulated as finding the optimal weights of input and output variables with respect to the other DMUs. Note that each DMU has its own optimization problem. The idea of the Chebyshev distance DEA is to rank DMUs based on robustness of efficiency or inefficiency classification to variations of input and output data using the Chebyshev distance. Specifically, the resulting efficiency score for -th DMU is equal to , where is the optimal solution to the optimization problem

δ_i (1)
such that

where are the weights of inputs, are the weights of outputs and is the auxiliary variable used for ensuring VRS. The above formulation is a non-linear optimization problem which Hladík (2019) further propose to linearize. Let us reparametrize the weights and the VRS variable as


The linear approximation of (1) is then given by

δ_i (3)
such that

Hladík (2019) shows in several examples that the linear approximation (3) is quite precise and can be effectively utilized in practice.

The efficiency scores , lie in the interval whether given by the original non-linear optimization problem (1) or its linear approximation (3). Values indicate inefficient DMUs while values indicate efficient DMUs. The Chebyshev distance DEA further possesses the following properties:

  • Robust Interpretation: The efficiency scores of the Chebyshev distance DEA indicate how DMUs are sensitive to changes in their inputs and outputs. Specifically, the efficiency scores for inefficient DMUs are the smallest possible variations of all inputs and outputs causing efficiency in terms of the Chebyshev distance while the efficiency scores for efficient DMUs are the largest possible variations of all inputs and outputs preserving efficiency.

  • Super-Efficiency: As noted above, the Chebyshev distance DEA ranks inefficient as well as efficient DMUs. In contrast, the basic formulation of the classical DEA allows only for ranking inefficient DMUs.

  • Normalization: The efficiency scores of the Chebyshev distance DEA are naturally normalized due to their robust interpretation. Therefore, the efficiency scores can be compared across different analyzes.

  • Non-Negativeness: Unlike classical DEA, the Chebyshev distance DEA allows for zero inputs and zero outputs as well.

  • Units Invariance: Similarly to the classical DEA, the inputs and outputs can be arbitrarily scaled without affecting the efficiency scores of the Chebyshev distance DEA model. Therefore, it does not matter in which units are the inputs and outputs measured.

  • Ranking Order: The classification to efficient and inefficient DMUs as well as the order of inefficient DMUs according to their efficiency score is exactly the same in the Chebyshev distance DEA model as in the classical CCR model (or the BBC model when assuming VRS). The values of the efficiency scores, however, differ.

3.2 Analysis of Efficiency Scores in the Second Stage

We utilize the linear regression for modeling efficiency scores in the same way as

Holý and Šafr (2018). Let be the number of regressors and the design matrix with the values of the regressors. We further denote the vector corresponding to the -th row of . As efficiency scores of the Chebyshev distance DEA are bounded from bellow by 0 and from above by 2, we resort to the regression model with the logistic function


where and are the unknown parameters. Next, we use the transformation and arrive at the linear regression model


Note that we assume that are independent. This is clearly not the case as there is inherent dependency between the efficiency scores obtained by DEA. Serial correlation affects mainly the inference while the estimate of coefficients remains unbiased and consistent. As studied by Simar and Wilson (2007), the dependency structure is complex and unknown but disapperars asymptotically. Our data sample is quite large and we therefore resort to the independece simplification as most studies.

We further analyze efficiency scores using the decision tree approach. To build the decision tree, we adopt the RPART routine of Therneau and Atkinson (2019). Again, we analyze dependency of the efficiency scores on the regressors , .

4 Empirical Study

4.1 Data Sample

We analyze efficiency of public libraries established by Czech municipalities during the year 2017. In total, there are public libraries in 2017. Of these, are established by municipalities excluding Prague and by municipal and administrative districts of Prague. The remaining libraries include the National Library of the Czech Republic, the Moravian Library in Brno, the 13 regional libraries, libraries established by districts, etc. We focus only on the municipal libraries outside the capital. In our data, 2.71 % libraries have some observations missing. We remove these libraries from the analysis. Our data sample therefore consists of municipal libraries with no missing data. We have data available for the years 2016 and 2017. The two year history allows us to utilize aggregated values and first differences in the analysis.

For in-depth statistics about public libraries in the Czech Republic, we refer to the National Information and Consulting Centre for Culture (NIPOS).

4.2 Variable Selection

In our study, we utilize 10 variables in total. Descriptive statistics of the variables are reported in Table

3. The correlation matrix is illustrated in Figure 1. All variables except the town distance are strongly positively correlated while the town distance is moderately negatively correlated with the others. For the efficiency analysis, we consider the following input variables:

  • Total Expenditures: The total expenditures in CZK by the municipality to library activities (class 3314 in the sectoral classification of budget structure) in 2016 and 2017. We aggregate the expenditures to two years to capture long-term investments and smooth out annual budget changes. The data source is information portal MONITOR of the Ministry of Finance of the Czech Republic.

  • Employees: The number of full-time equivalents of library employees in 2017. Note that 64.07% of libraries have no own employees as very small libraries are run either by employees of the municipal office or volunteers. The data source is NIPOS.

  • Collection: The total number of book units owned by the library in 2016. This variable represents the capital of the library. We use the value from the previous year as we consider the increase in book collection in the current year to be output variable reflecting the performance of the library management. The data source is NIPOS.

We denote the input variables respectively as , and , . Further inputs such as the area of the library, the equipment, more detailed expenditures or more detailed collection could also be utilized. Unfortunately, we do not have these variables available in our data.

We consider the following output variables:

  • Registrations: The total number of users registered in the library in 2017. This variable captures the size of the reader base. The data source is NIPOS.

  • Circulation: The total number of book loans in 2017. This variable captures the main activity of libraries – book lending. The data source is NIPOS.

  • Events Attendance: The total number of visitors of events organized by the library in 2017. This variable captures the cultural role of libraries. Many libraries do not organize any events while others offer regular cultural program. The data source is NIPOS.

  • Collection Additions: The positive part of difference between the book collection in 2017 and 2016. This variable captures the increase of the capital of libraries. According to Table 3, the book collection of 50.56% libraries remains the same as in 2016 or in some cases even decreases. The data source is NIPOS.

We denote the output variables respectively as , , and , . Further outputs such as the number of visits, the number of consultations, the opening hours, the inter-library circulation or various measures of the internet activity could also be utilized. However, we do not have these variables available in our data.

Finally, we consider the folowing 3 variables potentially describing the environment in which libraries operate:

  • Population: The number of inhabitants of the municipality as of January 1, 2018. The data source is the Czech Statistical Office (CSO). We denote this variable as , .

  • Population Density: The number of inhabitants of the municipality per hectare as of January 1, 2018. The data source is CSO. We denote this variable as , .

  • Town Distance: The travel time by car in minutes to the municipality with extended powers222We have also considered different specifications of distance and reference town. Instead of the travel time, we have tried the air distance and road distance. Instead of the municipality with extended powers, we have tried the district capital (LAU 1 – okresní město), regional capital (NUTS 3 – krajské město), town with population higher than and city with general significance. All combinations of distances and reference towns have lead to weaker results.. The data source is web mapping service We denote this variable as , .

Not. Variable Min. Max. Mean Std. Dev. Zeros
Total Expenditures
Event Attendance
Collection Additions
Population Density
Town Distance
Table 3: Descriptive statistics of the input, output and environmental variables.
Figure 1: Correlation matrix of the input, output and environmental variables.
Figure 2: Kernel density functions of the efficiency scores.

4.3 Preliminary Efficiency Analysis

First, we apply the presented Chebyshev distance DEA with selected inputs and outputs to the full dataset of libraries. We denote this as preliminary efficiency analysis. Note that we consider VRS as there are huge differences in sizes of libraries and we do not assume proportional changes in inputs and outputs. Returns to scale can then be either increasing, decreasing or even constant. The estimated density function of preliminary efficiency scores is illustrated in Figure 2. For the estimation of the density, we utilize the Gaussian kernel. As expected for such large dataset, most libraries are inefficient with very low efficiency score. Specifically, 98.45% of all units are inefficient with mean score 0.1916 and median score 0.0999.

In the next steps, we improve this preliminary approach and focus on two issues – the operational environment and the discriminatory power. We investigate whether our sample of units is homogeneous (i.e. all libraries operate within the same environment) or heterogeneous (i.e. libraries operate under different conditions). Based on our findings, we divide the full sample into several smaller categories according to the environmental influences. This not only ensures homogeneity but also reduces the overly strict discriminatory power.

4.4 Dependence on Explanatory Variables

We study the influence of the population , the population density and the town distance on the transformed preliminary efficiency score of the unit using the linear regression. We arrive at the model formulation333Before arriving at this final model, we have tried several specifications of the regression model including all variables , and with logarithmic and power transformations as well as various interactions.


where , , , and are the parameters. Results of the regression model are reported in Table 4

. For the preliminary efficiency scores, all regressors are statistically significant at any reasonable confidence level. The model explains 22.90% variance in the dependent variable.

The above regression model has the following interpretation. The efficiency score increases with population as the coefficient is positive. For very small population, however, the efficiency score also increases as the coefficient is also positive. Finally, the efficiency score increases with decreasing town distance as the coefficient is negative. This relation is more distinctive for smaller population as the town distance is divided by the population . We do not include the population density in the final model as it is not significant in any transformation.

The regression model describes the relationship between the efficiency score and possible environmental factors. However, it does not tell us whether the population and town distance cause change in the efficiency and can be considered as environmental factors.

Model Coeff. Regressor Estimate Std. Error t-Statistic p-Value
Preliminary Intercept -24.0894 1.4866 -16.2049 0.0000
1.9496 0.1082 18.0230 0.0000
54.3415 5.1511 10.5495 0.0000
-2.7975 0.7090 -3.9456 0.0001
Decision Tree Intercept -15.6972 2.2017 -7.1295 0.0000
1.3447 0.1602 8.3933 0.0000
35.5161 7.6292 4.6553 0.0000
-0.2859 1.0501 -0.2723 0.7854
Expert Intercept -21.8288 2.2922 -9.5233 0.0000
1.8568 0.1668 11.1323 0.0000
53.6474 7.9426 6.7544 0.0000
-1.0758 1.0932 -0.9841 0.3251
Table 4: Summary of regression models.

4.5 Efficiency Analysis with Decision Tree Categories

The regression model indicates dependency of the efficiency score on the population and town distance. We further support this claim by the decision tree analysis. Other motivation for the use of the decision tree is separation of the data sample to several subsamples. As our goal is to use subsamples for separate efficiency analysis, we want them to have rougly the same number of units. Unfortunately, this is not guaranteed by the decision tree and we must therefore control the building of the tree by restricting the minimum number of units in a category. We find that in our case, the minimum of units leads to the most interpretable results. Another tuning parameter is the number of categories or the depth of the tree. We find that 11 categories with depth 7 is an adequate choice.

The categories of libraries given by the decision tree together with mean values of preliminary efficiency scores are reported in Table 5. We denote the categories as D01–D11. The decision tree divides the units into small with population lower than (categories D01–D05), medium with population between and (categories D06–D09) and large with population higher than (categories D10 and D11). Small units are further divided according to the town distance, medium according to the population and large to municipalities with extended powers (category D11) and other towns (category D10). As in the regression model, the town distance is more important for the smaller units. However, the mean efficiency scores suggest that the relation might be more complex – likely due to dependece between population and town distance. Decision tree also finds that it is significant whether the town distance is zero (and the unit is therefore the reference town) or positive as it puts all municipalities with extended powers into the category D11. The building of the decision tree is illustrated in Figure 3.

Next, we calculate efficiency scores separately for each category given by the decision tree. The mean scores are reported in Table 5. The discriminatory power of this efficiency analysis is more reasonable as 92.30% of all units are inefficient with mean score 0.4371 and median score 0.3070. The shape of the score density function is relatively mild as illustrated in Figure 2. Note that the preliminary scores have different interpretation than the decision tree scores as they use different samples. For example the fact that the mean decision tree score of D05 is higher than the mean score of D04 does not imply that D05 is more efficient. On the contrary, preliminary scores show that D04 is on average more efficienct. Only with the removal of D04 units and others from the efficiency analysis of D05, the D05 units become more efficient on average.

As for the preliminary scores, we use the regression model for the decision tree scores. Note that we can compare efficiency scores in different categories thanks to the normalization property of the Chebyshev distance DEA. Table 4 shows that town distance is no longer significant for the new scores. This suggests that the influence of the town distance is eliminated by the decision tree categories and the town distance is indeed an environmental factor. Our adjustment for the town distance in categories therefore leads to more fair comparison of libraries. The effect of the population, however, remains significant althought it is a bit lower as the model explains only 6.00% in the efficiency scores variance. It is also evident from Table 5 that more units have higher efficiency score for categories with higher population. This suggests that the population have some partial environmental influence but we cannot attribute unilateral causal influence to it. Libraries in towns with larger population are simply far more efficient on average even if we treat smaller towns separately.

This is an important result advocating our separation approach. Unlike the all-in-one model, two-stage and multi-stage models, we do not consider exogenous variables to fully affect the operating environment. We use them to measure similarity between DMUs and then retain only similar DMUs in the data sample. Our approach therefore diminishes the environmental influence of dissimilar DMUs while keeping the unaltered influence of similar DMUs.

Figure 3: Decision tree of depth 3 with mean efficiency scores and numbers of units.
Cat. Population Distance Units Preliminary Dec. Tree Expert
D01 373 0.1147 0.4893 0.3163
D02 408 0.1413 0.3685 0.3630
D03 867 0.1077 0.2869 0.3047
D04 481 0.1451 0.2916 0.3924
D05 367 0.0923 0.4609 0.2866
D06 165 0.1865 0.5353 0.4527
D07 871 0.1519 0.3332 0.4265
D08 380 0.2048 0.6471 0.6318
D09 206 0.2999 0.7081 0.6497
D10 404 0.4514 0.6130 0.7558
D11 138 0.8012 0.9256 0.9256
All 0.1916 0.4371 0.4458
Table 5: Mean efficiency scores within each decision tree category.

4.6 Efficiency Analysis with Expert Categories

The categorization by the decision tree is purely data-driven approach with its benefits and limitations. For example, it is a well known fact that decision trees are quite sensitive to changes in data and have tendency to overfit. We compare the categories given by the decision tree with categories selected by an expert. The expert categories can be useful in several ways. From the statistical point of view, their simpler rules can prevent sensitivity to data changes and offer more robust approach. From the applicability point of view, they can be used in variety of applications and time frames in contrast with our decision tree specifically designed for the efficiency analysis of public libraries in 2017. From the managerial point of view, it might be easier to convince management of the decision making units that expert categories with "nicer looking" rules are more fair. Nevertheless, the data-driven categories offer valuable insight and should serve as the benchmark.

Our expert categories with their rules are described in Table 6. We keep the number of categories at 11 and denote them E01-E11. We divide units into 5 population levels and 2 distance levels forming 10 categories based on very simple rules with roughly the same size. We keep municipalities with extended powers in the separate category E11 identically to the decision tree category D11.

We follow the same procedure as for the efficiency analysis based on the decision tree. Efficiency scores within expert categories are reported in Table 6. The discriminatory power is quite similar to the decision tree efficiency analysis as 92.04% of all units are inefficient with mean score 0.4458 and median score 0.3164. Furthermore, the kernel density functions of the scores are almost identical for the two categorizations as illustrated in Figure 2.

Finally, we fit the regression model and arrive at the same conclusion – the population remain significant while the distance is not significant. The model explains 8.34% in the variance of the efficiency scores which is slightly higher number than in the decision tree model. This means that the decision tree model captures environmental effects better but the two models are quite comparable.

Cat. Population Distance Units Preliminary Dec. Tree Expert
E01 270 0.1364 0.4274 0.4181
E02 376 0.1094 0.3369 0.3275
E03 785 0.1155 0.3513 0.3089
E04 682 0.1144 0.3135 0.3001
E05 741 0.1461 0.3833 0.3346
E06 474 0.1543 0.3882 0.4641
E07 463 0.2129 0.5681 0.5887
E08 249 0.2032 0.5695 0.6862
E09 281 0.4312 0.6546 0.6350
E10 201 0.4185 0.5999 0.8787
E11 138 0.8012 0.9256 0.9256
All 0.1916 0.4371 0.4458
Table 6: Mean efficiency scores within each expert category.

4.7 Comparison of Efficiency Scores

The preliminary efficiency analysis does not account for heterogeneous environment and we therefore do not recommend to use its efficiency scores to rank libraries. Efficiency analysis with either decision tree categories or expert categories considers environmental effects of population with town distance and is suitable to rank libraries. The categories given by the decision tree better remove the influence of the operating environment. Both approaches are, however, rather similar as the correlation coefficient between their efficiency scores is 0.8405. Preliminary efficiency scores are more different as their correlation coefficient is 0.7523 for decision tree scores and 0.7609 for expert scores.

5 Conclusion

We assess technical effiencies of public libraries established by municipalities in the Czech Republic in the year 2017. In the first stage, we adopt the Chebyshev distance DEA and utilize its many attractive properties including the super-efficiency and natural normalization. We consider total expenditures, employees and book collection as inputs with registrations, book circulation, event attendance and collection additions as outputs. In the second stage, we perform the regression analysis and find that the efficiency scores are significantly dependent on the population of the municipality and distance to the municipality with extended powers. To remove the influence of the operating environment, we employ DEA for libraries separated into categories given by the decision tree analysis. Interestingly, the effect of population is not completely removed suggesting it is partially environmental variable and partially explanatory variable. We also consider categories designed by an expert and find that the proposed separation approach is robust to the specification of categories to a certain degree. The proposed methodology can be used in similar applications when the data sample is large and the operating environment exhibits heterogeneity.


The author would like to thank Jan Kubát for his help with data preparation and Bojka Hamerníková, Vladimír Beneš and Marek Jetmar for their comments.


The work on this paper was supported by the Technology Agency of the Czech Republic under Grant TL01000463 in the Eta program.


  • Atici and Podinovski (2015) Atici, K. B., Podinovski, V. V. 2015. Using Data Envelopment Analysis for the Assessment of Technical Efficiency of Units with Different Specialisations: An Application to Agriculture. Omega. Volume 54. Pages 72–83. ISSN 0305-0483. {}.
  • Banker et al. (1984) Banker, R. D., Charnes, A., Cooper, W. W. 1984. Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis. Management Science. Volume 30. Issue 9. Pages 1078–1092. ISSN 0025-1909. {}.
  • Boussofiane et al. (1991) Boussofiane, A., Dyson, R. G., Thanassoulis, E. 1991. Applied Data Envelopment Analysis. European Journal of Operational Research. Volume 52. Issue 1. Pages 1–15. ISSN 0377-2217. {}.
  • Charnes et al. (1978) Charnes, A., Cooper, W. W., Rhodes, E. 1978. Measuring the Efficiency of Decision Making Units. European Journal of Operational Research. Volume 2. Issue 6. Pages 429–444. ISSN 0377-2217. {}.
  • Chen (1997) Chen, T.-Y. 1997. A Measurement of the Resource Utilization Efficiency of University Libraries. International Journal of Production Economics. Volume 53. Issue 1. Pages 71–80. ISSN 0925-5273. {}.
  • Chen et al. (2005) Chen, Y., Morita, H., Zhu, J. 2005. Context-Dependent DEA with an Application to Tokyo Public Libraries. International Journal of Information Technology & Decision Making. Volume 4. Issue 3. Pages 385–394. ISSN 0219-6220. {}.
  • Cook and Seiford (2009) Cook, W. D., Seiford, L. M. 2009. Data Envelopment Analysis (DEA) - Thirty Years On. European Journal of Operational Research. Volume 192. Issue 1. Pages 1–17. ISSN 0377-2217. {}.
  • Cook et al. (2014) Cook, W. D., Tone, K., Zhu, J. 2014. Data Envelopment Analysis: Prior to Choosing a Model. Omega. Volume 44. Pages 1–4. ISSN 0305-0483. {}.
  • De Carvalho et al. (2012) De Carvalho, F. A., Jorge, M. J., Jorge, M. F., Russo, M., De Sa, N. O. 2012. Library Performance Management in Rio de Janeiro, Brazil: Applying DEA to a Sample of University Libraries in 2006-2007. Library Management. Volume 33. Issue 4-5. Pages 297–306. ISSN 0143-5124. {}.
  • De Witte and Geys (2011) De Witte, K., Geys, B. 2011. Evaluating Efficient Public Good Provision: Theory and Evidence from a Generalised Conditional Efficiency Model for Public Libraries. Journal of Urban Economics. Volume 69. Issue 3. Pages 319–327. ISSN 0094-1190. {}.
  • De Witte and Marques (2010) De Witte, K., Marques, R. C. 2010.

    Incorporating Heterogeneity in Non-Parametric Models : A Methodological Comparison.

    International Journal of Operational Research. Volume 9. Issue 2. Pages 188–204. ISSN 1745-7645. {}.
  • Dyson et al. (2001) Dyson, R. G., Allen, R., Camanho, A. S., Podinovski, V. V., Sarrico, C. S., Shale, E. A. 2001. Pitfalls and Protocols in DEA. European Journal of Operational Research. Volume 132. Issue 2. Pages 245–259. ISSN 0377-2217. {}.
  • Emrouznejad and Yang (2018) Emrouznejad, A., Yang, G.-L. 2018. A Survey and Analysis of the First 40 Years of Scholarly Literature in DEA: 1978–2016. Socio-Economic Planning Sciences. Volume 61. Pages 4–8. ISSN 0038-0121. {}.
  • Fukuyama and Matousek (2017) Fukuyama, H., Matousek, R. 2017. Modelling Bank Performance: A Network DEA Approach. European Journal of Operational Research. Volume 259. Issue 2. Pages 721–732. ISSN 0377-2217. {}.
  • Golany and Roll (1989) Golany, B., Roll, Y. 1989. An Application Procedure for DEA. Omega. Volume 17. Issue 3. Pages 237–250. ISSN 0305-0483. {}.
  • Guccio et al. (2018) Guccio, C., Mignosa, A., Rizzo, I. 2018. Are Public State Libraries Efficient? An Empirical Assessment Using Network Data Envelopment Analysis. Socio-Economic Planning Sciences. Volume 64. Pages 78–91. ISSN 0038-0121. {}.
  • Hammond (2002) Hammond, C. J. 2002. Efficiency in the Provision of Public Services: A Data Envelopment Analysis of UK Public Library Systems. Applied Economics. Volume 34. Issue 5. Pages 649–657. ISSN 0003-6846. {}.
  • Hladík (2019) Hladík, M. 2019. Universal Efficiency Scores in Data Envelopment Analysis Based on a Robust Approach. Expert Systems with Applications. Volume 122. Pages 242–252. ISSN 0957-4174. {}.
  • Holý and Šafr (2018) Holý, V., Šafr, K. 2018. Are Economically Advanced Countries More Efficient in Basic and Applied Research? Central European Journal of Operations Research. Volume 26. Issue 4. Pages 933–950. ISSN 1613-9178. {}.
  • Jablonsky (2016) Jablonsky, J. 2016. Efficiency Analysis in Multi-Period Systems: An Application to Performance Evaluation in Czech Higher Education. Central European Journal of Operations Research. Volume 24. Issue 2. Pages 283–296. ISSN 1435-246X. {}.
  • Jablonsky (2018) Jablonsky, J. 2018. Ranking of Countries in Sporting Events Using Two-Stage Data Envelopment Analysis Models: A Case of Summer Olympic Games 2016. Central European Journal of Operations Research. Volume 26. Issue 4. Pages 951–966. ISSN 1435-246X. {}.
  • Liu et al. (2013) Liu, J. S., Lu, L. Y. Y., Lu, W.-M., Lin, B. J. Y. 2013. A Survey of DEA Applications. Omega. Volume 41. Issue 5. Pages 893–902. ISSN 0305-0483. {}.
  • Liu et al. (2016) Liu, J. S., Lu, L. Y. Y., Lu, W.-M. 2016. Research Fronts in Data Envelopment Analysis. Omega. Volume 58. Pages 33–45. ISSN 0305-0483. {}.
  • Miidla and Kikas (2009) Miidla, P., Kikas, K. 2009. The Efficiency of Estonian Central Public Libraries. Performance Measurement and Metrics. Volume 10. Issue 1. Pages 49–58. ISSN 1467-8047. {}.
  • Ozcan and Khushalani (2017) Ozcan, Y. A., Khushalani, J. 2017. Assessing Efficiency of Public Health and Medical Care Provision in OECD Countries After a Decade of Reform. Central European Journal of Operations Research. Volume 25. Issue 2. Pages 325–343. ISSN 1435-246X. {}.
  • Reichmann (2004) Reichmann, G. 2004. Measuring University Library Efficiency Using Data Envelopment Analysis. Libri. Volume 54. Issue 2. Pages 136–146. ISSN 0024-2667. {}.
  • Reichmann and Sommersguter-Reichmann (2010) Reichmann, G., Sommersguter-Reichmann, M. 2010. Efficiency Measures and Productivity Indexes in the Context of University Library Benchmarking. Applied Economics. Volume 42. Issue 3. Pages 311–323. ISSN 0003-6846. {}.
  • Shabani et al. (2019) Shabani, A., Visani, F., Barbieri, P., Dullaert, W., Vigo, D. 2019. Reliable Estimation of Suppliers’ Total Cost of Ownership: An Imprecise Data Envelopment Analysis Model with Common Weights. Omega. Volume 87. Pages 57–70. ISSN 0305-0483. {}.
  • Shahwan and Kaba (2013) Shahwan, T. M., Kaba, A. 2013. Efficiency Analysis of GCC Academic Libraries: An Application of Data Envelopment Analysis. Performance Measurement and Metrics. Volume 14. Issue 3. Pages 197–210. ISSN 1467-8047. {}.
  • Sharma et al. (1999) Sharma, K. R., Leung, P.-S., Zane, L. 1999. Performance Measurement of Hawaii State Public Libraries: An Application of Data Envelopment Analysis (DEA). Agricultural and Resource Economics Review. Volume 28. Issue 2. Pages 190–198. ISSN 2372-2614. {}.
  • Simar and Wilson (2007) Simar, L., Wilson, P. W. 2007. Estimation and Inference in Two-Stage, Semi-Parametric Models of Production Processes. Journal of Econometrics. Volume 136. Issue 1. Pages 31–64. ISSN 0304-4076. {}.
  • Simon et al. (2011) Simon, J., Simon, C., Arias, A. 2011. Changes in Productivity of Spanish University Libraries. Omega. Volume 39. Issue 5. Pages 578–588. ISSN 0305-0483. {}.
  • Srakar et al. (2017) Srakar, A., Kodrič-Dačić, E., Koman, K., Kavaš, D. 2017. Efficiency of Slovenian Public General Libraries: A Data Envelopment Analysis Approach. Lex Localis. Volume 15. Issue 3. Pages 559–581. ISSN 1581-5374. {}.
  • Stroobants and Bouckaert (2014) Stroobants, J., Bouckaert, G. 2014. Benchmarking Local Public Libraries Using Non-Parametric Frontier Methods: A Case Study of Flanders. Library & Information Science Research. Volume 36. Issue 3-4. Pages 211–224. ISSN 0740-8188. {}.
  • Therneau and Atkinson (2019) Therneau, T. M., Atkinson, E. J. 2019. An Introduction to Recursive Partitioning Using the RPART Routines. Technical Report. {}.
  • Vitaliano (1998) Vitaliano, D. F. 1998. Assessing Public Library Efficiency Using Data Envelopment Analysis. Annals of Public and Cooperative Economics. Volume 69. Issue 1. Pages 107–122. ISSN 1370-4788. {}.
  • Vrabková and Friedrich (2019) Vrabková, I., Friedrich, V. 2019. The Productivity of Main Services of City Libraries: Using the Example from the Czech Republic and the Slovak Republic. Library & Information Science Research. Volume 41. Issue 3. Pages 100962/1–100962/11. ISSN 0740-8188. {}.
  • Wu et al. (2016) Wu, J., Zhu, Q., Chu, J., Liu, H., Liang, L. 2016. Measuring Energy and Environmental Efficiency of Transportation Systems in China Based on a Parallel DEA Approach. Transportation Research, Part D: Transport and Environment. Volume 48. Pages 460–472. ISSN 1361-9209. {}.
  • Yang and Pollitt (2009) Yang, H., Pollitt, M. 2009. Incorporating Both Undesirable Outputs and Uncontrollable Variables into DEA: The Performance of Chinese Coal-Fired Power Plants. European Journal of Operational Research. Volume 197. Issue 3. Pages 1095–1105. ISSN 0377-2217. {}.