Log In Sign Up

Analysing the causal effect of London cycle superhighways on traffic congestion

by   Prajamitra Bhuyan, et al.

Transport operators have a range of intervention options available to improve or enhance their networks. Such interventions are often made in the absence of sound evidence on resulting outcomes. Cycling superhighways were promoted as a sustainable and healthy travel mode, one of the aims of which was to reduce traffic congestion. Estimating the impacts that cycle superhighways have on congestion is complicated due to the non-random assignment of such intervention over the transport network. In this paper, we analyse the causal effect of cycle superhighways utilising pre-intervention and post-intervention information on traffic and road characteristics along with socio-economic factors. We propose a modeling framework based on the propensity score and outcome regression model. The method is also extended to the doubly robust set-up. Simulation results show the superiority of the performance of the proposed method over existing competitors. The method is applied to analyse a real dataset on the London transport network. The methodology proposed can assist in effective decision making to improve network performance.


Causal Analysis at Extreme Quantiles with Application to London Traffic Flow Data

Treatment effects on asymmetric and heavy tailed distributions are bette...

Estimating the average causal effect of intervention in continuous variables using machine learning

The most widely discussed methods for estimating the Average Causal Effe...

Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction

Reducing cycle time is a recurrent concern in the field of business proc...

Estimating and Mitigating the Congestion Effect of Curbside Pick-ups and Drop-offs: A Causal Inference Approach

Curb space is one of the busiest areas in urban road networks. Especiall...

Estimation of Time Delay caused by Point Geometry in Public Transport

Travel time prediction is a well-renowned topic of research. It is prima...

1 Introduction

The transport network acts as a lifeline for metropolitan cities across the world. Intelligent transportation systems can revolutionize traffic management and results in significant improvements in people’s mobility. They can offer an integrated approach to infrastructure development and traffic-mobility management. The absence of well functioning commuting channels can have a strongly negative impact on those residing in urban areas. In the last couple of decades, metropolitan areas in both developed and developing countries, have been affected by increasing traffic congestion and several other problems such as poor air quality from pollution. Most air pollution in cities can be attributed to road transport and domestic and commercial heating systems. In addition to the negative impacts on mobility and air quality, previous studies indicate that severe congestion has a negative impact on GDP and an efficient transport system significantly improves the city’s economic competitiveness (Slawson, 2017; Jin and Rafferty, 2017).

Effective design and management of the transport network has a significant impact on the quality of life in smart cities. In general, network interventions (i.e. treatment) are a widely used measure to control high-consequence events world-wide. But often such interventions are made in the absence of statistical evidence on the resulting outcomes. Consequently, it is common to find situations in which interventions fail to deliver their intended consequences and in which transport networks perform poorly in relation to traffic flow, speed, capacity utilisation, safety, and economic and environmental impacts. Such interventions often have unintended negative consequences. Due to the complex nature of transport networks it is difficult to disentangle drivers of good performance and identify the factors underpinning network failure. Furthermore, it is not easy to quantify how interventions impact on system performance because transport interventions are typically targeted to address specific network problems, and are therefore non-randomly assigned. The key consequence of this non-random treatment assignment is the possibility that the effect of the treatment is ‘confounded’ if the treated and control units differ systematically with respect to several characteristics which may affect the outcome of interest.

In recent years, cycling has been promoted as a healthy and sustainable mode of transport, with the additional benefits of reducing traffic congestion; frequency of road accidents; and emissions from vehicle exhausts. Schemes to promote cycling have been deemed effective and the number of cyclists has increased rapidly in major European cities, including Copenhagen, Amsterdam and London. Recent reports suggest a increase in daily journeys in Greater London over a period of ten years from 2004 to 2014 (Transport for London, 2014). The Mayor of London target to achieve 400% increase in cycling by 2026 and several policy decisions to facilitate cycling including the Cycle Superhighways (CS), Santander Cycles and Biking Boroughs have already been implemented (Transport for London, 2010). The CS are 1.5 meter wide barrier-free cycling-paths designed to connect outer London to central London. The blue surfacing on CS distinguishes them from the existing cycle-paths in London (See Figure 1). The CS routes are designed to provide adequate spatial capacity for existing cyclists and potential future commuters who adopt cycling as a mode of transport. With the aim of enabling faster and safer cycle journeys, the twelve Cycle Superhighways, were announced in 2008. As displayed in Figure 2, these routes were designed to radiate from the city center based on the clock face layout. In July 2010, the first two pilot routes, CS3 and CS7, were inaugurated. As reported by Transport for London (2011), in the first year, cycling has increased by along CS3 and along CS7. A new East-West route was introduced to replace CS10, while CS6 and CS12 have been cancelled. As of the end of 2015, only four routes are in operation, namely CS2 (Stratford to Aldgate); CS3 (Barking to Tower Gateway); CS7 (Merton to the City); and CS8 (Wandsworth to Westminster). Due to the lack of adequate data, the effects of CS on traffic congestion are not evaluated in the report by Transport for London (2011). Since their introduction, there has been considerable debate about the effects of CS on road traffic congestion (Norman, 2017; Blunden, 2016). The quantification of the effects of CS on traffic congestion is a complex problem due to the intricate nature of the transport-network, and various traffic and socioeconomic characteristics may act as confounders. For example, urban areas with high population density and narrow roads are expected to have more congestion compared to the outskirts. Similarly, the number of public road transport stops will have an effect on pedestrian activities resulting in changes in traffic congestion.

Figure 1: Cycling Superhighways
Figure 2: Route map of Cycling Superhighways

Congestion arises when the volume of traffic increases to a level that causes traffic speeds to fall below the free-flow (i.e. maximum) speed. There is a direct relationship between flow and traffic speed, both of which have an impact on the levels of congestion (Retallack and Ostendorf, 2019; Yuan et al., 2015)

. In order to investigate the impact of the introduction of Cycle Superhighways on congestion in London, we present a causal analysis of the links between traffic flow and speed and Cycle Superhighways. Our objective is to provide a modeling framework to obtain robust estimates of the causal quantities addressing issues of confounding and longitudinal dependence based on pre-intervention and post-intervention data. Using generalized linear mixed models (GLMMs), we incorporate time-invariant and time-varying covariates to adjust for potential sources of confounding and bias from longitudinal dependence by using random effect parameters. We also propose a novel estimator to deal with unknown interactions between time and covariates. The proposed methods are extended to a doubly robust set-up. First, we describes the existing models and methods used in the traditional potential outcome framework in Section

2. In Section 3, we propose new methods motivated by real-life data and explain the advantages compared to the existing methods available in the literature. Simulation studies are performed for assessing the effectiveness of the proposed methods and the results are summarized in Section 4. In Section 5, results from real data analysis on London Cycle Superhighways are discussed. We summarise the key findings and conclude with some discussion on future research in Section 6.

2 Potential Outcome Framework

In traditional causal inference problems, the primary interest is to estimate the average treatment effect (ATE) based on available data as realisations of a random vector,

, , where denotes a response, the exposure (or treatment status), and a vector of confounders or covariates for the ith unit. The treatment can be binary, multi-valued or continuous but essentially it is not assigned randomly. In this set-up, the simple comparisons of mean responses across different treatment groups lead to biased estimates and may not reveal a ‘causal’ effect of the treatment due to potential confounding. Confounding can be addressed if the vector of covariates is sufficient to ensure unconfoundedness, or conditional independence of potential responses and treatment assignment. In the context of binary treatments, the conditional independence assumption requires that , where is the indicator function for receiving the treatment and and indicate potential outcomes under treated or control status, respectively. An additional requirement for valid causal inference is that, conditional on covariates

, the probability of assignment to treatment is strictly positive for every combination of values of the covariates, i.e.

for all . See Rosenbaum and Rubin (1983) for more details. The main interest is to estimate the average treatment effect , which measures the difference in expected outcomes under treatment and control status.

Several estimators for the ATE are studied in the literature under the assumption that the covariate vector is sufficient to ensure the independence of potential responses and treatment assignment, see for example, Hernán and Robins (2020). Three estimators are of particular interest here. First, we model the expected response given the covariates and treatment, , using an outcome regression (OR) model , for known link function , regression function , and unknown parameter vector . If the OR model is correctly specified, then a consistent estimate of the ATE is given by

where is iteratively reweighted least squares estimate (Bang and Robins, 2005). Second, we could assume a model for , the conditional density of the treatment given the covariates and use this model to estimate propensity scores (PS), which we denote with parameter vector . A PS weighted estimator of the form attributed to Horvitz and Thompson (1952) is then given by

which can be used to estimate the ATE consistently. Finally, we could combine an OR and PS model and construct an estimate for the ATE. This is known as a doubly robust (DR) method in the sense that the resulting estimator is asymptotically consistent if either the OR or PS model is correctly specified. See Kang and Schafer (2007) for more details.

It is important to note that the aforementioned estimators can only incorporate post-intervention observations on the response variable. In our context, pre-intervention measurements are also available, which potentially contain important information on the causal quantities of interest. One can still use the existing methods for estimation of ATEs discarding the pre-intervention data, but the estimates are possibly less efficient compared to any estimator suitable for incorporating the entire dataset (See Section

4 later). In the next section, we attempt to address this issue and propose new estimator for estimation of ATEs based on both pre-intervention and post-intervention measurements.

3 Proposed Methods

In our context, the response variable is observed both before and after the intervention and it depends on time-independent as well as time-dependent covariates. The available data can be represented in a longitudinal structure as , , , where and denote pre-intervention and post-intervention-period, respectively. Note that for all , and contains both time-independent and time-dependent covariates. The policy decisions are formulated based on pre-intervention condition, that is the treatment status is only dependent on the . We assume for all , and , where and are potential outcomes under treated and control status, respectively, at time . In this set-up, our main interest is to estimate the average treatment effect , which measures the difference in expected outcomes under treatment and control status in the post-intervention period.

In the literature of Econometrics, a classical approach is to use the difference-in-difference (DID) method to adjust the effect of confounding based on pre-intervention and post-intervention data (Heckman et al., 1998). This method is popularly used to estimate the causal effect of policy decisions in the field of health economics (Harman et al., 2011; Wharam et al., 2007), epidemiology (Branas et al., 2011), market economics (Card and Krueger, 1994), and various other allied fields. This method is easy to compute and intuitively appealing, however, it assumes the ‘parallel trend’ assumption holds, that its, that the difference between the ‘treatment’ and ‘control’ group is constant over time in the absence of treatment. Under the parallel trend assumption, the DID estimator is given by

The parallel trend assumption may be implausible if pre-treatment characteristics that are thought to be associated with the response variable are unbalanced between the treated and the control group (Abadie, 2005)

. To address this issue, covariate adjusted DID estimators are proposed in the literature. However, almost all these method ignore the bias in the estimated standard errors due to the serially correlated response variable

(Bertrand and Mullainathan, 2004). Moreover, the presence of interactions between time and covariates may result in biased estimates. In the following subsections, we propose novel modeling approaches to address these key estimation issues.

3.1 A Generalized Mixed Model Approach

In this section we develop an estimator of the ATE which can account for the situation where the response variable depends on both time-independent and time-dependent covariates as well as exposure and, additionally, where the pre-intervention and post-intervention responses of the same unit may be serially correlated. To estimate the ATE incorporating these features, we consider the following GLLM model:


where is some known link function, and is the fixed effects with time effect , treatment effect , parameter vector corresponding to the design vector , and is the design vector for the random effects with being a positive definite covariance matrix. For the linear predictor , the conditional expectation is

and the conditional variance is var

, where is the variance function and is a dispersion parameter.

For our case study, the relationship between the response and treatment is likely to be confounded in the sense that both the response (i.e. traffic flow or speed) and the treatment (i.e. cycle superhighways) could depend on a set of pre-intervention characteristics. Moreover, several other factors in the post-intervention period could result in a spurious association between response and treatment. To address these issues, we include both time-independent and time-dependent covariates within the design matrix . Also, the proposed model relaxes the ‘parallel trend’ assumption and accounts for linear or non-linear time trend affecting the response. The response variable measured over different time points are serially correlated and the random effects account for the same and other sources of unobserved heterogeneity. It is important to note that the misspecification of the models associated with random effects may adversely affect the estimates of both random and fixed parameters. In real applications, normality of random effects is typically assumed for computational convenience, but it may lead to biased results. In this context, Huang (2011) and Abad et al. (2010) proposed useful diagnostics and discuss the consequences of misspecification.

The estimate of ATEs based on model (1) involves predictions for both the treatment statuses at the post-intervention period. For non-linear link functions, population-averaged expectations cannot be obtained by simply plugging the mean of the random effects in the expression for conditional expectation (Skrondal and Rabe-Hesketh, 2009). Using the double-expectation rule, the estimate of is obtained as

where is the density function of the random effects with estimated covariance matrix . The integral that is involved in the expression of must generally be evaluated numerically or by simulation.

3.2 Inverse Propensity Weighted Difference-in-Difference

Historically, researchers in various fields of application have used regression based methods to measure the differences between the treated and control group. More recently, PS based methods have become increasingly popular to eliminate the effects of confounding present in observational data. PS based models have several advantages, it is simpler to determine the adequacy of a PS model than to assess whether the regression model reasonably specifies the relationship between the exposure and covariates. Moreover, standard goodness-of-fit tests fail to identify whether the fitted regression model has successfully accounted for the systematic differences between treated and control groups for the estimation of ATE (Austin, 2011). In our context, the response variable not only depends on the pre-intervention confounders but is also affected by various factors, possibly unmeasured, in the post-intervention period. In particular, regression based method provide biased estimates in the presence of unknown interactions between time and covariates (See Section 4 later). In order to avoid the difficulties that arise in regression based approaches, we propose the following difference-in-difference estimator based on an inverse propensity weighting:

where is the estimated PS based on pre-intervention covariate vector

. One can use machine learning techniques such as random forests, neural network etc. to estimate the PS but logistic regression is widely used and performs reasonably well

(Austin, 2011). The inverse propensity score difference-in-difference estimate is consistent under the condition that the PS model is correctly specified, and the expected potential outcome under treatment and control status are equal in the pre-intervention period (i.e. ). The proof is outlined in the Appendix. In contrast to the conventional DID estimate, does not require the ‘parallel trend’ assumption for its consistency.

3.3 Doubly Robust Method

We have proposed two different methods that adjust the parameter estimates in the case where covariates may be related both to the response and treatment assignment mechanism. In Subsection 3.1, we model the relationships between the covariates and the response and use those relationships to predict for both the treatment statuses and obtain estimate ATEs. Another approach, discussed in Subsection 3.2, is to model the probabilities of treatment assignment given the covariates and incorporate them into a weighted difference-in-difference estimate. However, with observational data, one can never be sure that a model for the treatment assignment mechanism or an outcome regression model is correct, an alternative approach is to develop a doubly-robust (DR) estimator. Several DR estimation methodologies are proposed in the literature (Kang and Schafer, 2007). In this paper, we consider the augmented regression method and extend model (1) to a DR version as


where is a suitably chosen parametric function and is the associated coefficient. Scharfstein et al. (1999) considered inverse PS as an additional covariate, i.e. , and Bang and Robins (2005) proposed the so called ‘clever covariate’ which is also a function of the inverse PS and treatment status. Kang and Schafer (2007) compared the performance of DR estimates with various choices of and found that these two choices perform very poorly when some of the estimated propensities are small. The best performance is achieved by choosing

as a step-wise constant function with discontinuities at the sample quantiles of

. In other words, one can simply coarse classify the PS into some suitable number of categories and create dummy indicators to augment the regression model. With this choice, the model (

2) can be rewritten as



is the vector of dummy variables to indicate the category based on the coarse classification of

and is the associated parameter vector. The DR estimate of based on model (3) is given by

The performance of may depend on the dimension of . In practice, it is observed that the performance of the DR estimate is satisfactory with no more that four dummy variables (Kang and Schafer, 2007). The estimate of converges to if the GLMM is correctly specified. If the PS model is correct, but the GLMM is not, the augmented regression has a bias correction property. As expected, the DR estimate does not necessarily translate into good performance when neither model is correctly specified. See Scharfstein et al. (1999) for details. Also, it is important to note that the DR estimate may produce biased results in the presence of unknown interactions between time and covariates.

4 Simulation Study

In order to compare the performance of the proposed methods and existing competitors, we generate data comprising and units. We generate three independent covariates , and , where follows a bivariate normal with mean vector and covariance matrix ;

follows an exponential distribution with mean

; and

follows a normal distribution with mean

and variance . The covariate varies with time while and are time-invariant. We specify the following relationships between the covariates and the treatment and response :

where random effect and error are generated from independent normal distributions with mean and variance . Thus, and are confounders and is a non-confounding covarite. Approximately of the units are allocated treatment and the true ATE is 15 as a result of the aforementioned parameter choices. Under this set-up, the following estimators are tested:

  • - correctly specified OR model based on post-intervention measurements: .

  • - correctly specified GLMM model: .

  • - incorrectly specified GLMM model with erroneous exclusion of the time-invariant confounder : .

  • - difference of the differences between average values of treated and control group for pre-intervention and post-intervention periods.

  • - inverse propensity weighted estimate based on post-intervention response (i.e. ) with correctly specified PS model:

  • - inverse propensity weighted difference and difference estimate based on correctly specified PS model:

  • - DR estimate based on correctly specified GLMM model but augmented with incorrectly specified PS covariates with erroneous exclusion of the time-invariant confounder .

  • - DR estimate based on incorrectly specified GLMM with erroneous exclusion of the time-invariant confounder but augmented with correctly specified PS covariates.

  • - DR estimate based on incorrectly specified GLMM and augmented with incorrectly specified PS covariates where time-invariant confounder is excluded in both the models.

The mean estimates of ATE, variance(Var) and mean squared error (MSE) are presented in Table 1 based on 1000 replications. It is clearly seen that is biased as the effect of the treatment is counfounded due to the erroneous omission of the time-invariant confounder . As expected, the performance of is the best with respect to MSE. The DR estimates and performs as well as , but the performance of is similar to that of with respect to both bias and variance. The naive estimator is marginally biased but possesses high variability. The bias of and are similar but the former exhibits high variability due to the omission of the pre-intervention data. It is also interesting to observe that the variance and MSE of is much less than those of for the same reason. This clearly demonstrates that the pre-intervention data contains important information and provides more precise estimates of the ATE under correct model specifications. We have also observed that the estimates of ATEs are biased in the absence of random effect parameters to account the serial correlations in the data.

To analyse the sensitivity of the estimators associated with unknown interaction between time and covariates, we now consider the following relationships between the covariates and the treatment and response :

where we have considered an interaction effect of time and time-invariant confounder keeping all other specifications as in the previous case. The results are presented in Table 2. With the exception for the case of , it is clearly seen that all the estimators are biased due to the presence of an unknown interaction effect. In addition to bias, the variance of and is also very high. Interestingly, the correctly specified PS covariates fail to adjust the bias induced by interaction effect of time in the DR method, however, remains unaffected and performs the best.

Estimator Average Var MSE Average Var MSE
14.979 0.474 0.474 15.013 0.212 0.212
14.971 0.330 0.331 15.010 0.154 0.154
15.203 0.396 0.437 15.235 0.194 0.249
14.917 0.466 0.472 14.958 0.234 0.235
14.980 0.875 0.875 14.997 0.351 0.351
14.963 0.607 0.608 14.992 0.284 0.284
14.973 0.334 0.335 15.010 0.155 0.155
15.041 0.373 0.374 15.075 0.186 0.191
15.201 0.397 0.437 15.234 0.194 0.249
Table 1: Simulation results for comparison of the competitive estimators
Estimator Average Var MSE Average Var MSE
15.504 0.476 0.729 15.540 0.250 0.540
15.918 0.787 1.630 15.964 0.405 1.335
15.756 0.811 1.382 15.795 0.416 1.048
14.979 0.515 0.515 14.994 0.248 0.248
15.506 0.481 0.737 15.539 0.250 0.541
15.660 0.597 1.031 15.696 0.305 0.789
15.916 0.793 1.631 15.963 0.406 1.334
Table 2: Simulation results for analysing sensitivity of the estimators in presence of unknown time-interactions

5 Quantifying Causal Effects of Cycle Superhighways on Traffic Volume and Speed

As discussed in Section 1, we analyse the causal effect of CS on traffic congestion based on a dataset collected over the period 2007-2014. In this study, 75 treated zones and 375 control zones were selected randomly along the 40 km long main corridors radiating from central London to outer London.

In observational data the effect of CS is confounded due to various factors related to traffic dynamics, road characteristics, and socio-demographic conditions. The traffic data on the major road network as well as on the minor road network are collected by The Department for Transport (The Department of Transport, 2018). Also, additional information on traffic flow and speed are collected from the London Atmospheric Emissions Inventory (LAEI). It is observed that traffic congestion is associated with bus-stop density and road network density George (1970). An association between traffic congestion and socio-demographic characteristics, such as employment and land-use patterns, has also been indicated in previous studies (Badoe and Miller, 2000; Zhang et al., 2017; Chen et al., 2001). To incorporate these effects, we obtained relevant data on population and employment density, as well as the information of land-use patterns from the Office for National Statistics. The traffic characteristics are time-varying but the road characteristics, land-use patterns and employment density remain unchanged over the period of our study. The data that are available from the aforementioned sources and the logic to construct responses and covariates are described below.

  • Annual average daily traffic (AADT) – the total volume of vehicle traffic of a highway or road for a year divided by 365 days. To measure AADT on individual road segments, traffic data is collected by an automated traffic counter, hiring an observer to record traffic or licensing estimated counts from GPS data providers. AADT is a simple, but useful, measurement to indicate busyness of a road.

  • Traffic speed – calculated using time-mean-speed method based on the individual speed records for vehicles passing a point over a selected time period. Speed is also a fundamental measurement in transport engineering and used for maintaining a designated level of service.

  • Total Cycle Collisions (TCS) – total number of injured cycle collisions based on police records from the STATS19 accident reporting form and collected by the UK Department for Transport. The location of an accident is recorded using coordinates which are in accordance with the British National Grid coordinate system. The CS routes were intended to reduce the risk of accidents for cyclists and the route allocation is possibly influenced by TCS. It is expected that the accident rates will affect the traffic characteristics.

  • Bus-stop density – the ratio of the number of bus-stops to the road length. The presence of bus-stops is expected to affect the traffic flow and speed due to frequent bus-stops and pedestrian activities. The allocation of CS routes were designed to avoid areas with high bus-stop density for safety of the cyclists.

  • Road network density – with the available geographical information system we could also represent the road network density in each zone by using a measure of the number of network nodes per unit of area. A network node is defined as the meeting point of two or more links. To safeguard from conflicting turning movements the CS paths are routed through the areas with high road network density.

  • Road length – high capacity networks tend to depress land values which in turn will influence the socio-economic profile of the people who live close together. Data for road length for each zone was generated using geographical information system software.

  • Road type – a binary variable where ‘1’ represents dual-carriageway and ‘0’ represents single-carriage. This is an important feature since we might expect traffic congestion in single-carriage roads.

  • Density of domestic buildings – this is a potentially useful feature since we might expect congestion to be associated with the nature of land use and the degree of urbanization. Also, the allocation of the CS paths are possibly influenced by land use characteristics.

  • Density of non-domestic buildings – rising housing costs in business and office districts force people to live further away, lengthening commutes, and affecting traffic flow and speed. As mentioned before, this feature may influence allocation of the CS paths.

  • Road area density – the ratio of the area of the zone’s total road network to the land area of the zone. The road network includes all roads in the including motorways, highways, main or national roads, secondary or regional roads. It is expected that the traffic flow is associated with road density.

  • Employment density – traffic generation potential depends on economic activity and we proxy this by employment density. High employment density tends to influence pedestrian activity which in turn affects traffic speed. The CS paths are designed to provide coverage in the areas with high employment density and encourage commuters to use cycling as a regular mode of transport.

  • Time – in a longitudinal study, time itself may be a confounder because government policies or other interventions could simultaneously affect AADT and speed. Fuel taxation or motoring policies, for instance, provide relevant examples.

First, we analyse the causal effect of CS on AADT considering the sources of confounding as mentioned in (b)-(l). We consider a logistic regression model to estimate the PS based on pre-intervention measurements. We used backward elimination following Bang and Robins (2005) for covariate selection. At each step, the covariate with the largest p-value was dropped one at a time until all covariates are significant with a cut-point of p-value = 0.10. In this process, the final PS model include the following factors: TCS, speed, bus-stop density, road network density, road length, density of domestic buildings, density of non-domestic buildings, road area density, and employment density. To test our PS specification we check for balancing. In a similar manner to Flores et al. (2012) and Graham et al. (2016), we regress the exposure using a logit model on the covariates, and the estimated PS up to a cubic term. The AIC values obtained from the logit models with and without covariates are 280.47 and 262.57, respectively. This result suggests that the balancing property has been achieved for our PS specification as the inclusion of covariates leads to a deterioration in model adequacy.

To model AADT we consider a Gaussian GLMM and found that AIC values support identity rather than log link function. The GLMM model with gamma family encounters convergence issues. A completely analogous algorithm is used for variable selection and the final model included exposure along with traffic speed, bus-stop density, road network density, road length, road type, density of domestic buildings, density of non-domestic buildings, Road area density, and time. Similarly, we built a DR model using the approach described in Subsection 3.3. For this augmented GLMM model we consider four dummy variables based on equally distributed classification of the estimated PS. The models are estimated by maximum likelihood using the lme4 package in R. Some elements of are significantly different from zero, which indicates some deficiency in the GLMM model. We find similar indications based on the diagnostic tests conducted using the approach of Robins and Rotnitzky (2001). However, we find no evidence of misspecification for the PS model.

The estimated ATE of CS on AADT relative to the average AADT in the pre-intervention period are presented in Table 3

. The standard errors (SE) and the corresponding 95% confidence intervals (CI) for all the estimates, except

, are obtained from 10000 parametric bootstrap samples. The same for are computed using the nonparametric bootstrap. The results indicate a reduction in traffic flow compared to the pre-intervention period. The 95% bootstrap CI exclude , suggesting a significant effect of CS on AADT. The results from different estimators varies between % to % but the DID method provides results similar to the GLMM and DR methods. The SE of is higher than those of and , which is also reflected in the width of the confidence intervals. Interestingly, the bootstrap mean of all the estimators are similar.

Estimate (%) Mean SE CI
-7.306 -11.458 7.201 (-29.068, -3.801)
-11.856 -11.146 1.562 (-14.157, -8.077)
-11.929 -11.511 1.530 (-14.553, -8.554)
-12.741 -12.692 1.888 (-16.455, -9.065)
Table 3: Effect of CS on AADT relative to the average AADT in the pre-intervention period

Next, we perform a similar analysis to estimate the ATE of CS on traffic speed considering the sources of confounding as mentioned in (a) and (c)-(l). The final PS model includes the following factors: AADT, TCS, bus-stop density, road network density, density of domestic buildings, density of non-domestic buildings, and employment density. As mentioned before, we test our PS specification by regressing the exposure using a logit model on the covariates, and the estimated PS up to a cubic term. The AIC values obtained from the logit models with and without covariates are 252.03 and 240.46, respectively. This result indicate that the balancing property has been achieved for our PS specification. We model speed using a Gaussian GLMM with identity link function and the final model included exposure, AADT, TCS, bus-stop density, road length, road type, density of domestic buildings, density of non-domestic buildings, employment, and time. Here also we find some deficiency in the GLMM model based on diagnostic tests.

The estimated ATE of CS on traffic speed relative to the average speed in the pre-intervention period are presented in Table 4. It can be seen that the estimates of the change in traffic speed based on PS and DID are insignificant. A marginal increase in traffic speed is indicated by GLMM based methods. It is not unexpected that we do not observe changes in traffic speed, there are many factors associated with the transport network and other interventions can play a crucial role in mitigating potential traffic problems anticipated by the introduction of cycle lanes. Important factors contributing to the levels of congestion are traffic speed and flow. Considering both the analyses presented here, it can be inferred that Cycle Superhighways are an effective intervention that can potentially alleviate traffic congestion in London.

Estimate Mean SD CI
-2.647 -1.134 3.445 (-5.014 6.971)
2.950 3.501 1.114 (1.289, 5.661)
2.807 3.201 1.098 (1.184, 5.490)
0.539 0.518 1.186 (-1.658, 2.965)
Table 4: Effect of CS on speed relative to the average speed in the pre-intervention period

6 Concluding Remarks

This paper has presented a statistical framework that can be used to derive inference for causal quantities based on pre-intervention and post-intervention data motivated from a case study on the London transport network. However, the scope of the proposed methods go far beyond this particular application. The key methodological insight is that extending the traditional OR model within a GLMM set-up, which is able to represent both time-varying and time-invariant confounding and accounts for the serial correlations in the data. The Inverse propensity weighted difference in difference estimate is an attractive alternative because it avoids any parametric assumptions associated with the form of the regression and/or link function which is essential in the GLMM approach. Our results suggest that the introduction of Cycle Superhighways can reduce traffic flow, but we find marginal improvement in traffic speed. Providing evidence that Cycle Superhighways can be an effective intervention in metropolitan cities like London, which are heavily affected by congestion.

In recent years, London’s air quality has improved as a result of policies to reduce emissions, primarily from road transport, although significant areas still exceed NO2 EU limits. The Cycle Superhighways are one of the several interventions introduced which may results in an improvement in air quality. However, the effect of Cycle Superhighways on air quality is debatable, and is an interesting research problem that could be studied under the same causal analysis set-up outlined here.


The authors would like to acknowledge the Lloyd’s Register Foundation for funding this research through the programme on Data-Centric Engineering at the Alan Turing Institute.



The inverse propensity weighted difference-in-difference estimator is a consistent estimator of when the propensity score model is correctly specified, and .


Following Lunceford and Davidian (2004), we consider the following estimating equations




The maximum likelihood estimate , , and satisfies equation 4-6, for .

Note that , and for . Now we can write

where is the ture value of . Similarly, we have for . Therefore,


where and for . Note that by assumption. Now using consistency of the M-estimates (Huber, 1967; Serfling, 1980, p-249), we have


  • Abad et al. (2010) Abad, A. A., Litire, S., and Molenberghs, G. (2010). Testing for misspecification in generalized linear mixed models. Biostatistics, 11:771–786.
  • Abadie (2005) Abadie, A. (2005). Semiparametric difference-in-differences estimators. The Review of Economic Studies, 72(1):1–19.
  • Austin (2011) Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3):399–424.
  • Badoe and Miller (2000) Badoe, D. A. and Miller, E. J. (2000). Transportation-land-use interaction: empirical findings in north america, and their implications for modeling. Transportation Research Part D: Transport and Environment, 5(4):235–263.
  • Bang and Robins (2005) Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61:962–972.
  • Bertrand and Mullainathan (2004) Bertrand, M., E. D. and Mullainathan, S. (2004). How much should we trust difference-in-differences estimates? Quarterly Journal of Economics, 119(1):249–275.
  • Blunden (2016) Blunden, M. (2016). Cycle superhighways make traffic worse in the city, report reveals. EveningStandard, Oct 5.
  • Branas et al. (2011) Branas, C. C., Cheney, R. A., MacDonald, J. M., Tam, V. W., Jackson, T. D., and Ten Have, T. R. (2011). A difference-in-differences analysis of health, safety, and greening vacant urban space. American Journal of Epidemiology, 174(11):1296–1306.
  • Card and Krueger (1994) Card, D. and Krueger, A. B. (1994). Minimum wages and employment: A case study of the fast food industry in new jersey and pennsylvania. American Economic Review, 84(4):772–93.
  • Chen et al. (2001) Chen, C., Jia, Z., and Varaiya, P. (2001). Causes and cures of highway congestion. IEEE Control Systems Magazine, 21(6):26–32.
  • Flores et al. (2012) Flores, C. A., Flores-Lagunes, A., Gonzalez, A., and Neumann, T. C. (2012). Estimating the effects of length of exposure to instruction in a training program: The case of job corps. The Review of Economics and Statistics, 94(1):153–171.
  • George (1970) George, K. A. (1970). Transportation compatible land uses and bus-stop location. Transactions on The Built Environment, 44.
  • Graham et al. (2016) Graham, D. J., McCoy, E. J., and Stephens, D. A. (2016).

    Approximate bayesian inference for doubly robust estimation.

    Bayesian Analysis, 1(1):47–69.
  • Harman et al. (2011) Harman, J. S., Lemak, C. H., Al-Amin, M., Hall, A. G., and Duncan, R. P. (2011). Changes in per member per month expenditures after implementation of florida’s medicaid reform demonstration. Health Services Research, 43(3):787–804.
  • Heckman et al. (1998) Heckman, J., Ichimura, H., Smith, J., and Todd, P. (1998). Characterizing selection bias using experimental data. Econometrica, 66(5):1017–1098.
  • Hernán and Robins (2020) Hernán, M. A. and Robins, J. (2020). Causal Inference: What If. Chapman and Hall/CRC.
  • Horvitz and Thompson (1952) Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47:663–685.
  • Huang (2011) Huang, X. (2011). Detecting random-effects model misspecification via coarsened data. Computational Statistics and Data Analysis, 55:703–714.
  • Huber (1967) Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the 5th Berkeley Symposium, 1(373):221–233.
  • Jin and Rafferty (2017) Jin, J. and Rafferty, P. (2017). Does congestion negatively affect income growth and employment growth? empirical evidence from us metropolitan regions. Transport Policy, 55:1–8.
  • Kang and Schafer (2007) Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical, 22(4):523–539.
  • Lunceford and Davidian (2004) Lunceford, J. K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine, 23:2937–2960.
  • Norman (2017) Norman, W. (2017). Bike lanes don’t clog up our roads, they keep london moving. The Gaurdian, Dec 1.
  • Retallack and Ostendorf (2019) Retallack, A. E. and Ostendorf, B. (2019). Current understanding of the effects of congestion on traffic accidents. International Journal of Environmental Research and Public Health, 16(18).
  • Robins and Rotnitzky (2001) Robins, J. M. and Rotnitzky, A. (2001). A comment on “inference for semiparametric models: some questions and an answer”. Statistica Sinica, 11:920–936.
  • Rosenbaum and Rubin (1983) Rosenbaum, P. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effec. Biometrika, 40:41–55.
  • Scharfstein et al. (1999) Scharfstein, D. O., Rotnitzky, A., and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of American Statistical Association, 94:1096–1120.
  • Serfling (1980) Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. John Wiley & Sons.
  • Skrondal and Rabe-Hesketh (2009) Skrondal, A. and Rabe-Hesketh, S. (2009). Prediction in multilevel generalized linear models. Journal of Royal Statistical Society, Series A, 172(3):659–687.
  • Slawson (2017) Slawson, N. (2017). Traffic jams on major uk roads cost economy around £9bn. The Gaurdian, Oct 18.
  • The Department of Transport (2018) The Department of Transport (2018). Traffic statistics-methodology review-alternative data sources.
  • Transport for London (2010) Transport for London (2010). Cycling revolution london.
  • Transport for London (2011) Transport for London (2011). Barclays cycle superhighways evaluation of pilot routes 3 and 7.
  • Transport for London (2014) Transport for London (2014). Number of daily cycle journeys in london.
  • Wharam et al. (2007) Wharam, J. F., Landon, B. E., Galbraith, A. A., Kleinman, K. P., Soumerai, S. B., and Ross-Degnan, D. (2007). Emergency department use and subsequent hospitalizations among members of a high-deductible health plan. JAMA, 297(10):1093–102.
  • Yuan et al. (2015) Yuan, K., Knoop, V. L., and Hoogendoorn, S. P. (2015). Capacity drop: Relationship between speed in congestion and the queue discharge rate. Transportation Research Record, 2491(1):72–80.
  • Zhang et al. (2017) Zhang, K., Sun, D. J., Shen, S., and Zhu, Y. (2017). Analyzing spatiotemporal congestion pattern on urban roads based on taxi gps data. Journal of Transport and Land Use, 10(1):675–694.