1 Introduction
The transport network acts as a lifeline for metropolitan cities across the world. Intelligent transportation systems can revolutionize traffic management and results in significant improvements in people’s mobility. They can offer an integrated approach to infrastructure development and trafficmobility management. The absence of well functioning commuting channels can have a strongly negative impact on those residing in urban areas. In the last couple of decades, metropolitan areas in both developed and developing countries, have been affected by increasing traffic congestion and several other problems such as poor air quality from pollution. Most air pollution in cities can be attributed to road transport and domestic and commercial heating systems. In addition to the negative impacts on mobility and air quality, previous studies indicate that severe congestion has a negative impact on GDP and an efficient transport system significantly improves the city’s economic competitiveness (Slawson, 2017; Jin and Rafferty, 2017).
Effective design and management of the transport network has a significant impact on the quality of life in smart cities. In general, network interventions (i.e. treatment) are a widely used measure to control highconsequence events worldwide. But often such interventions are made in the absence of statistical evidence on the resulting outcomes. Consequently, it is common to find situations in which interventions fail to deliver their intended consequences and in which transport networks perform poorly in relation to traffic flow, speed, capacity utilisation, safety, and economic and environmental impacts. Such interventions often have unintended negative consequences. Due to the complex nature of transport networks it is difficult to disentangle drivers of good performance and identify the factors underpinning network failure. Furthermore, it is not easy to quantify how interventions impact on system performance because transport interventions are typically targeted to address specific network problems, and are therefore nonrandomly assigned. The key consequence of this nonrandom treatment assignment is the possibility that the effect of the treatment is ‘confounded’ if the treated and control units differ systematically with respect to several characteristics which may affect the outcome of interest.
In recent years, cycling has been promoted as a healthy and sustainable mode of transport, with the additional benefits of reducing traffic congestion; frequency of road accidents; and emissions from vehicle exhausts. Schemes to promote cycling have been deemed effective and the number of cyclists has increased rapidly in major European cities, including Copenhagen, Amsterdam and London. Recent reports suggest a increase in daily journeys in Greater London over a period of ten years from 2004 to 2014 (Transport for London, 2014). The Mayor of London target to achieve 400% increase in cycling by 2026 and several policy decisions to facilitate cycling including the Cycle Superhighways (CS), Santander Cycles and Biking Boroughs have already been implemented (Transport for London, 2010). The CS are 1.5 meter wide barrierfree cyclingpaths designed to connect outer London to central London. The blue surfacing on CS distinguishes them from the existing cyclepaths in London (See Figure 1). The CS routes are designed to provide adequate spatial capacity for existing cyclists and potential future commuters who adopt cycling as a mode of transport. With the aim of enabling faster and safer cycle journeys, the twelve Cycle Superhighways, were announced in 2008. As displayed in Figure 2, these routes were designed to radiate from the city center based on the clock face layout. In July 2010, the first two pilot routes, CS3 and CS7, were inaugurated. As reported by Transport for London (2011), in the first year, cycling has increased by along CS3 and along CS7. A new EastWest route was introduced to replace CS10, while CS6 and CS12 have been cancelled. As of the end of 2015, only four routes are in operation, namely CS2 (Stratford to Aldgate); CS3 (Barking to Tower Gateway); CS7 (Merton to the City); and CS8 (Wandsworth to Westminster). Due to the lack of adequate data, the effects of CS on traffic congestion are not evaluated in the report by Transport for London (2011). Since their introduction, there has been considerable debate about the effects of CS on road traffic congestion (Norman, 2017; Blunden, 2016). The quantification of the effects of CS on traffic congestion is a complex problem due to the intricate nature of the transportnetwork, and various traffic and socioeconomic characteristics may act as confounders. For example, urban areas with high population density and narrow roads are expected to have more congestion compared to the outskirts. Similarly, the number of public road transport stops will have an effect on pedestrian activities resulting in changes in traffic congestion.
Congestion arises when the volume of traffic increases to a level that causes traffic speeds to fall below the freeflow (i.e. maximum) speed. There is a direct relationship between flow and traffic speed, both of which have an impact on the levels of congestion (Retallack and Ostendorf, 2019; Yuan et al., 2015)
. In order to investigate the impact of the introduction of Cycle Superhighways on congestion in London, we present a causal analysis of the links between traffic flow and speed and Cycle Superhighways. Our objective is to provide a modeling framework to obtain robust estimates of the causal quantities addressing issues of confounding and longitudinal dependence based on preintervention and postintervention data. Using generalized linear mixed models (GLMMs), we incorporate timeinvariant and timevarying covariates to adjust for potential sources of confounding and bias from longitudinal dependence by using random effect parameters. We also propose a novel estimator to deal with unknown interactions between time and covariates. The proposed methods are extended to a doubly robust setup. First, we describes the existing models and methods used in the traditional potential outcome framework in Section
2. In Section 3, we propose new methods motivated by reallife data and explain the advantages compared to the existing methods available in the literature. Simulation studies are performed for assessing the effectiveness of the proposed methods and the results are summarized in Section 4. In Section 5, results from real data analysis on London Cycle Superhighways are discussed. We summarise the key findings and conclude with some discussion on future research in Section 6.2 Potential Outcome Framework
In traditional causal inference problems, the primary interest is to estimate the average treatment effect (ATE) based on available data as realisations of a random vector,
, , where denotes a response, the exposure (or treatment status), and a vector of confounders or covariates for the ith unit. The treatment can be binary, multivalued or continuous but essentially it is not assigned randomly. In this setup, the simple comparisons of mean responses across different treatment groups lead to biased estimates and may not reveal a ‘causal’ effect of the treatment due to potential confounding. Confounding can be addressed if the vector of covariates is sufficient to ensure unconfoundedness, or conditional independence of potential responses and treatment assignment. In the context of binary treatments, the conditional independence assumption requires that , where is the indicator function for receiving the treatment and and indicate potential outcomes under treated or control status, respectively. An additional requirement for valid causal inference is that, conditional on covariates, the probability of assignment to treatment is strictly positive for every combination of values of the covariates, i.e.
for all . See Rosenbaum and Rubin (1983) for more details. The main interest is to estimate the average treatment effect , which measures the difference in expected outcomes under treatment and control status.Several estimators for the ATE are studied in the literature under the assumption that the covariate vector is sufficient to ensure the independence of potential responses and treatment assignment, see for example, Hernán and Robins (2020). Three estimators are of particular interest here. First, we model the expected response given the covariates and treatment, , using an outcome regression (OR) model , for known link function , regression function , and unknown parameter vector . If the OR model is correctly specified, then a consistent estimate of the ATE is given by
where is iteratively reweighted least squares estimate (Bang and Robins, 2005). Second, we could assume a model for , the conditional density of the treatment given the covariates and use this model to estimate propensity scores (PS), which we denote with parameter vector . A PS weighted estimator of the form attributed to Horvitz and Thompson (1952) is then given by
which can be used to estimate the ATE consistently. Finally, we could combine an OR and PS model and construct an estimate for the ATE. This is known as a doubly robust (DR) method in the sense that the resulting estimator is asymptotically consistent if either the OR or PS model is correctly specified. See Kang and Schafer (2007) for more details.
It is important to note that the aforementioned estimators can only incorporate postintervention observations on the response variable. In our context, preintervention measurements are also available, which potentially contain important information on the causal quantities of interest. One can still use the existing methods for estimation of ATEs discarding the preintervention data, but the estimates are possibly less efficient compared to any estimator suitable for incorporating the entire dataset (See Section
4 later). In the next section, we attempt to address this issue and propose new estimator for estimation of ATEs based on both preintervention and postintervention measurements.3 Proposed Methods
In our context, the response variable is observed both before and after the intervention and it depends on timeindependent as well as timedependent covariates. The available data can be represented in a longitudinal structure as , , , where and denote preintervention and postinterventionperiod, respectively. Note that for all , and contains both timeindependent and timedependent covariates. The policy decisions are formulated based on preintervention condition, that is the treatment status is only dependent on the . We assume for all , and , where and are potential outcomes under treated and control status, respectively, at time . In this setup, our main interest is to estimate the average treatment effect , which measures the difference in expected outcomes under treatment and control status in the postintervention period.
In the literature of Econometrics, a classical approach is to use the differenceindifference (DID) method to adjust the effect of confounding based on preintervention and postintervention data (Heckman et al., 1998). This method is popularly used to estimate the causal effect of policy decisions in the field of health economics (Harman et al., 2011; Wharam et al., 2007), epidemiology (Branas et al., 2011), market economics (Card and Krueger, 1994), and various other allied fields. This method is easy to compute and intuitively appealing, however, it assumes the ‘parallel trend’ assumption holds, that its, that the difference between the ‘treatment’ and ‘control’ group is constant over time in the absence of treatment. Under the parallel trend assumption, the DID estimator is given by
The parallel trend assumption may be implausible if pretreatment characteristics that are thought to be associated with the response variable are unbalanced between the treated and the control group (Abadie, 2005)
. To address this issue, covariate adjusted DID estimators are proposed in the literature. However, almost all these method ignore the bias in the estimated standard errors due to the serially correlated response variable
(Bertrand and Mullainathan, 2004). Moreover, the presence of interactions between time and covariates may result in biased estimates. In the following subsections, we propose novel modeling approaches to address these key estimation issues.3.1 A Generalized Mixed Model Approach
In this section we develop an estimator of the ATE which can account for the situation where the response variable depends on both timeindependent and timedependent covariates as well as exposure and, additionally, where the preintervention and postintervention responses of the same unit may be serially correlated. To estimate the ATE incorporating these features, we consider the following GLLM model:
(1) 
where is some known link function, and is the fixed effects with time effect , treatment effect , parameter vector corresponding to the design vector , and is the design vector for the random effects with being a positive definite covariance matrix. For the linear predictor , the conditional expectation is
and the conditional variance is var
, where is the variance function and is a dispersion parameter.For our case study, the relationship between the response and treatment is likely to be confounded in the sense that both the response (i.e. traffic flow or speed) and the treatment (i.e. cycle superhighways) could depend on a set of preintervention characteristics. Moreover, several other factors in the postintervention period could result in a spurious association between response and treatment. To address these issues, we include both timeindependent and timedependent covariates within the design matrix . Also, the proposed model relaxes the ‘parallel trend’ assumption and accounts for linear or nonlinear time trend affecting the response. The response variable measured over different time points are serially correlated and the random effects account for the same and other sources of unobserved heterogeneity. It is important to note that the misspecification of the models associated with random effects may adversely affect the estimates of both random and fixed parameters. In real applications, normality of random effects is typically assumed for computational convenience, but it may lead to biased results. In this context, Huang (2011) and Abad et al. (2010) proposed useful diagnostics and discuss the consequences of misspecification.
The estimate of ATEs based on model (1) involves predictions for both the treatment statuses at the postintervention period. For nonlinear link functions, populationaveraged expectations cannot be obtained by simply plugging the mean of the random effects in the expression for conditional expectation (Skrondal and RabeHesketh, 2009). Using the doubleexpectation rule, the estimate of is obtained as
where is the density function of the random effects with estimated covariance matrix . The integral that is involved in the expression of must generally be evaluated numerically or by simulation.
3.2 Inverse Propensity Weighted DifferenceinDifference
Historically, researchers in various fields of application have used regression based methods to measure the differences between the treated and control group. More recently, PS based methods have become increasingly popular to eliminate the effects of confounding present in observational data. PS based models have several advantages, it is simpler to determine the adequacy of a PS model than to assess whether the regression model reasonably specifies the relationship between the exposure and covariates. Moreover, standard goodnessoffit tests fail to identify whether the fitted regression model has successfully accounted for the systematic differences between treated and control groups for the estimation of ATE (Austin, 2011). In our context, the response variable not only depends on the preintervention confounders but is also affected by various factors, possibly unmeasured, in the postintervention period. In particular, regression based method provide biased estimates in the presence of unknown interactions between time and covariates (See Section 4 later). In order to avoid the difficulties that arise in regression based approaches, we propose the following differenceindifference estimator based on an inverse propensity weighting:
where is the estimated PS based on preintervention covariate vector
. One can use machine learning techniques such as random forests, neural network etc. to estimate the PS but logistic regression is widely used and performs reasonably well
(Austin, 2011). The inverse propensity score differenceindifference estimate is consistent under the condition that the PS model is correctly specified, and the expected potential outcome under treatment and control status are equal in the preintervention period (i.e. ). The proof is outlined in the Appendix. In contrast to the conventional DID estimate, does not require the ‘parallel trend’ assumption for its consistency.3.3 Doubly Robust Method
We have proposed two different methods that adjust the parameter estimates in the case where covariates may be related both to the response and treatment assignment mechanism. In Subsection 3.1, we model the relationships between the covariates and the response and use those relationships to predict for both the treatment statuses and obtain estimate ATEs. Another approach, discussed in Subsection 3.2, is to model the probabilities of treatment assignment given the covariates and incorporate them into a weighted differenceindifference estimate. However, with observational data, one can never be sure that a model for the treatment assignment mechanism or an outcome regression model is correct, an alternative approach is to develop a doublyrobust (DR) estimator. Several DR estimation methodologies are proposed in the literature (Kang and Schafer, 2007). In this paper, we consider the augmented regression method and extend model (1) to a DR version as
(2) 
where is a suitably chosen parametric function and is the associated coefficient. Scharfstein et al. (1999) considered inverse PS as an additional covariate, i.e. , and Bang and Robins (2005) proposed the so called ‘clever covariate’ which is also a function of the inverse PS and treatment status. Kang and Schafer (2007) compared the performance of DR estimates with various choices of and found that these two choices perform very poorly when some of the estimated propensities are small. The best performance is achieved by choosing
as a stepwise constant function with discontinuities at the sample quantiles of
. In other words, one can simply coarse classify the PS into some suitable number of categories and create dummy indicators to augment the regression model. With this choice, the model (
2) can be rewritten as(3) 
where
is the vector of dummy variables to indicate the category based on the coarse classification of
and is the associated parameter vector. The DR estimate of based on model (3) is given byThe performance of may depend on the dimension of . In practice, it is observed that the performance of the DR estimate is satisfactory with no more that four dummy variables (Kang and Schafer, 2007). The estimate of converges to if the GLMM is correctly specified. If the PS model is correct, but the GLMM is not, the augmented regression has a bias correction property. As expected, the DR estimate does not necessarily translate into good performance when neither model is correctly specified. See Scharfstein et al. (1999) for details. Also, it is important to note that the DR estimate may produce biased results in the presence of unknown interactions between time and covariates.
4 Simulation Study
In order to compare the performance of the proposed methods and existing competitors, we generate data comprising and units. We generate three independent covariates , and , where follows a bivariate normal with mean vector and covariance matrix ;
follows an exponential distribution with mean
; andfollows a normal distribution with mean
and variance . The covariate varies with time while and are timeinvariant. We specify the following relationships between the covariates and the treatment and response :where random effect and error are generated from independent normal distributions with mean and variance . Thus, and are confounders and is a nonconfounding covarite. Approximately of the units are allocated treatment and the true ATE is 15 as a result of the aforementioned parameter choices. Under this setup, the following estimators are tested:

 correctly specified OR model based on postintervention measurements: .

 correctly specified GLMM model: .

 incorrectly specified GLMM model with erroneous exclusion of the timeinvariant confounder : .

 difference of the differences between average values of treated and control group for preintervention and postintervention periods.

 inverse propensity weighted estimate based on postintervention response (i.e. ) with correctly specified PS model:

 inverse propensity weighted difference and difference estimate based on correctly specified PS model:

 DR estimate based on correctly specified GLMM model but augmented with incorrectly specified PS covariates with erroneous exclusion of the timeinvariant confounder .

 DR estimate based on incorrectly specified GLMM with erroneous exclusion of the timeinvariant confounder but augmented with correctly specified PS covariates.

 DR estimate based on incorrectly specified GLMM and augmented with incorrectly specified PS covariates where timeinvariant confounder is excluded in both the models.
The mean estimates of ATE, variance(Var) and mean squared error (MSE) are presented in Table 1 based on 1000 replications. It is clearly seen that is biased as the effect of the treatment is counfounded due to the erroneous omission of the timeinvariant confounder . As expected, the performance of is the best with respect to MSE. The DR estimates and performs as well as , but the performance of is similar to that of with respect to both bias and variance. The naive estimator is marginally biased but possesses high variability. The bias of and are similar but the former exhibits high variability due to the omission of the preintervention data. It is also interesting to observe that the variance and MSE of is much less than those of for the same reason. This clearly demonstrates that the preintervention data contains important information and provides more precise estimates of the ATE under correct model specifications. We have also observed that the estimates of ATEs are biased in the absence of random effect parameters to account the serial correlations in the data.
To analyse the sensitivity of the estimators associated with unknown interaction between time and covariates, we now consider the following relationships between the covariates and the treatment and response :
where we have considered an interaction effect of time and timeinvariant confounder keeping all other specifications as in the previous case. The results are presented in Table 2. With the exception for the case of , it is clearly seen that all the estimators are biased due to the presence of an unknown interaction effect. In addition to bias, the variance of and is also very high. Interestingly, the correctly specified PS covariates fail to adjust the bias induced by interaction effect of time in the DR method, however, remains unaffected and performs the best.
Estimator  Average  Var  MSE  Average  Var  MSE  

14.979  0.474  0.474  15.013  0.212  0.212  
14.971  0.330  0.331  15.010  0.154  0.154  
15.203  0.396  0.437  15.235  0.194  0.249  
14.917  0.466  0.472  14.958  0.234  0.235  
14.980  0.875  0.875  14.997  0.351  0.351  
14.963  0.607  0.608  14.992  0.284  0.284  
14.973  0.334  0.335  15.010  0.155  0.155  
15.041  0.373  0.374  15.075  0.186  0.191  
15.201  0.397  0.437  15.234  0.194  0.249 
Estimator  Average  Var  MSE  Average  Var  MSE  

15.504  0.476  0.729  15.540  0.250  0.540  
15.918  0.787  1.630  15.964  0.405  1.335  
15.756  0.811  1.382  15.795  0.416  1.048  
14.979  0.515  0.515  14.994  0.248  0.248  
15.506  0.481  0.737  15.539  0.250  0.541  
15.660  0.597  1.031  15.696  0.305  0.789  
15.916  0.793  1.631  15.963  0.406  1.334 
5 Quantifying Causal Effects of Cycle Superhighways on Traffic Volume and Speed
As discussed in Section 1, we analyse the causal effect of CS on traffic congestion based on a dataset collected over the period 20072014. In this study, 75 treated zones and 375 control zones were selected randomly along the 40 km long main corridors radiating from central London to outer London.
In observational data the effect of CS is confounded due to various factors related to traffic dynamics, road characteristics, and sociodemographic conditions. The traffic data on the major road network as well as on the minor road network are collected by The Department for Transport (The Department of Transport, 2018). Also, additional information on traffic flow and speed are collected from the London Atmospheric Emissions Inventory (LAEI). It is observed that traffic congestion is associated with busstop density and road network density George (1970). An association between traffic congestion and sociodemographic characteristics, such as employment and landuse patterns, has also been indicated in previous studies (Badoe and Miller, 2000; Zhang et al., 2017; Chen et al., 2001). To incorporate these effects, we obtained relevant data on population and employment density, as well as the information of landuse patterns from the Office for National Statistics. The traffic characteristics are timevarying but the road characteristics, landuse patterns and employment density remain unchanged over the period of our study. The data that are available from the aforementioned sources and the logic to construct responses and covariates are described below.

Annual average daily traffic (AADT) – the total volume of vehicle traffic of a highway or road for a year divided by 365 days. To measure AADT on individual road segments, traffic data is collected by an automated traffic counter, hiring an observer to record traffic or licensing estimated counts from GPS data providers. AADT is a simple, but useful, measurement to indicate busyness of a road.

Traffic speed – calculated using timemeanspeed method based on the individual speed records for vehicles passing a point over a selected time period. Speed is also a fundamental measurement in transport engineering and used for maintaining a designated level of service.

Total Cycle Collisions (TCS) – total number of injured cycle collisions based on police records from the STATS19 accident reporting form and collected by the UK Department for Transport. The location of an accident is recorded using coordinates which are in accordance with the British National Grid coordinate system. The CS routes were intended to reduce the risk of accidents for cyclists and the route allocation is possibly influenced by TCS. It is expected that the accident rates will affect the traffic characteristics.

Busstop density – the ratio of the number of busstops to the road length. The presence of busstops is expected to affect the traffic flow and speed due to frequent busstops and pedestrian activities. The allocation of CS routes were designed to avoid areas with high busstop density for safety of the cyclists.

Road network density – with the available geographical information system we could also represent the road network density in each zone by using a measure of the number of network nodes per unit of area. A network node is defined as the meeting point of two or more links. To safeguard from conflicting turning movements the CS paths are routed through the areas with high road network density.

Road length – high capacity networks tend to depress land values which in turn will influence the socioeconomic profile of the people who live close together. Data for road length for each zone was generated using geographical information system software.

Road type – a binary variable where ‘1’ represents dualcarriageway and ‘0’ represents singlecarriage. This is an important feature since we might expect traffic congestion in singlecarriage roads.

Density of domestic buildings – this is a potentially useful feature since we might expect congestion to be associated with the nature of land use and the degree of urbanization. Also, the allocation of the CS paths are possibly influenced by land use characteristics.

Density of nondomestic buildings – rising housing costs in business and office districts force people to live further away, lengthening commutes, and affecting traffic flow and speed. As mentioned before, this feature may influence allocation of the CS paths.

Road area density – the ratio of the area of the zone’s total road network to the land area of the zone. The road network includes all roads in the including motorways, highways, main or national roads, secondary or regional roads. It is expected that the traffic flow is associated with road density.

Employment density – traffic generation potential depends on economic activity and we proxy this by employment density. High employment density tends to influence pedestrian activity which in turn affects traffic speed. The CS paths are designed to provide coverage in the areas with high employment density and encourage commuters to use cycling as a regular mode of transport.

Time – in a longitudinal study, time itself may be a confounder because government policies or other interventions could simultaneously affect AADT and speed. Fuel taxation or motoring policies, for instance, provide relevant examples.
First, we analyse the causal effect of CS on AADT considering the sources of confounding as mentioned in (b)(l). We consider a logistic regression model to estimate the PS based on preintervention measurements. We used backward elimination following Bang and Robins (2005) for covariate selection. At each step, the covariate with the largest pvalue was dropped one at a time until all covariates are significant with a cutpoint of pvalue = 0.10. In this process, the final PS model include the following factors: TCS, speed, busstop density, road network density, road length, density of domestic buildings, density of nondomestic buildings, road area density, and employment density. To test our PS specification we check for balancing. In a similar manner to Flores et al. (2012) and Graham et al. (2016), we regress the exposure using a logit model on the covariates, and the estimated PS up to a cubic term. The AIC values obtained from the logit models with and without covariates are 280.47 and 262.57, respectively. This result suggests that the balancing property has been achieved for our PS specification as the inclusion of covariates leads to a deterioration in model adequacy.
To model AADT we consider a Gaussian GLMM and found that AIC values support identity rather than log link function. The GLMM model with gamma family encounters convergence issues. A completely analogous algorithm is used for variable selection and the final model included exposure along with traffic speed, busstop density, road network density, road length, road type, density of domestic buildings, density of nondomestic buildings, Road area density, and time. Similarly, we built a DR model using the approach described in Subsection 3.3. For this augmented GLMM model we consider four dummy variables based on equally distributed classification of the estimated PS. The models are estimated by maximum likelihood using the lme4 package in R. Some elements of are significantly different from zero, which indicates some deficiency in the GLMM model. We find similar indications based on the diagnostic tests conducted using the approach of Robins and Rotnitzky (2001). However, we find no evidence of misspecification for the PS model.
The estimated ATE of CS on AADT relative to the average AADT in the preintervention period are presented in Table 3
. The standard errors (SE) and the corresponding 95% confidence intervals (CI) for all the estimates, except
, are obtained from 10000 parametric bootstrap samples. The same for are computed using the nonparametric bootstrap. The results indicate a reduction in traffic flow compared to the preintervention period. The 95% bootstrap CI exclude , suggesting a significant effect of CS on AADT. The results from different estimators varies between % to % but the DID method provides results similar to the GLMM and DR methods. The SE of is higher than those of and , which is also reflected in the width of the confidence intervals. Interestingly, the bootstrap mean of all the estimators are similar.Estimate (%)  Mean  SE  CI  

7.306  11.458  7.201  (29.068, 3.801)  
11.856  11.146  1.562  (14.157, 8.077)  
11.929  11.511  1.530  (14.553, 8.554)  
12.741  12.692  1.888  (16.455, 9.065) 
Next, we perform a similar analysis to estimate the ATE of CS on traffic speed considering the sources of confounding as mentioned in (a) and (c)(l). The final PS model includes the following factors: AADT, TCS, busstop density, road network density, density of domestic buildings, density of nondomestic buildings, and employment density. As mentioned before, we test our PS specification by regressing the exposure using a logit model on the covariates, and the estimated PS up to a cubic term. The AIC values obtained from the logit models with and without covariates are 252.03 and 240.46, respectively. This result indicate that the balancing property has been achieved for our PS specification. We model speed using a Gaussian GLMM with identity link function and the final model included exposure, AADT, TCS, busstop density, road length, road type, density of domestic buildings, density of nondomestic buildings, employment, and time. Here also we find some deficiency in the GLMM model based on diagnostic tests.
The estimated ATE of CS on traffic speed relative to the average speed in the preintervention period are presented in Table 4. It can be seen that the estimates of the change in traffic speed based on PS and DID are insignificant. A marginal increase in traffic speed is indicated by GLMM based methods. It is not unexpected that we do not observe changes in traffic speed, there are many factors associated with the transport network and other interventions can play a crucial role in mitigating potential traffic problems anticipated by the introduction of cycle lanes. Important factors contributing to the levels of congestion are traffic speed and flow. Considering both the analyses presented here, it can be inferred that Cycle Superhighways are an effective intervention that can potentially alleviate traffic congestion in London.
Estimate  Mean  SD  CI  

2.647  1.134  3.445  (5.014 6.971)  
2.950  3.501  1.114  (1.289, 5.661)  
2.807  3.201  1.098  (1.184, 5.490)  
0.539  0.518  1.186  (1.658, 2.965) 
6 Concluding Remarks
This paper has presented a statistical framework that can be used to derive inference for causal quantities based on preintervention and postintervention data motivated from a case study on the London transport network. However, the scope of the proposed methods go far beyond this particular application. The key methodological insight is that extending the traditional OR model within a GLMM setup, which is able to represent both timevarying and timeinvariant confounding and accounts for the serial correlations in the data. The Inverse propensity weighted difference in difference estimate is an attractive alternative because it avoids any parametric assumptions associated with the form of the regression and/or link function which is essential in the GLMM approach. Our results suggest that the introduction of Cycle Superhighways can reduce traffic flow, but we find marginal improvement in traffic speed. Providing evidence that Cycle Superhighways can be an effective intervention in metropolitan cities like London, which are heavily affected by congestion.
In recent years, London’s air quality has improved as a result of policies to reduce emissions, primarily from road transport, although significant areas still exceed NO2 EU limits. The Cycle Superhighways are one of the several interventions introduced which may results in an improvement in air quality. However, the effect of Cycle Superhighways on air quality is debatable, and is an interesting research problem that could be studied under the same causal analysis setup outlined here.
Acknowledgement
The authors would like to acknowledge the Lloyd’s Register Foundation for funding this research through the programme on DataCentric Engineering at the Alan Turing Institute.
Appendix
Theorem.
The inverse propensity weighted differenceindifference estimator is a consistent estimator of when the propensity score model is correctly specified, and .
Proof.
Following Lunceford and Davidian (2004), we consider the following estimating equations
Note that , and for . Now we can write
where is the ture value of . Similarly, we have for . Therefore,
and
where and for . Note that by assumption. Now using consistency of the Mestimates (Huber, 1967; Serfling, 1980, p249), we have
∎
References
 Abad et al. (2010) Abad, A. A., Litire, S., and Molenberghs, G. (2010). Testing for misspecification in generalized linear mixed models. Biostatistics, 11:771–786.
 Abadie (2005) Abadie, A. (2005). Semiparametric differenceindifferences estimators. The Review of Economic Studies, 72(1):1–19.
 Austin (2011) Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3):399–424.
 Badoe and Miller (2000) Badoe, D. A. and Miller, E. J. (2000). Transportationlanduse interaction: empirical findings in north america, and their implications for modeling. Transportation Research Part D: Transport and Environment, 5(4):235–263.
 Bang and Robins (2005) Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61:962–972.
 Bertrand and Mullainathan (2004) Bertrand, M., E. D. and Mullainathan, S. (2004). How much should we trust differenceindifferences estimates? Quarterly Journal of Economics, 119(1):249–275.
 Blunden (2016) Blunden, M. (2016). Cycle superhighways make traffic worse in the city, report reveals. EveningStandard, Oct 5.
 Branas et al. (2011) Branas, C. C., Cheney, R. A., MacDonald, J. M., Tam, V. W., Jackson, T. D., and Ten Have, T. R. (2011). A differenceindifferences analysis of health, safety, and greening vacant urban space. American Journal of Epidemiology, 174(11):1296–1306.
 Card and Krueger (1994) Card, D. and Krueger, A. B. (1994). Minimum wages and employment: A case study of the fast food industry in new jersey and pennsylvania. American Economic Review, 84(4):772–93.
 Chen et al. (2001) Chen, C., Jia, Z., and Varaiya, P. (2001). Causes and cures of highway congestion. IEEE Control Systems Magazine, 21(6):26–32.
 Flores et al. (2012) Flores, C. A., FloresLagunes, A., Gonzalez, A., and Neumann, T. C. (2012). Estimating the effects of length of exposure to instruction in a training program: The case of job corps. The Review of Economics and Statistics, 94(1):153–171.
 George (1970) George, K. A. (1970). Transportation compatible land uses and busstop location. Transactions on The Built Environment, 44.

Graham et al. (2016)
Graham, D. J., McCoy, E. J., and Stephens, D. A. (2016).
Approximate bayesian inference for doubly robust estimation.
Bayesian Analysis, 1(1):47–69.  Harman et al. (2011) Harman, J. S., Lemak, C. H., AlAmin, M., Hall, A. G., and Duncan, R. P. (2011). Changes in per member per month expenditures after implementation of florida’s medicaid reform demonstration. Health Services Research, 43(3):787–804.
 Heckman et al. (1998) Heckman, J., Ichimura, H., Smith, J., and Todd, P. (1998). Characterizing selection bias using experimental data. Econometrica, 66(5):1017–1098.
 Hernán and Robins (2020) Hernán, M. A. and Robins, J. (2020). Causal Inference: What If. Chapman and Hall/CRC.
 Horvitz and Thompson (1952) Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47:663–685.
 Huang (2011) Huang, X. (2011). Detecting randomeffects model misspecification via coarsened data. Computational Statistics and Data Analysis, 55:703–714.
 Huber (1967) Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the 5th Berkeley Symposium, 1(373):221–233.
 Jin and Rafferty (2017) Jin, J. and Rafferty, P. (2017). Does congestion negatively affect income growth and employment growth? empirical evidence from us metropolitan regions. Transport Policy, 55:1–8.
 Kang and Schafer (2007) Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical, 22(4):523–539.
 Lunceford and Davidian (2004) Lunceford, J. K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine, 23:2937–2960.
 Norman (2017) Norman, W. (2017). Bike lanes don’t clog up our roads, they keep london moving. The Gaurdian, Dec 1.
 Retallack and Ostendorf (2019) Retallack, A. E. and Ostendorf, B. (2019). Current understanding of the effects of congestion on traffic accidents. International Journal of Environmental Research and Public Health, 16(18).
 Robins and Rotnitzky (2001) Robins, J. M. and Rotnitzky, A. (2001). A comment on “inference for semiparametric models: some questions and an answer”. Statistica Sinica, 11:920–936.
 Rosenbaum and Rubin (1983) Rosenbaum, P. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effec. Biometrika, 40:41–55.
 Scharfstein et al. (1999) Scharfstein, D. O., Rotnitzky, A., and Robins, J. M. (1999). Adjusting for nonignorable dropout using semiparametric nonresponse models. Journal of American Statistical Association, 94:1096–1120.
 Serfling (1980) Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. John Wiley & Sons.
 Skrondal and RabeHesketh (2009) Skrondal, A. and RabeHesketh, S. (2009). Prediction in multilevel generalized linear models. Journal of Royal Statistical Society, Series A, 172(3):659–687.
 Slawson (2017) Slawson, N. (2017). Traffic jams on major uk roads cost economy around £9bn. The Gaurdian, Oct 18.
 The Department of Transport (2018) The Department of Transport (2018). Traffic statisticsmethodology reviewalternative data sources.
 Transport for London (2010) Transport for London (2010). Cycling revolution london.
 Transport for London (2011) Transport for London (2011). Barclays cycle superhighways evaluation of pilot routes 3 and 7.
 Transport for London (2014) Transport for London (2014). Number of daily cycle journeys in london.
 Wharam et al. (2007) Wharam, J. F., Landon, B. E., Galbraith, A. A., Kleinman, K. P., Soumerai, S. B., and RossDegnan, D. (2007). Emergency department use and subsequent hospitalizations among members of a highdeductible health plan. JAMA, 297(10):1093–102.
 Yuan et al. (2015) Yuan, K., Knoop, V. L., and Hoogendoorn, S. P. (2015). Capacity drop: Relationship between speed in congestion and the queue discharge rate. Transportation Research Record, 2491(1):72–80.
 Zhang et al. (2017) Zhang, K., Sun, D. J., Shen, S., and Zhu, Y. (2017). Analyzing spatiotemporal congestion pattern on urban roads based on taxi gps data. Journal of Transport and Land Use, 10(1):675–694.