Log In Sign Up

MCMC for a hyperbolic Bayesian inverse problem in traffic flow modelling

by   Jeremie Coullon, et al.

As work on hyperbolic Bayesian inverse problems remains rare in the literature, we explore empirically the sampling challenges these offer which have to do with shock formation in the solution of the PDE. Furthermore, we provide a unified statistical model to estimate using motorway data both boundary conditions and fundamental diagram parameters in LWR, a well known motorway traffic flow model. This allows us to provide a traffic flow density estimation method that is shown to be superior to two methods found in the traffic flow literature. Finally, we highlight how Population Parallel Tempering - a modification of Parallel Tempering - is a scalable method that can increase the mixing speed of the sampler by a factor of 10.


page 2

page 11

page 13

page 17

page 19

page 21

page 22

page 23


Empirical analysis of the variability in the flow-density relationship for smart motorways

The fundamental diagram is an assumed functional relationship between tr...

Fundamental Diagram of Traffic Flow from Prigogine-Herman-Enskog Equation

Recent applications of a new methodology to measure fundamental traffic ...

Bayesian calibration of traffic flow fundamental diagrams using Gaussian processes

Modeling the relationship between vehicle speed and density on the road ...

Graph-based Prior and Forward Models for Inverse Problems on Manifolds with Boundaries

This paper develops manifold learning techniques for the numerical solut...

Anomaly detection and classification in traffic flow data from fluctuations in the flow-density relationship

We describe and validate a novel data-driven approach to the real time d...

New stochastic highway capacity estimation method and why product limit method is unsuitable

Kaplan-Meier estimate, commonly known as product limit method (PLM), and...

1 Introduction

Inverse problems - where one must typically fit differential equations to data - are an essential part of modelling in the sciences and engineering. It allows researchers to model complex systems to be able to understand and predict their behaviour. We can find applications in such varied fields as Geophysics [30], Hydrogeology [19], and fluid mechanics [8].

In this paper we consider the Bayesain framework as outlined in [31]

for partial differential equations. We consider observations

generated by an observation operator polluted with noise :


Starting from a prior belief on the parameter , the objective is to update this distribution based on observations represented by the likelihood.


With the prior mean of

. For nonlinear PDEs such as the one studied in this paper, the posterior is unavailable analytically, and so one must resort to numerical methods such as sampling methods (Markov Chain Monte Carlo, SMC) (see

[24] or [5]). Here we focus on MCMC.

A lot of work has been done for inverse problems involving elliptic and parabolic PDEs ([31], [19]), but little work has been done in the hyperbolic case.

We consider as application vehicular traffic flow on motorways. We show in figure (1) traffic density in space and time (a plot in the plane) for 49 minutes and 5 km of road.

In this plot time is denoted on the horizontal axis and space (distance on the road) on the vertical axis. Vehicles move from distance 0km to 5km (namely upwards in the plot) and forwards in time (namely to the right in the plot) so therefore move diagonally (upwards and to the right). We can see the vehicles’ movement in the first few minutes (approximately between minutes 381 and 405) where they take a few minutes to cover the 5km section of road. This is consistent with a vehicle speed of approximately 120km/h (ie 2km/min). This vehicle movement corresponds to the free flow waves. We also observe backward moving waves in the second half of the plot (from around minute 405 until the end) which are high density waves. These backwards moving waves correspond to the experience of needing to brake sharply when driving on motorways to avoid crashing into a traffic jam; congested flow waves move upstream in traffic.

Figure 1: Density estimated from occupancy for the section of M25 on the 8th January 2007 between 6:21am and 7:09am. We observe forward moving free flow waves between minutes 381 and 405 which correspond to the movement of vehicles. We also observe backwards moving high density waves in the second half of the plane.

Many approaches have been considered in the traffic flow literature to model such a system: systems of ODEs ([3], [13]), SDEs ([25]), or PDEs ([23], [36], [2]). We focus on the most well known model, the Lighthill-Whitham-Richards (LWR) model, which is a conservation law with a nonlinear flux function:


With density and the equilibrium density. We use subscripts to denote partial derivatives. Using the fundamental relationship relating flow to density and speed : , we can define the Fundamental Diagram (FD) .

The main contributions of the paper are as follows: as work on hyperbolic Bayesian inverse problems is rare in the literature, we explore empirically the sampling challenges these offer which have to do with shock formation in the solution of the PDE. Furthermore, we provide a unified statistical model to estimate both boundary conditions and fundamental diagram parameters in LWR. This allows us to provide a traffic flow density estimation method that is shown to be superior to two methods found in the traffic flow literature. Finally, we highlight how Population Parallel Tempering - a modification of Parallel Tempering - is a scalable method that can increase the mixing speed of the sampler by a factor of 10.

The structure of the paper is as follows. In section 2 we will give an overview of traffic flow theory, the LWR model, the traffic data, and previous work in this area. In section 3 we will describe the setup for the inverse problem, and sample from the Fundamental Diagram (FD) parameters keeping the boundary conditions (BCs) fixed. Then in section 4 we will keep the FD parameters fixed and sample the BCs. In this section we will highlight how a simple extension of parallel tempering can greatly improve mixing in such a problem. Finally, in section 5 we will sample both FD and BC parameters, and discuss the sampling challenges specific to hyperbolic Bayesian inverse problems. We will also compare this methodology to other methods both in the traffic engineering and statistical literature.

2 Motorway traffic flow and the LWR model

2.1 The Fundamental Diagram

We start by observing flow and density traffic data (flow being the number of vehicles per minute and density the number of vehicles per km) as can be seen in figure (2)), and noting that for the flow-density plot (plot (c) in the figure) the measurements on the left of the plot can be well described by a curve with a positive slope. We define traffic following this curve to be in the state of free flow. The rest of the points are scattered to the right of this curve which we call congested flow. A main difference is that vehicles do not interact much in free flow; adding a vehicle does not decrease the speed of the other ones. In contrast, adding a vehicle to congested flow will decrease the speed of the other vehicles as they are forced to slow down to avoid collisions. We can see the effect of increasing density on average vehicle speed in plot (b) of figure (2).

Figure 2: Figures showing the empirical relationship between vehicle flow , density , and velocity . The data is taken from inductance loops over one minute averages (data and figure from [35])

The fundamental diagram (FD), first studied by Greenshields [15] is the function used to described the relationship between density and flow as seen in figure (2). It is used to close the conservation of mass equation in PDE models of traffic, as will be described later.

Three variables are usually considered in the macroscopic modelling framework (where one models traffic flow as a fluid): density which describes the number of vehicles per unit length, speed which describes the average speed of vehicles at a point, and flow which is the number of vehicles that pass a point in space during a unit of time. These three variables are linked through the following relationship:


A popular example is Daganzo’s Triangular FD (also called the bi-linear FD) from [11] given in equation (5) and plotted in figure (2(a)). This FD has important traffic flow quantities as parameters: the capacity (maximum possible value of flow), the critical density which separates free flow from congested flow, and the jam density (maximum possible value of density). This FD is popular for its simplicity as well as for its computational efficiency when used in PDE models such as LWR (introduced in section (2.1.1)). We will see later how the wave speed propagation of traffic flow models depends on the shape of the fundamental diagram.


We will focus in this paper on the Fundamental Diagram introduced by del Castillo in [12] called the negative power model (which we will simply call del Castillo’s FD). This FD has 4 parameters (all in ): flow scaling term , jam density , shape parameter , and parameter relating to the critical density. The shape parameter determines the tightness of the peak and in the limit of we obtain the Triangular FD defined above. The parameter determines the vertical scaling of the FD, and is related to the critical density via the following relation:


We give the FD in equation (7) below and plot the FD for several values of in figure (2(b)) to illustrate how del Castillo’s FD includes the Triangular FD as a limiting case.

(a) The Triangular FD
(b) Del Castillo’s FD
Figure 3: (a) Daganzo’s triangular fundamental diagram plotted for dimensionless flow and density with . (b) Del Castillo’s fundamental diagram plotted for dimensionless flow and density with and for

2.1.1 Lwr

As mentioned in section (1), one of the ways to model traffic flow is to use partial differential equations (PDEs). This approach describes the macroscopic behaviour of traffic flow emerging from the individual interactions between vehicles, so we consider traffic flow as a continuum without representing the individual behaviour of vehicles. The differences between these models will therefore be in the assumptions made in how the system acts. We point to [4] and [17] for a critical review on the topic.

An important class of these PDE models are those that consist of a single equation, which corresponds to a conservation law of the form:


As mentioned earlier, we use the subscripts to denote and to denote . This conservation equation is derived from the obvious fact that vehicles are conserved on a length of road (assuming there are no on or off-ramps) (see [22] for a derivation). Different assumptions in how the system acts will lead to different ways of closing this conservation equation.

A standard way of doing this is to use the following form for the velocity: (see [10] for an overview of PDE models in traffic flow). This assumes that vehicles adapts their speed instantaneously to a change in local density; i.e. traffic flow is always in equilibrium described by the function . Multiplying this function by the density gives the fundamental diagram . This model is named the LWR model after Lighthill & Whitham [23] and Richards [29] who developed it independently.

A property of the LWR model is that it allows the formation of shock waves (discontinuities in the solution); more generally, nonlinear hyperbolic PDEs are prone to forming shocks. As a result, LWR with this choice of FD can model the propagation of upstream and downstream shock fronts on the highway. We can see these shock fronts in figure (1): the forwards moving waves occur for free flow traffic (namely for low density) and the backwards moving waves occur for congested flow traffic (namely for high density).

In hyperbolic PDEs these shock fronts will move at speeds depending on the density and this is given by the gradient of the FD at that value of density. More generally, a disturbance in the initial condition of the PDE moves along characteristics: curves in the plane where the solution is constant (with speed also given by the gradient of the FD). So using the FDs in figure (3), we can see that in free flow we have so characteristics have positive speed, while in the congested regime they have negative speed.

To solve LWR we need to choose the FD along with its parameters, and choose some inital conditions (density along the road at time ) and boundary conditions (density at the inlet and outlet of the road for the entire time period of the simulation).

2.2 Numerical Method

We would like to solve LWR for arbitrary FD parameters, initial conditions, and boundary conditions. However we cannot solve it analytically in the general case. We will use the open-source software Clawpack (

[7]) which is a package for solving conservation laws using finite volume methods (see [26] for information about the 5.0 release). Furthermore, an introduction to these finite volume methods along with an overview of the software can be found in [22].

We give here an overview of the Godunov method (a first-order finite volume method) and point out a limitation. The idea of this method is to solve the PDE in conservation form (as in equation (9)), which ensures that the method behaves correctly in the presence of shock waves:


We discretise space and time into cells, and consider methods of the form:



  • density at cell , time

  • flux (flow) at the the right boundary of cell at time

  • flux (flow) at the the left boundary of cell at time

  • and the time and space discretisation

If we consider the density to be constant in each cell, we obtain a Riemann Problem at each boundary: an initial value problem with piecewise constant data and a single discontinuity. We then solve this Riemann Problem - which can be done analytically - at each boundary to find and .

There are two main cases in the Riemann Problem:

Case 1:

In this case we will have a shock wave as seen in figure (5): the discontinuity will simply be advected with speed:

Case 2:

Here we will have a rarefaction wave as seen in figure (4).

Figure 4: The Riemann Problem with causes a rarefaction wave. The initial condition (namely at ) consists of a constant value of high density for and a constant value of low density for . As the simulation moves forward in time we observe a rarefaction wave, or a fanning out of density values between the low and high values of the initial condition.

Creating methods that operate on the PDE in conservation form ensures that we can accurately model the position of the shock wave at any point in time and space. However they introduce a great deal of numerical viscosity that smooths out the solution. One can then extend Godunov’s method to create second order numerical methods such as the Lax-Wendroff method (see [22] for a detailed account) that models smooth solutions more accurately than the first order Godunov scheme but fails at discontinuities. To correct for this, one needs to add so-called flux limiters. Using the Clawpack software, one only needs to solve the Riemann problem at each cell and the software automatically uses a Lax-Wendroff method with a flux limiter (the default one used is called the minmod limiter).

Figure 5: We plot the analytic solution to the Riemann Problem along with its numerical solution using Clawpack. As time progresses we observe that the discontinuity is smoothed slightly. However, we notice that the position of the shock wave remains accurate.

However these methods still do exhibit some numerical viscosity. To illustrate this effect, we consider a Riemann problem (case 1 above) and compare the true solution of LWR with del Castillo’s fundamental diagram for two different times to the output of Clawpack in figure (5).

To understand the relevance of shocks and this numerical viscosity to Bayesian inference, consider the problem of estimating the FD parameters in LWR using a Riemann Problem (case 1 above) as initial condition and a single detector in the x-t plane (for

). As we vary the speed of the shock wave (by varying the parameters in the FD), the discontinuity will pass over the detector and cause the likelihood to jump. This discontinuity in the likelihood means that we cannot use gradient based methods in MCMC and this may cause mixing difficulties. However, as the numerical method smooths the discontinuity slightly, mixing might be slightly improved. Thus numerical viscosity is expected to cause the posterior to be slightly different than if LWR was solved exactly. On the other hand, even though we observe jumps in density in motorway traffic, we do not expect them to be truly discontinuous: we rather expect them to be rapid but smooth transitions of density. Formally investigating the effect of this error would be an interesting area of research.

When defining the solver we must also choose the resolution and such that the CFL condition (Courant–Friedrichs–Lewy) - a necessary condition for convergence - is satisfied (see [22]). To understand this condition, first we recall that as information in hyperbolic PDEs propagates with finite speed [along its characteristics] the solution is only affected by an interval of the initial condition. Points outside this interval do not affect the solution . We define this interval to be the domain of dependence of the solution . The CFL condition then states that the numerical domain of dependence must contain the true domain of dependence (the numerical domain of dependence is similarly defined). For methods that only use adjacent cells to compute the solution at the next time step (for example equation (10)), this means that information must only come from the adjacent cells and not from cells further away. We then define to be the fastest wave speed of the PDE and we require , ie:


This condition means we require the time resolution to be smaller than the time it takes for the fastest wave speed to cross a cell of size . Thus information cannot propagate from non-adjacent cells to the current cell when calculating the density at the next time step.

The Clawpack software automatically chooses the time resolution for the solver based on the chosen spatial resolution and the CFL condition. We choose a spatial resolution of m (as we have cells for a km length of road), and pass in a new boundary condition density value every s (which corresponds to the time resolution required by CFL for typical FD parameter values found in data). Clawpack therefore chooses depending on the FD parameter values. We used this resolution for the square wave test in figure (5).

2.3 Data

We use MIDAS data from the Highways Agency on the M25 in 2007. The data is measured on loop detectors spaced every 500m on the road which take measurements averaged every minute. These loops measure count, occupancy, headway, and average speed. Count is the number of vehicles that have passed the detector in a minute, so therefore corresponds to flow (number of vehicles per unit time). Occupancy () is the percentage of time in a minute that the detector was recording the passing of a vehicle (so 100% is gridlock, and 0% means that no vehicles passed over the detector), and headway is the time difference between a vehicle leaving the detector and another one arriving.

As density is a variable that must be included in models but is not directly measured by MIDAS detectors, we must estimate its value. One way to do this is to estimate it from speed data: for each lane, multiply the count by 60 and divide it by average speed (which is in km/h). We obtain the estimate . The limitation of this approach is that it does not consider the size of vehicles (which can vary greatly; from small cars to lorries for example).

Another approach is to estimate density from occupancy. To do this we use the relationship between occupancy and density ([16]) , with the average vehicle length. We use this to obtain density estimated from occupancy:


There are several ways to estimate the average vehicle length (or its reciprocal ): in [16] the authors outline several methods to estimate or and find that these have different trade-offs.

However as we have the count data by vehicle type in the MIDAS data, we will use it to estimate the vehicle length at every minute. The flow by type is the overall count (for all lanes) of the number of vehicles that have passed the detector in the minute classified by type (type 1: 4m vehicles, type 2: 6m, type 3: 9m, type 4: 16m). As this count data is given over all lanes (rather than for each lane), we will have to assume that vehicles types are evenly distributed across lanes. Although this assumption allows us to estimate the average vehicle length in a practical way, it is unrealistic as motorways have lanes designated for slower vehicles (which includes longer vehicles). Using

to denote the count data for vehicles of type and the total count data we have: . We then use this value (calculated at every minute) to estimate density from occupancy for each lane: .

We plot flow vs density using these two methods in figure (6) to show that they give very different results. In particular the congested flow wave speeds vary greatly between the two methods, while the free flow wave speeds are approximately the same. In section (5) we will estimate density in the BCs (as well as estimate the FD parameters) and compare the resulting wave speeds to those arising from density from speed and occupancy.

Figure 6: Section of M25 on the 8th January 2007 between 6am and 10am. We plot flow vs density for two estimation methods: density from occupancy and density from speed summed over all lanes. These methods give very different estimates. In particular we note that the congested flow wave speeds vary greatly between methods, while the free flow wave speed is approximately the same.

We will need to choose an appropriate section of road for analysis. As we are dealing with a single lane macroscopic model, we need to choose a section of road with no in/out flows (junctions), the same number of lanes (which we will aggregate variables over), few detector faults, and a consistent flow-density relationship. The chosen road is a 5km section of the M25, and as the detectors are spaced every 500m there are 12 detectors (the endpoints of the section are included). However some of the detectors have faults, so we use 8 detectors in our inference at the following locations (in km): . We use the detector measurements between 6:21am and 7:09am (including both endpoints) on the 8th January 2007; we therefore use 48min of data which corresponds to 49 time points.

2.4 Previous related work

A study close to the topic of this paper is a Bayesian analysis of traffic flow tested on motorway data by Polson and Sokolov in [28]. The objective of the paper is to develop a methodology (using particle filtering) to estimate in real time traffic density and parameters in the LWR model. This allows real-time estimation of road capacity (maximum possible flow on the road) and critical density (density at which flow is maximised) that can adapt to the drop in capacity due to an accident on the road. We go over key points in their methodology and provide a critical review.

The motorway data they use includes occupancy which they use to estimate density (see formula (12)). As explained in section (2.3), they use this quantity to estimate density with the formula (see [16] ) with traffic density and the average vehicle length. They use a constant average vehicle length, but it is unclear from the paper how it has been obtained.

They discretise the road into cells and assume that the boundary conditions and initial conditions are known (they use density data from occupancy to construct these). Defining

to be the hidden state vector of traffic densities for each cell, the model they use in the particle filter is:


Where and are evolution and equation error respectively, is the vector of measured traffic density, is the LWR evolution equation with Fundamental Diagram parameters calculated using a Godunov scheme (see [22] for a comprehensive account of these numerical methods or recall section (2.2)). The observation matrix picks out the cells with the measurements. The objective of the methodology is to sample from the posteriors and with the current history of data.

They consider a length of road on an interstate outside Chicago which seems to have two detectors separated by 845m. They discretise this distance into 4 cells (so each cell corresponds to a distance of 211m) and use a time discretisation of 5 minutes (which seems to be the resolution of data available). They run the analysis for 24 hours worth of data and estimate the drop in capacity due to a traffic accident. They also test their methodology on simulated data assuming known initial and boundary conditions. For this simulated data they use a spatial resolution of 300m and a time resolution of 5 seconds over a total road length of 1.5km and a time horizon of 1600s.

As their objective is real time estimation of certain traffic flow quantities (such as traffic state and capacity) they use a particle filter rather than MCMC as it is more appropriate for real-time analysis. In contrast, we develop in this paper a more general methodology for estimating parameters in hyperbolic PDEs with a more rigorous treatment of the boundary and initial conditions. Indeed, they assume that the boundary and initial conditions are known using density estimated from occupancy. However estimating density from occupancy has problems (like all methods) which we summarise in section 2.3

; we will therefore impute the boundary conditions rather than estimate them directly from data.

Another issue with the methodology is the coarse discretisation of the LWR model (211m and 5min for space and time respectively for the analysis of real motorway data); at this resolution the numerical solver will rather coarsely approximate the underlying PDE. Moreover, the shock waves which are important features of these nonlinear PDEs will be slightly smoothed out, as we point out in section (2.2). Furthermore, each time step in the PDE solver includes a Gaussian error term as seen in equation (13); this could smooth out shock waves which could help the sampling methodology. However we point out that the numerical method which converges to the PDE (described in [22]

) does not include a random term. Adding a Gaussian error at each time step seems more like the discretisation of a stochastic PDE (the LRW model has no stochastic component), although it is unclear whether this discretisation would indeed converge to a specific SPDE under refinement of the grid. This may depend on the scaling of the variance in the Gaussian error with discretisation step. Moreover, adding a Gaussian error at each time step amounts to adding or removing a random fraction of a vehicle thus violating the conversation of total number of vehicles (ie: conservation of mass). The model used in the paper should therefore rather be considered as an ad-hoc discrete model inspired by the Godunov method for LWR rather than a discretisation of the PDE.

The methodology developed in this paper attempts to remedy these issues and be a more rigorous treatment of the Bayesian inverse problem. However, we reiterate that the objective of the paper described above is to develop a methodology for practical real-time estimation of certain traffic quantities. Given this objective, these simplifications and assumptions may be justified.

3 Conditional sampling: the Fundamental Diagram

3.1 Statistical Model

As the two methods to estimate density discussed in section (2.3

) give very different results, we would like to build our likelihood based on a quantity that has fewer assumptions built in, namely flow (which is simply vehicle counts per minute). We use a Poisson model for the statistical error, which is a standard model for count data. We choose this model because it has the correct domain, is unimodal, and because of its simplicity. A drawback of this model is that inter-arrival times in a Poisson process are exponentially distributed, whereas we expect to not have a vehicle immediately following another one (especially for high speeds). However as the detectors count vehicles over a minute, many vehicles will have passed before the next count and the model misfit for small time resolutions should not be apparent. Finally, the chosen section of road has four lanes, but our model is only for single-lane roads. We therefore sum flow values over all lanes to obtain the total number of vehicles, and if the individual flows are independent Poisson random variables, then the sum of flows also follows a Poisson distribution.

Letting be the vector of FD parameters in LWR, and let (with ) be observed density and flow. We assume that data is iid with a Poisson error model which leads to the log-likelihood:


With the predicted flow for the observation resulting from the observation operator from equation (1). In this section we sample from the parameters in del Castillo’s (equation (7)), namely . Recall that the parameter controls the tightness of the peak, so as the FD tends to the Triangular FD (equation (5). In order to allow this parameter to take high values but while also allowing the Markov chain to converge, we invert this parameter and hence sample (we continue to denote this inverted parameter as ).

Based on domain knowledge of the realistic shape the FD can take, we set uniform priors for the FD parameters shown in equation (15) below:


Code for reproducing the sampling results in this paper can be found at

3.2 Direct Fit

We start by fitting in this section del Castillo’s FD directly to flow-density data. So the predicted flow in equation (14) is simply flow predicted from observed density using del Castilo’s FD with parameters (so without using the LWR model).

As this is a simple model and the likelihood is suitably informative, there is no particular need for a proper prior and so we use a flat prior . We therefore obtain the posterior: .

We sample from the posterior using a random walk Metropolis Hastings algorithm. We run 3 chains for 5K iterations with covariance matrix given in the appendix in section (A). We obtain acceptance rates of , , and , and show the trace plots for the parameters in figure (7) which suggest good mixing.

To compare mixing speeds between different samplers we use the estimated decay time which we denote by , which is the lag that corresponds to an autocorrelation of . This estimate is more robust than some estimators of the integrated autocorrelated time (see [32] for a discussion of estimators of the integrated autocorrelation time) as the gradient of the ACF at that point usually still has a high magnitude: estimating this value by eye is therefore practical. As we will estimate this visually (from the ACF plots) we consider this to be a fairly approximate way of comparing chains.

For this MCMC run, the ACF (plot not shown) reveals that the decay time is approximately 9. We show FD plots with sampled parameters against flow-density data (using density from occupancy) in figure (7(a)).

We then show the output from LWR in the plane using the posterior mean samples in figure (7(b)). Comparing this plot to the real data in figure (1) we can clearly see that the congested flow wave speed is incorrect (namely, the congested flow waves in the case of the direct fit do not cross the domain at all). This suggests a misfit of the model.

Figure 7: Posterior samples from a direct fit of del Castillo’s FD to M25 data. Trace plots of sampled parameters against flow-density data. The 3 colours correspond to the 3 MCMC chains
(a) The Triangular FD
(b) Del Castillo’s FD
Figure 8: Posterior samples from a direct fit of del Castillo’s FD to M25 data. (b) Plotted FDs using the samples (c) Density in the plane from LWR. Parameters used are the posterior mean from samples. We notice that the congested flow waves do not cross the domain as they do in the data

3.3 Inverse problem

In this section we infer the FD parameters using LWR to predict flow values in equation (14). The observation operator solves the LWR model given some boundary and initial conditions as well as the FD parameters , maps the density output to flow using the FD with parameters , and picks out the flow at the values corresponding to observations.

To run the LWR solver we need some BCs: we fix them using density estimated from occupancy from M25 data. However, the observations are only available every minute but we need a

s resolution to insert in the solver. We therefore need to interpolate in between each minute to obtain BCs at the correct resolution. We do this by sampling from the BC prior (a "log Ornstein-Uhlenbeck" process) conditional on the density from occupancy data observed at every minute, and plot both BCs in figure (


The "log Ornstein-Uhlenbeck" (log-OU) process is defined as follows: For a given BC (ie: inlet or outlet), let be the logarithm of the BC at time and let with be the time-dependent mean. Then we choose to be the unique solution of the stochastic differential equation (with a Wiener process), with and the mean-reversion parameter and diffusivity parameter respectively (see [18]). The elicitation of this prior is described in section (4.1), and from it we fit the parameters to obtain and .

Figure 9: Inlet and outlet BCs to use in sampling. These are samples from the log-OU prior conditional on density from occupancy (M25 data) at every minute.

LWR requires the initial condition as well as the boundary conditions. The density will be propagated with either free flow or congested flow wave speed and so only the density measured for the first few minutes will be influenced by the initial condition. To avoid having to infer this initial condition, we simply do not use these first few detector times to build the likelihood; as a result the likelihood is unaffected by the initial condition and is only affected by the choice of boundary conditions and FD parameters. To be able to only remove a small number of points in the plane (ie: just the first few minutes) we assume that the FD parameters are such that density for these initial times corresponds to free flow (which is a reasonable assumption as can be seen in figure (1)). We further assume that the free flow wave speed lies within a reasonable range of speeds. We remove the influence of the initial condition on the likelihood in this way for all further inferences in the paper.

We now employ a RWMH sampler to fit del Castillo’s FD to M25 data using LWR as the forward model. We use the covariance matrix given in the appendix in section (A) and run 3 chains for 5K iterations. We obtain the acceptance rates , , and for the 3 chains. We show the trace plots in figure (10) which seems to show good mixing and from the ACF plots (not shown) we observe that the decay time is approximately 10. We also show the FD samples in figure (10(a)).

Figure 10: Posterior samples for del Castillo’s FD for 3 chains fit to M25 data using LWR as the forward problem. Trace plots of sampled parameters against flow-density data.
(a) The Triangular FD
(b) Del Castillo’s FD
Figure 11: Posterior samples for del Castillo’s FD for 3 chains fit to M25 data using LWR as the forward problem. (a) Plotted FDs used the samples (b) Density in the plane from LWR output with posterior mean parameters. Unlike in the case of the direct fit to data, the congested flow waves cross the domain as in real data.

Here we see in figure (10(b)) that the congested flow waves cross the domain (as in the real data), unlike in the direct fit to data. This suggests that the likelihood built with LWR (rather than the direct fit) penalises misfitting the congested flow wave speeds. As the direct fit does not consider time or space, this result confirms the intuition that using a likelihood that includes dynamic information about the system yields a better fit to data.

4 Conditional Sampling - Boundary Conditions

4.1 Prior elicitation

In the previous section we used density estimated from occupancy to define the boundary conditions (BCs) in LWR as this method was determined to be the most appropriate. However, density is a continuous quantity and is the result of a limiting procedure while traffic is composed of discrete vehicles. We therefore prefer to not impose it and rather treat it as a parameter to infer.

As density from occupancy (see section (2.3)) is considered an appropriate method for estimating density, we will use it to elicit the prior for the BCs. As discussed previously, it is estimated using the average vehicle length (unlike density estimated from speed). Eliciting the prior in this way will encode our prior belief that the estimated BCs should not deviate too far from density estimated from occupancy.

We will use time series of density from our section of road to fit a Gaussian process prior for the BCs (of course discarding the day that we use in inference).

We choose as prior a "log-OU" process. By that we mean that the logarithm of the centered BCs follow an Ornstein Uhlenbeck (OU) process. We choose this prior for 3 reasons:

  • We would like density to always be positive.

  • We would like the prior to allow sudden excursions in density corresponding to high density waves. Indeed, with a log-OU prior we model the logarithm of the centered BCs with an OU process. As a result, a high value of density will a priori have a higher variance which enables high density waves, and a low value of density will a priori have a low variance.

  • We would like to be able to sample easily from the prior as well as evaluate the probability of a sample under the prior. This is computationally inexpensive to do with a log-OU prior.

We first give a succinct overview of the OU process and then estimate the parameters of this process from traffic flow data.

For a given BC (ie: inlet or outlet), let be the logarithm of the BC at time and let with be the time-dependent mean. Then we choose to be the unique solution of the stochastic differential equation (with a Wiener process), with and the mean-reversion parameter and diffusivity parameter respectively (see [18]).

We fit an OU process to centred log-BCs for the inlet and outlet BC together, as fitting them seperately yielded similar OU parameters. The inlet BC is a function of time that returns density, and corresponds to the inlet of the studied stretch of road (ie: ) . Similarly, the outlet BC corresponds to the outlet of the road (ie: km).

For the inlet and outlet detector data we keep only weekdays, discard the 1st January, and keep only the 75 and 65 first days for the inlet and outlet detectors respectively. We removed these last days as they have unusual density curves. We also removed the 8th January as this is the dataset used in the inference. We fit the mean to the logarithm of traffic curves , and then fit the OU parameters and to (we fix to define a unit of time to be minute). We apply a very slight smoothing to the log-BC means to ensure that they are smooth.

We estimate the parameters for the inlet and outlet together using MCMC. Defining to be the i-th measured density curve and the precision matrix for the OU process, we write the likelihood as:


We use a flat prior for the parameters and use a random walk Metropolis sampler to sample from the posterior. The posterior mean is and .

We plot in figure (11(a)) inlet BCs from data (data used to fit the OU process) along with prior samples. To allow comparison to the BCs from data, the prior samples here have the same resolution: one point per minute. We can visually check that the log-OU prior fits fairly well the inlet BCs from data (prior samples for the outlet BCs are also similar to BCs from data).

We also plot in figure (11(b)) samples from the inlet BCs at full resolution: one point every 1.5 seconds, which is the resolution that we use in the inference. We use this resolution as it allows for a detailed description of the density waves that will get propagated by LWR. Indeed comparing the two figures, we can see that a resolution of 1min does not capture the details of the high density waves.

(a) Inlet BC prior samples vs data
(b) Inlet BC prior samples - high resolution
Figure 12: (a) Inlet BCs from data (using density from occupancy) along with prior samples at 1 min resolution (b) Samples from the prior for the inlet at full resolution: 1 point every 1.5 seconds

4.2 Sampling - Parallel Tempering

In this section we fix the FD to be the mean of the posterior estimated in the previous section and sample from the BCs. We try a first sampler which exhibits some mixing problems, and then show how a simple change to Parallel Tempering can drastically improve mixing.

4.2.1 A first sampler

The sampler employs as proposal a pCN algorithm with Gibbs blocking within a parallel tempering (PT) algorithm. We now give an overview of these methods:

The preconditioned Crank-Nicholson (pCN) algorithm (see [9]) is a Metropolis Hastings algorithm designed to sample a discretisation of a functional parameter. It uses as proposal:



  • : logarithm of the BC

  • a sample from the prior with mean (OU process)

  • : step size parameter to tune in the pCN algorithm

  • : prior mean (of the logarithm of BCs)

This proposal is such that the acceptance probability in the Metropolis Hastings algorithm is dimension independent. This means that unlike a Gaussian proposal, the sampler will not deteriorate as we refine the discretisation of the function.

However, running this algorithm on the BCs (either updating both BCs together, or one at a time) results in a badly mixing chain: the step size must be set to be very small for the sampler to have a reasonable acceptance rate. We also noticed that the posterior seems to have different variances for different sections of the BCs. Namely, sections of low and high density seem to have low variance, while sections of medium density seem to have high variance. If one maps density to flow using the FD (see figure (10(a))), this corresponds to low flow having low variance and high flow having high variance, which is consistent with the Poisson error model.

To obtain a faster sampler, we make the assumption that these different sections in the BCs are independent. Indeed from inspecting figure (1) we can see that for a fixed FD wave speed, a section in the outlet BC will influence the plane mainly within a thin strip with wave speed given by the FD. Another section in that BC will influence another strip parallel to the first section, so these two sections are therefore expected to be fairly independent. Of course the inlet and outlet BCs are correlated, as a wave leaving the outlet BC will hit the inlet BC and vice-versa. However if this assumption is approximately true we can sample these different sections separately using a Metropolis-within-Gibbs sampler. This allows us to tune the parameter in the pCN algorithm for each section. It would also allow us to make bigger jumps as each section would be much shorter than the entire BC. This idea of tuning differently for different sections of the BC is similar to operator weighted proposals discussed in [21].

We therefore use Gibbs blocking: we sample sections of each BC using pCN in a Metropolis-within-Gibbs step. We do this by sampling from the prior between two time points conditional on the rest of the BC being fixed. As the prior of the log-BCs is a Gaussian Process this is straightforward to do. We use the following guidelines to choose the Gibbs blocks (based on experimentation):

  1. Choose a block size between 10min and 30min. Choosing a size that is too big will result in a low acceptance rate, while choosing too small results in too many blocks (and hence slow mixing for the chain overall).

  2. Choose a section where the density - and hence the variance - is fairly similar. Tune based on the variance of the block.

  3. Check the acceptance rate and aim for around

This sampler results in drastically better performance than pCN on its own. However the BC samples appear to exhibit multimodality which causes the sampler to have difficulty mixing between the modes. We therefore employ Parallel Tempering.

Parallel Tempering (PT), also known as Replica Exchange MCMC ([24], [5]) is an algorithm used to sample from multimodal distributions. We augment the state space by introducing an additional discrete parameter with . Defining the unnormalised target posterior as we can augment the state space by tempering the likelihood (with the log-likelihood, the log-prior, and the pseudo-prior (as defined in [5])):


We therefore run

chains in parallel to target the joint distribution

where is the posterior tempered using the inverse temperature .

We define to be the probability of making a within-temperature move, sample , and run the algorithm:

  • if , perform a within-temperature move; this can be done using any MCMC sampling scheme

  • if , choose uniformly a pair of posteriors and (usually chosen so that the inverse temperatures are adjacent) and swap the states and with probability .

Many extensions of these tempering algorithms have been proposed in the literature, such as ones generalising the between-temperature moves in [33] or ones defining a continuous temperature schedule in [14].

A benefit of this algorithm is that the chains can be run on parallel cores rather than sequentially. However, one needs to choose a temperature schedule . To tune this schedule we use the following iterative tuning procedure (based on the one outlined in [1]): we raise the likelihood to the power with , and find a value of that allows the chain to mix well. We then find a colder temperature such that the swap acceptance rate is approximately . We then fit a geometric schedule between these two values and extrapolate to find the other temperatures. We then check that the acceptance rate for temperature swaps is around for these spacings.

A limitation of this algorithm is that as the number of temperatures increase, it takes exponentially longer for samples in the hottest chains to propagate upwards to the target distribution.

4.2.2 Sampling results and discussion

We fix the FD to the mean of the posterior found in the previous chapter, namely: .

We employ a PT sampler with inverse temperatures [0.58, 0.76, 1] and run 3 chains for around 220K iterations. We set the sampler to do 3 within-temperature proposals between each swap proposal. We report the blocks used and their acceptance rates in appendix B. We show the outlet and inlet BC samples in figures (12(a)) and (12(b)), and notice that the outlet BC exhibits bimodality between times for only one chain. Indeed this is confirmed in the trace plots for a few times in the outlet BC in figure (13(a)): we can see one of the chains jump to a mode at a density of around . This is due to a chain at a higher temperature finding this mode and a swap being accepted. We note that this only occurred after approximately 175K iterations (which takes approximately 72 hours to run). Running the chains for several days allowed one of them to find this low probability mode. However it is possible that more tuning would allow this mode to be explored more regularly, for example by adding more Gibbs blocks for those times for all temperatures. The trace plots for 3 typical times for the inlet BC is shown in figure (13(b)); this BC appears to not have any mixing problems.

Calculating the ACF for different times in the outlet and inlet BC we find that most of the BC times have decay times between 1K and 3K iterations which is a sign of a slowly mixing chain. We show the ACF plot for a typical BC time point in figure (17(a)).

The slow mixing speed can in part be explained by the strong correlations between the outlet and inlet BC parameters: as discussed in section (4.2.1) above, we can see in figure (1) that a forwards moving free flow wave leaving the inlet BC (with wave speed given by the slope of the FD) will hit the outlet BC (and vice versa for congested flow waves). Changing one of the BCs therefore requires changing the other one in a way that is compatible with the first. Furthermore, as hyperbolic PDEs are prone to shock formation, the BCs will have sharp changes in density: a small translation of the BC to the left or right (say) will therefore cause a large drop in likelihood. In contract, PDEs that exhibit diffusion will smooth out such discontinuities quickly, so such a misfit will be penalised much less by the likelihood. An algorithm such as pCN - which is simply random walk proposal - will therefore mix slowly.

(a) Outlet BC samples
(b) Inlet BC samples
Figure 13: BC samples for a PT sampler with del Castillo’s FD on M25 data. (a) BC samples for the outlet BC. We can see how one of the chains jumped to a mode in the first few minutes. (b) BC samples for the inlet BC.
(a) Outlet BC trace plots
(b) Inlet BC trace plots
Figure 14: (a) Traceplots for a few times in the outlet BC for 3 chains. We can see at minute 0 the green chain that explores the mode found in figure (12(a)); however the other chains never find this mode. (b) Trace plots for the inlet BC

To explain the bimodality present in figures (13) and (14), we recall that the likelihood is built from flow. This means that different values of density that map to the same value of flow will be equally likely. To illustrate this, we plot in figure (15) del Castillo’s FD with the same parameters values used in the sampler, and we plot two vertical lines for two values of density ( and ) that map to the same value of flow (the horizontal line). If we then inspect the sections of the outlet and inlet BCs that exhibit bimodality, we observe that some of the pairs of density branches approximately correspond to these two values. Of course there is dynamic behaviour in the plane so this explanation is an approximation. However, the bimodality observed in this posterior and in the later one studied in this paper is approximately accounted for by this explanation.

Figure 15: Del Castillo FD with parameter values used in the sampler for . The two vertical lines correspond to two values of density ( and ) that map to the same value of flow. As the likelihood is built from flow, these two values of density are equally likely and therefore the posterior exhibits bimodality.

The mode in the first few times of the outlet BC discovered by one of the chains raises some questions: why did the other chains not explore this mode? How much probability mass does this mode contribute to the total posterior mass?

We have noticed that the within-temperature proposals move the chains very slowly (see explanation above). We can see this in the trace plots in figure (14), and especially in time 0 of figure (13(a)): the chains swap back and forth between two sets of samples for sometimes several tens of thousands of iterations. This means that one chain will stay on the newly discovered mode for a large number of iterations, while the other chains have not yet found it.

To reframe this problem, we can think of the hotter chain in PT as a kind of independent sampler proposal. Indeed, tempering a distribution is a general way of finding a good proposal distribution that we can sample from in practice. Reframing the problem in this way, we notice that successive proposals in PT are highly correlated, which slows down the PT sampler. If, on the other hand, successive proposals from the hotter chains to the colder ones were more varied, the sampler would mix a lot more quickly.

4.3 Population Parallel Tempering

To fix the issue discussed above, we introduce a simple fix to Parallel Tempering which we call Population Parallel Tempering. This is simply a population version of Parallel Tempering that allows us to parallelise the algorithm even more: adding more chains will result in accelerated mixing for each of the chains. See [20] for an overview of population based sampling methods and [6] and [27] for some algorithms that parallelise MCMC.

Consider three PT samplers running in parallel as in figure (15(a)). Each PT sampler has four temperatures (with ) and so is an ensemble of four chains. In the figure each number denotes a MCMC chain. So the first PT sampler has chains , the second PT sampler has chains , and the third has chains . As the 3 PT samplers are run independently, swap proposals can only operate on chains within their ensemble, as shown in figure (15(a)). If the within-temperature proposals are slow, then successive swaps are likely to simply swap similar samples back and forth.

However, if we allow the three PT samplers to share chains (while still only swapping adjacent temperatures), then there are many more possible swaps allowed. We show one such swap proposal in figure (15(b)). In this scenario, the next time chain swaps with chain it is likely that this chain will have swapped samples with another chain. This reduces the probability of situations such as the one in figure (13(a)).

Consider further the situation where the number of PT samplers gets very large. In this situation, the number of MCMC iterations that ran between two consecutive swaps of 2 chains also gets large, which means that eventually the hotter chain will have run long enough to produce a new independent sample. So if we choose a large enough number of PT samplers, the swap proposals between the target distribution and the next hottest chains (namely the swaps between temperatures and ) will tend towards an independent sampler. This provides a simple way to parallelise PT: adding more samplers (and therefore more cores) should increase the mixing speed of the overall sampler.

(a) Parallel Tempering
(b) Population Parallel Tempering
Figure 16: Diagrams illustrating a swap proposal for 3 parallel tempering samplers each with 4 inverse temperatures . Each number represents a chain. (a) In standard parallel tempering the samplers are independent and so do not interact. (b) In population parallel tempering all the chains can swap samples

We test this by running a Population PT sampler on the BCs with exactly the same setup as in the previous section, but with 5 PT samplers in parallel (so there are 15 chains in total). In figure (17) we show the trace plots for the same BC time points displayed in the previous section. We can see from the trace plots that this sampler mixes much faster. We quantify this improvement by inspecting the ACF plots (a typical ACF plot for both PT and Population PT is shown in figure (18)) and finding that the decay time is around 100 for most of the BC time points. This is an order of magnitude improvement on the PT sampler in section (4.2.2), which had decay times betwen 1K and 3K.

(a) outlet BC trace plots
(b) Inlet BC trace plots
Figure 17: Trace plots for both BCs using a Population Parallel Tempering sampler. We have much better mixing than in figure (14

) (a) We notice that the mode in the first few minutes of the outlet BC is hit very rarely, showing that it contributes only a small amount to the total posterior probability mass. (b) Trace plots for the inlet BC: the chain mixes much quicker than in the standard PT sampler

We can also see that the mode in minute 0 of the outlet BC was still very rarely explored, but that the chain did not get stuck in it. From inspecting the hotter chains (not shown), we find that this mode was explored more regularly in those chains, but that swaps between these chains and coldest ones were rarely accepted. This suggests that the mode simply contributes a very small amount of probability mass to the posterior distribution.

(a) ACF for Parallel Tempering
(b) ACF for Population Parallel Tempering
Figure 18: Comparison of ACF plots for two typical BC times for PT and Population PT. The decay time for Population PT is an order of magnitude less than for PT. (a) Parallel tempering: the decay time is around 1000 for this time point (the decay time ranges between 1000 and 3000 for other time points (b) Population PT: the decay time here is around 100.

4.4 Discussion

In this section we built sampler that uses pCN proposals, Gibbs blocking, and Parallel Tempering, and it was found to exhibit mixing problems. We found that a Population Parallel Tempering sampler - a simple fix that allows PT samplers to "share" hot chains - drastically improved the performance of the sampler and resulted in good mixing. We reiterate that all other tuning parameters were kept the same between the two samplers. In the next section we sample both FD and BC parameters using the Population PT sampler.

5 FD and BC sampling

We now sample both FD and BC parameters using the building blocks defined in the previous sections. We alternate sampling both sets of parameters in a Metropolis-within-Gibbs sampler.

5.1 Sampling results

We employ a Population Parallel Tempering sampler as a standard PT sampler exhibited similar mixing problems as in section (4.2.2). Indeed, the correlations between the parameters discussed in section (4.2.2) are more pronounced in this case as we are sampling the FD parameters as well as both BCs. For example we can see from figure (1) that if a BC proposal translates the inlet BC to the left or right (say), then either the FD wavespeed will have to be changed accordingly, or the outlet BC will have to be translated accordingly. However, as we are employing a Metropolis-within-Gibbs sampler each parameter can only make small jumps. As discussed previously, the hyperbolic nature of the PDE causes the solution to have large jumps in density which accentuates even a slight misfit. In contract, if the PDE exhibited diffusion then this effect would be much less pronounced.

We use 4 inverse temperatures and 3 within-temperature moves. We run 4 PT samplers in parallel (so there are 16 chains in total) using the inverse temperatures [0.44, 0.58, 0.76, 1]. For the within-temperature proposals we propose a new FD and new BCs with probabilities 0.4 and 0.6 using a Gaussian proposal for the FD and Gibbs blocks for the BCs. The covariance matrices and blocks for each temperature are given in the appendix in section (B.1). We run 3 within-temperature moves for every between-temperature move. We run 3 chains for 230K iterations and we obtain acceptance rates of , , and for the FD proposals for each chain.

Figure 19: Trace plots for the FD parameters for a population PT sampler, which show good mixing.
Figure 20: FD samples plotted with M25 flow data and three density estimation methods: from occupancy, from speed, and from BCs. The samples are from FD and BC sampling for del Castillo’s FD for M25 data. The density estimated in the BCs seems to agree with density from speed, but the congested flow wave speed in the fitted model seems to be different to the wave speeds implied by the other two density estimation methods.
(a) outlet BC trace plots
(b) Inlet BC trace plots
Figure 21: BC samples for the population PT sampler sampling FD and BC parameters.

We show in figure (19) the trace plots for the FD parameters which show good mixing. We show in figure (20) the FD samples plotted with M25 flow data against 3 density estimates: density from speed, from occupancy, and estimated in the BCs. To obtain the latter we used the mean BCs (both inlet and outlet) and picked out the time points that correspond to measurements. We then plotted the M25 flow data at those time points against the density in the BC means. We first observe that density estimated in the BCs does not agree with density estimated from occupancy but is similar to density estimated from speed. In terms of the wave speeds, the free flow wave speeds implied by all three density estimates seem to agree, but the congested wave speeds do not. The congested flow wave speed in the fitted model seems to be in between the wave speed implied by the two other density estimates. As the congested flow waves in the fitted model (in figure (22(a))) seem to agree with the waves in M25 data (in figure (1)) , this suggests that estimating the density in LWR rather than estimating it in a preprocessing step yields a better fit of the wave speeds. Finally, we show the residuals in figure (22(b)) which suggests a good model fit.

We compare FD ACF plots for PT and Population PT in figure (22): the decay time for the FD parameters in the PT sampler is around 3000-5000, while the decay time for the Population PT sampler is around 100; as in the previous section, this is an order of magnitude improvement. The decay times for the BCs samples (ACF plots not shown) has a similar improvement: the decay times for the BC time points for the PT sampler are typically between 1000 and 3000. On the other hand, the decay time for the Population Sampler is around 100, which is again an order of magnitude speedup.

(a) FD ACF plots for PT
(b) FD ACF plots for Population PT
Figure 22: ACF plots for the FD parameter for both PT and Population PT. (a) ACF for the PT sampler: the decay time is around 3000-5000. (b) ACF for the Population PT sampler: the decay time is around 100, which is an order of magnitude faster than for the standard PT sampler.
(a) plane from posterior mean
(b) Residuals from posterior mean
Figure 23: Using the posterior mean parameters from FD and BC sampling with a Population PT sampler, we plot the output of LWR in the plan in figure (a) and the residuals in figure (b)

6 Conclusion and future work

After having given a brief review of motorway traffic flow modelling, we fit the Fundamental Diagram (FD) directly to flow-density data, but found that estimating them with LWR as the forward model resulted in a superior fit in terms of wave speeds. We then sampled from the boundary conditions (BCs) in LWR using a pCN algorithm with Gibbs blocking within a Parallel Tempering (PT) sampler, but found the sampler to have poor mixing. This is due to the strong correlations between the BCs and the hyperbolic nature of the PDE. We then highlighted how a simple modification - Population Parallel Tempering - sped up the sampler by an order of magnitude. We then use this sampler to sample both FD and BC parameters in LWR.

As work on Bayesian inverse problems for hyperbolic PDEs remains rare in the literature, we have provided a unified statistical model to estimate both boundary conditions and fundamental diagram parameters while respecting the character of LWR as a conservation law. Furthermore, we compare the density estimated in the BCs to two density estimates in the engineering literature (density from occupancy and speed) and find that although the free flow wave speeds implied by the three methods agree, only the congested flow wave speeds in the density from BCs (namely, in the fitted model) agree with the congested flow waves in M25 data. When inserted into LWR, the BCs estimated using our new method provides a fit superior to that obtained from BCs using engineering methods.

An avenue for future work could be to establish sufficient conditions for well-posedness of the Bayesian inverse problem in the continuous limit case (infinite PDE solver resolution) as well as rates of posterior contraction in the limit of the time of observation tending to infinity. Results in [34] could be built on here.

Furthermore, the sampling methodology developed in this paper could be used to fit more sophisticated traffic flow models to data. Examples of these are systems of PDEs that include conservation of momentum as well as mass (namely, 2 equation models), which are discussed in [10]. These classes of models could be quantitatively compared after fitting the parameters and boundary conditions for each of them. This would allow for a rigorous assessment of the strengths and weaknesses of these models.

Finally, to introduce another modification on parallel tempering, recall that in our implementation we tempered only the likelihood so that BC swaps between temperatures would get accepted. This is because if we tempered the whole posterior the priors on the log-BCs (ie: an OU prior) for each temperature would be incompatible. However to avoid shrinkage it would be of interest to temper the whole posterior rather than just the likelihood. One would therefore need to modify log-BCs proposals between temperatures to be compatible with the prior of new tempered posterior. Some analytic work as well as experimental results would be needed to define this version of parallel tempering on function space.


  • [1] Atchadé, Y. F., Roberts, G. O., Rosenthal, J. S., 2011. Towards optimal scaling of metropolis-coupled Markov chain Monte Carlo, Statistics and Computing 21(4), pp. 555-568.
  • [2] Aw, A., Rascle, M., 2000. Resurection of "second order" models of traffic flow. SIAM Journal on Applied Mathematics, 60(3), pp.916-938.
  • [3] Bando,M., Hasebe, K., Nakayama, A., Shibata, A., and Sugiyama, 1995. Dynamic Model of traffic congestion and numerical simulation. Physical Review E 51, 1035-1042. (doi:10.1103/PhysRevE.51.1035)
  • [4] Bonzani, I., Gramani Cumin, L. M., 2009. Critical analysis and perspectives on the hydrodynamic approach for the mathematical theory of vehicular traffic. Mathematical and Computer Modelling, 50 526-541. doi:10.1016/j.mcm.2009.03.007
  • [5] Brooks, S., Gelman, A., Jones, G. L., Meng, X, 2011 Handbook of Markov Chain Monte Carlo. CRC Press
  • [6] B. Calderhead. A general construction for parallelizing Metropolis- Hastings algorithms. Proceedings of the National Academy of Sciences, 111(49):17408- 17413, 2014.
  • [7] Clawpack Development Team, 2018. Clawpack Version 5.4.0,, doi:10.5281/zenodo.262111
  • [8] Cotter, S.L., Dashti, M., Robinson, J.C., Stuart, A.M., 2009. Bayesian inverse problems for functions and applications to fluid mechanics. Inverse Problems 25. doi:10.1088/0266-5611/25/11/115008
  • [9] Cotter, S. L., Roberts G. O., Stuart, A. M., White, D., 2013. MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster. Statistical Science 28(3), pp. 424-446 DOI: 10.1214/13-STS421
  • [10] Coullon, J., 2019. MCMC for a hyperbolic Bayesian inverse problem in motorway traffic flow. PhD dissertation. University College London.
  • [11] Daganzo, C., F., 1994. The cell transmission model: a dynamic representation of highway traffic consistent with the hydrodynamic theory. Transportation Research Part B: Methodological, 28(4) pp. 269-287.
  • [12] del Castillo, J., 2012. Three new models for the flow-density relationship: derivation and testing for freeway and urban data. Transportmetrica 8(6): 443-465
  • [13] Gasser, I., Sirito, G., and Werner, B. 2004. Bifurcation analysis of a class of “car following” traffic models. Physica D 197, p.222-241.
  • [14] Graham, M. M., Storkey, A. J., 2017. Continuously tempered Hamiltonian Monte Carlo. arxiv Available at [Accessed 5th March 2019]
  • [15] Greenshields B.D., 1934. The photographic method of studying traffic behaviour. Proceedings of the 13th Annual Meeting of the Highway Research Board pp. 382-399.
  • [16] Heydecker, B.G., Addison, J. D., 2011. Measuring traffic flow using real-time data. In: Kuehne, R and Gartner, NA, (eds.) Transportation Research Circular, E-C149: 75 Years of the Fundamental Diagram for Traffic Flow Theory. (pp. 109-120).
  • [17] Hoogendoorn, S. P. and Bovy, P. H. L., 2001. State-of-the-art of vehicular traffic flow modelling. Proceedings of the Institution of Mechanical Engineers, Part 1: Journal of Systems and Control Engineering 215:283. doi: 10.1177/095965180121500402
  • [18] Iacus, M., Stefano, 2008. Simulation and Inference for Stochastic Differential Equations. Springer.
  • [19] Iglesia, M.A., Lin, K., Stuart, A. M., 2014. Well-posed Bayesian geometric inverse problems arising in subsurface flow. Inverse Problems 30. doi:10.1088/0266-5611/30/11/114001
  • [20] Jasra, A., Stephens, D. A., Homes, C. C, 2007. On population-based simulation for static inference. Statistics and Computing 17(3), pp. 263-279.
  • [21] Law, K. J. H, 2014. Proposals which speed up function-space MCMC. Journal of Computational and Applied Mathematics 262(15) pp 127-138
  • [22] LeVeque Randall, J. 2004. Finite Volume Methods for Hyperbolic Problems. Cambridge University Press.
  • [23] Lighthill, M. H., and Whitham, G.B., 1955. On Kinematic Waves 2: A Theory of Traffic Flow on Long Crowded Roads. Proceedings of the Royal Society of London A, 229, pp.317-345.
  • [24] Liu, J. S. 2001, Monte Carlo Strategies in Scientific Computing. Springer-Verlag New York
  • [25] Mahnke, R., Kaupuzs, J., Lubashevsky, I., 2005. Probabilistic description of traffic flow. Physics Reports, 408(2005), 1-130.
  • [26] Mandli, K.T., Ahmadia, A.J., Berger, M.J., Calhoun, D.A., George, D.L., Hadjimichael, Y., Ketcheson, D.I., Lemoine, G.I., LeVeque, R.J., 2016. Clawpack: building an open source ecosystem for solving hyperbolic PDEs. PeerJ Computer Science. doi:10.7717/peerj-cs.68
  • [27] Martino, L., Elvira, V., Luengo, D., Corander, J., Louzanda, F., 2016. Orthogonal parallel MCMC methods for sampling and optimization. Digital Signal Processing, 58 (2016), 64-84.
  • [28] Polson, N., Sokolov, V., 2015. Bayesian analysis of traffic flow on interstate I-55: the LWR model. The Annals of Applied Statistics 9(4), pp 1864-1888
  • [29] Richards, P.I., 1956. Shock Waves on the Highway. Operations Research, 4 (1) pp.42-51
  • [30] Sambridge, M., 2014. A Parallel Tempering algorithm for probabilistic sampling and multimodal optimization. Geophysical Journal International 196 pp 357-374. DOI: doi: 10.1093/gji/ggt342
  • [31] Stuart, A. M., 2010. Inverse problems: A Bayesian perspective. Acta Numerica, 19, pp 451-559 doi:10.1017/ S0962492910000061
  • [32] Sokal, A. D., 1989. Monte Carlo Methods in Statistical Mechanics: Foundations and New Algorithms, lecture notes (unpublished). In Cours de Troisieme Cyle de la Physique en Suisse Romande., Lausanne.
  • [33] Tawn, N. G., Roberts, G. O., 2018. Accelerating Parallel Tempering: Quantile Tempering Algorithm (QuanTA). arxiv [preprint] Stat.ME. Available at: [Accessed 5th March 2019]
  • [34] Thorsten Hohage, Frank Werner. 2016. Inverse Problems with Poisson Data: statistical regularization theory, applications and algorithms. Inverse Problems 32: 093001:56pp.
  • [35] Ward, J., 2009. Heterogeneity, Lane-Changing and Instability in Traffic: a Mathematical Approach. PhD dissertation. University of Bristol.
  • [36] Zhang, H.M., 2000. A non-equilibrium traffic model devoid of gas-like behaviour. Transportation Research Part B, 36 (2002) pp.275-290.

Appendix A FD sampling

Covariance matrix for the direct fit to data for :

Covariance matrix for sampling from LWR with parameters :

Appendix B BC sampling

BC type Start (min) End (min) acceptance probabilities (%)
inlet 0 49 0.07 22.6, 18.9, 18.7
inlet 0 15 0.15 12.1, 10.9, 11.0
inlet 7 35 0.13 23.9, 28.3, 28.5
inlet 33 49 0.4 16.8, 17.4, 21.1
outlet 0 49 0.04 23.7, 24.6, 23.6
outlet 0 15 0.25 16.0, 16.4, 16.6
outlet 12 24 0.21 9.5, 13.1, 10.1
outlet 25 39 0.16 23.6, 23.6, 22.7
outlet 29 49 0.12 22.1, 17.8, 21.0
outlet 6 18 0.4 10.6, 15.0, 18.5
outlet 39 49 0.4 26.8, 23.5, 27.3
inlet 0 12 0.27 3.8, 4.7, 3.2
inlet 20 36 0.25 28.5, 23.9, 27.3
outlet 2 12 0.9 2.7, 2.4, 5.2

Table 1: BC Gibbs blocks for for sampling for del Castillo’s FD on M25 data with parallel tempering. We give the start time, end time, and step size for the BC Gibbs blocks along with the acceptance probabilities for each of the 3 chains.
BC type Start (min) End (min) acceptance probabilities (%)
inlet 0 49 0.1 16.0, 19.9, 19.4
inlet 0 15 0.2 15.5, 12.8, 10.4
inlet 7 34 0.18 23.7, 21.8, 21.4
inlet 34 49 0.44 18.3, 24.1, 22.7
outlet 0 49 0.055 19.0, 17.2, 19.4
outlet 0 15 0.3 21.8, 19.7, 18.8
outlet 12 24 0.21 21.3, 22.2, 19.3
outlet 25 39 0.2 20.5, 23.0, 21.1
outlet 29 49 0.14 21.5, 22.0, 22.2
outlet 6 18 0.45 19.8, 17.0, 14.8
outlet 38 49 0.48 23.2, 21.6, 21.1
inlet 0 12 0.3 7.2, 7.5, 3.5
inlet 20 35 0.31 25.3, 25.4, 24.2
inlet 5 15 0.9 2.9, 3.2, 1.5
outlet 2 12 1 4.2, 2.1, 2.5
Table 2: BC Gibbs blocks for for sampling for del Castillo’s FD on M25 data with parallel tempering. We give the start time, end time, and step size for the BC Gibbs blocks along with the acceptance probabilities for each of the 3 chains.
BC type Start (min) End (min) acceptance probabilities (%)
inlet 0 49 0.11 28.9, 21.1, 17.2
inlet 0 15 0.21 22.1, 16.3, 16.0
inlet 7 34 0.2 24.4, 26.8, 30.8
inlet 34 49 0.47 23.6, 27.6, 31.1
outlet 0 49 0.06 22.4, 25.8, 23.1
outlet 0 15 0.39 20.8, 21.6, 17.7
outlet 12 24 0.26 23.6, 23.6, 22.2
outlet 25 39 0.2 26.6, 27.7, 27.5
outlet 29 49 0.14 24.2, 25.6, 24.4
outlet 6 18 0.5 30.2, 25.1, 22.1
outlet 38 49 0.51 24.6, 20.7, 22.8
inlet 0 12 0.4 5.4, 7.3, 8.7
inlet 20 35 0.36 25.3, 26.2, 25.7
inlet 5 15 0.9 6.3, 4.3, 3.9
outlet 3 12 1 14.0, 10.4, 11.8
Table 3: BC Gibbs blocks for for sampling for del Castillo’s FD on M25 data with parallel tempering. We give the start time, end time, and step size for the BC Gibbs blocks along with the acceptance probabilities for each of the 3 chains.

b.1 FD and BC sampling

b.1.1 Covariance matrices

Covariance matrix for for each temperature

  • For :

  • For :

  • For :

  • For :

b.1.2 Boundary condition blocks

BC type Start (min) End (min) acceptance probabilities (%)
inlet 0 49 0.07 28.4, 29.7, 23.4
inlet 0 15 0.15 14.9, 15.3, 18.4
inlet 7 35 0.13 34.2, 26.1, 31.9
inlet 33 49 0.4 29.6, 31.6, 26.0
outlet 0 49 0.04 28.8, 30.9, 31.0
outlet 0 15 0.25 24.1, 26.9, 24.2
outlet 12 24 0.21 20.6, 27.7, 19.6
outlet 25 39 0.16 27.7, 34.4, 24.4
outlet 29 49 0.12 20.1, 24.8, 21.2
outlet 6 18 0.4 35.0, 33.1, 36.4
outlet 39 49 0.4 20.0, 16.9, 17.2
inlet 0 12 0.2 11.8, 11.3, 9.6
inlet 20 36 0.25 44.5, 48.6, 45.4
outlet 2 12 0.9 6.4, 4.3, 2.9
outlet 16 26 0.15 24.4, 20.0, 22.4
inlet 10 18 0.45 34.4, 29.6, 38.9
Table 4: BC Gibbs blocks for for sampling FD and BCs for del Castillo’s FD on M25 data with parallel tempering. We give the start time, end time, and step size for the BC Gibbs blocks along with the acceptance probabilities for each of the 3 chains.
BC type Start (min) End (min) acceptance probabilities (%)
inlet 0 49 0.1 22.7, 15.4, 25.3
inlet 0 15 0.2 13.3, 11.1, 10.1
inlet 7 34 0.18 27.2, 26.8, 28.1
inlet 34 49 0.44 33.3, 27.7, 33.1
outlet 0 49 0.055 21.6, 24.5, 21.3
outlet 0 15 0.3 25.0, 30.6, 34.8
outlet 12 24 0.21 28.6, 21.9, 30.6
outlet 25 39 0.2 18.9, 30.6, 30.6
outlet 29 49 0.14 23.6, 19.5, 15.8
outlet 6 18 0.45 42.4, 43.2, 43.0
outlet 38 49 0.48 5.0, 4.8, 8.7
inlet 0 12 0.24 14.0, 21.1, 13.1
inlet 20 35 0.31 44.2, 46.5, 39.7
inlet 5 15 0.9 2.1, 0.7, 0.7
outlet 2 12 1 4.7, 4.1, 6.6
outlet 16 26 0.21 20.8, 21.3, 16.8
inlet 10 18 0.52 36.7, 37.9, 42.1
Table 5: BC Gibbs blocks for for sampling FD and BCs for del Castillo’s FD on M25 data with parallel tempering. We give the start time, end time, and step size for the BC Gibbs blocks along with the acceptance probabilities for each of the 3 chains.
BC type Start (min) End (min) acceptance probabilities (%)
inlet 0 49 0.11 25.6, 33.1, 24.5
inlet 0 15 0.21 16.5, 18.5, 11.8
inlet 7 34 0.2 32.6, 31.7, 37.1
inlet 34 49 0.47 39.4, 35.5, 42.3
outlet 0 49 0.06 23.2, 29.2, 29.6
outlet 0 15 0.39 26.9, 24.8, 21.4
outlet 12 24 0.26 35.0, 30.8, 28.9
outlet 25 39 0.2 31.8, 36.6, 27.5
outlet 29 49 0.14 24.8, 22.0, 28.1
outlet 6 18 0.5 46.2, 48.9, 35.9
outlet 38 49 0.51 7.9, 12.0, 6.7
inlet 0 12 0.31 12.1, 10.0, 17.0
inlet 20 35 0.36 47.1, 38.5, 46.5
inlet 5 15 0.8 7.7, 7.5, 7.1
outlet 0 13 1 2.2, 1.5, 2.4
outlet 0 13 1 4.1, 3.1, 3.4
outlet 16 26 0.26 25.7, 16.3, 27.4
inlet 10 18 0.55 35.2, 39.4, 37.6
Table 6: BC Gibbs blocks for for sampling FD and BCs for del Castillo’s FD on M25 data with parallel tempering. We give the start time, end time, and step size for the BC Gibbs blocks along with the acceptance probabilities for each of the 3 chains.
BC type Start (min) End (min) acceptance probabilities (%)
inlet 0 49 0.155 26.6, 31.0, 29.0
inlet 0 15 0.25 20.2, 17.4, 28.9
inlet 7 34 0.29 28.8, 28.1, 32.4
inlet 34 49 0.57 34.3, 44.0, 37.5
outlet 0 49 0.078 27.9, 27.9, 27.3
outlet 0 15 0.44 31.0, 37.4, 32.9
outlet 12 24 0.3 36.9, 31.8, 35.3
outlet 25 39 0.24 30.2, 34.8, 26.8
outlet 29 49 0.18 17.3, 23.0, 19.5
outlet 6 18 0.65 45.2, 40.9, 43.6
outlet 38 49 0.5 11.1, 12.5, 14.2
inlet 0 12 0.39 13.7, 16.1, 14.3
inlet 20 35 0.5 39.4, 36.4, 40.1
inlet 5 15 0.9 9.6, 12.1, 8.3
outlet 0 13 1 3.4, 6.1, 6.8
outlet 0 13 1 3.6, 4.8, 5.1
outlet 16 26 0.3 26.6, 27.8, 20.1
inlet 10 18 0.61 51.9, 45.5, 40.7

Table 7: BC Gibbs blocks for for sampling FD and BCs for del Castillo’s FD on M25 data with parallel tempering. We give the start time, end time, and step size for the BC Gibbs blocks along with the acceptance probabilities for each of the 3 chains.