Empirical analysis of the variability in the flow-density relationship for smart motorways

by   Kieran Kalair, et al.

The fundamental diagram is an assumed functional relationship between traffic flow and traffic density. In practice, this relationship is noisy and exhibits significant statistical variability. On smart motorways, this variability is increased by variable speed limits that are not captured by the fundamental diagram. To study this variability, it is appropriate to consider the joint probability distribution function (pdf) of density and flow. We perform an empirical study of the variability in the relationship between flow and density using 74 days of data from 64 sections of London's M25. The objectives are to determine how much of the variability in the flow-density relationship results from variable speed limits and to assess whether particular functional forms of the fundamental diagram are systematically preferred. Empirically, the joint pdf of flow and density is strongly bimodal, illustrating that traffic flows are often found in high-density or low-density regimes but rarely in between. We find that the high-density regime is strongly affected by variable speed limits whereas the low-density regime is not. The Daganzo-Newell (triangular) model of the fundamental diagram systematically fits best to the data. However, the optimal parameters vary with location. Clustering analysis of these parameters suggests three qualitatively different types of flow-density relationships applying to different sections of the M25. These clusters have natural interpretations in terms of the frequency and severity of flow breakdown. Accident rates also depend on cluster type suggesting possible links to other properties of traffic flows beyond the flow-density relationship.



There are no comments yet.


page 1


Anomaly detection and classification in traffic flow data from fluctuations in the flow-density relationship

We describe and validate a novel data-driven approach to the real time d...

Fundamental Diagram of Traffic Flow from Prigogine-Herman-Enskog Equation

Recent applications of a new methodology to measure fundamental traffic ...

A Comparative Study of Parametric Regression Models to Detect Breakpoint in Traffic Fundamental Diagram

A speed threshold is a crucial parameter in breakdown and capacity distr...

MCMC for a hyperbolic Bayesian inverse problem in traffic flow modelling

As work on hyperbolic Bayesian inverse problems remains rare in the lite...

A fatal point concept and a low-sensitivity quantitative measure for traffic safety analytics

The variability of the clusters generated by clustering techniques in th...

Modeling the Gaia Color-Magnitude Diagram with Bayesian Neural Flows to Constrain Distance Estimates

We demonstrate an algorithm for learning a flexible color-magnitude diag...

Investigating the Relationship between Freeway Rear-end Crash Rates and Macroscopically Modelled Reaction Time

This study explores the hypothesis that an analytically derived estimate...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The fundamental diagram of traffic flow is an important tool in understanding, categorizing and analyzing traffic flow. Despite several models and functional forms of the density-flow and density-speed relationships being suggested over the years, understanding the complex structure of fundamental diagrams is a difficult task. Furthermore, analysis of real data suggests that the density-flow relationship in real traffic systems exhibits significant variability that is not captured by a single functional form. As a result, it is natural to consider a distributional interpretation of the density-flow relationship, considering it as a 2-dimensional probability distribution and analyzing features of this to provide a different perspective on the relationship between density and flow. In this work, we take such an approach to analyse the empirical variation in fundamental diagrams obtained from London’s M25, one of the busiest Smart Motorways in the UK. In addition to studying the variability of the density-flow relationship at a single location, we also consider variation between locations. We approach this by first determining the optimal functional form for a diagram across 64 sections of road on the M25, then use clustering analysis to assess what qualitatively different types of density-flow relationships emerge in the network. This provides insights into the variability in the density-flow relationship of smart-motorways on both a local and global scale.

Ii Literature Review

Fundamental diagrams have been widely studied, starting with the first model of the density-speed relationship formulated by Greenshields in 1933 [11]. As an observation of a complex system, the diagrams contain much complexity, commonly exhibiting different levels of variation and spread of data as density varies. Many attempts to study fundamental diagrams have been conducted since their inception. One such work is [18], where the authors took a year of data and proposed the optimal functional form of the fundamental diagram from this data was the logistic diagram. Whilst most works take a piecewise triangular functional representation of a fundamental diagram, their search for, in their words, a functional form with ‘mathematical elegance and empirical accuracy’ demonstrates that even a simple question of what curve fits a set of data points best is open to discussion.

Many sources of variation observed in the fundamental diagram are incorporated when combining the diagrams with models, either macroscopic or microscopic. One example of this is defining the flux function of a macroscopic model (which is simply the fundamental diagram) to have a random free-flow speed [12]. Whilst this captures some of the variation observed in real data and incorporates it in model results, it does not address the initial question of quantifying or better understanding the variation present in original data. Other works have argued that simply including random noise or terms in initially deterministic models can lead to behaviour inconsistent with the original model, which lead to [8] rephrasing this randomness in a Lagrangian framework model, essentially allowing driver headway choice to be stochastic. Increasingly, instead of random flux functions, some modelling approaches use real-time data to update models which attempts to account for the deviations from typical behaviour, for example in [5] and [20]. Whilst useful, especially as more data and processing power becomes readily available, this again does not attempt to better understand the variation present before any modelling of traffic is completed.

Whilst our work focuses on fundamental diagrams of smart motorways, there is much work focusing on deriving them for an urban setting, for example [16] and [17]. Whilst the first paper here attempts to create an algorithm to fit functions to the sparse probe data, and the second attempts to use the probe data to build a model of the traffic state, neither address the problem of variation in the urban sense. Whilst discovering appropriate ways to use new data sources is important in the context of traffic management in urban scenarios, it is also important to better understand the situations where data collection is more routine and simple, for example on sensor equipped motorways. It may be an interesting extension of our work to consider if there exist any significant differences in the urban level-variation compared to the more controlled and dense motorway traffic variation.

Perhaps a work most similar to ours is [15], where the authors investigate the impact of variable speed limits on the slope of a fundamental diagram, as well as the critical occupancy and capacity of a road. Our work differs substantially from this, as we take different interpretation of the fundamental diagram on a single link to analyse the impact of variable speed limits, through a 2d probability distribution and clustering approach, not a functional form approach. In addition, we consider the spatial consistency of different fundamental diagrams, something not covered in this paper

Iii Data Pre-Processing

The data for this study is taken from Highways England’s National Traffic Information Service (NTIS). We choose to focus on the M25 motorway, one of the busiest in the country, taking 74 days of data starting from April 7th 2017.

NTIS represents roads as a directed graph, with each motorway consisting of a number of ‘links’ comprising the edges of the directed graph. On each link, the number of lanes is constant and no slip roads join or leave the motorway. We first recreate this directed graph model for the M25, consisting of 77 links, varying in length between 200 and 10,000 meters.

Placed on each link are physical sensors called ‘loops’. Loops detect passing vehicles at various distances along links, and from this the following measurements of interest are recorded:

  • Speed: The average speed of vehicles crossing the loop in kilometers per hour

  • Flow: The number of vehicles crossing the loop in a time-period in vehicles per hour

We compute a time-series of density by dividing flow by speed. NTIS aggregates this loop-level data to produce link-level recordings of speed and flow, among other quantities not used in this study. The link-level data is the resolution we focus on here, meaning we extract one data-point measuring speed and flow every minute for 74 days, repeated for 77 different links.

In addition, we extract event and speed limit information. Events are any unusual activity on a link, separated into the following categories:

  • Accidents

  • Vehicle obstructions

  • General obstructions

  • Abnormal traffic

  • Maintenance, Road management & poor environmental conditions

We focus on accidents, vehicle obstructions, general obstructions and abnormal traffic events. NTIS defines an abnormal traffic event as a period when the true travel time on a link exceeds Highways England’s predicted travel time by at least 120 seconds, for at least 5 consecutive minutes. The algorithm used to compute travel times is not publicly known.

Speed limit information represents operator interventions on motorway traffic. On some links on the M25, adjustable speed limit signs are displayed overhead to drivers, which can be used to temporarily change the speed limit on a section of road to any of 40, 50, 60 or 70 miles per hour.

After processing, we have a set of time-series, with flags at each time there was an event or speed limit intervention on a link. Of the 77 links, 64 have working sensors and report data we can use for analysis.

Iv Exploratory Analysis

Typically, one views a fundamental diagram as a scatter plot of density and flow, as seen in Fig. IV.1.

Fig. IV.1: Example density-flow fundamental diagrams using 74 days of M25 data, taken from a link between junctions 2 and 3 (3636 metres long).

Inspecting Fig. IV.1, we would consider the data to have quite a high amount of variation, particularly at density values around and above 60 veh/km. However, different methods of visualization offer different interpretations. Suppose we now approximate the 2D probability distribution in IV.1

using kernel density estimation (KDE). Kernel density estimation is a non-parametric method to estimate an underlying distribution function when only samples from it are available. For data points

, and in this context , we write a kernel density estimate as:


where is some prior choice of kernel. A common choice is the Gaussian kernel, which leads to the kernel density estimate being written as:


The process of determining the optimal KDE is then to estimate the parameters of , which has 4 elements here. There are various methods to do this, typically based on minimizing the mean integrated square error (MISE), defined as:


with being some unknown true density function we wish to approximate. Whilst the exact computation of MISE is not typically possible, approximation methods to it for multivariate data are given in [1]. A KDE interpretation of a fundamental diagram is shown in Fig. IV.2.

Fig. IV.2: Example fundamental diagram using 74 days of M25 data, taken from a link between junctions 2 and 3 (3636 metres long). This is the same diagram as shown in Fig. IV.1, however we take a vastly different interpretation of it. From the original plot, we expect large variation, however from this plot we see the data is highly centered around 2 modes. Note the log-scale used on the coloring, further emphasising we have a very strong low-density peak, a less strong high density peak, and extremely rare excursions from these.

Inspecting Fig. IV.1 and Fig. IV.2, we conclude that although significant deviations from typical behaviour exist, they are rare comparative to the overall behaviour observed on the link.

V Analyzing the Single-Link Variation

To begin the analysis in this section, suppose we first take a single link. For this link, we know when speed limits are on, so we segment our series into these categories. In particular, as speed limits are inherently linked to events (and may be activated by operators when an event seems likely, or has occurred and traffic is disrupted), we can consider how much of the variation seen in each fundamental diagram can be explained by the variable speed limit active. Any link on the M25 may contain some number of speed limit signs. At any point where we have a set of signs, we will have one above each lane, currently all displaying the same speed limit for each lane. However, different groups of signs on a link may have different speed limits active, meaning there is some ambiguity as to what we specify the current speed limit on a link to be in these cases. As a result, we group the diagrams as follows: when we detect at-least one variable speed limit sign has been altered on a link, we calculate the average speed limit displayed on all of the changed signs, and set the current speed limit on a link to the nearest value in the feasible speed limit set to this average. The feasible speed limit set is given in miles per hour as (40, 50, 60, 70), and in kilometers per hour as (64.4, 80.5, 96.6, 112.7) Results for segmenting a diagram into different speed limit cases are shown in Fig. V.1.

(a) 40 mph - Flow
(b) 50 mph - Flow
(c) 60 mph - Flow
(d) 70 mph - Flow
Fig. V.1: Example fundamental diagrams using 74 days of M25 data, taken from a link between junctions 2 and 3 (3636 metres long). The data has separated into separate cases, each having a different variable speed limit active. We see the diagram when the speed limit on the link is 70 miles per hour is not at all representative of the diagram when the speed limit is set to 40 mph. For reference, we have the conversions: (40, 50, 60, 70) mph being (64.4, 80.5, 96.6, 112.7) km/hr. The colouring and marking of points refers to the clustering analysis discussed in section V-A. The medoids are shown in red (), with the points lying the in low-density mode shown by green and high density mode shown by blue .

Inspecting Fig. V.1, we see two significant behaviours. Firstly, there are no data points past densities around 80 vehicles per kilometer in the 70 mph set. Rather than this being a setting rule, it is likely that this is simply a consequence of some other setting rule, as density is not a quantity directly provided by NTIS data. Secondly, it is striking how the variation in the data, particularly flow, decreases as we increase the speed limit. Clearly, 40 mph speed limits will be active in the most extreme cases of events and traffic state, and 70 mph speed limits will be active in the most ideal of scenarios.

Whilst the speed diagrams still see vast decreases in variation as we increase the speed limit from 40mph to 70mph, they are perhaps not so extreme as in the flow case. It should also be noted that, although it appears many data points lie above the speed limit in each diagram, there are multiple reasons for this. Apart from drivers simply ignoring speed limits, it is likely that when only part of a section of road has a particular lower speed limit on, other sections may be higher, meaning the average speed measurement we receive from NTIS may be higher than the minimum speed limit. Also, in the case of 40 mph speed limits, these are often set for maintenance work late at night, so the opportunity for drivers to exceed them is far greater than say during an accident event, where a physical queue forces drivers to slow down.

V-a Single Link Clustering Analysis

Having identified this clear distinct behaviour for each speed limit case, we now try to quantify it. Through our various methods of exploratory analysis on the fundamental diagrams, we saw in Fig. IV.2

that the distribution of data in the fundamental diagrams was bi-modal. As a result, it is natural to consider measuring variation in the entire diagram relative to each mode. If some other measure, for example the 2-d analogy to standard deviation was used, we would neglect this clear behaviour present in our data. First, we must identify the locations of the modes, then come up with a measure of distance around them, finally comparing the spread around these modes for data from each speed limit case.

It should be noted that we could simply take the two maxima of our KDE fit to determine the mode locations, however using clustering ensures we have no approximation errors and the tails are correctly accounted for.

To determine the location of the modes, clustering is insightful and practical technique. In particular, as we are aware a-priori that our data is concentrated in two locations, we use clustering algorithms to determine these locations. There are various different clustering algorithms one could use, however we have a significantly simplified selection procure as we are searching for exactly 2 clusters in each diagram. Classic choices, based on the prior need for 2 clusters, are k-means and k-medoids. Whilst widely applied to various problems, k-means can be significantly influenced by outliers in the data, which are clearly present here. Instead, we can use k-medoids, which attempts to assign cluster centers that are themselves data-points, rather than averages of the points. It is widely known that k-medoids is a prohibitively expensive algorithm when using a naive implementation. Instead, we use the ‘clara’ variation

[10], clustering subsets of the data of a fixed size, and running over 50 random starting conditions to attempt to avoid initial condition bias. Before clustering, we scale the data to ensure all quantities have similar ranges, dividing the speed values by the maximum observed speed, flow by the maximum observed flow, and density by the critical density at which the maximum flow occurs.

Clustering provides approximate mode locations for each diagram, with an example shown in Fig. V.1. From Fig. V.1, it is clear that for all speed limit cases, the low density mode stays roughly constant in position, however the high-density model moves significantly. For the flow cases, we see it moves down in density as the speed limit increases, and slightly down in flow. The speed cases are similar, with the high density mode moving up in speed and down in density as speed limits increase, however one should also note the significant reduction in spread around the modes as a function of which speed limit is active.

To quantify this behaviour, we consider the location of each mode as a function of speed limit, with results shown in Fig. V.2.

Fig. V.2: A fundamental diagram, with the high and low density clustering modes at each speed limit value highlighted. Notice how the movement of the high-density mode is far greater than the movement of the low-density mode, which essentially remains in the same position when speed limits greater than 40mph are active. In the insert, we plot the standard deviation of each cluster for each speed limit value. We see when speed limits are set to 50mph, the low-density mode has larger standard deviation, however at all other values the high-density mode has higher standard deviation.

From Fig. V.2, we see clear verification of the visual conclusions previously made. Importantly, we see significant shifts in the density coordinate for the high density flow mode when compared to the low density case. Quantifying this, we see that the high-density mode moves from around 90 veh/km when 40mph speed limits are active to around 45 veh/km when 70mph speed limits are active, a relative change of . Comparatively, we see the low-density mode move from a density of around 20 veh/km when 40mph speed limits are active to a density of around 12 veh/km when 70mph speed limits are active, a similar relative change of . It should be noted however that most of the low-density mode movement occurs between the 40mph and 50mph cases, with its location essentially being fixed for the 50, 60 and 70mph cases. On the other hand, the medoid flow coordinates change from 5300 veh/hr to 4400 veh/hr in the high-density case and 2000 veh/hr to 1250 veh/hr in the low-density case. Again, most of the low-density mode changes occur between the 40 and 50 mph cases.

Additionally, the insert in Fig. V.2 allows us to compare the spread around each medoid for each of the speed limit values. We see the standard deviation for the low-density mode actually peaks when 50mph speed limits are active, however decreases from the 70 to 50 to 40mph cases. The peak is standard deviation of the low-density mode coincides with the lowest standard deviation of the high-density mode. Recall that all quantities were scaled by characteristic values before fitting, and we are comparing the euclidean distances between points in each cluster and the cluster medoid, so this standard deviation is computed on dimensionless values.

Inspecting the distance distributions themselves, we generally see heavy-tailed distributions, but over very short ranges. When considering appropriate fits to these distributions using the methods described in [2], the limited range of distances makes distinguishing between any heavy-tailed distributions difficult.

Vi Analyzing the Link-to-Link Variation

Vi-a Fitting Functional Forms to Data

Having investigated the variation in the fundamental diagram of a single link, we now consider appropriate functional fits to the data, aiming to determine which of the many available functional forms best describes the data. We outline the most common fits seen in the literature in Appendix A table I. To begin, we simply fit each of the functional forms given in table I, and show our results in Fig. VI.1.

(a) Realistic Parameter Space
Fig. VI.1: Results for fitting each functional form of fundamental diagram to a single links fundamental diagram. All inputs, limits and data are scaled before fitting.

Inspecting Fig. VI.1, we see generally similar fits to the data across all diagrams for low density values, with the Greenberg diagram clearly being the poorest fit at both low and high densities. Any diagram we use should be representative of all data however, not a single link. To identify which diagram is best representative of the data, or indeed if such a diagram exists, we fit all of the considered diagrams to each of our 64 links, again running 100 different optimizations for each diagram on each link to attain the optimal parameter sets of each diagram. In Fig. VI.2, we show the results of measuring the root mean square error (RMSE) for all links, using only the optimal parameter set in each case.

Fig. VI.2: Root mean square error results for all links considered. We see the Daganzo Newell and Smooth Triangle results are generally indistinguishable across links, however these two diagrams consistently have lower RMSE than the others considered.

From Fig. VI.2, it is clear that the most representative functional form for the density-flow relationship is triangular, with the discrete and continuous versions consistently achieving lower RMSE than any other consider functional form. An interesting point to consider is that the Daganzo-Newell fundamental diagram peak (representing maximum flow) is not placed at the largest observed flow value. Doing so actually decreases the goodness of fit, leading to a smaller value and larger root-mean square error and sum of squared error. We instead capture the relationship with the majority of the data by placing it at a position much in agreement with Fig. IV.2. As the Daganzo-Newell triangular diagram is by far the most common in the literature, we show results for this from now on.

Vi-B Determining the Optimal Diagram During Speed Limit Categories

Whilst we have seen that the optimal diagram, in the sense of minimum RMSE, for data from each of our links is triangular, it is of interest to investigate if the same is true when segmenting diagrams by speed limit. To do so, we again fit each candidate diagram to all of our links, however this time fitting them to subsets of data, each containing points only when specific speed limits are active. We then attain goodness of fit results for each subset across all links, 61 of which have variable speed limit signs on.

Fig. VI.3: Root mean square error results for all links considered. On top we fit only to data where 40mph speed limts are active. On the bottom we fit to data only when 70mph speed limits are active. As in the non-segmented cases, we see the Daganzo Newell and Smooth Triangle results are generally superior across links, however it is far less clear in many cases, and in some they are indistinguishable from various other models. Note that on the 70mph diagram, the Newell and Logistic diagrams are almost indistinguishable as they have such similar RMSE values.

Contrasting Fig. VI.2 to Fig. VI.3, we see that whilst for non-segmented data the triangular forms of diagram are clearly superior to all others, this becomes less clear when we segment by speed limit. All models perform similarly when 40mph speed limits are active, likely due to the large variation meaning we simply have outliers regardless of what functional form is used. In the 70mph case, we the smooth and discrete triangle diagrams are clearly superior to the Greenberg and Greenshields ones, however we see much more varied performance when considering the Logistic, North Western and Newell. Generally, the difference in quality of fit between each diagrams is reduced when we segment by speed limit, and particularly at lower speed limits one could argue all diagrams are reasonably similar in description of the data, however at higher speed limits the separation becomes more visible.

Vi-C Spatial Consistency

Using this functional form, we can now question how spatially consistent the fundamental diagrams of the M25 are. In particular, one may wonder if there exists one set of parameters that fits the M25, and all links vary around this, or if there are multiple subsets of parameters that best describe the traffic behaviour. To test this, we take the density-flow data for each diagram, again scaling it to ensure all links are independent of the number of lanes. We then fit the Daganzo-Newell fundamental diagram via least squares optimization. To ensure we do not get stuck in local minima, we start 100 such fits of each function, using random initial starting points and waiting for all to converge. Having done this for all links, we have a set of parameters for each link that best describe the fundamental diagram, and we then use clustering methods to discover natural patterns in the data.

As with all clustering methods, we re-scale the parameters to ensure our results are not dominated by a single input, and then using each links parameter set as an input, we cluster the fundamental diagrams, using Hierarchical Agglomerative Clustering (HAC) with Ward linkage [9]

and euclidean distance. Ward’s linkage ensures the clustering result minimizes the variance observed in clusters. Notice that we switch to HAC clustering rather than k-medoids here, as it allows us to visually see how our data separates into clusters, allowing an intuitive explanation of how many clusters are ultimately derived from the data.

After clustering, inspecting the dendrogram shows 3 natural parameter sets emerge in the data, suggesting 3 distinct fundamental diagrams when considering their optimal fit. We show the clustering dendrogram in Fig. VI.4 accompanied by a map showing the spatial locations on the M25.

Fig. VI.4: Left: The dendrogram for the Daganzo-Newell fundamental diagram when fit to each of the links considered on the M25. Right: A plot of the M25, with links coloured by cluster membership. We see a clear separation into 3 clusters when inspecting the dendrogram, with a large cluster (green, 2) covering most of the west side and north-east of the M25. Cluster 1 (blue) is mainly concentrated in the south-east of the M25, although scattered around some other areas, and third cluster (red) mainly covers 3 separate areas in the south-west, north-west and east.

Inspecting the map in Fig. VI.4, we see evidence of spatial structure: the colours representing the three types of link identified by the clustering analysis form long chains rather than being scattered randomly. Additionally, we see two links in cluster 1 (blue) directly before and after the M25 meets the Dartford tunnel and A282, where one would expect different behaviour to say, a link in the center of the M25.

Having identified 3 clear clusters of parameters and spatial consistency, we now consider why these groups exist. As we have clustered the diagram parameters, we can directly relate these to link behaviour. Additionally, we can consider the influence of events by counting the occurrences on each link in each cluster. Results for each parameter set, the number of accidents and obstruction events, and the number of abnormal traffic events are given in Fig. VI.5.

Fig. VI.5: Left: Box plot of the parameter values in each cluster observed during Daganzo-Newell diagram clustering. Center: Box plot of the number of accidents and obstructions on each link in each cluster. Right: Box plot of the number of abnormal traffic events on each link in each cluster. From the parameters box plot, we see a far larger values in cluster 1 than any other cluster. Similarly, there is clear separation of the maximum flow parameters in cluster 2 (green) compared to the other clusters. Cluster 3 appears to have lower maximum flow and values than any other cluster.

Inspecting Fig. VI.5, we see clear differences in the parameter values within each cluster. Note first that, as all density and flow values were scaled before fitting, these fits should be independent on the number of lanes a link has as well as the length. We see cluster 1 (blue) has by far the highest maximum density observed of any of the links. A high maximum density would imply that, during the breakdown phase of flow, higher densities do not cause as severe decreases in flow than in other links. As a result, these links may be those that are more resistant to severe congestion when breakdown does occur.

Cluster 2 (green) has significantly higher maximum flow parameters than any others, as well as critical densities. From this, we would infer that these links are typically reaching a high capacity more often than links in other clusters, and flow breakdown occurs at higher densities than other clusters links. In a spatial sense, these links are mostly located on the west and north of the M25.

In contrast, cluster 3 (red) has far lower maximum flow parameters than either other cluster, as well as lower critical densities. In a real-sense, this means we would expect these links to reach high capacities less often than links in other clusters, and have flow breakdown occur at earlier densities than normal. The majority of these links can be found in the south-west of the M25, as well as in the north-west and east. The eastern links are before the Dartford crossing, although note the single links directly before the crossing are both members of cluster 1 (blue). The south-western links are roughly near Crawley and Guildford.

It is also interesting to note that cluster 1 (blue) has far fewer accidents and obstructions observed than clusters 2 (green) and 3 (red), but the amount of abnormal traffic events is comparable across all clusters. Accidents and obstructions may lead to significant breakdown in the density-flow relationship, however as abnormal traffic events are defined by some arbitrary threshold, they may not necessarily describe the same severity of breakdown as an accident would. If so, this would explain the large parameters observed in this cluster.

A final, and potentially most useful visualization of the cluster differences is simply to plot all of the diagrams for a direct comparison. Such a plot is given in Fig. VI.6, where we plot each cluster as coloured in the previous plots, but include all other diagrams faded for reference.

Fig. VI.6: Plots of all optimal fundamental diagrams, coloured and line-styled by cluster membership. The top diagram highlights cluster 1 (blue), the center cluster 2 (green) and the bottom cluster 3 (red). Here we see how the differences in parameters observed in Fig. VI.5 relate directly to the functional fits of the diagrams.

Considering Fig. VI.6, we see cluster 1 (blue) clearly has the most elongated breakdown phase (large ), where as cluster 2 has all of the highest peaks and some of the most severe breakdowns (low ). Cluster 3 (red) typically has flow breakdown occur earlier than any cluster (low ), shown by having the lowest density maxima, but it is typically neither the most or least extreme (intermediate values).

Vii Discussion & Conclusions

In this paper, we have taken a large dataset from a smart motorway in the UK and considered ways to measure the variation found in fundamental diagrams of traffic flow. We have approached this by looking at the variation when segmented by speed limits active, as well as the spatial dependence of the parameter sets of the most appropriate identified functional forms to represent the data. Our results not only offer a better understanding of the dynamics of traffic behaviour as viewed from the standpoint of a fundamental diagram, but they also highlight how useful different interpretations may be, rather than simply visual checks of scatter plots.

We found two modes exist in typical fundamental diagrams, a high and low density mode, and quantified the spread of data around each mode. We found heavy tail distributions of deviations from these modes, and showed how the range of these decayed as we moved from a stable (70mph speed limit) to a less stable (40mph speed limit) case. Comparing various links, we found 3 clusters of parameter sets exist in fundamental diagrams, and investigated why this may be so. Our investigation of the parameters in each cluster showed spatially where high capacity, relative to other links is most often achieved on the M25, as well as where the earliest breakdowns in flow occurred. Finally, we considered the frequency of events that would impact flow on links, relating these to both the clustering found and also spatially to the M25.

In the future, our results may be useful when considering transportation planning and road management, particularly on where to focus improvements in infrastructure to the M25 or how to understand and quantify the usage of variable speed limits. Further work in this area could address a limitation of our dataset, the lack of measures of severity of events. Currently, an event could simply be debris blocking a single lane on a link, or an overturned vehicle blocking multiple lanes. Additional data to consider the severity of each event in each cluster, as well as some quantitative measure of severity, could provide further insight into this 3 fundamental diagram behaviour.

Appendix A Functional Forms of Diagrams

Table I lists a number of common functional forms of fundamental diagrams.


Flux Function


Greenshield [11]

- Free flow velocity - Maximum road density

Greenberg [7]

- Velocity when road is at full capacity

Northwestern [4]

- Density when road is at full crit

Newell [13]

- Decay calibration coefficient

Logistic Model [19]

- Decay calibration coefficient

Daganzo-Newell [3, 14]

- The maximal flow possible on the road

Continuous Triangle [6]

- Scaling parameter - Turning point parameter - Curvature parameter

TABLE I: Typical fundamental diagrams found in the literature.


The authors are grateful to Dr. Steve Hilditch and Thales UK for sharing expertise on the operation of NTIS and UK transportation systems more generally. We also thank Ayman Boustati and Alvaro Cabrejas Egea for help with data acquisition and pre-processing.


  • [1] J. E. Chacón and T. Duong. Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices. Test, 19(2):375–398, 2010.
  • [2] A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Review, 51(4):661–703, 2009.
  • [3] C F. Daganzo. The cell transmission model: A dynamic representation of highway traffic consistent with the hydrodynamic theory. Transportation Research Part B: Methodological, 28(4):269–287, 1994.
  • [4] J. L. Drake and J. L. Schofer. A statistical analysis of speed-density hypotheses. Highway Research Record 154, pages 53–87, 1966.
  • [5] A. Duret and Y. Yuan. Traffic state estimation based on eulerian and lagrangian observations in a mesoscopic modeling framework. Transportation Research Part B: Methodological, 101:51–71, 2017.
  • [6] S. Fan, M. Herty, and B. Seibold. Comparative model accuracy of a data-fitted generalized aw-rascle-zhang model. Networks and Heterogeneous Media, 9(2):239–268, 2014.
  • [7] H. Greenberg. An analysis of traffic flow. Operations Research, 7(1):79–85, 1959.
  • [8] S. E. Jabari and H. X. Liu. A stochastic model of traffic flow: Theoretical foundations. Transportation Research Part B: Methodological, 46(1):156–174, 2012.
  • [9] H. Joe and Jr. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301):236–244, 1963.
  • [10] L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Series in Probability and Statistics. Wiley, West Sussex, 2008.
  • [11] R. D. Kühne. Greenshields’ legacy: Highway traffic. In 75 Years of the Fundamental Diagram for Traffic Flow Theory: The Fundamental Diagram, Woods Hole, Massachusetts, June 2008.
  • [12] J. Li, Q-Y. Chen, H. Wang, and D. Ni. Analysis of lwr model with fundamental diagram subject to uncertainties. Transportmetrica, 8(6):387–405, 2012.
  • [13] G. F. Newell. Nonlinear effects in the dynamics of car following. Operations Research, 9(2):209–229, 1961.
  • [14] G. F. Newell. A simplified theory of kinematic waves in highway traffic, part ii: Queueing at freeway bottlenecks. Transportation Research Part B: Methodological, 27(4):289–303, 1993.
  • [15] M. Papageorgiou, E. Kosmatopoulos, and I. Papamichail. Effects of variable speed limits on motorway traffic flow. Transportation Research Record: Journal of the Transportation Research Board, 2047:37–48, 2008.
  • [16] T. Seo, T. Kusakabe, and Y. Asakura.

    Calibration of fundamental diagram using trajectories of probe vehicles: Basic formulation and heuristic algorithm.

    Transportation Research Procedia, 21:6–17, 2017. International Symposia of Transport Simulation (ISTS) and the International Workshop on Traffic Data Collection and its Standardization (IWTDCS).
  • [17] L. Shoufeng, W. Jie, H. van Zuylen, and L. Ximin. Deriving the macroscopic fundamental diagram for an urban area using counted flows and taxi gps. In 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), pages 184–188, 10 2013.
  • [18] H. Wang, J. Li, Q-Y. Chen, and D. Ni. Representing the fundamental diagram: the pursuit of mathematical elegance and empirical accuracy. In Transportation Research Board 89th Annual Meeting, pages Paper 10–1354, Washington DC, 2010. Transportation Research Board.
  • [19] H. Wang, J. Li, Q. Y. Chen, and D. Ni. Logistic modeling of the equilibrium speed-density relationship. Transportation Research Part A: Policy and Practice, 45(6):554–566, 2011.
  • [20] Y. Yuan, A. Duret, and H. van Lint.

    Mesoscopic traffic state estimation based on a variational formulation of the lwr model in lagrangian-space coordinates and kalman filter.

    Transportation Research Procedia, 10:82–92, 2015. 18th Euro Working Group on Transportation, EWGT 2015, 14-16 July 2015, Delft, The Netherlands.