Log In Sign Up

Effects of Predictive Real-Time Traffic Signal Information

by   Vadim Sokolov, et al.
George Mason University

This paper analyzes the impact of providing car drivers with predictive information on traffic signal timing in real-time, including time-to-green and green-wave speed recommendations. Over a period of six months, the behavior of these 121 drivers in everyday urban driving was analyzed with and without access to live traffic signal information. In a first period, drivers had the information providing service disabled in order to establish a baseline behavior; after that initial phase, the service was activated. In both cases, data from smartphone and vehicle sensors was collected, including speed, acceleration, fuel rate, acceleration and brake pedal positions. We estimated the changes in the driving behavior which result from drivers' receiving the traffic signal timing information by carefully comparing distributions of acceleration/deceleration patterns through statistical analysis. Our analysis demonstrates that there is a positive effect of providing traffic signal information timing to the drivers.


page 3

page 7


AI-Based Framework for Understanding Car Following Behaviors of Drivers in A Naturalistic Driving Environment

The most common type of accident on the road is a rear-end crash. These ...

Smartphone based Driving Style Classification Using Features Made by Discrete Wavelet Transform

Smartphones consist of different sensors, which provide a platform for d...

Learning to Recommend Signal Plans under Incidents with Real-Time Traffic Prediction

The main question to address in this paper is to recommend optimal signa...

CarFi: Rider Localization Using Wi-Fi CSI

With the rise of hailing services, people are increasingly relying on sh...

Estimating Phase Duration for SPaT Messages

A SPaT (Signal Phase and Timing) message describes for each lane the cur...

Estimating Residual Phase Duration for SPaT Messages

SPaT (Signal Phase and Timing) refers to the current phase at a signaliz...

1 Introduction

Vehicle-to-Infrastructure (V2I) systems have been the subject of intense interest in recent years, offering the promise of significant reductions in fuel consumption and greenhouse gas and other emissions, as well as safety improvements. While there have been many testbed studies that support these promises, as well as anecdotal evidence from limited deployments, the effectiveness of various V2I mechanisms in real-world has not been fully demonstrated. Many factors that could impact, and possibly negate the anticipated benefits, such as driver compliance, distraction, and the impact of other drivers’ behavior are difficult to estimate before wide-scale trials. In order to increase transportation authorities’ willingness to deploy V2I systems, it is important that real-world studies be conducted to gather data that can serve to evaluate these factors. Such studies would, ideally, compare a statistically significant number of individual drivers’ performance with and without V2I assistance, under a variety of driving conditions, for long enough to explore “novelty” effects (e.g., whether drivers stop paying attention once they become habituated to the technology).

One example of V2I technology consists in providing real-time information about upcoming traffic signals, which could bring 8-15% energy savings [11, 12]. However, most studies have focused on modeling and simulation [1, 10] or “professional driver on closed course” studies [12], which may not capture real-world complex factors. This technology could also reduce the number of accidents at signalized intersections. However, only real-world implementation would demonstrate whether that would be the case, and would also uncover any potential unintended consequences - technology supposed to improve safety, such as red-light cameras, sometimes produce counterintuitive results [3].

To help achieve the anticipated benefits, DSRC (Digital Short-Range Communications) is a possible communication technology that would allow the infrastructure to communicate with approaching vehicles, using specialized roadside and in-vehicle equipment. While DSRC offers many benefits, including nearly instantaneous relay of information, this approach requires a significant investment in new infrastructure.

Connected Signals has developed and demonstrated a complementary approach to relaying signal information to vehicles that exploits existing connections between traffic signals and municipal traffic management systems (TMSs), and existing connections between TMSs and the Internet, to access signal data. Cellular technology is used to communicate with vehicles. This approach avoids the need to deploy special-purpose hardware at each intersection and in each vehicle. A number of pilot deployments have been completed in cities in the US, New Zealand, and Australia. Given that a large fraction of urban traffic signals are connected to TMSs, and that vehicles increasingly have built-in cellular connectivity, this approach offers the prospect of being able to connect many signals to many vehicles almost immediately at very low cost. Signal information can be accessed through Connected Signals’ EnLighten smartphone app, as well as directly through integrated systems that have been developed with a number of major vehicle OEMs, including BMW.

2 Study Design

In this study, Connected Signals (CS) data was broadcast to the drivers in one of two forms: through the vehicle infotainment unit for the drivers of BMWs equipped with the functionality, or through CS’ EnLighten smartphone app. Special versions of each were produced to accommodate the study’s needs. The initial fielded version supports signal count-downs and “green-light-assist” information that tells drivers whether they would make an upcoming signal at their current speed. A sample screenshot is shown in Figure 1. The display indicates that the light is currently green and will remain so for 28 seconds. The green arrow shows that, at the current speed, the car will arrive at the intersection during the green phase. The application is available in selected cities, including San Jose, CA, where the field study was conducted.

Figure 1: BMW EnLighten display: next light will be green, based on current speed

The study involved recruiting roughly 400 drivers. With the drivers’ consent, data on vehicle position and speed, acceleration, braking, and (where possible) fuel consumption was collected. This data was transmitted to servers in the cloud, where it was merged with contemporaneous signal-state information for later, offline, analysis.

Drivers who spend significant portions of their driving time both in and out of the covered areas were recruited. This allows their behavior to be compared longitudinally, making it possible to detect habituation effects and eliminate biases that might occur in simple sequential “without data/with data” trials. Data collection was run for approximately six months, to ensure that a meaningful amount of data was collected for each driver, and that each driver experienced a variety of driving conditions.

When in covered areas, drivers were provided with predictive signal information telling them, when possible, whether they would make or miss the next signal at their current speed, a recommended speed to make the next signal, and countdowns for red signal durations when they are stopped. For safety reasons, speed recommendations were limited by the current speed limit, and red-light countdowns stopped at 5 seconds before the signal changes to force drivers to rely on the physical traffic signal. At that time, a chime also sounded to alert drivers to return their focus to driving in case they may have become distracted while waiting for the signal to change.

The experimental design for the study is intended to maximize our ability to determine the effects of signal data availability, given the constraints of what can readily be obtained from a collection of privately owned vehicles and a self-nominated group of participants.

A number of steps were taken to minimize-as much as possible-the effects of such factors as driver and vehicle variability, habituation, and differing driving conditions in and out of signal coverage. First of all, during an initial period, drivers were not provided with signal information over a sufficient number of trips to establish a baseline. During that time, information was collected on drives and correlated with real-time signal state information. This allows determination of how drivers respond to the signal state information they get in the normal way (looking at the lights) without additional predictive or guidance information.

Secondly, throughout the study, data was collected from trips both inside and outside the signal-coverage area. Since the locations (but not the states) of signals outside of coverage are known, this helps distinguish between changes in drivers’ behavior that result from access to signal data and changes that result from other factors such as weather or traffic conditions. While this is not a perfect comparison, it should provide reasonable indicators of the significance of the observed results.

Finally, each driver was assigned a unique ID that was used to associate all their drives. This allows changes in driver behavior to be analyzed longitudinally over the course of the study, including between control and signal-informed driving conditions. The unique IDs were created so they cannot be inverted to identify particular drivers to ensure the privacy of drivers in the study, and all data was anonymized using these IDs as it was received.

Although the characteristics of the study’s drivers and their route selections cannot be controlled for, the ability to compare individual drivers longitudinally, both with and without signal data, over an extended period should minimize the influence of such factors.

For those vehicles with integrated signal information capabilities, the necessary information was captured directly from the participating vehicles. For vehicles without direct integration, a special-purpose version of EnLighten was developed to acquire the necessary data from the smartphones’ sensors. For each trip, time-series data was be collected on:

  1. Vehicle position, heading, and speed

  2. Number and duration of stops

  3. Acceleration and deceleration profiles

  4. Energy consumption (if available)

  5. Availability, timing, and content of provided signal information

  6. Actual signal state (if known).

This information was associated with the vehicle type and driver ID. All data was sent to the cloud both to facilitate provision of signal-state predictions to the vehicle and for recording for subsequent analysis.

Since baseline and longitudinal data without signal provisioning was collected in addition to data with signal information, it should be possible to reliably estimate the effects on energy consumption and safety and estimate the impact of signal time information.

3 San Jose Data

We analyze data from two sources: smartphone and CAN bus. The CAN bus data are collected via OBDII interface. The data was observed during the period from 2016-09-13 to 2017-02-09. Table 1 provides number of observations (in thousands) and number of active and inactive trips from CAN and phone. An active trip is when the feedback system was on. Phone observations were collected from GPS and accelerometer sensors. CAN signals are vehicle dependent. Different manufacturers broadcast different sets of signals via the OBDII interface.

Dataset Obs (A) Obs (I) Trips (A) Trips (I)
CAN Speed 6461 9941 2535 4406
Phone Speed 3172 3753 4661 6803
HMM 2408 2918 2961 4359
Phone Acceleration 2852 3861 4733 8033
CAN Acc. Pedal D 2171 3325 2306 3911
CAN RPM 1088 1726 2482 4315
CAN Throttle 1088 1727 2450 4249
CAN Acc. Pedal E 1085 1662 2307 3904
CAN Throttle R 1068 1636 2299 3844
CAN Throttle B 1051 1591 2268 3754
CAN Fuel Rate 19 62 79 165
BMW RPM 998 488 214 57
BMW Fuel 916 460 214 57
BMW Acceleration 9 171 41 13
BMW Location 73 44 165 56
BMW Start/Stop 13 2 214 57
BMW Brake 9 0 41 3
Feedback Time 414 0 2570 0
Feedback Green 152 0 3303 0
Table 1: Number of active (A) and inactive (I) observations and trips, Number of observations are in thousands.

Observations collected from CAN bus data is collected at irregular frequency, with most of the observations being collected at frequency of 3 Hz. On the other hand, phone data was observed at regular frequency of 1 Hz. Figure 2 shows the empirical distribution of the frequencies at which data was collected from both sources.

Figure 2: Histogram of observations’ frequency from (a) CAN bus and (b) phone sensors

The irregularity of the CAN observations’ frequency is most likely due to WiFi connection disruptions. The data from ODBII dongle was transmitted to the phone via WiFi connection and then the phone would send data to the back-end server.

There are 13154 road segments from which the data was observed. Out of those, 3620 were road segments on which drivers would receive information about traffic light timing. Road segments on which traffic light information is available we will call active segments. As shown in Figure 3, most of the road segments in the central part of San Jose are active.

Figure 3: Active Road Segments. Road segments colored in red are active

Further, most of the data was collected in the central part of San Jose. Maps on Figure 4 show a 0.01% sample of observations from the analyzed data set.

(a) Sampled Locations (b) Heat map
Figure 4: Sample of locations where data was recorded

3.1 Data Processing

Data from phone’s GPS sensor was matched to road segments using Hidden Markov Model (HMM) 

[6, 4]. Missing observations were considered to be at random. Plot 5 shows distribution of durations of missing observations. There is a heavy tail for durations of missing observations among CAN signals. This is likely due to the connectivity issues between OBDII dongle and the phone that collects the data.

(a) CAN (b) Phone
Figure 5: Histogram of duration of missing data from CAN and phone.

When we calculate derivatives of location to calculate the speed, we simply remove the observation with time jumps. We also observed that some of the trips had long sequence of zero speed observations. We removed those zero observations from analysis.

To address the issue of noise we truncated observations with values that are beyond physical limits. Speed was truncated to be inside miles per hour and acceleration was truncated to be inside . Further we used Kalman smoothing to remove sharp acceleration spikes and changes in speed that violate basic laws of vehicle dynamics.

3.1.1 Kalman Smoothing

We formulate dynamics of the speed and acceleration observations as a state-space model


Where vector with speed and acceleration . Further,

We used

Thus, we consider that change in speed from time step to time step as normally distributed with mean 0 and variance

and change in acceleration follows a normal distribution with mean 0 and variance 1. For each of the trajectories, we used and . Figure 6 shows the result of applying Kalman smoothing (red line) to noisy speed and acceleration signals from phone GPS sensor (black line).

Figure 6:

Observations Smoothed with Kalman Filter

3.1.2 Dynamic Time Warping

Signals collected from CAN did not have associated locations. To add location attribute to each of the CAN signals, we joined CAN records with the phone records, which do have location attributes. Since the data was collected at different frequencies, a simple data base join operation that looks for equal time stamps would not work. We used dynamic time warping (DTW) [5] to join the CAN and phone datasets. DTW finds an optimal alignment between two time series datasets. It “warps” one of the sequences to match the other one. Given two time-dependent sequences and , DTW calculates the optimal warping path that minimizes the total distance between time series

here is the path function which for each index calculates corresponding indexes in and vectors. In our application the cost function is the distance between the time stamps of each individual observations.

4 Analysis of Information Effect on Driving Behavior

We applied data cleaning and filtering techniques described in the previous section to the datasets. Further, to remove bias, we only compared records from those roads where traffic signal information was available. The resulting cleaned phone speed data set contains 1,093,506 active observations and 1,311,666 inactive observations. An active observation was recorded when information about traffic signal timing was provided to the driver and inactive observations were recorded when no such information was provided.

First, we compare summary statistics for both active and inactive groups. The means for acceleration values are given in Table 2.

Status Mean positive acceleration Mean negative acceleration
Inactive 0.7626 -0.767
Active 0.746 -0.741
Table 2: Mean for positive and negative acceleration values

A one sided Welch two-sample -test for the difference in the means

confirms that the difference is significant. For positive acceleration the 95% confidence interval for the difference is

and for negative accelerations, interval is with -value 10 in both cases.

Further, we performed the -test for the means of CAN signals. Results of analysis of the CAN signals is shown in Table 3.

Signal Inactive Mean Active Mean Conf. Interval
Pedal D 57.534 57.719 (-0.225,-0.145)
RPM 1285.8 1098.4 (185.07,189.77)
Throttle 50.769 48.862 (1.8435,1.9692)
Acceleration Pedal 55.944 62.959 (-7.0984,-6.9321)
Throttle Relative 20.458 17.207 (3.185,3.3175)
Throttle Position 79.808 90.074 (-10.366,-10.166)
Fuel Rate 9420.4 3222.3 (5743.2,6653)
BMW RPM 5183.5 5343.2 (-165.73,-153.74)
BMW Fuel 32708 32430 (211.14,344.98)
Table 3: Analysis of the means of CAN signals.

We also performed -test for individual road segments, however, even for the most traveled roads of the network, the sample sizes were not large enough to make conclusive statements. Table 4 shows the means and results of -test for the top 10 most traveled roads in San Jose.

Road ID Mean Inactive (# obs) Mean Active (# obs) -value
24166 0.826 (886) 0.751 (2686) 0.0013
77790 0.656 (1001) 0.649 (1514) 0.3904
22025 0.664 (813) 0.668 (2733) 0.5813
20566 0.626 (1392) 0.738 (1595) 1.0000
9028 0.722 (220) 0.587 (440) 0.0104
12686 0.708 (422) 0.666 (1064) 0.1287
80357 0.653 (3172) 0.722 (257) 0.9413
80356 0.494 (2054) 0.677 (183) 0.9999
9717 0.848 (655) 0.922 (1162) 0.9741
24552 0.510 (702) 0.518 (2161) 0.6511
Table 4: Comparison of mean acceleration values for individual roads.

The higher mean acceleration/deceleration values in the inactive group show that the information did have a positive effect. The difference in the means, though, can be result of many drivers to accelerating “slightly faster” without information or as a result of presence of sharp acceleration patterns in the data. To answer this question we need to analyze the sharp acceleration patterns. Statistically speaking we are interested in behavior of the extreme acceleration/deceleration observations, a.k.a. tail observations.

Thus, instead of analyzing means, we need to analyze the entire distribution of the observations. Figure 7 shows the empirical distribution (histogram) for acceleration/deceleration for both active and inactive groups.

(a) Positive (b) Negative
Figure 7: Histogram for (a) Positive and (b) Negative acceleration values

Active values have more mass around modes of the distribution, while the distribution for inactive values has a heavier tail. The Kolmogorov-Smirnov test confirms the empirical observation that distributions are different. We perform a Kolmogorov-Smirnov test to compare empirical cumulative distributions. Figure 8

shows the empirical cumulative distribution functions (CDF) for acceleration values for two groups of trips (active/inactive).

Figure 8: Empirical cumulative distribution for acceleration observations from active group (solid line) and inactive group (dashed line).

The Kolmogorov-Smirnov -statistic equals 0.016 and the -value is 10. Thus, we accept the alternative hypothesis that the CDF of inactive values lies below that of active values for acceleration values.

Figure 9 shows the empirical cumulative distribution functions (CDF) for deceleration values for two groups of trips (active/inactive).

Figure 9: Empirical cumulative distribution for deceleration observations from active group (solid line) and inactive group (dashed line).

The Kolmogorov-Smirnov -statistic equals 0.021 and the -value is 10. Thus, we accept the alternative hypothesis that the CDF of inactive values lies above that of active values for deceleration observations.

4.1 Extreme Value Theory

The Kolmogorov-Smirnov test confirms that the distributions of the active and inactive groups differ in their tails. To further quantify the difference of the tail observations, we use Extreme value theory (EVT) [2]. EVT was developed to analyze extreme climate events and sharp market movements [9]. For example, Sigauke et. el. [8] use EVT for electricity demand modeling, and [7] provide a statistical model combined with EVT for electricity markets.

We are interested in predicting the frequency at which a variable exceeds a certain threshold. For example, we consider acceleration greater than 3 m/s to be aggressive driving and are interested in understanding the frequency of those events. Let denote our variable of interest, for example acceleration, and consider the exceedence over threshold events

. Then the probability of this event has a limiting generalized Pareto (GP) distribution, so that

Here are the location, scale and shape parameters, and .

is called Generalized Pareto (GP) distribution. The Exponential distribution is obtained by continuity as


To verify that this theoretical result holds for a given dataset, we can use the following property of the excedence function, empirically if follows Generalized Pareto distribution, then for and , we have

An empirical plot of mean excess threshold should, therefore, be close to straight line with slope . Figure 10 shows that for our dataset this identity holds. Thus, we can use GP distribution to analyze the tails behavior.

(a) Acceleration (b) Deceleration
Figure 10: Mean excess plot for active and inactive trips.

We fit the GP distribution and use it to calculate what threshold is expected to be exceeded every 24 seconds. Table 5 provides the result.

Acceleration Deceleration
Active 2.18 -2.26
Inactive 2.36 -2.31
Table 5: Level exceeded every 24 seconds of the observation

The formal analysis of tails using GP distribution confirms our empirical observation that the inactive group has heavier tails.

5 Discussion

The main contribution of this paper is the development and application of statistical techniques for analysis of driving data collected from smartphone and vehicle sensors. We analyzed two groups of observations: active and inactive. Active users received traffic information and inactive useres did not. We compared the means of the acceleration/deceleration observations and found that active drivers have smoother driving patterns. Our analysis demonstrates that there is a positive effect of providing traffic signal information timing to the drivers. Further, we analyzed extreme acceleration and deceleration patterns (tail observation). We showed that extreme acceleration and decelerations arise less frequently in the active group. The difference in the mean observations is not large. For example mean acceleration among active group is 26 m/s and it is m/s for the inactive group, which is a 2.2% reduction. However, the difference among extreme accelerations/decelerations is more pronounced. We observed that every 24 seconds m/s is expected to be exceeded by active group and m/s for the inactive group. Thus, the reduction is 7.6%.

There are many directions for future research. These include development of Bayesian analysis techniques which allow to “pool” data from multiple regions to derive metrics specific to individual road segments or intersections. Further, development of statistical models that use type of information provided as inputs would support understanding how drivers react to different types of messages. It will be useful to understand how changes in driving behavior impact vehicle emissions and safety metrics. For example, will smoother driving cycles lead to less accidents at the intersections and how will this impact fuel consumption and CO emissions?

Finally, the data presented here are dependent on the level of accuracy of Connected Signals’ predictive signal data and the particular presentation(s) of that data to drivers used for the study. More study on the effects of varying presentations and various accuracy requirements would clearly be beneficial.

6 Acknowledgment

The study is supported in part by the US Department of Energy under its “Small Business Vouchers Pilot” program, and includes participation by Connected Signals, Inc., Argonne National Laboratories, BMW, and the cities of San Jose and Walnut Creek, California.


  • [1] Behrang Asadi and Ardalan Vahidi. Predictive cruise control: Utilizing upcoming traffic signal information for improving fuel economy and reducing trip time. IEEE transactions on control systems technology, 19(3):707–714, 2011.
  • [2] Stuart Coles, Joanna Bawa, Lesley Trenner, and Pat Dorazio. An introduction to statistical modeling of extreme values, volume 208. Springer, 2001.
  • [3] Dominique Lord and Srinivas Reddy Geedipally. Safety effects of the red-light camera enforcement program in chicago, illinois, 2014.
  • [4] Qi Luo, Joshua Auld, and Vadim Sokolov. Addressing some issues of map-matching for large-scale, high-frequency gps data sets. In TRB 2015 Annual Meeting, Washington, DC, 2015.
  • [5] Meinard Müller. Dynamic time warping. Information retrieval for music and motion, pages 69–84, 2007.
  • [6] Paul Newson and John Krumm. Hidden markov map matching through noise and sparseness. In Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems, pages 336–343. ACM, 2009.
  • [7] Saahil Shenoy and Dimitry Gorinevsky. Risk adjusted forecasting of electric power load. In American Control Conference (ACC), 2014, pages 914–919. IEEE, 2014.
  • [8] Caston Sigauke, Andréhette Verster, and Delson Chikobvu.

    Extreme daily increases in peak electricity demand: Tail-quantile estimation.

    Energy Policy, 53:90–96, 2013.
  • [9] Richard L Smith. Measuring risk with extreme value theory. Risk management: value at risk and beyond, 224, 2002.
  • [10] Tessa Tielert, Moritz Killat, Hannes Hartenstein, Raphael Luz, Stefan Hausberger, and Thomas Benz. The impact of traffic-light-to-vehicle communication on fuel consumption and emissions. In Internet of Things (IOT), 2010, pages 1–8. IEEE, 2010.
  • [11] Andreas Weber and Andreas Winckler. Advanced traffic signal control algorithms, appendix a: Exploratory advanced research project: Bmw final report. Technical report, 2013.
  • [12] Haitao Xia, Kanok Boriboonsomsin, Friedrich Schweizer, Andreas Winckler, Kun Zhou, Wei-Bin Zhang, and Matthew Barth. Field operational testing of eco-approach technology at a fixed-time signalized intersection. In Intelligent Transportation Systems (ITSC), 2012 15th International IEEE Conference on, pages 188–193. IEEE, 2012.