1 Introduction
Vehicle-to-Infrastructure (V2I) systems have been the subject of intense interest in recent years, offering the promise of significant reductions in fuel consumption and greenhouse gas and other emissions, as well as safety improvements. While there have been many testbed studies that support these promises, as well as anecdotal evidence from limited deployments, the effectiveness of various V2I mechanisms in real-world has not been fully demonstrated. Many factors that could impact, and possibly negate the anticipated benefits, such as driver compliance, distraction, and the impact of other drivers’ behavior are difficult to estimate before wide-scale trials. In order to increase transportation authorities’ willingness to deploy V2I systems, it is important that real-world studies be conducted to gather data that can serve to evaluate these factors. Such studies would, ideally, compare a statistically significant number of individual drivers’ performance with and without V2I assistance, under a variety of driving conditions, for long enough to explore “novelty” effects (e.g., whether drivers stop paying attention once they become habituated to the technology).
One example of V2I technology consists in providing real-time information about upcoming traffic signals, which could bring 8-15% energy savings [11, 12]. However, most studies have focused on modeling and simulation [1, 10] or “professional driver on closed course” studies [12], which may not capture real-world complex factors. This technology could also reduce the number of accidents at signalized intersections. However, only real-world implementation would demonstrate whether that would be the case, and would also uncover any potential unintended consequences - technology supposed to improve safety, such as red-light cameras, sometimes produce counterintuitive results [3].
To help achieve the anticipated benefits, DSRC (Digital Short-Range Communications) is a possible communication technology that would allow the infrastructure to communicate with approaching vehicles, using specialized roadside and in-vehicle equipment. While DSRC offers many benefits, including nearly instantaneous relay of information, this approach requires a significant investment in new infrastructure.
Connected Signals has developed and demonstrated a complementary approach to relaying signal information to vehicles that exploits existing connections between traffic signals and municipal traffic management systems (TMSs), and existing connections between TMSs and the Internet, to access signal data. Cellular technology is used to communicate with vehicles. This approach avoids the need to deploy special-purpose hardware at each intersection and in each vehicle. A number of pilot deployments have been completed in cities in the US, New Zealand, and Australia. Given that a large fraction of urban traffic signals are connected to TMSs, and that vehicles increasingly have built-in cellular connectivity, this approach offers the prospect of being able to connect many signals to many vehicles almost immediately at very low cost. Signal information can be accessed through Connected Signals’ EnLighten smartphone app, as well as directly through integrated systems that have been developed with a number of major vehicle OEMs, including BMW.
2 Study Design
In this study, Connected Signals (CS) data was broadcast to the drivers in one of two forms: through the vehicle infotainment unit for the drivers of BMWs equipped with the functionality, or through CS’ EnLighten smartphone app. Special versions of each were produced to accommodate the study’s needs. The initial fielded version supports signal count-downs and “green-light-assist” information that tells drivers whether they would make an upcoming signal at their current speed. A sample screenshot is shown in Figure 1. The display indicates that the light is currently green and will remain so for 28 seconds. The green arrow shows that, at the current speed, the car will arrive at the intersection during the green phase. The application is available in selected cities, including San Jose, CA, where the field study was conducted.

The study involved recruiting roughly 400 drivers. With the drivers’ consent, data on vehicle position and speed, acceleration, braking, and (where possible) fuel consumption was collected. This data was transmitted to servers in the cloud, where it was merged with contemporaneous signal-state information for later, offline, analysis.
Drivers who spend significant portions of their driving time both in and out of the covered areas were recruited. This allows their behavior to be compared longitudinally, making it possible to detect habituation effects and eliminate biases that might occur in simple sequential “without data/with data” trials. Data collection was run for approximately six months, to ensure that a meaningful amount of data was collected for each driver, and that each driver experienced a variety of driving conditions.
When in covered areas, drivers were provided with predictive signal information telling them, when possible, whether they would make or miss the next signal at their current speed, a recommended speed to make the next signal, and countdowns for red signal durations when they are stopped. For safety reasons, speed recommendations were limited by the current speed limit, and red-light countdowns stopped at 5 seconds before the signal changes to force drivers to rely on the physical traffic signal. At that time, a chime also sounded to alert drivers to return their focus to driving in case they may have become distracted while waiting for the signal to change.
The experimental design for the study is intended to maximize our ability to determine the effects of signal data availability, given the constraints of what can readily be obtained from a collection of privately owned vehicles and a self-nominated group of participants.
A number of steps were taken to minimize-as much as possible-the effects of such factors as driver and vehicle variability, habituation, and differing driving conditions in and out of signal coverage. First of all, during an initial period, drivers were not provided with signal information over a sufficient number of trips to establish a baseline. During that time, information was collected on drives and correlated with real-time signal state information. This allows determination of how drivers respond to the signal state information they get in the normal way (looking at the lights) without additional predictive or guidance information.
Secondly, throughout the study, data was collected from trips both inside and outside the signal-coverage area. Since the locations (but not the states) of signals outside of coverage are known, this helps distinguish between changes in drivers’ behavior that result from access to signal data and changes that result from other factors such as weather or traffic conditions. While this is not a perfect comparison, it should provide reasonable indicators of the significance of the observed results.
Finally, each driver was assigned a unique ID that was used to associate all their drives. This allows changes in driver behavior to be analyzed longitudinally over the course of the study, including between control and signal-informed driving conditions. The unique IDs were created so they cannot be inverted to identify particular drivers to ensure the privacy of drivers in the study, and all data was anonymized using these IDs as it was received.
Although the characteristics of the study’s drivers and their route selections cannot be controlled for, the ability to compare individual drivers longitudinally, both with and without signal data, over an extended period should minimize the influence of such factors.
For those vehicles with integrated signal information capabilities, the necessary information was captured directly from the participating vehicles. For vehicles without direct integration, a special-purpose version of EnLighten was developed to acquire the necessary data from the smartphones’ sensors. For each trip, time-series data was be collected on:
-
Vehicle position, heading, and speed
-
Number and duration of stops
-
Acceleration and deceleration profiles
-
Energy consumption (if available)
-
Availability, timing, and content of provided signal information
-
Actual signal state (if known).
This information was associated with the vehicle type and driver ID. All data was sent to the cloud both to facilitate provision of signal-state predictions to the vehicle and for recording for subsequent analysis.
Since baseline and longitudinal data without signal provisioning was collected in addition to data with signal information, it should be possible to reliably estimate the effects on energy consumption and safety and estimate the impact of signal time information.
3 San Jose Data
We analyze data from two sources: smartphone and CAN bus. The CAN bus data are collected via OBDII interface. The data was observed during the period from 2016-09-13 to 2017-02-09. Table 1 provides number of observations (in thousands) and number of active and inactive trips from CAN and phone. An active trip is when the feedback system was on. Phone observations were collected from GPS and accelerometer sensors. CAN signals are vehicle dependent. Different manufacturers broadcast different sets of signals via the OBDII interface.
Dataset | Obs (A) | Obs (I) | Trips (A) | Trips (I) |
---|---|---|---|---|
CAN Speed | 6461 | 9941 | 2535 | 4406 |
Phone Speed | 3172 | 3753 | 4661 | 6803 |
HMM | 2408 | 2918 | 2961 | 4359 |
Phone Acceleration | 2852 | 3861 | 4733 | 8033 |
CAN Acc. Pedal D | 2171 | 3325 | 2306 | 3911 |
CAN RPM | 1088 | 1726 | 2482 | 4315 |
CAN Throttle | 1088 | 1727 | 2450 | 4249 |
CAN Acc. Pedal E | 1085 | 1662 | 2307 | 3904 |
CAN Throttle R | 1068 | 1636 | 2299 | 3844 |
CAN Throttle B | 1051 | 1591 | 2268 | 3754 |
CAN Fuel Rate | 19 | 62 | 79 | 165 |
BMW RPM | 998 | 488 | 214 | 57 |
BMW Fuel | 916 | 460 | 214 | 57 |
BMW Acceleration | 9 | 171 | 41 | 13 |
BMW Location | 73 | 44 | 165 | 56 |
BMW Start/Stop | 13 | 2 | 214 | 57 |
BMW Brake | 9 | 0 | 41 | 3 |
Feedback Time | 414 | 0 | 2570 | 0 |
Feedback Green | 152 | 0 | 3303 | 0 |
Observations collected from CAN bus data is collected at irregular frequency, with most of the observations being collected at frequency of 3 Hz. On the other hand, phone data was observed at regular frequency of 1 Hz. Figure 2 shows the empirical distribution of the frequencies at which data was collected from both sources.
![]() |
![]() |
The irregularity of the CAN observations’ frequency is most likely due to WiFi connection disruptions. The data from ODBII dongle was transmitted to the phone via WiFi connection and then the phone would send data to the back-end server.
There are 13154 road segments from which the data was observed. Out of those, 3620 were road segments on which drivers would receive information about traffic light timing. Road segments on which traffic light information is available we will call active segments. As shown in Figure 3, most of the road segments in the central part of San Jose are active.

Further, most of the data was collected in the central part of San Jose. Maps on Figure 4 show a 0.01% sample of observations from the analyzed data set.
![]() |
![]() |
(a) Sampled Locations | (b) Heat map |
3.1 Data Processing
Data from phone’s GPS sensor was matched to road segments using Hidden Markov Model (HMM)
[6, 4]. Missing observations were considered to be at random. Plot 5 shows distribution of durations of missing observations. There is a heavy tail for durations of missing observations among CAN signals. This is likely due to the connectivity issues between OBDII dongle and the phone that collects the data.![]() |
![]() |
(a) CAN | (b) Phone |
When we calculate derivatives of location to calculate the speed, we simply remove the observation with time jumps. We also observed that some of the trips had long sequence of zero speed observations. We removed those zero observations from analysis.
To address the issue of noise we truncated observations with values that are beyond physical limits. Speed was truncated to be inside miles per hour and acceleration was truncated to be inside . Further we used Kalman smoothing to remove sharp acceleration spikes and changes in speed that violate basic laws of vehicle dynamics.
3.1.1 Kalman Smoothing
We formulate dynamics of the speed and acceleration observations as a state-space model
(1) | ||||
(2) |
Where vector with speed and acceleration . Further,
We used
Thus, we consider that change in speed from time step to time step as normally distributed with mean 0 and variance
and change in acceleration follows a normal distribution with mean 0 and variance 1. For each of the trajectories, we used and . Figure 6 shows the result of applying Kalman smoothing (red line) to noisy speed and acceleration signals from phone GPS sensor (black line).![]() |
![]() |
Observations Smoothed with Kalman Filter
3.1.2 Dynamic Time Warping
Signals collected from CAN did not have associated locations. To add location attribute to each of the CAN signals, we joined CAN records with the phone records, which do have location attributes. Since the data was collected at different frequencies, a simple data base join operation that looks for equal time stamps would not work. We used dynamic time warping (DTW) [5] to join the CAN and phone datasets. DTW finds an optimal alignment between two time series datasets. It “warps” one of the sequences to match the other one. Given two time-dependent sequences and , DTW calculates the optimal warping path that minimizes the total distance between time series
here is the path function which for each index calculates corresponding indexes in and vectors. In our application the cost function is the distance between the time stamps of each individual observations.
4 Analysis of Information Effect on Driving Behavior
We applied data cleaning and filtering techniques described in the previous section to the datasets. Further, to remove bias, we only compared records from those roads where traffic signal information was available. The resulting cleaned phone speed data set contains 1,093,506 active observations and 1,311,666 inactive observations. An active observation was recorded when information about traffic signal timing was provided to the driver and inactive observations were recorded when no such information was provided.
First, we compare summary statistics for both active and inactive groups. The means for acceleration values are given in Table 2.
Status | Mean positive acceleration | Mean negative acceleration |
---|---|---|
Inactive | 0.7626 | -0.767 |
Active | 0.746 | -0.741 |
A one sided Welch two-sample -test for the difference in the means
confirms that the difference is significant. For positive acceleration the 95% confidence interval for the difference is
and for negative accelerations, interval is with -value 10 in both cases.Further, we performed the -test for the means of CAN signals. Results of analysis of the CAN signals is shown in Table 3.
Signal | Inactive Mean | Active Mean | Conf. Interval |
Pedal D | 57.534 | 57.719 | (-0.225,-0.145) |
RPM | 1285.8 | 1098.4 | (185.07,189.77) |
Throttle | 50.769 | 48.862 | (1.8435,1.9692) |
Acceleration Pedal | 55.944 | 62.959 | (-7.0984,-6.9321) |
Throttle Relative | 20.458 | 17.207 | (3.185,3.3175) |
Throttle Position | 79.808 | 90.074 | (-10.366,-10.166) |
Fuel Rate | 9420.4 | 3222.3 | (5743.2,6653) |
BMW RPM | 5183.5 | 5343.2 | (-165.73,-153.74) |
BMW Fuel | 32708 | 32430 | (211.14,344.98) |
We also performed -test for individual road segments, however, even for the most traveled roads of the network, the sample sizes were not large enough to make conclusive statements. Table 4 shows the means and results of -test for the top 10 most traveled roads in San Jose.
Road ID | Mean Inactive (# obs) | Mean Active (# obs) | -value |
---|---|---|---|
24166 | 0.826 (886) | 0.751 (2686) | 0.0013 |
77790 | 0.656 (1001) | 0.649 (1514) | 0.3904 |
22025 | 0.664 (813) | 0.668 (2733) | 0.5813 |
20566 | 0.626 (1392) | 0.738 (1595) | 1.0000 |
9028 | 0.722 (220) | 0.587 (440) | 0.0104 |
12686 | 0.708 (422) | 0.666 (1064) | 0.1287 |
80357 | 0.653 (3172) | 0.722 (257) | 0.9413 |
80356 | 0.494 (2054) | 0.677 (183) | 0.9999 |
9717 | 0.848 (655) | 0.922 (1162) | 0.9741 |
24552 | 0.510 (702) | 0.518 (2161) | 0.6511 |
The higher mean acceleration/deceleration values in the inactive group show that the information did have a positive effect. The difference in the means, though, can be result of many drivers to accelerating “slightly faster” without information or as a result of presence of sharp acceleration patterns in the data. To answer this question we need to analyze the sharp acceleration patterns. Statistically speaking we are interested in behavior of the extreme acceleration/deceleration observations, a.k.a. tail observations.
Thus, instead of analyzing means, we need to analyze the entire distribution of the observations. Figure 7 shows the empirical distribution (histogram) for acceleration/deceleration for both active and inactive groups.
![]() |
![]() |
(a) Positive | (b) Negative |
Active values have more mass around modes of the distribution, while the distribution for inactive values has a heavier tail. The Kolmogorov-Smirnov test confirms the empirical observation that distributions are different. We perform a Kolmogorov-Smirnov test to compare empirical cumulative distributions. Figure 8
shows the empirical cumulative distribution functions (CDF) for acceleration values for two groups of trips (active/inactive).

The Kolmogorov-Smirnov -statistic equals 0.016 and the -value is 10. Thus, we accept the alternative hypothesis that the CDF of inactive values lies below that of active values for acceleration values.
Figure 9 shows the empirical cumulative distribution functions (CDF) for deceleration values for two groups of trips (active/inactive).

The Kolmogorov-Smirnov -statistic equals 0.021 and the -value is 10. Thus, we accept the alternative hypothesis that the CDF of inactive values lies above that of active values for deceleration observations.
4.1 Extreme Value Theory
The Kolmogorov-Smirnov test confirms that the distributions of the active and inactive groups differ in their tails. To further quantify the difference of the tail observations, we use Extreme value theory (EVT) [2]. EVT was developed to analyze extreme climate events and sharp market movements [9]. For example, Sigauke et. el. [8] use EVT for electricity demand modeling, and [7] provide a statistical model combined with EVT for electricity markets.
We are interested in predicting the frequency at which a variable exceeds a certain threshold. For example, we consider acceleration greater than 3 m/s to be aggressive driving and are interested in understanding the frequency of those events. Let denote our variable of interest, for example acceleration, and consider the exceedence over threshold events
. Then the probability of this event has a limiting generalized Pareto (GP) distribution, so that
Here are the location, scale and shape parameters, and .
is called Generalized Pareto (GP) distribution. The Exponential distribution is obtained by continuity as
.To verify that this theoretical result holds for a given dataset, we can use the following property of the excedence function, empirically if follows Generalized Pareto distribution, then for and , we have
An empirical plot of mean excess threshold should, therefore, be close to straight line with slope . Figure 10 shows that for our dataset this identity holds. Thus, we can use GP distribution to analyze the tails behavior.
![]() |
![]() |
(a) Acceleration | (b) Deceleration |
We fit the GP distribution and use it to calculate what threshold is expected to be exceeded every 24 seconds. Table 5 provides the result.
Acceleration | Deceleration | |
---|---|---|
Active | 2.18 | -2.26 |
Inactive | 2.36 | -2.31 |
The formal analysis of tails using GP distribution confirms our empirical observation that the inactive group has heavier tails.
5 Discussion
The main contribution of this paper is the development and application of statistical techniques for analysis of driving data collected from smartphone and vehicle sensors. We analyzed two groups of observations: active and inactive. Active users received traffic information and inactive useres did not. We compared the means of the acceleration/deceleration observations and found that active drivers have smoother driving patterns. Our analysis demonstrates that there is a positive effect of providing traffic signal information timing to the drivers. Further, we analyzed extreme acceleration and deceleration patterns (tail observation). We showed that extreme acceleration and decelerations arise less frequently in the active group. The difference in the mean observations is not large. For example mean acceleration among active group is 26 m/s and it is m/s for the inactive group, which is a 2.2% reduction. However, the difference among extreme accelerations/decelerations is more pronounced. We observed that every 24 seconds m/s is expected to be exceeded by active group and m/s for the inactive group. Thus, the reduction is 7.6%.
There are many directions for future research. These include development of Bayesian analysis techniques which allow to “pool” data from multiple regions to derive metrics specific to individual road segments or intersections. Further, development of statistical models that use type of information provided as inputs would support understanding how drivers react to different types of messages. It will be useful to understand how changes in driving behavior impact vehicle emissions and safety metrics. For example, will smoother driving cycles lead to less accidents at the intersections and how will this impact fuel consumption and CO emissions?
Finally, the data presented here are dependent on the level of accuracy of Connected Signals’ predictive signal data and the particular presentation(s) of that data to drivers used for the study. More study on the effects of varying presentations and various accuracy requirements would clearly be beneficial.
6 Acknowledgment
The study is supported in part by the US Department of Energy under its “Small Business Vouchers Pilot” program, and includes participation by Connected Signals, Inc., Argonne National Laboratories, BMW, and the cities of San Jose and Walnut Creek, California.
References
- [1] Behrang Asadi and Ardalan Vahidi. Predictive cruise control: Utilizing upcoming traffic signal information for improving fuel economy and reducing trip time. IEEE transactions on control systems technology, 19(3):707–714, 2011.
- [2] Stuart Coles, Joanna Bawa, Lesley Trenner, and Pat Dorazio. An introduction to statistical modeling of extreme values, volume 208. Springer, 2001.
- [3] Dominique Lord and Srinivas Reddy Geedipally. Safety effects of the red-light camera enforcement program in chicago, illinois, 2014.
- [4] Qi Luo, Joshua Auld, and Vadim Sokolov. Addressing some issues of map-matching for large-scale, high-frequency gps data sets. In TRB 2015 Annual Meeting, Washington, DC, 2015.
- [5] Meinard Müller. Dynamic time warping. Information retrieval for music and motion, pages 69–84, 2007.
- [6] Paul Newson and John Krumm. Hidden markov map matching through noise and sparseness. In Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems, pages 336–343. ACM, 2009.
- [7] Saahil Shenoy and Dimitry Gorinevsky. Risk adjusted forecasting of electric power load. In American Control Conference (ACC), 2014, pages 914–919. IEEE, 2014.
-
[8]
Caston Sigauke, Andréhette Verster, and Delson Chikobvu.
Extreme daily increases in peak electricity demand: Tail-quantile estimation.
Energy Policy, 53:90–96, 2013. - [9] Richard L Smith. Measuring risk with extreme value theory. Risk management: value at risk and beyond, 224, 2002.
- [10] Tessa Tielert, Moritz Killat, Hannes Hartenstein, Raphael Luz, Stefan Hausberger, and Thomas Benz. The impact of traffic-light-to-vehicle communication on fuel consumption and emissions. In Internet of Things (IOT), 2010, pages 1–8. IEEE, 2010.
- [11] Andreas Weber and Andreas Winckler. Advanced traffic signal control algorithms, appendix a: Exploratory advanced research project: Bmw final report. Technical report, 2013.
- [12] Haitao Xia, Kanok Boriboonsomsin, Friedrich Schweizer, Andreas Winckler, Kun Zhou, Wei-Bin Zhang, and Matthew Barth. Field operational testing of eco-approach technology at a fixed-time signalized intersection. In Intelligent Transportation Systems (ITSC), 2012 15th International IEEE Conference on, pages 188–193. IEEE, 2012.