An adaptive cruise control (ACC) is a control system that allows vehicles to maintain a desired speed until it finds a slow-moving front vehicle based on radar, lasers, or cameras, after which ACC allows vehicles to keep a desired headway distance to the front vehicle by adjusting acceleration and deceleration vahidi2003research. More and more vehicles are being equipped with ACC manolis2020real. The market penetration rate of ACC is expected to increase continuously makridis2020empirical. ACC is becoming standard equipment in many recently available commercial vehicles.
While ACC is primarily designed for improving driving comfort, numerous research has been conducted to demonstrate the effect of ACC on traffic flow efficiency especially focusing on how the headway distance influences traffic flow davis2004effect. These works are based on either microscopic simulation ntousakis2015microscopic; bayar2016impact or macroscopic modelling delis2016simulation; nikolos2015macroscopic, and the general consensus is that ACC has strong potential to improve traffic flow while also noting that poor configurations of ACC settings may result in degraded traffic flow delis2016simulation. However, these works are limited to analysis on the effect of ACC on traffic flow and do not present a solution on how to configure the ACC setting optimally in response to traffic conditions to maximize traffic flow.
There are some early efforts that optimize the ACC setting to maximize traffic flow mba2016evaluation; liu2017fine. However, these works are based on off-line optimization which does not cope well with dynamically changing traffic conditions. There are a few works that address this limitation by adjusting the ACC setting adaptively in real-time according to dynamic traffic conditions schakel2014improving; bekiaris2019feedback; goni2019using. Specifically, Shakel and Arem schakel2014improving develop an in-car advisory system that changes the headway distance based on a traffic state prediction model (i.e., the extended generalized Treiber–Helbing filter (EGTF)) van2010robust. This model determines between free flow and congestion based on the speed data collected from a detector that are aggregated over 1 min. Goni-Ros et al. develop an adaptive ACC that adapts the ACC setting based on the speed of preceding vehicles goni2019using. Bekiaris-Liberis and Delis propose a similar approach that adjusts the ACC setting in real-time based on the average speed and vehicle density bekiaris2019feedback. Spiliopoulou et al. design a threshold-based adaptive ACC that adjusts the ACC setting of a connected vehicle based on a threshold of traffic flow measured using the speed of surrounding vehicles spiliopoulou2018adaptive. In line with this research, the most recent research work at the point of writing this paper is manolis2020real. Essentially, this paper extends spiliopoulou2018adaptive by adding acceleration bound changes considering that recent vehicles allow drivers to choose the acceleration strength inspired by Yuan et al. yuan2016capacity. However, these works take into account only the traffic conditions of the main road in determining the optimal ACC setting; as such, these works fail to work properly for highways with on-ramps where the merging traffic is one of the major causes of traffic perturbations gupta2006phase. A different approach of adaptively adjusting the ACC setting needs to be developed for highways with on-ramps that takes into account the dynamically changing traffic conditions of the merging lane van2017impacts.
In this paper, we demonstrate that a state-of-the-art real-time ACC manolis2020real performs poorly in terms of improving the traffic flow of highway segments with on-ramps and propose Dynamic Adaptive Cruise Control (D-ACC) System that allows vehicles to make an informed decision about the optimal ACC setting adaptively in response to dynamically changing traffic conditions of both the main road and merging lane. In particular, to enable vehicles to adapt the ACC setting more effectively in a fine-grained manner according to dynamically changing traffic conditions, we adopt deep reinforcement learning (RL) which is known to provide effective decision making in such a complex environment especially for autonomous vehicles chen2017end
. We formulate the problem of determining the ACC setting focusing on the headway distance given the information about traffic conditions received via vehicle-to-everything communication (V2X) based on a Markov Decision Process (MDP) frameworksutton2018reinforcement and solve the problem by designing a deep Q-network mnih2013playing to address the challenge of dynamic adaptation in complex environments represented by a large and continuous state space. Extensive simulations are conducted with a combination of a microscopic road traffic simulator SUMO fernandes2010platooning and a V2X network simulator Veins sommer2010bidirectionally to train the deep Q-network and evaluate the performance of D-ACC. We demonstrate that D-ACC improves the average speed by up to 70% compared to a state-of-the-art real-time ACC system manolis2020real. The contributions of this paper are summarized as follows.
We demonstrate that existing real-time ACC systems perform poorly for highway segments with on-ramps due to not taking into account the dynamic traffic conditions of the merging lane.
We formulate the problem of adaptively configuring ACC setting in response to dynamically changing traffic conditions of both the main road and merging lane as a MDP framework.
We design a deep Q-network to allow vehicles to make informed decisions on adjusting the ACC setting more effectively and in a more fine-grained manner.
Extensive simulations are conducted to demonstrate that D-ACC outperforms a state-of-the-art real-time ACC system under different scenarios with varying penetration rates, merging traffic density, and lane-changing behaviors.
This paper is organized as follows. In Section 2, we conduct a motivational study to demonstrate that a state-of-the-art real-time ACC system performs poorly due to not appropriately accounting for the dynamics of merging traffic. We then present the details of the proposed D-ACC including the design of a MDP framework and our deep Q-network for optimizing traffic flow in Section 3. The simulation setting and results are presented in Section 4. Finally, we conclude in Section 5.
In this section, we conduct a motivational study to inspire the need for a new approach for ACC systems for highways with on-ramps by demonstrating that simple model-based solutions do not cope well with the dynamics of merging traffic. This study is conducted using SUMO fernandes2010platooning and Veins sommer2010bidirectionally. In particular, Veins, a framework for vehicular network simulation based on OMNeT++ varga2008overview is integrated with SUMO to simulate V2X for vehicles to collect traffic information. The parameters used for the simulation are summarized in Table 2. We create a highway segment with an on-ramp where vehicles equipped with ACC drive on the main road by maintaining a certain headway distance to preceding vehicles that is determined based on the traffic conditions of the main road manolis2020real. Figure 1 illustrates that in this scenario, vehicles on the main lane are interfering with the merging traffic due to the small headway distance suggested by a state-of-the-art ACC system manolis2020real.
To understand how existing real-time ACC systems based on a model which determines the headway distance according to the traffic conditions of only the main road fail to deal with merging traffic effectively, we vary the headway distance and measure the average fuel consumption and average vehicle delay in both scenarios with and without merging traffic. Figure 2 displays the results. When there is no merging traffic, a small headway distance improves fuel efficiency and delay, because, with a smaller headway distance, more vehicles are allowed to pass the highway segment which agrees with the results of most of existing real-time ACC systems based on the strong correlation between traffic conditions of the main road and headway distance spiliopoulou2018adaptive; manolis2020real.
However, these simple model-based ACC systems perform poorly when there is merging traffic. Specifically, in the given scenario with merging traffic, we demonstrate that a small headway distance determined based only on the correlation between the traffic conditions of the main road and the headway distance actually degrades the fuel efficiency and delay significantly. The reason is because merging vehicles experience difficulty in finding a space to change lanes due to the small headway distance of the vehicles in the main lane, and making a lane change causes the vehicles in the main lane to brake, leading to higher fuel consumption and delay. This explains why both the fuel efficiency and delay are improved as the headway distance increases as shown in Figure 2. A very interesting observation is that if the headway distance is increased too much, the fuel efficiency and delay start to degrade, i.e., there is a “sweet spot” for the headway distance that maximizes traffic flow, as indicated by stars in Figure 2; We also observe that such a sweet spot changes when the traffic conditions of the merging lane change over time. This motivational study suggests that a new ACC system should be developed that automatically adapts the ACC settings by taking into account dynamically changing traffic conditions of a merging lane for highway segments with on-ramps to optimize traffic flow.
3 Proposed Approach
This section presents the details of D-ACC. Specifically, in Section 3.1, we present an overview of D-ACC. We then formulate the problem of determining the headway distance to maximize traffic flow as a Markov Decision Process (MDP) in Section 3.2. In Section 3.3, we present the design of a deep Q-network to solve the problem more effectively considering the large and continuous state space for MDP in our problem.
D-ACC is designed for adapting the headway distance to maximize traffic flow with a special emphasis on supporting the use of ACC for highways with merging traffic. D-ACC targets vehicles equipped with necessary devices to enable ACC and V2X. Although V2X technology has yet to be widely adopted, D-ACC can be implemented with a smartphone that communicates with a roadside unit (or a remote traffic server via a cellular network) to collect real-time traffic information needed to determine headway distance. This smartphone can be connected to the vehicle’s default ACC system via the OBD port (e.g., Comma.AI santana2016learning) to control the headway distance generated by D-ACC, thereby enabling easy adoption of the technology.
A vehicle equipped with D-ACC, as it approaches a highway segment with an on-ramp, starts to communicate with a RSU such as the one that is already available in latest intelligent transportation systems (ITS) duret2020hierarchical via V2X. The RSU is used to collect real-time traffic information and broadcast the information to approaching vehicles. In this work, we utilize (1) traffic density of the main lanes, (2) average vehicle speed of the main lanes, (3) traffic density of the merging lane, (4) average vehicle speed of the merging lane, and (5) the length of the acceleration lane. A vehicle receives such information wirelessly and runs D-ACC to determine the optimal headway distance to maximize traffic flow.
3.2 Markov Decision Process (MDP) Framework
A core functionality of D-ACC is to determine the headway distance by taking into account the traffic conditions of both the main road and the merging lane. A Markov decision process (MDP) sutton2018reinforcement represents the basis for modeling the decision making behaviors for D-ACC. We formulate the dynamic decision making process of D-ACC as MDP in which D-ACC-equipped vehicles take actions of adjusting the headway distance in response to dynamically changing traffic conditions. As such, the design of MDP is the first step towards the development of D-ACC.
We represent MDP as a 4-tuple , where is a set of states; is a set of actions;
is the probability for the next state given actionand the current state ; and is the reward function, i.e., is the reward for transitioning from a state to a new state due to an action . In particular, is a policy of time that represents the probability of making an action given state . The objective is to find the optimal policy that makes the cumulative sum of the expected reward:
is maximized over the long run, i.e., , where is a discount factor.
3.2.1 State Space
We design the state space of MDP to include the following traffic parameters: (a) traffic density of the main road, (b) traffic density of the merging lane, (c) average vehicle speed of the main road, (d) average vehicle speed of the merging lane, and (e) length of the acceleration lane, being motivated by the work daamen2010empirical. Specifically, Daamen et al. note that the traffic density of the main road as well as the merging lane influence the merging behaviors daamen2010empirical. For example, if the traffic density of the merging lane is high, then a larger headway distance would be needed to allow vehicles to join quickly so more vehicles can merge into the main lane. It is also noted in daamen2010empirical, the vehicle speed, in particular, the speed difference between the vehicles on the merging lane and the vehicles on the main lane influences the efficiency of merging behavior daamen2010empirical. For example, if the speed of the vehicles on the merging lane is relatively faster, then a large gap would have to be created to ensure safer lane change. Finally, the length of an acceleration lane is incorporated into our state space because a longer acceleration lane would allow vehicles to take more time to complete merging.
3.2.2 Action Space
We design the action space for vehicles on the main lane. While this action space can be easily extended, in this work, we focus on the most important ACC setting, the headway distance. Specifically, an action is defined as setting the headway distance to a certain value from a range of possible headway distances that a vehicle supports. An important aspect of D-ACC is that vehicles are advised to perform an action before they reach an on-ramp. This is motivated by recent research goni2019using demonstrating that creating a gap to the preceding vehicle before it reaches a congested (or potentially congested) area will make the following vehicles decelerate, consequently reducing the inflow into the congested area and preventing from worsening the traffic congestion.
In this paper, we allow an ego vehicle to take an action, but we note that a decision for actions can also be made by a RSU leveraging more powerful computing resources and broadcast the decision to approaching vehicles via V2X. An important aspect of D-ACC is that it allows vehicles to keep monitoring dynamically changing traffic conditions and updating their actions adaptively in order to maximize traffic flow.
3.2.3 Reward Function
We design our reward function for MDP such that the traffic flow is maximized. As such, the reward function is designed to decrease the average vehicle delay required to pass a given highway segment. In computing the reward function, therefore, the information received from a RSU via V2X is used including the traffic congestion speed and length of a highway segment where the vehicle is expected to pass. Specifically, the reward function is defined as follows.
3.3 Deep Q-Network
Having formulated our problem of determining the headway distance to maximize traffic flow, this section explains how we solve the problem. The value function follows Bellman optimality equation:
, which allows us to determine the optimal policy, i.e., determining actions that optimize the reward function at each state sutton2018reinforcement. The Q-function can be easily represented as a tabular form when the state space is discrete and finite. However, the problem is that if the state space is large and continuous, it becomes easily untractable mnih2013playing. Specifically, due to the large and continuous state space of D-ACC to incorporate large ranges of vehicle density, speed, length of acceleration lane, etc., we adopt the deep Q-network (DQN) algorithm mnih2013playing
to address this challenge. The deep Q-network is based on a neural network to approximate the optimal value function where the input to the neural network is the states, and the output is the Q-values for each action. This technique is suited well for our application which requires dynamic adaptation in complex environments represented by the large and continuous state space. The key idea is that the Q-function is now approximated using a neural networkal2019deeppool. Specifically, the approximate value function for deep Q-network is now denoted by where are the weights of the Q-network at -th iteration. The network is trained to update the weights
with the loss function:, where is the parameters of the network at -th iteration, which are updated based on the weights of the Q-network .
|Neural network architecture||4 hidden layers of size 8, 12, 20, and 16 respectively|
|Activation functions||Rectified linear units (ReLU)|
|Replay buffer size||20k samples|
|(,||(0.95, 1.0, 0.001, 0.9995)|
|Loss function||Mean square error (MSE)|
|Optimization method||Stochastic Gradient Descent (SGD) with learning rate 0.001|
|Target network update frequency||1k episodes|
For our deep Q-network, we design a neural network consisting of 4 hidden layers of size 8, 12, 20, and 16, respectively with activation functions of rectified linear units (ReLU). Other specific parameters used for designing our deep Q-network through trial and errors are summarized in Table 1. In particular, to balance between exploration and exploitation, we adopt an -greedy policy wunder2010classes in training our deep Q-network. Specifically, with the probability of , an action is randomly selected from the action space; and with the probability of , the optimal action is selected based on the greedy method. We allow the value of to decrease gradually as the algorithm iterates, i.e., . The parameters for the -greedy policy are also summarized in Table 1. Figure 3 displays the results of our reward function with the number of episodes up to 1K, demonstrating the fast convergence of the reward function.
4 Simulation Results
4.1 Simulation Setup
|Vehicle Parameters||Traffic Parameters|
|Action interval||1 s||Traffic volume (main lane)||2057 veh/h|
|Vehicle length||4 m||Traffic volume (merging lane)||200-900 veh/h|
|Min headway||2.5 m||Highway length||1.5 km|
|Lane change model||LC2013 erdmann2015sumo||Merging lane length||360 m|
|Max acceleration||2.6 m/s||Acceleration lane length||180 m|
|Max deceleration||4.5 m/s|
We conduct simulations to evaluate the performance of D-ACC using a traffic simulator (SUMO) integrated with a network simulator (Veins). Simulations are executed with a PC equipped with a 1.4GHz quad-core Intel Core i5 CPU and 8GB of RAM running on MacOS. Deep reinforcement learning for D-ACC is implemented in Python using Keras and Tensorflowabadi2016tensorflow which is interfaced with SUMO via Traffic Control Interface (TraCI) wegener2008traci. We consider a highway segment with an on-ramp. The length of the highway segment, ramp, and acceleration lane are 1.5km, 360m, and 180m, respectively. Vehicles on the main road are generated at a rate of 2,057 veh/h, and that for the merging lane is varied from 200 and 900 veh/h to evaluate the effect of the merging traffic density. Each vehicle with a length of 4m performs a lane change according to a lane-changing model erdmann2015sumo. Each vehicle updates the headway distance every second in response to dynamically changing traffic conditions; yet the headway is not decreased below 2.5m for safety. In increasing or decreasing the headway distance, acceleration of up to 2.6m/s and deceleration of up to 4.5 m/s are used. All traffic parameters used for this simulation study are summarized in Table 2.
The main metric measured in this simulation study is the average vehicle speed that represents the traffic flow given the fixed length of the highway segment. We vary various parameters such as the penetration rate (i.e. a portion of vehicles equipped with D-ACC or a state-of-the-art real-time ACC out of all vehicles), traffic density of the merging lane, and lane-changing behavior.
4.2 Effect of Penetration Rate
We measure the average vehicle speed for both D-ACC and Manolis et al. manolis2020real by varying the penetration rate to understand the effect of the penetration rate on the performance. The results are depicted in Figure 4. Overall, the results demonstrate that higher penetrate rates result in better traffic flow for D-ACC. However, interestingly, opposite results are obtained for Manolis et al. because higher penetration rates for Manolis et al. mean that more vehicles try to maintain smaller headway distance due to the increased traffic of the main road caused by the merging traffic which will make merging vehicles more difficult to change lanes. Another interesting observation is that D-ACC improves traffic flow by 21% even with a very small penetration rate of 5% compared with the state-of-the-art real-time ACC system. More significant improvement in traffic flow is observed for higher penetration rates, i.e., over 30%. Specifically, D-ACC performs 70% better compared to Manolis et al. when the penetration rate is 60%. In other words, the average speed for D-ACC is increased by 11km/h compared to Manolis et al. when the penetration rate is increased from 5% to 60%. The reason is simply that more vehicles create larger gaps that allow merging vehicles to change lanes more easily.
4.3 Effect of Merging Traffic Density
In this section, we evaluate how effectively D-ACC copes with varying traffic conditions of the merging lane. Specifically, we vary the number of vehicles injected into the merging lane per hour and measure the average speed of vehicles on the main lane with different penetration rates. The results are depicted in Figure 5. It is shown that as the merging traffic increases, the average speed for both D-ACC and Manolis et al. decreases regardless of the penetration rate. An interesting observation is that D-ACC is more robust against merging traffic which is demonstrated by the increasing gap between the average speed of D-ACC and that of Manolis et al. as the merging traffic increases for all penetration rates. Most notably, the gap increases by 210% as the merging traffic increases from 400 veh/h to 800 veh/h, when the penetration rate is 60%. It is also observed that the benefits of using D-ACC increase as both the merging traffic and penetration rate increase. Specifically, the average speed difference between D-ACC and Manolis et al. for PR=60% with merging traffic of 800 veh/h is about 140% greater compared with that for PR=5% with 800 veh/h. Overall, in all penetration rates, D-ACC outperforms the state-of-the-art real-time ACC system regardless of the degree of merging traffic, and more significantly for higher merging traffic.
4.4 Effect of Lane Change Behavior
In this section, we analyze the effect of lane-changing behavior of merging vehicles on the performance of D-ACC. Simulations are conducted to understand how the lane-changing behavior influences the performance of D-ACC. For this simulation study, we adopt the default lane-changing model for SUMO erdmann2015sumo. In this lane-changing model, there is a parameter called “lcAssertive” which represents the willingness of a driver to accept lower front and rear gaps on the target lane when executing a lane change berrazouane2019analysis. This parameter is defined in the range between 0 and 1 (with the default value of 1 for SUMO), where a greater number means more aggressive lane-changing behavior.
We measure the average speed by varying the value of “lcAssertive” while fixing the penetration rate and the merging traffic density to 60% and 800 veh/h, respectively. The results are depicted in Figure 6. As shown, D-ACC performs better for larger “lcAssertive”, i.e., more aggressive lane-changing behavior. The reason is that a larger value of “lcAssertive” allows vehicles to change lanes quickly whenever there is a space available to join, thereby giving more time to following vehicles on the merging lane to make a lane-change decision. On the other hand, if a vehicle misses a chance to merge quickly due to a small value of “lcAssertive”, i.e., less aggressive lane-changing behavior, the vehicle needs to slow down its speed as it approaches the end of the acceleration lane. Consequently, changing lanes when the speed is very low, the traffic on the main lane is perturbed greatly leading to degraded average vehicle speed.
We have presented a dynamic adaptive cruise control system (D-ACC) that is designed to adjust the headway distance adaptively based on reinforcement learning in response to dynamically changing traffic conditions for highways with on-ramps. We demonstrated that D-ACC improves traffic flow significantly even with a small penetration rate compared with a state-of-the-art ACC system especially for highways with merging traffic. The broader impact of D-ACC is noteworthy due to its immediate applicability to real-world vehicles by a simple firmware update for existing ACC systems and availability of V2X based on a smartphone which can be connected to the vehicle’s ACC system via the OBD port. Our future work is to implement the system for an actual vehicle platform and evaluate the performance in real-world traffic environments.
6 Broader Impact
Traffic congestion is a serious social problem in many countries. More than 5.5 billion hours which are equivalent to 2.9 billion gallons of fuel are being wasted every year due to congestion highwayStat. Traffic congestion on a highway is often induced by heavy merging traffic especially during commute hours. D-ACC, a real-time ACC system specifically designed for a highway with an on-ramp, has strong potential to reduce such traffic congestion even with a very small penetration rate of 5%. Such penetration rate for D-ACC can be easily achieved considering that there are a large number of electric vehicles equipped with LTE such as Tesla for which a simple software update should be sufficient to install D-ACC, and even traditional vehicles will be able to utilize D-ACC by connecting a smartphone with a “D-ACC” app to the standard ACC system via the OBD port (e.g., Comma.AI santana2016learning). At the same time, however, this work may have some negative consequences in terms of the efficiency because of the challenge of fully understanding and modelling the intention of merging vehicles, e.g., some cars on a merging lane may not change lanes at all and just exit the highway.