Learning to Help Emergency Vehicles Arrive Faster: A Cooperative Vehicle-Road Scheduling Approach

02/20/2022
by   Lige Ding, et al.
MIT
0

The ever-increasing heavy traffic congestion potentially impedes the accessibility of emergency vehicles (EVs), resulting in detrimental impacts on critical services and even safety of people's lives. Hence, it is significant to propose an efficient scheduling approach to help EVs arrive faster. Existing vehicle-centric scheduling approaches aim to recommend the optimal paths for EVs based on the current traffic status while the road-centric scheduling approaches aim to improve the traffic condition and assign a higher priority for EVs to pass an intersection. With the intuition that real-time vehicle-road information interaction and strategy coordination can bring more benefits, we propose LEVID, a LEarning-based cooperative VehIcle-roaD scheduling approach including a real-time route planning module and a collaborative traffic signal control module, which interact with each other and make decisions iteratively. The real-time route planning module adapts the artificial potential field method to address the real-time changes of traffic signals and avoid falling into a local optimum. The collaborative traffic signal control module leverages a graph attention reinforcement learning framework to extract the latent features of different intersections and abstract their interplay to learn cooperative policies. Extensive experiments based on multiple real-world datasets show that our approach outperforms the state-of-the-art baselines.

READ FULL TEXT VIEW PDF

page 3

page 4

page 7

09/12/2021

EMVLight: A Decentralized Reinforcement Learning Framework for EfficientPassage of Emergency Vehicles

Emergency vehicles (EMVs) play a crucial role in responding to time-crit...
10/30/2021

A Decentralized Reinforcement Learning Framework for Efficient Passage of Emergency Vehicles

Emergency vehicles (EMVs) play a critical role in a city's response to t...
05/03/2022

Real-time Cooperative Vehicle Coordination at Unsignalized Road Intersections

Cooperative coordination at unsignalized road intersections, which aims ...
07/26/2022

Detection of road traffic crashes based on collision estimation

This paper introduces a framework based on computer vision that can dete...
12/13/2018

TuSeRACT: Turn-Sample-Based Real-Time Traffic Signal Control

Real-time traffic signal control systems can effectively reduce urban tr...
05/18/2021

Learning to Route via Theory-Guided Residual Network

The heavy traffic and related issues have always been concerns for moder...
07/09/2021

Routing Emergency Vehicles in Arterial Road Networks using Real-time Mixed Criticality Systems*

Reducing the response time of Emergency Vehicles (EVs) has an undoubted ...

1 Introduction

With the continual growth of population and vehicles in cities, we have been facing increasingly serious traffic congestion. Heavy traffic congestion not only causes extra air pollution and energy/time waste, but also potentially impedes the accessibility of Emergency Vehicles (EVs), such as ambulances, fire engines and police cars, when facing unexpected accidents, resulting in detrimental impacts on critical services and even safety of people’s lives. In medical emergencies such as cardiac arrest, every one-minute delay causes mortality rate to increase by and imposes additional $1542 medical cost in USA [30]. The building fires typically grow by per minute, causing an average $4000 of additional damages [30]. Therefore, it is of great significance to design an efficient scheduling approach to help EVs arrive faster, especially in congested traffic conditions.

Different from Ordinary Vehicles (OVs), EVs may be exempted from some conventional road rules, such as driving through an intersection when the traffic light is red, or exceeding the speed limit. Nevertheless, EVs may still be obstructed by numerous OVs on roads with a heavy traffic. To address this issue, one research line resorts to vehicle-centric approaches, which aim at scheduling EVs with the best routes using route optimisation techniques such as the A* algorithm [26], Dijkstra’s algorithm [6] and evolution strategy [3]. Some of studies on route planning for OVs, which fall into two categories, trajectory-based approaches [46, 8, 13, 44] and cost-centric approaches [15, 47, 22, 29, 42], could also be adapted to address the EV routing problem. However, the vehicle-centric approaches just avoid congested roads in a passive way, while failing to proactively improve traffic conditions to shorten the travel time of EVs. Another research line focuses on road-centric approaches [17, 45, 31], which aim at actively improving local traffic conditions to help EVs pass intersections quickly by granting traffic signal priority. However, these approaches rarely consider the dynamic overall traffic condition and the impact of a scheduling strategy on OVs. If we blindly keep the traffic light green for EVs arriving at intersections, the traffic congestion may not be effectively alleviated, and even traffic flows in other directions may be obstructed, thus in turn causing a greater negative impact on the overall traffic condition and also EVs.

Recent years have witnessed the great advance in Cooperative Vehicle-Infrastructure Systems (CVIS), wherein the sensing infrastructure (e.g., cameras, GPS) monitors traffic conditions and vehicles’ locations in real time, and the communication infrastructure enables vehicles and road infrastructure to exchange real-time information [12]. It provides a new opportunity to design a cooperative vehicle-road scheduling approach. Along this research line, we aim to dynamically optimize the route and concomitantly coordinate the traffic signals along the dynamically updated path for better handling the dynamic traffic flow. However, it is a very challenging task as the route planning and traffic signal control have complex interactions as follows:

  • [leftmargin=1em,itemindent=0pt,listparindent=0pt]

  • Impact of frequently changing traffic signals on real-time route planning. The real-time route planning needs to consider not only dynamic traffic flows over the road network but also frequently changing traffic signals. Some cost-centric approaches [40, 5, 24] are able to address dynamic traffic flows by a stochastic graph based on historical data. However, the traffic signal changes are so frequent (minute-level) that it is hard to accurately predict the travel time costs of different routes with only historical data.

  • Collaborative traffic signal control based on dynamic routing. Firstly, the dynamic route planning causes different intersections to become upstream and downstream intersections of EVs, which has different influences on EVs according to the Kinenmatic-wave theory [14]. It is significant but difficult to extract the latent features and dynamic influences of these intersections. Secondly, multiple traffic lights should learn to cooperate with each other to balance the traffic demands of both EVs and OVs. The joint optimization may lead to the scale expansion of the problem and increase the computational complexity. Although extensive studies focus on traffic signal control for OVs, they cannot well handle our problem [32, 41, 36].

To this end, we propose LEVID, a LEarning-based cooperative VehIcle-roaD scheduling approach, consisting of a real-time route planning module and a collaborative traffic signal control module, which influence each other and make decisions iteratively. The real-time route planning module adapts the artificial potential field method to address the real-time changes of traffic signals and avoid falling into a local optimum by considering the long-term cumulative benefit of a route. The traffic signal control module leverages a graph attention reinforcement learning framework, which models the traffic environment as a dynamic directed graph to present the influences of dynamic routes and increase the receptive field of each agent (traffic signal controller). By employing the multi-head attention as relation kernel, this framework is able to extract the latent features of different intersections and abstract their interplay to learn cooperative policies. Meanwhile, the asynchronous parameter-sharing method is adopted to reduce the computational complexity. Specifically, our contributions are three-fold as follows:

  • [leftmargin=1em,itemindent=0pt,listparindent=0pt]

  • We investigate the cooperative vehicle-road scheduling paradigm for helping EVs arrive faster. To the best of our knowledge, this is the first work to simultaneously optimize the route planning and traffic signal control in real time (Sect. 2, Sect. 3).

  • We propose the LEVID approach, which considers the long-term cumulative benefit of a dynamically planned route and leverages graph attention reinforcement learning for better cooperation between neighboring intersections (Sect. 4).

  • We evaluate our LEVID using both synthetic and real-world datasets from multiple cities. Experimental results demonstrate that our approach greatly reduces the average travel time for both EVs and OVs than the state-of-the-art baselines (Sect. 5).

2 Motivation

Fig. 1: The architecture of a CVIS.

In this section, we first introduce the supporting devices and technologies of a CVIS, which provides opportunities for designing effective scheduling approaches. Second, we point out the defects of the existing approaches from two separated perspectives, i.e., traffic signal control and route planning, which motivates us to design a cooperative vehicle-road scheduling approach LEVID for sufficiently leveraging the ability of a CVIS.

CVIS. Fig. 1 shows the architecture of a CVIS, which consists of EVs, road infrastructure and a control center. On the EV side, a GPS module is used to collect real-time locations of an EV; a communication module is used to interact with the road infrastructure via Vehicle-to-Roadside (V2R) and Roadside-to-Vehicle (R2V) communications according to the Dedicated Short Range Communication (DSRC) standard, and also interact with the control center via the cellular communications (e.g., 4G/5G). On the road infrastructure side, traffic cameras and traffic signal controllers have been widely deployed on major roads of many cities. For example, there are over 3,000 major intersections in the urban area of Hefei city, China, of which 1,338 intersections have traffic signal controllers that can be adjusted by the control center, and there are 14,967 traffic cameras deployed at intersections and other locations such as entrances/exits of expressways and key locations along arterial roads, as partly shown in Fig. 2. The trajectories of all the vehicles are recorded when they pass through cameras, and can be extracted by the advanced vehicle identification technologies [33]. The traffic volume can also be obtained by counting the number of vehicles passing through intersections. Finally, the control center can obtain real-time locations of EVs and traffic conditions from the road infrastructure; in turn, it can provide a driving plan to the EV and determine a traffic control strategy to the traffic signal controller.

Fig. 2: Distribution of cameras (denoted by white dots) and traffic signal controllers (denoted by red dots).
Fig. 3: Three candidate routes with the given origin and destination in Hefei city.

Traffic signal control. GreenWave [17] is the most commonly used traffic signal control approach, which allows all the traffic lights in the route to turn green so that EVs can pass intersections continuously along the emergency corridor. The “green wave” is achieved by signal coordination setting. However, if there is a traffic jam on one road segment, the signal offset time between intersections will be changed such that EVs cannot pass intersections continuously. In other words, GreenWave cannot handle a dynamic and heavy traffic flow well. To address this issue, we are working with the traffic police department in Hefei city to improve the GreenWave. More specifically, since the real-time locations of an EV are available, we can adjust the traffic signal to turn green automatically whenever the distance between EV and intersection is less than a certain threshold. Nevertheless, it is still a non-trivial task to determine a proper threshold. If the threshold is too large, it may cause the vehicles in the opposite direction to be blocked for a long time. Conversely, if the threshold is too small, it may fail to clear the way for the EV at an congested intersection. To analyze this phenomenon, we collect the trajectory data of 5,448 vehicles from traffic cameras during 9-11 a.m. on one working day in a region of Hefei city (Fig. 3). We simulate the movement of an EV through an intersection. When the threshold is set as for an intersection with a low traffic pressure, the traffic flow in the opposite direction has to wait for extra seconds. When the threshold is set as for an intersection with a heavy traffic pressure, the EV is blocked by queuing vehicles at the intersection for about seconds. This phenomenon motivates us to design a more effective approach from two perspectives: 1) utilize a learning-based traffic signal control strategy instead of a rule-based strategy, and 2) integrate it with a route planning strategy to further reduce the waiting time at intersections with a heavy traffic.

Route planning. To preliminarily demonstrate the importance of route planning, we generate an EV to move along different routes by data-driven simulations. As shown in Fig. 3, given the same origin and destination, if the EV moves along Route 1 with the shortest distance, the travel time is ; if it moves along Route 2 with the least congestion, the travel time would be ; whereas, when we further consider the real-time status of traffic signals into account, Route 3 is the best choice with the travel time of , as the phase in the east-west direction is allowed and the left-turn phase is forbidden at Intersection A. It implies the significance of considering both traffic conditions and changes of traffic signals for route planning.

3 Problem Formulation

Definition 1 (Road Network).

The road network is defined as a directed graph , where is the set of nodes (i.e., intersections) and is the set of edges (i.e., road segments). An edge represents a directed road segment from intersection to intersection .

Definition 2 (Route).

A route connects the origin location and the destination location with an ordered sequence of intersections, i.e., , where each pair of consecutive locations corresponds to a road segment .

Definition 3 (Incoming/Outgoing Lanes and Traffic Movement).

For a specific intersection, we define that (i) a lane where vehicles enter the intersection is called as an incoming lane; (ii) a lane where vehicles leave the intersection is called as an outgoing lane; (iii) the traffic traveling across the intersection from an incoming lane to an outgoing lane is called as a traffic movement, denoted by . Each road segment contains one or multiple lanes. The sets of incoming lanes and outgoing lanes are denoted by and .

Definition 4 (Movement Signal and Phase).

A movement signal is defined based on the corresponding traffic movement . Specifically, indicates that the green light is on for movement , and indicates that the red light is on for movement . A phase is defined as a combination of the legal green movement signals, denoted by , where and .

Fig. 4: Illustration of an intersection with eight mutually exclusive phases. In this case, phase is activated to allow the S-Left and N-left traffic movements.
Fig. 5: Framework of LEVID

Fig. 4 illustrates a typical intersection with twelve incoming lanes and twelve outgoing lanes. Correspondingly, there are eight movement signals (red and green dots around the intersection) for controlling traffic movements: E-Straight (Go Straight from East), W-Straight, S-Straight, N-Straight, E-Left (Turn Left from East), W-Left, S-Left, and N-Left. Four right-turn signals are omitted as the traffics on the right-turn lanes are always allowed in the real world. Furthermore, there are eight mutually exclusive phases, each of which is a combination of two traffic movements. In this example, the phase is activated, indicating that the traffics on the left-turn lanes from south and north are allowed to turn left.

Definition 5 (Travel time).

Given a route of an EV, its travel time consists of the driving time on each road segment and the waiting time at each intersection to wait for the existing queued OVs to pass through the intersection. Note that, although an EV may be exempted from some conventional road rules, such as driving through an intersection when the traffic light is red, or exceeding the speed limit, it may still be obstructed by OVs on roads with a heavy traffic. We denote the travel time of one EV by:

(1)

Problem Statement. Given the origin location and the destination location of an EV and the dynamic traffic condition at each time step , a real-time route planning strategy is utilized to determine a driving route . Meanwhile, given the observation , such as vehicle distribution and current traffic signal phase, of each intersection at each time step , a collaborative traffic signal control strategy is utilized to choose a control action (i.e., which phase to set). The objective of this work is expressed as follows:

(2)

4 Design of Levid

4.1 Framework

As shown in Fig. 5, LEVID contains a real-time route planning module and a collaborative traffic signal control module, which influence each other and make decisions iteratively.

  • [leftmargin=1em,itemindent=0pt,listparindent=0pt]

  • Real-time route planning module adapts the artificial potential field method [19]

    by modeling the estimated travel time as the repulsion and the trend of an EV moving towards the destination as the

    gravity. Furthermore, it introduces the long-term repulsion to handle the changing traffic lights and avoid falling into a local optimum. At every time interval , the route with the maximum long-term cumulative benefit is selected according to the current traffic signal phases and traffic condition near the EV. Meanwhile, the length of any candidate route is limited to reduce search depth and computational complexity.

  • Traffic signal control module models the traffic environment as a dynamic directed graph and adjusts the relational distance between intersections according to the dynamically updated route and upstream/downstream relationships. The receptive field of each agent contains its top- relevant neighboring intersections to differentiate valuable local information from global information. The observed features of the top-

    relevant intersections are transformed into hidden features with Multi-Layer Perceptron (MLP). Then the

    multi-head attention is employed as relation kernel to extract the latent features of different intersections and abstract their interplay to learn cooperative policies. Finally, the long-term impacts of different traffic signal phases are evaluated by the centralized critic model whose parameters are shared by all distributed actors (traffic signal controllers).

4.2 Real-time route planning module

The detailed calculation process of gravity and long-term repulsion is introduced in the following part.

Gravity. The gravity indicates the trend of an EV moving towards the destination. The greater the gravity, the faster the EV can reach the destination. Suppose an EV is arriving at the current intersection , and will go to the final destination intersection . Then the gravity of ’s neighbor to the EV is calculated as:

(3)

where denotes the road network distance between and , denotes the road network distance between and , and is the real-time average traffic speed on the road .

Immediate Repulsion. The immediate repulsion represents the estimated travel time of a candidate route. It contains the driving time of an EV on the road segments and the waiting time at the intersections along the route. Suppose an EV is arriving at the current intersection . Then the immediate repulsion of ’s neighbor to the EV contains time to wait at the intersection and driving time on the road segment , calculated as follows:

(4)

where denotes the length of the queue about to drive from intersection to intersection . is the real-time average traffic speed on the road and denotes the maximum speed for vehicles passing through an intersection allowed by law.

Long-term Repulsion. Long-term repulsion helps approximate the long-term cumulative benefit of one route. Some routes with less immediate repulsion may guide vehicles to move to a congested road segment due to short-term shortcomings. Therefore, we expand the search depth and calculate the long-term repulsion along different routes with a discounted factor as follows:

(5)

where denotes the set of ’s neighbors, and this iterative calculation will stop when the search depth reaches the maximum search depth limit . The repulsion is approximated based on the current traffic condition and it may have changed when an EV travels to the relevant road segment far away from the current location. The greater the distance between intersections, the larger the error of the estimated long-term impact. Therefore, a smaller discounted factor will be assigned to a farther intersection. We limit the depth of search space and calculate the long-term discount repulsion according to Eq. (5).

Fig. 6 illustrates the detailed process of route planning. The orange circle denotes the current location of one EV. There are three candidate routes, i.e. -red, -green, -yellow. Based on a specific traffic condition, the gravity of intersection towards EV is as this intersection will lead the EV to move away from the destination. The green numbers show the immediate repulsion of each road segment in route . Taking route as an example, we introduce the discount factor. As the distance increases, the discount factor decreases exponentially. We limit the depth of search space to 4 and set the discount factor as 0.8 in this example to show the detailed process. The complete algorithm is shown in Algorithm 1.

Fig. 6: The example of gravity and repulsion of different intersections. The red number shows the gravity of intersection towards EV. The value is negative as it will lead EV away from destination. The green numbers denote the repulsion while the black numbers show the discount factors.
1 Input: Current location of EV, destination , the state of each road segment.
2 Initialize the discount factor , and the depth of search space
3 for   do
4       for   do
5             /*Calculate ’s gravity towards EV*/ ;
6             /*Calculate ’s immediate repulsion*/
7             ;
8             /*Calculate ’s long-term repulsion*/
9             ;
10             /*Calculate long-term cumulative benefit*/
11             ;
12            
13       end for
14      
15 end for
16Output: The route with the maximum long-term cumulative benefit
Algorithm 1 The real-time route planning module

4.3 Traffic signal control module

The control of traffic signals can be formulated as decentralized partially observable markov decision process, where each agent chooses its phase action based on local observation

at each time interval .

4.3.1 Agent design

The state, action and reward for an agent which controls the signal of one intersection are as follow:

State (Observation). State denotes the traffic conditions of the whole urban environment while the observation of one agent in multi-agent RL equals to the state of the intersection. The observation of one agent at intersection includes the current phase , the number of OVs on each entering lane , the number of OVs on each exiting lane of this intersection and the corresponding number of EVs on each entering lane and exiting lane , which are denoted as and .

Action. At time , each agent chooses different legal available phase set according to the structure of road network and traffic demand. In our problem, we consider four phases (WE-Straight, NS-Straight, WE-Left and NS-Left) for an intersection.

Reward. The traffic light control method should consider both OVs and EVs. Therefore, we design the reward with an evaluation mechanism which considers these two types of traffic demands. We utilize the pressures to help OVs go through intersections more smoothly. As for EVs, they need to pass as soon as possible. Thus we leverage the queue length to measure the benefit of one action. The pressure [35, 37] of a movement for OVs is defined as the difference of OV density between the entering lanes and the exiting lanes. The pressure of intersection for OVs is the sum of absolute pressures over all traffic movements, which can be defined as:

(6)

where is the number of OVs on an entering lane and is the number of OVs on an exiting lane . What’s more, considering the traffic priority of different types of vehicles, we utilize their proportions in the traffic flow to assign the weights in the reward function. Then we define the reward as:

(7)

where is the number of EVs on the entering lanes of intersection and is the proportion of EVs in all vehicles.

4.3.2 Dynamic Directed Graph

The dynamic directed graph helps capture the dynamic impacts of neighboring intersections due to real-time route planning. We construct the road network as a graph in which the weight of each edge is calculated as the corresponding real-time road network distance between two intersections. Then, we get the top- relevant neighboring intersections of intersection based on the dynamic relational distance. Dynamic relational distance helps the current intersection pay more attention to the traffic flow at the upstream intersection when an EV will come from the upstream of the current intersection. Specifically, according to the planned route, the relational distance between upstream intersections and current intersection is set smaller by assigning a relational factor to these intersections. For an edge in the route of EV , the relational distance from intersection to is calculated as:

(8)

where is the road network distance from intersection to and is the relational factor.

Fig. 7: Top-K relevant neighbor intersections based on dynamic relational distance. The black numbers denote the relational distances between current intersection and other intersections.

Fig. 7 illustrates the top- relevant neighboring intersections and the corresponding relational distance to the current intersection based on different routes of the EV. We set the road network distance between adjacent intersections as , the discount factor = and = . Please note that the current intersection itself is also included in the top- neighbors.

4.3.3 Multi-head attention relation kernel

The -dimensional observation data of intersection are transformed into the -dimensional hidden features via a MLP:

(9)

where and

are the weight matrix and bias vector respectively. Then we embed the representation of the current intersection

and neighbor from the previous layer to get different types of importance score of one neighbor. Specifically, we utilize the multi-head attention mechanism where attention functions with different linear projections are performed in parallel to jointly attend to a neighbor from different representation subspaces with the following operations:

(10)

where is the index of different representation subspaces and is the importance score of neighbor to current intersection in the subspace . Please note that is usually different from due to dynamically updated route planning. We retrieve the general attention score between neighbors and the current intersection by normalizing the importance score of different neighbors in the same subspace:

(11)

where is the temperature factor and is the top- relevant neighboring intersections of intersection . Finally we model the overall influence of neighbors to the current intersection in different subspaces by combining the hidden feature representations of all the top- relevant neighbors with their respective general attention scores :

(12)

The averaging operation of multi-head attention is one of the most feasible ways to conclude the neighborhood cooperation.

4.3.4 Centralized Critic Model

The key idea of RL is to utilize Bellman equation to estimate the long-term discounted cumulative reward of an action, which is significant for the transportation system with strong spatio-temporal correlations. The long-term impact of a signal control action is defined as follows:

(13)

where is the immediate reward of action based on the observation at intersection . Based on the processed real-time observation information , we leverage the deep RL to estimate the expected reward of the given state-action pair as , which can be calculated as:

(14)

where and are the training parameters, is the number of phases (action space) and

represents all the trainable variables in our centralized critic model. The phase action with the maximum long-term reward will be chosen. We optimize our control policy by minimizing the loss function as follows:

(15)

where is the number of time steps, is the number of intersections and is the target value defined as:

(16)
((a)) Hefei dataset
((b)) Jinan dataset
((c)) Hangzhou dataset
((d)) NewYork dataset
Fig. 8: The spatial distribution of traffic flows of different datasets during the experiment period. Each node on the road network represents the traffic flow passing through an intersection. The red, yellow, green and blue colors indicate the decreasing traffic volume.

4.4 Complexity analysis

In this subsection, we analyze the scalability of LEVID_Dy, namely the RL part of LEVID. Specifically, we suppose our model gets the -dimensional input data and each layer has neurons; the scale of traffic signal phase space is . The time and space complexities are analyzed based on the following assumptions: (a) all the distributed actors leverage the centralized critic model to predict the long-term discounted reward of a traffic signal action; (b) each target intersection gets the top- relevant neighbors based on a breadth first search with a total search number of 2, as excessive search range may cause unnecessary computational consumption; (c) the multi-head attentions are computed independently with the same time consumption as that of single-head attention, and the embedding process of either source or target intersection can be executed simultaneously; (d) all the actors can execute the prediction process independently. Then the time complexity in each component is: (a) top- search: ; (b) MLP: ; (c) Graph Attentional layers: ; (d) Q-value Prediction layer: . And the total time complexity is , which is approximately equal to .

As for space complexity, the size of weight matrix and bias vectors in each component are as follows: (a)

top- search: (b) Observation Embedding layer: ; (c) Graph Attention layers: ; (d) Q-value Prediction layer: . Then the total number of parameters to store is . Normally, the size of the hidden layer is far greater than that of the data dimension and state space . Therefore, the space complexity of LEVID_Dy is approximately equal to . For a method with separate RL models (without parameter sharing) to control traffic signals in intersections, the space complexity is approximately equal to .

5 Performance Evaluation

We conduct experiments with an open-source traffic simulator called

CityFlow [49]. After the traffic trajectory data with specific route and start time are fed into the simulator, each OV moves towards its destination according to the environmental setting and the phase of traffic lights. The simulator provides states to a traffic signal control strategy and performs traffic signal actions from the control strategy. Meanwhile, we add a route planning module to the simulator, which controls an EV towards its destination along a dynamic route.

5.1 Setting

Both synthetic and real-world datasets are utilized to evaluate the effectiveness and efficiency of different approaches. One synthetic dataset is used to generate uniform traffic flows to test the performance of various approaches in steady traffic conditions. Four real-world traffic flow datasets are collected from four cities for evaluations on realistic and dynamic traffic conditions, and the road networks are imported to the simulator from OpenStreeMap111https://www.openstreetmap.org. We randomly select some vehicles in the traffic flows as EVs whose routes could be dynamically changed according to a route planning module, and the rest of vehicles still follow their original routes. Indeed, EVs usually account for a very low proportion of the overall traffic flows in the real world. Nevertheless, if the proportion of EVs is set too small, there will be only a small amount of transition experiences of EVs interacting with the environment, which will cause sparse rewards in the training of an RL model. Therefore, we set the proportion of EVs as for model training, which can not only generate necessary interaction experiences, but also simulate conflicted situations of multiple EVs at the same intersection. While the proportion of EVs is set according to the real-world conditions for model testing (around ‰). Table I lists the statistics of different datasets. Fig. 8 further shows the spatio-temporal distribution of traffic flows. The detailed descriptions on how we set or preprocess these datasets are as follows:

Dataset # intersections Arrival rate (vehicles/300s)
Mean Std Max Min
36 97.5 0 97.5 97.5
11 437.92 51.11 514 341
12 457.83 46.22 544 363
16 513.75 242.34 875 203
196 879.34 315.14 1314 416
TABLE I: Statistics of five datasets
  • [leftmargin=1em,itemindent=0pt,listparindent=0pt]

  • : Following the setting of [38], this dataset contains a grid network where each intersection has 4 directions (WestEast, EastWest, SouthNorth, NorthSouth) and 3 lanes (300 meters long and 3 meters wide) for each direction. In the traffic flow, vehicles come uniformly with 300 vehicles/lane/hour in the EastWest direction and 90 vehicles/lane/hour in the SouthNorth direction.

  • : There are 11 intersections in one region of Baohe district, Hefei city, China. The traffic flow data are collected by roadside surveillance cameras during 9-11 a.m. on the working days of April 2021. The cameras record the time, location and vehicle ID. We set the traffic volume as the number of vehicles passing through these intersections for experiments.

  • [38]: There are 12 intersections in Dongfeng Sub-district, Jinan, China. The traffic flow data are collected by cameras in the similar way to .

  • [38]: There are 16 intersections in Gudang Sub-district, Hangzhou, China. The traffic flow data are collected by cameras in the similar way to .

  • [38]: There are 192 intersections in the Upper East Side of Manhattan. The traffic flow data are collected based on the taxi trip data containing the origin and destination geo-locations of each trip. The geo-locations are mapped to intersections and the corresponding shortest path between them is obtained. The trips falling within the selected areas are chosen for experiments.

5.2 Compared methods

We compare our LEVID approach with various baselines and variants of LEVID. We summarize these approaches from two aspects in Table II

. On the one hand, considering the key technologies (e.g., vehicle-centric or road-centric, whether or not to use the RL method), they can be classified as: conventional traffic signal control approaches, RL-based traffic signal control approaches, route planning approaches and cooperative vehicle-road scheduling approaches. On the other hand, they can also be classified according to whether an approach is specially designed for EVs or just for OVs. All RL-based approaches are learned without any pre-trained parameters for fair comparison. The evaluation metric is the

average travel time of all the EVs or OVs between origin and destination (in seconds).

OVs EVs
Conventional Trafic
Signal Control
FixedTime, MaxPressure
GreenWave
RL-based Traffic
Signal Control
Individual RL,
OneModel, CoLight
LEVID_UnDy,
LEVID_Dy
Route
Planning
Dijkstra
CVRS
(loosely coupled)
AAF
CVRS
(tightly coupled)
LEVID_APF,
LEVID
TABLE II: Classification of various approaches

Baselines:

  • [leftmargin=1em,itemindent=0pt,listparindent=0pt]

  • FixedTime [20]: It’s the most commonly used traffic signal control method with preset offsets in the real world. It utilizes a pre-determined schedule plan considering the cycle length and phase time to handle the traffic flow.

  • MaxPressure [23]: It’s the most popular network-level traffic signal control approach in the transportation field, which greedily selects the phase with the maximum pressure.

  • GreenWave [17]: It allows all the traffic lights in the route to turn green so that EVs can pass intersections continuously along the emergency corridor. All the intersections share the same green phase length for each movement.

  • Dijkstra [6]: It’s a vehicle-centric scheduling approach, which builds a dynamic road network model for vehicles evacuation based on the Dijkstra algorithm.

  • AAF [9]: It’s an advanced adaptive and fuzzy approach to reduce emergency services response time. It selects the fastest path for an EV in advance and gives priority to the EV as soon as it approaches the traffic lights on the preset route. Note that, it is just a loosely coupled cooperation, as the route planning and traffic signal control modules are sequentially conducted, while our LEVID has a tightly coupled cooperation as the two modules are simultaneously conducted.

  • Individual RL [39]: It’s the individual deep RL approach without considering the information of neighbors. Each intersection is controlled by one heterogeneous agent which updates its own network independently.

  • OneModel [7]: It designs the state and reward of the agent in the same way with Individual RL. Each agent only considers the state of the roads connecting the controlled intersections and all agents share the same centralized critic model.

  • CoLight [38]: It’s an RL-based traffic signal control approach utilizing graph attention networks to automatically extract traffic features of adjacent intersections for facilitating communication.

Variants of LEVID:

  • [leftmargin=1em,itemindent=0pt,listparindent=0pt]

  • LEVID_UnDy: It removes the real-time route planning module from the LEVID. Meanwhile, its traffic signal control module removes the design of dynamic directed graph, and chooses the top- relevant neighbors based on the fixed geographic distance. This variant can show the improvements brought by the design of our state and reward function.

  • LEVID_Dy: It removes the real-time route planning module from the LEVID and retains the traffic signal control module, which selects the top- relevant neighbors based on a dynamic directed graph.

  • LEVID_APF: It utilizes the artificial potential field method which only considers the gravity and immediate repulsion, instead of the real-time path planning module of LEVID.

Model

OVs EVs OVs EVs OVs EVs OVs EVs OVs EVs
FixedTime 209.7 209.1 1175.9 1097.6 867.7 869.3 654.6 645.5 2239.1 1399.4
MaxPressure 194.5 192.4 660.2 642.45 387.4 394.7 514.2 523.4 1666.1 1113.5
GreenWave 203.8 169.6 1384.9 529.2 832.2 245.9 796.8 336.4 2497.1 611.6
Dijkstra 210.4 209.8 1247.9 1114.7 893.5 841.1 690.9 572.4 2020.5 1152.7
AAF 205.2 167.2 1417.8 544.3 829.3 241.7 676.7 343.1 2107.5 581.3
Individual RL 189.36 188.5 569.3 570.6 343.1 347.3 404.3 403.7
OneModel 211.8 211.2 1411.3 1459.2 724.9 725.8 570.8 520.5 1979.1 1219.8
CoLight 192.5 188.4 625.7 603.3 293.3 291.6 534.4 504.82 1459.5 906.6
LEVID_UnDy 197.4 160.6 667.2 549.6 354.6 254.1 567.9 339.9 1596.3 732.8
LEVID_Dy 193.1 157.8 674.3 490.5 352.5 235.6 586.7 307.7 1574.5 624.2
LEVID_APF 198.6 158.5 664.0 507.5 347.4 248.4 556.6 330.5 1434.7 667.9
LEVID 195.4 155.4 654.0 443.9 341.8 220.2 571.7 291.1 1431.8 546.5
TABLE III: Comparisons of average travel time of both OVs and EVs on the five datasets.

5.3 Overall Performance Comparison

Table III compares the average travel time of both OVs and EVs achieved by LEVID and various baselines/variants on the five datasets.

5.3.1 Advantages of Levid over Conventional Traffic Signal Control Approaches

From Table III, we observe that FixedTime has similar performance to MaxPressure on with a uniformly simulated traffic flow, while MaxPressure performs much better than FixedTime on the four real-world datasets, indicating that MaxPressure has a stronger ability of handling dynamic traffic flows. However, both FixedTime and MaxPressure do not consider the priorities of EVs, resulting in that OVs and EVs have similar performance. By contrast, GreenWave is specially designed for EVs and greatly reduce the average travel time of EVs on all the datasets (at most 71.71% and 45.07% reductions compared with FixedTime and MaxPressure, respectively). Nevertheless, GreenWave increases the average travel time of OVs (at most 258.0s longer than FixedTime), and there is a large performance gap between OVs and EVs (at most 1885.5s difference). By contrast, our LEVID and also its three variants not only greatly reduce the average travel time of EVs on all the datasets (at most 74.71% and 50.94% reductions compared with FixedTime and MaxPressure, respectively, by LEVID), but also shorten the average travel time of OVs in most cases (at most 807.3s difference by LEVID).

not deteriorate OVs too much (at most 125.8s difference by LEVID). The results demonstrate the obvious advantages of our LEVID from two aspects: 1) utilize a learning-based traffic signal control strategy instead of a rule-based strategy for handling dynamic traffic flows, and 2) integrate it with a route planning strategy to further reduce the waiting time at intersections with a heavy traffic.

5.3.2 Advantages of Levid over RL-based Traffic Signal Control Approaches

From Table III, we observe that Individual RL performs better than other two RL-based baselines, OneModel and CoLight, and even performs best for OVs on four small-scale datasets, , , and . It is because Individual RL trains an exclusive agent for each intersection, which can evaluate the intersection state more accurately and reduce the performance loss. However, Individual RL cannot be applied to a large-scale dataset (e.g., ) due to the low computational efficiency. On the contrary, OneModel utilizes a shared centralized critic network, which may ignore the differences between individuals, resulting in an inevitable performance loss. Compared with OneModel, CoLight reduces the average travel time of EVs by , , , and on , , , and , respectively, because it considers the state of neighboring intersections and leverages the GAT to model the interactions between neighboring intersections. Compared with CoLight, our LEVID and also its three variants achieve consistent and obvious performance improvements for EVs, and achieve similar performance for OVs. More specifically, LEVID_UnDy reduces the average travel time of EVs by 14.76%, 8.91%, 12.94%, 32.72% and 19.24% than CoLight on the five datasets, respectively, which demonstrates the importance of the reward design considering both EVs and OVs simultaneously. LEVID_Dy further reduces the average travel time of EVs by 1.70%, 10.76%, 7.36%, 9.51% and 14.82% than LEVID_UnDy on the five datasets, respectively, which demonstrates the importance of utilizing a dynamic directed graph. Moreover, our proposed approaches have a larger advantage with the increase of the road network scale.

((a)) Convergence speed on
((b)) Convergence speed on
((c)) Convergence speed on
((d)) Convergence speed on
((e)) Convergence speed on
((f)) Running time
Fig. 9: Convergence speed and running time of LEVID_Dy (green continuous curves) and other 4 RL-based traffic signal control approaches (dashed curves) during training. In most cases, LEVID_Dy starts with the best performance (Jumpstart), reaches to the pre-defined performance the fastest (Time to Threshold), and ends with the optimal policy (Aysmptotic). Curves are smoothed with a moving average of 5 points. Note that, the convergence curve of Individual RL is not provided on , as it cannot be applied to a large-scale dataset due to the low computational efficiency.

5.3.3 Advantages of integrating a real-time route planning module

From Table III, we observe that Dijkstra reduces the average travel time of EVs by 0.29%, , , and than OVs in the same environment on the five datasets, respectively, which demonstrates the importance of route planning. Nevertheless, this performance improvement is far less than that by several CVRS, because a vehicle-centric approach just avoids congested roads in a passive way while failing to proactively improve traffic conditions. By contrast, AAF reduces the average travel time of EVs by 18.53%, 61.61%, 70.97%, 49.37% and 72.36% than OVs in the same environment on the five datasets, respectively. Compared with GreenWave, AAF greatly reduces the average travel time of OVs due to the merit of integrating a route planning module. However, AAF cannot achieve consistent advantages for EVs especially on small-scale datasets, demonstrating its limited ability of handling frequently changing traffic flow in the way of planning routes in advance. Compared with AAF, LEVID_APF reduces the average travel time of OVs and EVs by at most 58.19% and 6.81%, respectively, on the five datasets, which demonstrates the importance of the tightly coupled cooperation between route planning and traffic signal control. In spite of this, we observe that LEVID_APF performs worse than AFF for EVs on , because LEVID_APF does not consider the limitation of the immediate repulsion in a large-scale road network. By contrast, LEVID reduces the average travel time of EVs by 1.96%, 12.54%, 11.48%, 11.94% and 18.26% than LEVID_APF on the five datasets, respectively, which demonstrates the importance of considering the long-term repulsion.

5.4 Convergence comparison

In Fig. 9, we present the average travel time of EVs evaluated at each episode to the corresponding learning curves for the five RL-based traffic signal control approaches. The results show that our method has better performance in both time to threshold (learning time to achieve a pre-specified performance level) and asymptotic performance (final learned performance). The convergence curve also presents the influence of dynamic traffic flows on the convergence. The convergence curves on the synthetic dataset are smoother while the dynamic real-world datasets bring some fluctuations to the convergence curve of most RL-based approaches. The training time (total time for 100 episode training) of all RL-based traffic signal control approaches are also presented. For fair comparison, each model is trained individually. As shown in Fig. 9(f), the time consumption of LEVID_Dy is much less than that of Individual RL and all the approaches with centralized model are efficient. This is consistent with the complexity analysis of the centralized model. In addition, the actual time consumption of Individual RL on is not provided, as each episode takes more than 100 hours when all models are trained centrally on one server.

Fig. 10: Space-time diagram with signal timing plan to illustrate the effect of learned coordination strategy on dataset .

5.5 Case study

The time-space diagrams are utilized to show the trajectory of one EV and the corresponding traffic signal control plan on dataset . In Fig. 10, the left part shows the real-world network structure and the right part denotes the specific driving process of the EV. The x-axis is the time and y-axis denotes the distance. There are one gray line denoting the trajectory of the EV and bands with green-yellow-red colors denoting the changing phases of intersections in this trajectory. This EV turns right at the 9th intersection, where the right turn signal is always green. Fig. 10 illustrates that the EV takes to pass through intersections. We can observe that LEVID can automatically form a green wave for the EV to help it pass quickly.

6 Related Work

6.1 Vehicle-centric Scheduling

The vehicle-centric scheduling methods aim at scheduling vehicles with the best routes that can minimize the travel cost or satisfy personalized preferences. Most of studies focus on route planning for OVs, which can be broadly divided into two categories, cost-centric routing [43, 15, 47, 22, 29, 42, 28] and trajectory-based routing [46, 8, 13, 44]. Some cost-centric studies mainly focus on route planning on a dynamic stochastic graph with time-dependent, uncertain edge weights [43, 15, 47, 22, 29]. Other cost-centric studies [42, 28] take the dependencies among time distributions of different roads into account to improve the accuracy of travel time estimation. The trajectory-based studies focus on leveraging historical trajectories for path recommendation [46, 8, 13, 44]. However, these methods may not apply to EVs as the main concerns of EVs should be the time sensitivity rather than the personalized preferences.

Only a few early studies focus on route planning for EVs [26, 6, 3]. Nordin et al. [26] utilize A* algorithm to determine the shortest path for dispatching an ambulance to a specific ambulance station or emergency site. Chen et al. [6] analyze three different emergency evacuation cases and build a dynamic road network model for vehicles evacuation based on the Dijkstra algorithm. Barrachina et al. [3] utilize vehicular communications to accurately estimate the traffic density in a certain area and help reduce the emergency services arrival time with evolution strategies. However, these methods just avoid congested roads in a passive way, while failing to proactively improve the traffic condition to shorten the travel time of EVs.

6.2 Road-centric Scheduling

The road-centric scheduling methods aim at actively improving traffic conditions by traffic signal control technologies [41, 32]. Extensive studies focus on traffic signal control for OVs, and the mainstream technologies have undergone an development from rule-based methods to learning-based methods [39, 34, 37, 4, 36]. The conventional traffic signal control method Maxpressure [23, 35] measures the traffic flow in real time and changes the current phase according to the rule-based preset scheme. Reinforcement learning based methods attempt to address traffic signal control problem by interacting with the environment and learning from real-time data. Some studies leverage tabular Q-learning [1, 11] and deep reinforcement learning [39] for the traffic signal control of single intersection. For the multi-intersection traffic signal control, the centralized RL method [34, 21] models the actions of all agents jointly and negotiates the traffic signal control with centralized optimization, which is computationally expensive. While the decentralized RL method makes its decision based on observation of each independent agent. Some methods [27, 4, 2, 10, 48] handle the non-stationary impacts of other agents in complicated environment with exquisite reward design, which requires more human expert experience. In contrast, other methods [25, 50, 36, 38] add neighbors’ traffic condition into observation and enable agents to behave as a group and form coordination. However, these methods do not consider the priority of EVs.

By contrast, the existing traffic signal control studies for EVs are still limited to the rule-based methods [17, 45, 31]. Kang et al. [17] propose an EV signal coordination approach to provide “green wave” for EVs. Younes et al. [45] design a real-time dynamic traffic signal control method which can handle the presence of one or more EVs over the road networks. Rosayyan et al. [31] leverage a global navigation satellite system based on geo-fencing techniques to identify the entry of EVs and provide green signal automatically. However, these rule-based methods rarely consider the impact of scheduling strategy on OVs and cannot interact with the environment in real-time.

6.3 Cooperative Vehicle-Road Scheduling

The cooperative vehicle-road scheduling provides route planning and traffic light control for EVs simultaneously. Djahel et al. [9] design a traffic signal controller, which finds the quickest path for an EV in advance, and utilize RFID to give priority to an EV as soon as it approaches the traffic lights on this route. Karmakar et al. [18] determine the signal lights to be green based on the current traffic condition and calculate the priority levels of different EVs based on the type and the severity of an incident in case of the conflict between EVs. This work also considers the impact on the traffic in the neighboring roads surrounding the EV’s travel route. However, these methods are mainly based on pre-set fixed rules and simplified assumptions. They cannot be updated synchronously as the real-time dynamic traffic flow changes [16].

7 Conclusion

In this paper, we consider a cooperative vehicle-infrastructure system to help EVs arrive faster. Based on the key insight that real-time vehicle-road information interaction and strategy coordination can bring more benefits, we propose LEVID, a learning-based cooperative vehicle-road scheduling approach. LEVID contains a real-time route planning module and a collaborative traffic signal control module, which influences each other and makes decisions iteratively. The first module adapts the artificial potential field method to handle the real-time changes of traffic signals and jump out of the local optimum. The second module utilizes the multi-agent reinforcement learning framework to handle the traffic features that are hard to be combined linearly based on human experience and predefined rules. It further leverages graph attention networks based on a dynamic directed graph to model the interactions between intersections. Extensive experiments based on multiple real-world datasets demonstrate that our approach outperforms the state-of-the-art baselines.

References

  • [1] B. Abdulhai, R. Pringle, and G. J. Karakoulas (2003) Reinforcement learning for true adaptive traffic signal control. Journal of Transportation Engineering 129 (3), pp. 278–285. Cited by: §6.2.
  • [2] I. Arel, C. Liu, T. Urbanik, and A. G. Kohls (2010) Reinforcement learning-based multi-agent system for network traffic signal control. IET Intelligent Transport Systems 4 (2), pp. 128–135. Cited by: §6.2.
  • [3] J. Barrachina, P. Garrido, M. Fogue, F. J. Martinez, J. Cano, C. T. Calafate, and P. Manzoni (2014) Reducing emergency services arrival time by using vehicular communications and evolution strategies. Expert Systems with Applications 41 (4), pp. 1206–1217. Cited by: §1, §6.1.
  • [4] C. Chen, H. Wei, N. Xu, G. Zheng, M. Yang, Y. Xiong, K. Xu, and Z. Li (2020) Toward a thousand lights: decentralized deep reinforcement learning for large-scale traffic signal control. In Proc. of AAAI, Vol. 34, pp. 3414–3421. Cited by: §6.2.
  • [5] D. Chen, C. S. Ong, and L. Xie (2016) Learning points and routes to recommend trajectories. In Proc. of ACM CIKM, pp. 2227–2232. Cited by: 1st item.
  • [6] Y. Chen, S. Shen, T. Chen, and R. Yang (2014) Path optimization study for vehicles evacuation based on dijkstra algorithm. Procedia Engineering 71, pp. 159–165. Cited by: §1, 4th item, §6.1.
  • [7] T. Chu, J. Wang, L. Codecà, and Z. Li (2019) Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Transactions on Intelligent Transportation Systems 21 (3), pp. 1086–1095. Cited by: 7th item.
  • [8] J. Dai, B. Yang, C. Guo, and Z. Ding (2015) Personalized route recommendation using big trajectory data. In Proc. of IEEE ICDE, pp. 543–554. Cited by: §1, §6.1.
  • [9] S. Djahel, N. Smith, S. Wang, and J. Murphy (2015) Reducing emergency services response time in smart cities: an advanced adaptive and fuzzy approach. In Proc. of IEEE ISC2, pp. 1–8. Cited by: 5th item, §6.3.
  • [10] S. El-Tantawy, B. Abdulhai, and H. Abdelgawad (2013) Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown toronto. IEEE Transactions on Intelligent Transportation Systems 14 (3), pp. 1140–1150. Cited by: §6.2.
  • [11] S. El-Tantawy and B. Abdulhai (2010) An agent-based learning towards decentralized and coordinated traffic signal control. In Proc. of IEEE ITSC, pp. 665–670. Cited by: §6.2.
  • [12] M. Goldsmith (2019) V2X can help emergency vehicles get there faster. Note: https://www.danlawinc.com/v2x-can-help-emergency-vehicles-get-faster/ Cited by: §1.
  • [13] C. Guo, B. Yang, J. Hu, and C. Jensen (2018) Learning to route with sparse trajectory sets. In Proc. of IEEE ICDE, pp. 1073–1084. Cited by: §1, §6.1.
  • [14] W. D. Hayes (1970) Kinematic wave theory. Proc. of the Royal Society of London. A. Mathematical and Physical Sciences 320 (1541), pp. 209–226. Cited by: 2nd item.
  • [15] J. Hu, B. Yang, C. Guo, and C. S. Jensen (2018) Risk-aware path selection with time-varying, uncertain travel costs: a time series approach. The VLDB Journal 27 (2), pp. 179–200. Cited by: §1, §6.1.
  • [16] S. Humagain, R. Sinha, E. Lai, and P. Ranjitkar (2020) A systematic review of route optimisation and pre-emption methods for emergency vehicles. Transport reviews 40 (1), pp. 35–53. Cited by: §6.3.
  • [17] W. Kang, G. Xiong, Y. Lv, X. Dong, F. Zhu, and Q. Kong (2014) Traffic signal coordination for emergency vehicles. In Proc. of IEEE ITSC, pp. 157–161. Cited by: §1, §2, 3rd item, §6.2.
  • [18] G. Karmakar, A. Chowdhury, J. Kamruzzaman, and I. Gondal (2020) A smart priority based traffic control system for emergency vehicles. IEEE Sensors Journal. Cited by: §6.3.
  • [19] O. Khatib (1985) Real-time obstacle avoidance for manipulators and mobile robots. In Proc. of IEEE ICRA, Vol. 2, pp. 500–505. Cited by: 1st item.
  • [20] P. Koonce and L. Rodegerdts (2008) Traffic signal timing manual.. Technical report United States. Federal Highway Administration. Cited by: 1st item.
  • [21] L. Kuyer, S. Whiteson, B. Bakker, and N. Vlassis (2008) Multiagent reinforcement learning for urban traffic control using coordination graphs. In Proc. of ECML PKDD, pp. 656–671. Cited by: §6.2.
  • [22] L. Li, S. Wang, and X. Zhou (2019) Time-dependent hop labeling on road network. In Proc. of IEEE ICDE, pp. 902–913. Cited by: §1, §6.1.
  • [23] J. Lioris, A. Kurzhanskiy, and P. Varaiya (2016) Adaptive max pressure control of network of signalized intersections. IFAC-PapersOnLine 49 (22), pp. 19–24. Cited by: 2nd item, §6.2.
  • [24] H. Liu, Y. Li, Y. Fu, H. Mei, J. Zhou, X. Ma, and H. Xiong (2020) Polestar: an intelligent, efficient and national-wide public transportation routing engine. In Proc. of ACM SIGKDD, pp. 2321–2329. Cited by: 1st item.
  • [25] T. Nishi, K. Otaki, K. Hayakawa, and T. Yoshimura (2018) Traffic signal control based on reinforcement learning with graph convolutional neural nets. In Proc. of IEEE ITSC, pp. 877–883. Cited by: §6.2.
  • [26] N. A. M. Nordin, Z. A. Zaharudin, M. A. Maasar, and N. A. Nordin (2012) Finding shortest path of the ambulance routing: interface of algorithm using programming. In IEEE Symposium on Humanities, Science and Engineering Research, pp. 1569–1573. Cited by: §1, §6.1.
  • [27] A. Nowé, P. Vrancx, and Y. De Hauwere (2012) Game theory and multi-agent reinforcement learning. pp. 441–470. Cited by: §6.2.
  • [28] S. A. Pedersen, B. Yang, and C. S. Jensen (2020) A hybrid learning approach to stochastic routing. In Proc. of IEEE ICDE, pp. 1910–1913. Cited by: §6.1.
  • [29] S. A. Pedersen, B. Yang, and C. S. Jensen (2020) Fast stochastic routing under time-varying uncertainty. The VLDB Journal 29 (4), pp. 819–839. Cited by: §1, §6.1.
  • [30] RapidSOS (2015) Quantifying the impact of emergency response times.. Note: www.RapidSOS.com Cited by: §1.
  • [31] P. Rosayyan, S. Subramaniam, and S. I. Ganesan (2020) Decentralized emergency service vehicle pre-emption system using rf communication and gnss-based geo-fencing. IEEE Transactions on Intelligent Transportation Systems. Cited by: §1, §6.2.
  • [32] C. Sommer, R. German, and F. Dressler (2010) Bidirectionally coupled network and road traffic simulation for improved ivc analysis. IEEE Transactions on mobile computing 10 (1), pp. 3–15. Cited by: 2nd item, §6.2.
  • [33] P. Tong, M. Li, M. Li, J. Huang, and X. Hua (2021) Large-scale vehicle trajectory reconstruction with camera sensing network.. In MobiCom, pp. 188–200. Cited by: §2.
  • [34] E. Van der Pol and F. A. Oliehoek (2016) Coordinated deep reinforcement learners for traffic light control. Proc. of NeurIPS. Cited by: §6.2.
  • [35] P. Varaiya (2013) Max pressure control of a network of signalized intersections. Transportation Research Part C: Emerging Technologies 36, pp. 177–195. Cited by: §4.3.1, §6.2.
  • [36] Y. Wang, T. Xu, X. Niu, C. Tan, E. Chen, and H. Xiong (2020) STMARL: a spatio-temporal multi-agent reinforcement learning approach for cooperative traffic light control. IEEE Transactions on Mobile Computing. Cited by: 2nd item, §6.2.
  • [37] H. Wei, C. Chen, G. Zheng, K. Wu, V. Gayah, K. Xu, and Z. Li (2019) Presslight: learning max pressure control to coordinate traffic signals in arterial network. In Proc. of ACM SIGKDD, pp. 1290–1298. Cited by: §4.3.1, §6.2.
  • [38] H. Wei, N. Xu, H. Zhang, G. Zheng, X. Zang, C. Chen, W. Zhang, Y. Zhu, K. Xu, and Z. Li (2019) Colight: learning network-level cooperation for traffic signal control. In Proc. of CIKM, pp. 1913–1922. Cited by: 1st item, 3rd item, 4th item, 5th item, 8th item, §6.2.
  • [39] H. Wei, G. Zheng, H. Yao, and Z. Li (2018) Intellilight: a reinforcement learning approach for intelligent traffic light control. In Proc. of ACM SIGKDD, pp. 2496–2505. Cited by: 6th item, §6.2.
  • [40] L. Wei, Y. Zheng, and W. Peng (2012) Constructing popular routes from uncertain trajectories. In Proc. of ACM SIGKDD, pp. 195–203. Cited by: 1st item.
  • [41] T. Xu, H. Zhu, H. Xiong, H. Zhong, and E. Chen (2019) Exploring the social learning of taxi drivers in latent vehicle-to-vehicle networks. IEEE Transactions on Mobile Computing 19 (8), pp. 1804–1817. Cited by: 2nd item, §6.2.
  • [42] B. Yang, J. Dai, C. Guo, C. S. Jensen, and J. Hu (2018) PACE: a pa th-ce ntric paradigm for stochastic path finding. The VLDB Journal 27 (2), pp. 153–178. Cited by: §1, §6.1.
  • [43] B. Yang, C. Guo, C. S. Jensen, M. Kaul, and S. Shang (2014) Stochastic skyline route planning under time-varying uncertainty. In Proc. of IEEE ICDE, pp. 136–147. Cited by: §6.1.
  • [44] S. B. Yang and B. Yang (2020) Learning to rank paths in spatial networks. In Proc. of IEEE ICDE, pp. 2006–2009. Cited by: §1, §6.1.
  • [45] M. B. Younes and A. Boukerche (2018) An efficient dynamic traffic light scheduling algorithm considering emergency vehicles for intelligent transportation systems. Wireless Networks 24 (7), pp. 2451–2463. Cited by: §1, §6.2.
  • [46] J. Yuan, Y. Zheng, C. Zhang, W. Xie, X. Xie, G. Sun, and Y. Huang (2010) T-drive: driving directions based on taxi trajectories. In Proc. of ACM SIGSPATIAL, pp. 99–108. Cited by: §1, §6.1.
  • [47] Y. Yuan, X. Lian, G. Wang, L. Chen, Y. Ma, and Y. Wang (2019) Weight-constrained route planning over time-dependent graphs. In Proc. of IEEE ICDE, pp. 914–925. Cited by: §1, §6.1.
  • [48] X. Zang, H. Yao, G. Zheng, N. Xu, K. Xu, and Z. Li (2020) MetaLight: value-based meta-reinforcement learning for traffic signal control. In Proc. of AAAI, Vol. 34, pp. 1153–1160. Cited by: §6.2.
  • [49] H. Zhang, S. Feng, C. Liu, Y. Ding, Y. Zhu, Z. Zhou, W. Zhang, Y. Yu, H. Jin, and Z. Li (2019) Cityflow: a multi-agent reinforcement learning environment for large scale city traffic scenario. In World Wide Web Conference, pp. 3620–3624. Cited by: §5.
  • [50] G. Zheng, X. Zang, N. Xu, H. Wei, Z. Yu, V. Gayah, K. Xu, and Z. Li (2019) Diagnosing reinforcement learning for traffic signal control. arXiv preprint arXiv:1905.04716. Cited by: §6.2.