CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario

by   Huichu Zhang, et al.
Shanghai Jiao Tong University

Traffic signal control is an emerging application scenario for reinforcement learning. Besides being as an important problem that affects people's daily life in commuting, traffic signal control poses its unique challenges for reinforcement learning in terms of adapting to dynamic traffic environment and coordinating thousands of agents including vehicles and pedestrians. A key factor in the success of modern reinforcement learning relies on a good simulator to generate a large number of data samples for learning. The most commonly used open-source traffic simulator SUMO is, however, not scalable to large road network and large traffic flow, which hinders the study of reinforcement learning on traffic scenarios. This motivates us to create a new traffic simulator CityFlow with fundamentally optimized data structures and efficient algorithms. CityFlow can support flexible definitions for road network and traffic flow based on synthetic and real-world data. It also provides user-friendly interface for reinforcement learning. Most importantly, CityFlow is more than twenty times faster than SUMO and is capable of supporting city-wide traffic simulation with an interactive render for monitoring. Besides traffic signal control, CityFlow could serve as the base for other transportation studies and can create new possibilities to test machine learning methods in the intelligent transportation domain.


page 1

page 2

page 3

page 4


Towards a Very Large Scale Traffic Simulator for Multi-Agent Reinforcement Learning Testbeds

Smart traffic control and management become an emerging application for ...

A Survey on Traffic Signal Control Methods

Traffic signal control is an important and challenging real-world proble...

Learning to Route via Theory-Guided Residual Network

The heavy traffic and related issues have always been concerns for moder...

Railway Operation Rescheduling System via Dynamic Simulation and Reinforcement Learning

The number of railway service disruptions has been increasing owing to i...

DQN Control Solution for KDD Cup 2021 City Brain Challenge

We took part in the city brain challenge competition and achieved the 8t...

Charging control of electric vehicles using contextual bandits considering the electrical distribution grid

With the proliferation of electric vehicles, the electrical distribution...

Road Traffic Monitoring using DSRC Signals

A wide variety of sensor technologies are nowadays used for traffic moni...

1. introduction

Traffic signal control problem, one of the biggest urban problems, is drawing increasing attention in recent years (Wei et al., 2018; Li et al., 2016; Van der Pol and Oliehoek, 2016). Recent advances are enabled by large-scale real-time traffic data collected from various sources such as vehicle tracking device, location-based mobile services, and road surveillance cameras through advanced sensing technology and web infrastructure. Traffic signal control is interesting but complex because of the dynamics of traffic flow and the difficulties to coordinate thousands of traffic signals. Reinforcement learning becomes one of the promising approaches to optimize traffic signal plans, as shown in several recent studies (Wei et al., 2018; Li et al., 2016; Van der Pol and Oliehoek, 2016). At the same time, traffic signal control is also one of the major real-world application scenarios for reinforcement learning (Li, 2017).

To successfully deploy reinforcement learning technique for traffic signal control, the traffic simulator becomes the most important factor. Because the learning method relies on a large set of data samples. These data samples can hardly be collected from the real world directly. Aside from the consequence of bad decisions, a city simply cannot generate enough data samples for learning. If we treat each minute as a data sample, a city can only generate 1,440 (24 hours by 60 minutes) data samples in a day. Such a small size of sample is not enough to train a deep reinforcement learning model to be powerful enough to make good decisions. Thus, it becomes crucial to have a simulator that is fast enough to generate a large set of data samples.

The most popular public traffic simulator SUMO (Lopez et al., 2018) (Simulation of Urban Mobility) has been frequently used in many recent studies. SUMO, however, is not scalable to the size of the road network and the size of traffic flow. For example, it can only perform around three simulation steps per second on a grid with tens of thousands of vehicles, the situation is even worse if we use the python interface to get information about the system to support reinforcement learning. A city, however, is often at the size of a thousand intersections (e.g. there are intersections of major roads in Hangzhou, China) and hundreds of thousands vehicles, which is beyond the current simulation capacity of SUMO.

To enable the reinforcement learning for intelligent transportation, we create a traffic simulator CityFlow 111, which can be scaled to support the city-wide traffic simulation. One of the major improvements over SUMO is that CityFlow enables multithreading computing. To the best of our knowledge, this is the first open-source simulator that can support city-wide traffic simulator. CityFlow is flexible to define road network, vehicle models, and traffic signal plans. It is more than twenty times faster than SUMO. We have also provided friendly interface for reinforcement learning testbed. We plan to demonstrate these functions at the demo session.

Finally, our scalable traffic simulator CityFlow will open many new possibilities besides traffic signal control scenario. First, it could support various large-scale transportation research studies, such as vehicle routing through mobile app, traffic jam prevention. Second, similar to OpenAI Gym222 which provides a set of benchmark environments for reinforcement learning, CityFlow could serve as a benchmark reinforcement learning environment for transportation studies. Besides traffic signal control, reinforcement learning has been used in transportation studies such as taxi dispatching (Xu et al., 2018) and mixed autonomy systems (Wu et al., 2017). But all the existing studies either use SUMO or over-simplified traffic simulator. Third, we plan to better calibrate the simulation parameters by learning from real-world observations. This will make the simulator not only generate data samples fast but also generate “real” data samples.

2. Brief Description

2.1. System Design

CityFlow is a microscopic traffic simulator which simulates the behavior of each vehicle at each time step, providing highest level of detail in the evolution of traffic. However, microscopic traffic simulators are subject to slow simulation speed (Yin and Qiu, 2011). Unlike SUMO, CityFlow uses multithreading to accelerate the simulation. Data structure and simulation algorithm are also optimized to further speedup of the process.

2.1.1. Road Network

Road network is the basic data structure in CityFlow. Road represents a directional road from one intersection to another intersection with road-specific properties. A road may contain multiple lanes. Each lane holds a Linked List of vehicles. Linked List supports fast insertion and searching of leading vehicles. Segments are small fragments of a lane. We design segments in order to efficiently find all vehicles within a certain range of the lane. This structure is crucial for fast lane change operation. Intersection is where roads intersects. An intersection contains several roadlinks. Each roadlink connects two roads of the intersection and can be controlled by traffic signals. A roadlink contains several lanelinks. Each lanelink represents a specific path from one lane of incoming road to one lane of outgoing road. Cross represents the cross point between two lanelinks. This structure is crucial for fast intersection logic.

2.1.2. Car Following Model

The car-following model is the core component CityFlow. It computes the desired speed of each vehicle at next step using information like traffic signal, leading vehicles, etc. and ensures that no collisions occur in the system. Currently, the car following model used in CityFlow is a modification of the model proposed by Stephen Krauß (Krauß, 1998). The key idea is that: the vehicle will drive as fast as possible subject to perfect safety regularization (e.g. being able to stop even if leading vehicle stops using maximum deceleration). Unlike SUMO (Lopez et al., 2018), we use ballistic position update rule instead of Euler position update. Ballistic update yields more realistic dynamics for car-following models based on continuous dynamics especially for larger time-steps (e.g. 1 second) (Treiber and Kanagaraj, 2015).

Basically, vehicles are subject to several speed constraints, maximum speed which meets all these constraints will be chosen. Currently, following constraints are considered:

  • vehicle and driver’s maximum acceleration

  • road speed limit

  • collision free following speed

  • headway time following speed

  • intersection related speed

Due to page limit, we only present the detail of collision free speed computation. It takes current speed of following vehicle, current speed of leading vehicle, maximum deceleration of following vehicle, maximum deceleration of leading vehicle, current gap between two vehicles, the length of each time step as parameters and compute the no-collision-speed by solving a quadratic equation using equation 1.


Intersection related speed is handled by intersection logic and is illustrated in the next section.

2.1.3. Intersection Logic

The behavior of vehicles in intersection is complex and it requires careful design to efficiently mimic real world behavior (Krajzewicz and Erdmann, 2013; Fellendorf and Vortisch, 2010). Basically, vehicles in intersection should obey following two rules:

  • fully stop at red signal, stop if possible at yellow signal

  • yield to vehicles with higher priority (e.g. turning vehicles should yield straight-moving vehicles)

To avoid collisions at intersection, it is non-trivial to check if there are vehicles on the opposite lane. The simplest method is to use brute force search to find all vehicles within a certain range and check if they will collide within a certain time period. But this method is very time consuming. Instead, we precompute all the cross points between lanelinks in intersection. When a vehicle approaches the intersection, it will notify all cross points in the intersection about its arrival. The cross points is responsible for deciding which vehicle could pass and which vehicle should yield. The time complexity of our algorithm is . Due to page limit, we omit the detail of our algorithm.

2.1.4. Lane Change Model

Lane change model addresses two questions for a vehicle: when and how to change lane. Vehicles may change lanes when there are more free space on adjacent lanes or a lane change is required to follow its route. Notice that it is slow to traverse all vehicles in adjacent lanes. Instead, by maintaining the vehicle information in segments which are small fragments of each lane, we only need to search for related vehicles in adjacent segments in constant time (up to three segments for each lane), which largely reduce time complexity.

When a vehicle decides to change lane, it needs to find a way to notify other vehicles. Here we use a similar mechanism in SUMO. When a vehicle changes lane, the simulation engine will put a copy of it to its destination lane, called shadow vehicle. A shadow vehicle has the same function as a normal vehicle, and it can become the leader of other vehicles in the car following model. The vehicle and its shadow moves consistently, which is guaranteed by the simulation engine in the way that their speed constraints will be applied to each other. After the lane change finishes, the simulation engine will just remove the original vehicle and let its shadow vehicle replace it.

2.2. Python Interface

In order to support multi-agent reinforcement learning, we provide a python interface via pybind11 (Jakob et al., 2017). User can perform simulation step by step and get various kinds of information about current state, e.g. number of vehicles on lane, speed of vehicles. Besides, we provide interface to control the elements in simulator at each time step. Currently, users can control traffic signals and add vehicles on-the-fly. We plan to support more types of controlling functions such as vehicle behavior control and road property control in the future. Below is a sample usage of python interface.

import engine
eng = engine.Engine(config_file)
phase = […] # the traffic signal phase of each time step
for step in range(3600):
    eng.set_tl_phase(”intersection_1_1”, phase[step])
    # do something
Listing 1: usage of python interface

2.3. Frontend

We provide a web-based Graphic User Interface. User can check the replay output by the simulator. In order to support viewing large-scale simulation, we use WebGL-based library PixiJS333 for fast rendering of vehicles and traffic signals. Figure 1 shows some screenshots of the GUI under several scenarios.

(a) road network with different green ratio
(b) grid road network
(c) grid road network
(d) Part of Manhattan road network
Figure 1. Screenshot of CityFlow in different scenario

3. Performance

Figure 2. Speedup of CityFlow compared to SUMO

3.1. Efficiency

We compare the performance between SUMO and CityFlow under different scenarios. The experiment runs on Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz. As Figure 2 shows, CityFlow outperforms SUMO in all scenarios from small traffic to large traffic with single thread. The speedup is even more significant with more threads. We achieve about 25 times speedup on large scale road networks with tens of thousands of vehicles using 8 threads, which is 72 steps of simulation per second. Besides, CityFlow shows better efficiency when retrieving information of the simulation via python interface. This is mainly because SUMO uses socket for interaction while CityFlow uses pybind11 for seamless C++ and python integration.

3.2. Effectiveness

We evaluate the effectiveness of CityFlow by comparing to SUMO because SUMO is already a widely-used traffic simulator and its effectiveness is acceptable by domain experts. We compare the average duration of vehicles (time for a vehicle to enter and leave the road network) under different traffic volume settings. As Table 1 shows, the difference is within reasonable range.

Vehicles/Hour 100 200 300 400 500
SUMO 40.76 41.57 42.75 44.08 45.93
CityFlow 40.79 41.58 42.62 43.84 45.45
Difference 0.07% 0.04% 0.30% 0.54% 1.06%
Table 1. Duration of vehicles under different traffic volume

4. Demo Detail

We plan to demonstrate CityFlow in different traffic scenarios and show its capability to serve as reinforcement learning testbed.

The demo consists of following parts:

  • Simulating traffic in various scenarios, from synthetic grid scenarios to real world scenarios, and from small road networks with dozens of vehicles to large scale networks with tens of thousands of vehicles.

  • Show the effectiveness the car-following model, intersection logic and lane change behavior of the simulator.

  • Show a complete reinforcement learning training episode of optimizing traffic signal plan. Participants can observe gradual improvement of traffic condition during the training.

  • Demo participants can control cycle length, green ratio of traffic signal and change the volume of traffic and see instant feedback of how the traffic condition would change.

We have published a video on Youtube444, which demonstrate the expected effect. The project is under active development and we are likely to add other features (e.g. more map options, vehicle controls) and demonstrate more functions at the conference.

No special hardware is required since we are demonstrating a software project (learning platform). We will bring our laptop. It would be great if a monitor is provided.

5. Summary

We propose CityFlow, an efficient, multi-agent reinforcement learning environment for large scale city traffic scenario. Researchers can use it as a testbed for traffic signal control problem and conduct research on urban mobility. We will demonstrate the usage and some results of RL-controlled traffic signal plan. Also, we are actively developing the project and plan to support more RL scenarios like dynamic vehicle routing, policy of reversible lane or limited lane as well as open source the project in the near future.


  • (1)
  • Fellendorf and Vortisch (2010) Martin Fellendorf and Peter Vortisch. 2010. Microscopic traffic flow simulator VISSIM. In Fundamentals of traffic simulation. Springer, 63–93.
  • Jakob et al. (2017) Wenzel Jakob, Jason Rhinelander, and Dean Moldovan. 2017. pybind11 – Seamless operability between C++11 and Python.
  • Krajzewicz and Erdmann (2013) Daniel Krajzewicz and Jakob Erdmann. 2013. Road intersection model in SUMO. In 1st SUMO User Conference-SUMO, Vol. 21. 212–220.
  • Krauß (1998) Stefan Krauß. 1998. Microscopic modeling of traffic flow: Investigation of collision free vehicle dynamics. Ph.D. Dissertation. Universitat zu Koln.
  • Li et al. (2016) Li Li, Yisheng Lv, and Fei-Yue Wang. 2016. Traffic signal timing via deep reinforcement learning. IEEE/CAA Journal of Automatica Sinica 3, 3 (2016), 247–254.
  • Li (2017) Yuxi Li. 2017. Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274 (2017).
  • Lopez et al. (2018) Pablo Alvarez Lopez, Michael Behrisch, Laura Bieker-Walz, Jakob Erdmann, Yun-Pang Flötteröd, Robert Hilbrich, Leonhard Lücken, Johannes Rummel, Peter Wagner, and Evamarie Wießner. 2018. Microscopic Traffic Simulation using SUMO, In The 21st IEEE International Conference on Intelligent Transportation Systems. IEEE Intelligent Transportation Systems Conference (ITSC).
  • Treiber and Kanagaraj (2015) Martin Treiber and Venkatesan Kanagaraj. 2015. Comparing numerical integration schemes for time-continuous car-following models. Physica A: Statistical Mechanics and its Applications 419 (2015), 183–195.
  • Van der Pol and Oliehoek (2016) Elise Van der Pol and Frans A Oliehoek. 2016. Coordinated deep reinforcement learners for traffic light control. Proceedings of Learning, Inference and Control of Multi-Agent Systems (at NIPS 2016) (2016).
  • Wei et al. (2018) Hua Wei, Guanjie Zheng, Huaxiu Yao, and Zhenhui Li. 2018. IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 2496–2505.
  • Wu et al. (2017) Cathy Wu, Aboudy Kreidieh, Eugene Vinitsky, and Alexandre M Bayen. 2017. Emergent Behaviors in Mixed-Autonomy Traffic. In Conference on Robot Learning. 398–407.
  • Xu et al. (2018) Zhe Xu, Zhixin Li, Qingwen Guan, Dingshui Zhang, Qiang Li, Junxiao Nan, Chunyang Liu, Wei Bian, and Jieping Ye. 2018. Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 905–913.
  • Yin and Qiu (2011) Derek Yin and Tony Qiu. 2011. Comparison of macroscopic and microscopic simulation models in modern roundabout analysis. Transportation Research Record: Journal of the Transportation Research Board 2265 (2011), 244–252.