Mcity Data Collection for Automated Vehicles Study

12/12/2019 ∙ by Yiqun Dong, et al. ∙ University of Michigan 4

The main goal of this paper is to introduce the data collection effort at Mcity targeting automated vehicle development. We captured a comprehensive set of data from a set of perception sensors (Lidars, Radars, Cameras) as well as vehicle steering/brake/throttle inputs and an RTK unit. Two in-cabin cameras record the human driver's behaviors for possible future use. The naturalistic driving on selected open roads is recorded at different time of day and weather conditions. We also perform designed choreography data collection inside the Mcity test facility focusing on vehicle to vehicle, and vehicle to vulnerable road user interactions which is quite unique among existing open-source datasets. The vehicle platform, data content, tags/labels, and selected analysis results are shown in this paper.



There are no comments yet.


page 1

page 2

page 3

page 4

page 6

page 7

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Automated vehicles (AVs) have the potential to radically impact our society[societal_impact]

by improving safety, congestion and energy consumption. Reliable AV operations require reliable sensing and perception of the surrounding environment, e.g., to understand the presence and future motions of road users and the governing traffic rules. Robust perception is the basis of safe/proper trajectory planning and control. To achieve reliable perception, deep neural networks are frequently used, which require large sets of data. In recent years, many open datasets were created and shared, first from universities and more recently, from companies. Not all datasets include all three of the common AV sensor types and the tags/labels vary considerably among those datasets.

All three types of commonly used AV sensors (cameras, lidars and radars) have strength and weakness. In addition, even within the lidar family, the mechanical scanning Velodyne lidar we used has 32 beams and covers much wider horizontal and vertical field of view, while the Ibeo lidar only has 4 beams and a limited field of view. Because both lidar sensors are widely used in automotive applications but commonly for different purposes (e.g., Level 4 vs. Level 2), compare/contrast their performance is of interest [dingzhao_multiple_lidar].

In the past, vehicle controls were largely designed based on model-based algorithms through mathematically rigorous processes [Peng_preview_control, dong_control_1, dong_control_2]. Recent advances, however, have indicated the potential of data-driven approaches [dingzhao_driving_behavior, xianan_paper], which requires a large amount of training (and validation) data. In the past, quite a few open datasets were published, which help to elevate the state of the art of the data-driven approaches tremendously. Nevertheless, many of them seem to only capture naturalistic driving, i.e., not deliberately focusing on challenging scenarios. Based on our previous work on accelerated evaluation, we believe the challenging driving behaviors should be emphasized more. In other words, collecting data naturalistically is time-consuming and costly. A deliberate, choreography-designed set of scenarios conducted inside a safe and closed test facility can provide a different and useful set of data that is complementary to naturalistic data.

The overall guiding principle of our data collection effort is completeness, including to deploy a wide set of sensors, cover a wide array of weather and lighting conditions, diverse lane marking on diverse road topology, and situations involving challenge interactions from other road users (vehicles, bicycles, pedestrians). In addition to collect naturalistic data on open roads, we also capture designed scenarios inside the Mcity test facilities, with the focus on intersections.

Table I compares our dataset with several other open datasets. Note that several commercial entities published some data recently, but some have very restrictive terms of use (e.g., Waymo[waymo_dataset]), which we choose not to include in our comparison. We summarize our contributions below:

  • [leftmargin = 9pt]

  • Controlled diversity: We repeatedly collect naturalistic driving data on fixed routes with deliberate variation in lighting, weather, traffic, and human driver characteristics.

  • Designed choreography: We designed representative urban driving scenarios with the host vehicle interacting with other vehicle/pedestrian/cyclist inside the Mcity test facility. The test case parameters were selected to cover both normal (courteous, law-abiding) and abnormal (aggressive, against traffic law) conditions.

  • Completeness: As shown in Table I, we use a comprehensive set of sensors.

Dataset Year Locations
Size (hr/mi)
Labeled Frames
360 FOV Lidar
Limited FOV Lidar
Lighting Diversity
Weather Diversity
Driving Behavior
Designed Choreography
Mcity dataset 2019 AA, Mcity 50/3k 17.5k
Lyft[lyft_dataset] 2019 CA -/- 55k
nuScenes[nuscenes] 2019 Boston, SG 5.5/55 40k
H3D[h3d] 2019 CA 0.77/- 27k
HDD[honda_dataset] 2018 CA 104/- 0
AS[appolo] 2018 China 100/- 144k
AS lidar[aslidar] 2018 China 2/- 20k
KAIST[kaist] 2018 Seoul -/- 8.9k
Vistas[mapillary] 2017 6 continents -/- 25k
BDD100k[bdd] 2017 NY, SF 1k/- 100k
Cityscapes[Cityscapes] 2016 50 cities -/- 25k
RobotCar[oxford] 2015 Oxford 210/620 0
KITTI[kitti] 2012 Karlsruhe 1.5/- 15k
CamVid[camvid] 2008 Cambridge 0.4/- 701
Notes: (1) All listed datasets have front facing camera(s). We define Lighting Diversity as whether both daytime and night data were collected, and Weather Diversity if both clear and rainy/snowy/foggy weathers are involved. Driving Behavior studies the driver’s facial/posture/steering commands. Designed Choreography refers to designed vehicle-vehicle/pedestrian/bicyclist interactions. (2) In the table, “” denotes Yes, “” denotes No, and “-” indicates no information is provided. (3) AA: Ann Arbor, SG: Singapore, CA: California, NY: New York, SF: San Francisco, AZ: Arizona, WA: Washington, AS: ApolloScape, H3D: Honda Research Institute 3D Dataset; HDD: Honda Research Institute Driving Dataset.
TABLE I: Comparison of existing open datasets

The remainder of this paper is organized as follows: Section II describes related literature. Section III introduces our vehicle platform and sensors setup. Section IV outlines our effort in data calibration, tagging, and labeling. Section V example data analytics. Finally, Section VI points out general conclusions and ongoing/future efforts of our work.

Ii Related Work

Ii-a Image Datasets

Many image datasets have been openly released for AV development. Examples including Imagenet

[imagenet] and COCO[coco] provides a seminal starting point for large-scale AI study. CamVid[camvid] offers semantic segmentation for 701 images, and Cityscapes[Cityscapes] captured in 50 cities include pixel-level annotations for 5k images. More recent datasets include Vistas[mapillary], BDD100k[bdd], and ApolloScape[appolo]. Some datasets were designed to capture particular diversities/challenges in driving. Vistas and BDD100k target large-scale naturalistic driving from many drivers with wide varieties of weather and lighting, [scnn] focuses on data for lane lines, and [cityperson, nightperson] focuses on pedestrians. In the literature there were also efforts that rely exclusively on camera images for AV perception. However, 3D localization using images only is challenging [39, 54, 46, 50]. This leads to a more comprehensive setup of sensors to utilize both semantic (cameras) and ranging sensors (Radar/Lidar/Ibeo). This combination provides better performance or redundancy under hardware failure [nuscenes]. Many datasets released recently include both semantic and ranging sensors.

Ii-B Multimodal Datasets

The seminal work that conveys the strength of multimodal sensors is KITTI[kitti], which provides Lidar scans as well as stereo images and GPS/IMU data. The H3D dataset[h3d] provides annotations in 360 view, not just the front objects. The KAIST dataset also uses a thermal camera for night time perception [kaist], Oxford RobotCar studies repeated driving on the same route[oxford], AppoloScape captures Lidar scans in dense traffic [aslidar], and nuScenes focuses on 360 semantic views[nuscenes]. Very recent datasets also includes the work from industrial entities such as Waymo[waymo_dataset] and Lyft[lyft_dataset].

Ii-C Driving Behavior Datasets

The aforementioned datasets primarily focused on data collection for different road environment. We believe another important aspect is the interaction with other road users and driver’s steering or speed control inputs. A prominent example of data collection not focusing on AV development is the University of Michigan safety pilot project data222 This dataset captures vehicle speed, location, and front perception using a MobilEye camera. Many data analysis results have been published[dingzhao_driving_behavior, xianan_paper, xianan_paper_2]. Multimodel datasets also usually include accurate GPS positions, thus providing the possibility of extracting vehicle speed, acceleration, and heading angle for human driver modeling [kitti, nuscenes, oxford]. Datasets focused on human driver behavior also include [honda_dataset, brain4car].

Ii-D Annotations

In the literature, different annotation and labeling strategies have been used. For images, 2D bounding boxes[imagenet, bdd], 3D bounding boxes[kitti, nuscenes, Cityscapes], and pixel-level segmentation[coco, cam_radar_fusion, bdd] are the most common formats. When (360) Lidar points are available, 3D bounding boxes may be provided[kitti, aslidar]. For Ibeo and Radar, annotations are usually not provided because the sensor outputs are too sparse or too complicated to annotate.

Fig. 1: Sensors on the vehicle platform.
Fig. 2: Example sensors outputs: (a) 30FOV, (b) 60

FOV, (c) Rear, (d) Head/Eyeball, (e) Body pose, (f) Lidar (red/yellow/green point clouds), Radar (green thin cuboids), and Ibeo (white dots)

Iii Vehicle Platform and Sensors

We collect the data manually driving a instrumented Lincoln MKZ. This vehicle is equipped with the following sensors:

  • [leftmargin = 9pt]

  • 3 Velodyne Ultra Puck VLP-32C Lidar, horizontal angular resolution 0.2, vertical 0.33, range 200m, 10Hz.

  • 2 forward-facing cameras, 60 and 30FOV, 1080P, 30Hz.

  • 1 backward-facing camera, 90FOV, 1080P, 30Hz.

  • 1 Cabin pose camera, 12801080, 30Hz.

  • 1 Cabin head/eyeball camera, 640P, 30Hz.

  • 1 Ibeo four beam LUX sensor, horizontal angular resolution 0.25, vertical 0.8, range 50m, 25Hz.

  • 1 Delphi ESR 2.5 Radar, range 60m, 90FOV, 20Hz.

  • 1 NovAtel FlexPak6 with IMU-IGM-S1 and 4G cellular for RTK GPS, single antenna, 1Hz.

The locations and example output of the sensors are shown in Fig. 1-2. We use two cameras for forward perception, one with wide FOV for general object detection/tracking, the other with narrower FOV for traffic signs and signals. We use a Logitech BRIO camera for backward monitoring which uses a wide FOV. The internal cabin cameras capture the body pose anf head/eyes movement of the human driver. We use three mechanical scanning Lidars, all on the rooftop to capture objects in front, rear left and rear right of the vehicle.

Fig. 3: Data collection routes. We start from Mcity, then proceed along US-23 north (#1), US-23 south (#2), M-153 east (#3) and US-94 east (#4)
Fig. 4: Data collection and image labeling examples (from the 60FOV camera). Left to right, top to bottom: clear, night-time, rain, fog, tunnel, bridge, curved road, ramp, intersection.

We record all sensors and critical vehicle CAN bus data. The CAN bus reports throttle, brake, and steering commands from the human driver, turn signals, high/low beam state. All the external cameras are connected to a laptop with a GPU, and the videos are recorded via the FFmpeg software. Other sensors (including the head/eyeball movement camera) and the CAN bus data are logged in the ROS formats.

Iv Data Collection and Annotation

Iv-a Data Collection Overview

The data is collected both on open roads and inside Mcity. On open roads, we focus on highways and major local roads. Three human drivers drive manually on these routes with different lighting, weather, road, and traffic conditions. See Fig. 3-4 for the four routes and an example of the collected scenes. We select routes that take roughly 1 hour round-trip. In total, over 3,000 miles have been covered. In the near future, we plan to focus on urban environments.

Fig. 5: Vehicle-vehicle interactions. (1) Low speed merge (2) Cuts in (3) Door ajar (4) Pass parallel parked vehicle (5) Roadside parked vehicle (6) Angle parked vehicle (7) Right turn (8) Left turn (9) Round-about
Fig. 6: Vehicle-pedestrian/bicyclist interactions. (1) Driving straight at an intersection (2) Right turn at an intersection (3) Left turn at an intersection (4) Follows pedestrian/bicyclist (5) Passing pedestrian/bicyclist on road (6) Pedestrian/Bicyclist yields to vehicle (7) Pedestrian/Bicyclist emerges from behind occlusion (8) Entering round-about (9) Exiting round-about.
Fig. 5: Vehicle-vehicle interactions. (1) Low speed merge (2) Cuts in (3) Door ajar (4) Pass parallel parked vehicle (5) Roadside parked vehicle (6) Angle parked vehicle (7) Right turn (8) Left turn (9) Round-about

The second set of data focuses on designed choreography inside Mcity. We refer particularly to the challenging scenarios in the Mcity ABC testing [mcity_abc], and design 18 scenarios to study vehicle to vehicle and vehicle to pedestrian/bicyclist interactions, see Fig. 66 and Table II. We record the sensor data wherein both normal (obeying traffic rules) and abnormal (disobeying traffic rules) driving behaviors are involved. We swap the roles of the interacting vehicles when appropriate. We also repeat the collection runs three times for each scenario.

Vehicle–vehicle interactions
Scenario 1 Low speed merge
Scenario 2 Vehicle cuts in
Scenario 3 Parked vehicle door ajar
Scenario 4 Pass parallel parked vehicle
Scenario 5 Roadside parked vehicle start up
Scenario 6 Inclined parked vehicle start up
Scenario 7 Intersection right turn, other straight
Scenario 8 Intersection left turn, other right turn/straight
Scenario 9 Vehicle entering round-about
Vehicle–pedestrian (P)/bicyclist (B) interactions
Scenario 1 Vehicle driving straight at intersection
Scenario 2 Vehicle right turn at intersection
Scenario 3 Vehicle left turn at intersection
Scenario 4&5 Vehicle follows&passes P/B on road
Scenario 6 Pedestrian yields to vehicle driving on road
Scenario 7 P/B emerges from behind occlusion
Scenario 8&9 Vehicle entering&exiting round-about
: Other vehicle door ajar, no role swap for the recording MKZ.
: For 1–3, 8, 9, P/B uses the crosswalk to cross road.
TABLE II: Designed choreography data collection inside Mcity.

Overall we have collected more than 50 hours of naturalistic driving data covering more than 3,000 miles, and 255 runs for the designed choreography. In total we have roughly 8 TB of ROS files and 3 TB of FFmpeg video.

Iv-B Synchronization

The synchronization of the data mainly consists of temporal and spatial calibration. In the temporal calibration, we synchronize using UTC timestamp. For videos recording, we tweak the FFmpeg software to report the UTC timestamp when each frame is written to the disk. For ROS formatted files, the ROS time is equivalent to the UTC timestamp.

Fig. 7: Camera-Lidar calibration results.
Fig. 8: Data tagging hierarchy.

The spatial calibration mainly includes camera intrinsic/extrinsic parameters calibration, camera-Lidar, camera-Radar, and camera-Ibeo calibrations. For the camera parameters, we adopt open tools in ROS and use chess boards for the calibration. For camera-Lidar and camera-Ibeo, we follow a methodology similar to KITTI, i.e., using the marker boards wherein manual efforts are needed [yuanxin_calib]. As for camera-Radar calibration, we follow [cam_radar_fusion]. See Fig. 7 for an illustration of the camera and front Lidar alignment results.

Iv-C Data Tagging

We tag the data (images) for two purposes: for ease of data query, and to balance between the diversity and laborious labeling in the annotation. We devise four tags for each frame: road type, road (surface) condition, weather condition, and image quality. Associated labels are then assigned into each tag. See Fig. 8 for the tagging hierarchy. More explanation and tag distribution analysis can be found in V.

Iv-D Data Annotations

We divide our open road data over the year into 5 stages (batches). Currently we provide results primarily for the front 60FOV camera. We have annotated more than 17.5k image frames. The annotation class list mainly consists of different objects and traffic signs. Individual files are generated for each frame, illustrating the segmentation boundaries of listed objects/traffic signs. See Fig. 4 for example results.

Fig. 9: Statistics of different labels in each group. From left to right: object, traffic lights, traffic signs, lanes.
Fig. 10: Label density (per image) of the object group.
Fig. 9: Statistics of different labels in each group. From left to right: object, traffic lights, traffic signs, lanes.

Fig. 11: Data tagging statistics. From left to right: Road type, Road surface, Weather condition, Image quality tags.
Fig. 12: Time distribution of the 7 scenarios.
Fig. 13: Human driving commands and ego motion distribution for highway exit (blue) and right lane change (green) scenarios.

V Data Analysis and Use

V-a Image Annotations

We perform diversity analysis for our current image. We divide our annotations into 4 groups, i.e. objects, traffic lights, traffic signs, and lanes. Fig. 10 shows the statistics of different labels in each group, and Fig. 10 shows example results of number of labels in the object group. For the other three annotation groups, we label traffic lights and traffic signs, visually discernible lane lines/markers/road curbs/stop line, and labels for each lane line segments, which is among the most elaborated datasets we are aware of.

Statistics pertaining to the images tagging are shown in Fig. 11. For weather conditions, most of the data were captured in normal or sunny weathers. However, rainy, foggy, and snowing days were also included. Road condition mainly depicts the lighting/friction condition on the road surface. We mainly consider surface deterioration, material change, snow coverage on the road in our tagging. Statistics of image quality and road types tagging are shown in Fig. 11. While many other datasets include data only when the camera works perfectly, poor image quality due to weather or hardware malfunction should be considered. We include both normal, adverse-lighting, lens-condensation, and blur images. Road type describes the shape of collected road/lane lines. See the statistics in Fig. 11. The tagging for such property is also quite elaborated among all open datasets.

V-B Driving Behaviors

Our data includes open naturalistic driving and designed choreography inside the Mcity test facility. For latter the behaviors are illustrated in Table II and Fig. 6-6, the analysis in this section focuses on the open road data. Following [kitti, honda_dataset], we discuss ego motions of the recording vehicle. However, we analyze the driving behaviors separately for different scenarios. Following the previous research efforts for highway entrance/exit in [highway_enter_1, highway_exit], lane change in [LC_2], and intersection interactions in [Intersection_1], we split the recorded data in each run into 7 different scenarios, i.e. left turn, left lane change (LC), ramp entrance, right turn, right LC, highway exit, and lane keeping. We then organize the data following these scenarios. See the distribution plot of the 7 scenarios in Fig. 12. We also mark the total size (number) of each scenario we have recorded in the figure. To our best knowledge, our dataset is the only one that organizes according to driving scenarios.

The results of human commands and vehicle motion for right LC and highway exit scenarios can be seen in Fig. 13. Although both scenarios should use right turning signal, the distributions for the states and commands are distinctively separated. This indicates the need to organize data based on driving scenarios. We are currently annotating the collected data to summarize the perception data.

V-C The Complete Driving Flow

Fig. 14: Obeying traffic rules: a complete trip of unprotected right turn on open roads. This figure depicts the same scene as Fig. 2. The driver (black vehicle) coasts down to the intersection (1-2), brakes to a full stop at the stop line, and turns head to the left to check incoming traffics (3). Once a safe gap is found, the driver turns head back, steers the vehicle (4-6), and accelerates (7-8). Note we draw the vehicle state and driver’s commands in each plot; bar on the left: speed (MPH), right: normalized throttle (up)/brake (down), bottom: steering angle (rad, positive to the right).
Fig. 15: Disobeying traffic rules: a complete trip of an unprotected left turn inside Mcity. The human driver (black vehicle) did not yield to the oncoming vehicle. In (1-3), the driver coasts down to the intersection, turns head, visually checks the incoming traffics and evaluates the safe gap. In (4-7), the driver does not yield, accelerates and turns. In (8), both vehicles accelerate to leave the intersection. In this turning, the gap was small: the oncoming vehicle had to brake to avoid collision. Driver’s commands are: bar on the left: speed (MPH), right: normalized throttle (up)/brake (down), bottom: steering angle (rad, positive right).

Our data also provides the complete flow of human driver’s actions. We show two examples in Fig. 14-15. Fig. 14 depicts an unprotected right turn on open roads. The human driver catches a safe gap to make the turn. In Fig. 15, we illustrate a trip wherein the driver disobeys traffic rules in an unprotected left turn. In both figures, we record and plot the visual perception direction (gray arrow), throttle, brake, and steering commands; the ego vehicle speed is also shown. We believe that in addition to be efficient and safe, being naturalistic is also a desired trait for AV. The complete data capture will be useful for such analysis.

Vi Conclusion and Future Works

This paper presents the ongoing data collection effort at Mcity. Compared to existing datasets, our data is complete with all commonly used sensor types. We collect the data both on open roads naturalistically and inside the Mcity test facility with designed choreography. We perform preliminary analysis on our data, which use tags to indicate different driving scenarios and conditions.


We want to thank many students/engineers at Mcity for the vehicle platform development and their thoughtful suggestions on data analysis. We also thank Seres for providing the vehicle and funding the project, and Might AI for image labeling.