In recent years, various companies and organizations have been working on one of the most challenging problem of transportation - a fully autonomous, self-driving vehicle. In order to navigate autonomously and make decisions, vehicles need to make sense of their environment. Many research groups have relied on camera based solutions on account of their high color resolution and inexpensive nature. On the other hand, 3D range sensors like light detection and ranging (LIDAR) scanners have become more appealing and feasible with the advances in consumer-grade technology. Furthermore, each sensor has its own pros and cons. For a vehicle to handle a variety of operating conditions, a more robust solution involving fusion of multiple sensors may be required. Systems today use a combination of 3D scanners, high resolution cameras and GPS/INS, to enable autonomy. No matter what sensors they use, autonomous vehicles will have to navigate in dynamic urban environments and successfully handle a variety of scenarios and operating conditions. These vehicles will have to negotiate with other autonomous and non-autonomous vehicles out on the road, thus opening up research avenues in Multi-Agent Autonomous Systems. This Multi-AV Seasonal dataset can provide a basis to enhance state-of-the-art robotics algorithms related to multi-agent autonomous systems and make them more robust to seasonal and urban variations. We hope that this dataset will be very useful to the Robotics and Artificial Intelligence community and will provide new research opportunities in collaborative autonomous driving.
2 Related Work
Within the past decade, there has been a significant advancement in autonomy. A lot of this development has been data centric for which the researchers rely on publicly available datasets. Pandey et al. (2011) retrofitted a Ford F-250 pickup truck with HDL-64E lidar, Point grey Ladybug3 omnidirectional camera, Riegl LMS-Q120 lidar, Applanix POS LV and Xsens MTi-G IMU to release one of the first few publicly available datasets. This dataset includes small and large loop closure events, feature rich downtown and parking lots which makes it useful for computer vision and simultaneous localization and mapping algorithms.
The KITTI dataset (Geiger et al., 2013) is another benchmark dataset collected from an autonomous vehicle platform. The sensor data provided aims at progressive research in the field of optical flow, visual odometry, 3D object detection and tracking. Along with the raw data, the dataset also includes ground truth and benchmarking metrics (Geiger et al., 2012) which enables evaluating a new algorithm against the state-of-the-art. The University of Oxford’s robotcar dataset (Maddern et al., 2017) is another dataset rich in terms of variety in sensors and seasons. The focus of this dataset is to enable long term localization and mapping. The dataset contains data from stereo cameras, monocular cameras, lidars, GPS/IMU collected over a span of 17 months which includes all weather scenarios and construction scenarios. All of the above mentioned datasets do not provide any map information. The nuScenese dataset (Caesar et al., 2019) contains semantic maps which provide information on roads, sidewalks and crosswalks, to be used as a prior for object detection, tracking and localization. Argoverse (Chang et al., 2019)
, along with vehicle trajectories and 3D object bounding boxes, also contains maps of driveable area and vector maps of lane centerline and their connectivity. ApolloScape(Huang et al., 2018), CityScapes (Cordts et al., 2016) and Mappilary (Neuhold et al., 2017) are other datasets that focus on semantic segmentation using a combination of images and lidar. Such datasets also exists in other robotics domains like the CMU-YCB object and model set for benchmarking in robotic manipulation research (Sun et al., 2018). One of the limitations of the datasets mentioned so far is that they are mostly collected from only one autonomous vehicle.
Here we present a large scale multi-AV dataset augmented with 3D map of the environment. This would provide a significant database for autonomous vehicle research as well as multi-agent research including cooperative localization (Zhang et al., 2016).
The major contributions of this research are:
A Multi-Agent dataset with seasonal variations in weather, lighting, construction and traffic conditions
Full resolution time stamped data from 4 lidars and 7 cameras
GPS data, IMU Pose trajectory and SLAM corrected ground truth pose
High resolution 2D ground plane lidar reflectivity and 3D point cloud maps
All data can be visualized, modified and applied using the open-source Robot Operation System (ROS) (Quigley et al., 2009)
The Ford Multi-AV Seasonal dataset consists of diverse scenarios that include:
Seasonal/ Weather variation - sunny, cloudy, fall, snow
Traffic conditions - construction, oversized vehicles, pedestrians, congestion, under-pass, bridges, tunnels
Driving environments - highway, university campus, residential areas, airport pick-up/drop-off
3 Hardware Platform
We used a fleet of 2014 Ford Fusion Hybrids as the base platform. All the sensors were strategically placed on the vehicle as shown in 4. The trunk of the car was used to install four Quad -core i7 processors with 16 GB Ram, networking devices and a cooling mechanism. All the post processing was done on a Dell Precision 7710 laptop. The vehicle was integrated with the following sensors:
3.1 Velodyne HDL-32E Lidars
Each vehicle is equipped with a set of 4 Velodyne HDL-32E lidars (Velodyne, 2011) that are mounted on the roof. This lidar consist of 32 laser diodes providing an effective 40 vertical field of view. The entire unit can spin about its vertical axis to provide a full 360 azimuthal field of view. We captured our data set with the laser spinning at 10 Hz. The combined point cloud provides a 360 degree coverage around the car. We eliminate the points that hit the vehicle by designing a custom mask for each lidar.
3.2 Flea3 GigE Point Grey Cameras
Our fleet is equipped with 7 Flea3 GigE Point Grey Cameras (FLIR, 2017). These are CCD color cameras with 12 bit ADC and Global shutter. There are 2 stereo pairs of 1.3 MP cameras - Front Left-Right pair and a Rear Left-Right pair mounted on the roof of each vehicle. In addition to that, there are two 1.3 MP cameras on the sides and one 5 MP center dash camera mounted near the rear-view mirror. A combination of these 7 cameras provide an excellent field of view (80 for each 1.3 MP and 40 for the 5 MP). We captured images at 15 Hz for the 1.3 MP stereo pairs and side cameras, and 6 Hz for the center 5 MP camera.
3.3 Applanix POS LV
Applanix (Applanix, 2016) is a professional-grade, compact, fully integrated, turnkey position and orientation system combining a differential GPS, an inertial measurement unit (IMU) rated with 1
of drift per hour, and a 1024-count wheel encoder to measure the relative position, orientation, velocity, angular rate and acceleration estimates of the vehicle. In our data set we provide the 6-DOF pose estimates obtained by integrating the acceleration and velocity.
4 Sensor Calibration
All the sensors on each vehicle are fixed and calibrated with respect to the origin of the vehicle i.e. the center of the rear axle. We provide intrinsic and extrinsic rigid-body transformation for each sensor with respect to this origin, also called as the body frame of the vehicle. The transformation is represented by the 6-DOF pose of a sensor coordinate frame where XAB = (Smith et al., 1988) denotes the 6-DOF pose of frame A (child) to frame B (parent). We follow the same procedure for each respective sensor on all the vehicles in the fleet. The calibration procedures are summarized below.
4.1 Applanix to Body Frame
The Applanix is installed close to the origin of the body frame, which is defined as the center of the rear axle of the vehicle. A research grade coordinate measuring machine (CMM) was used to precisely obtain the position of the applanix with respect to the body frame. Typical precision of a CMM is of the order of micrometers, thus for all practical purposes we assumed that the relative position obtained from CMM are true values without any error. We provide the 6 DOF position and orientation of the applanix relative to the body frame for each vehicle given by Xab = .
4.2 Lidar Calibration
The dataset includes reflectivity calibration information for each of the HDL-32E lidars for each vehicle (Levinson and Thrun, 2014) and beam offsets as provided by Velodyne (Velodyne, 2011). Each laser beam of the lidar has variable reflectivity mapping. As a result, we also provide reflectivity calibration files for each lidar.
To start with an estimate or lidar positions in 6DOF, we used CMM to precisely obtain the position of some known reference points on the car with respect to the body frame, The measured CMM points are denoted Xbp. We then manually measured the position of each lidar from one of these reference points to get Xpl. The relative transformation of the lidar with respect to the body frame is thus obtained by compounding the two transformations (Smith et al., 1988). This estimate is used to initialize the Generalized Iterative Closest Point (ICP) Algorithm (Segal et al., 2009), which matches the lidar scans from each lidar against other lidars to obtain a centimeter level accurate transformation from lidar to body given by Xlb = for each lidar.
4.3 Camera Calibration
calibration was performed on each of the cameras using the method described in AprilCal (Richardson et al., 2013). This is an interactive suggestion based calibrator that performs real-time detection on feducial markers. All images in the dataset are undistorted using the camera intrinsic matrix and distortion coefficients as shown below:
where fx and fy are the focal lengths, x0 and y0 are the principal point offsets and D is the set of distortion coefficients . We provide ROS format yaml files with the camera , rotation and projection matrices.
calibration was performed to find the relative position of each camera with respect to the body frame of car. We use the method described in Pandey et al. (2012). This is a mutual information based algorithm that provides a relative transformation from the camera to the lidar (Xcl). We use lidar extrinsics (Xlb) to finally compute the position of the camera relative to the body frame (Xcb).
All data was collected by a fleet of Ford fusion vehicles that were outfitted with a Applanix POS-LV inertial measurement unit (IMU), four HDL-32 Velodyne 3D-lidar scanners, 2 Point Grey 1.3 MP stereo camera pairs, 2 Point Grey 1.3 MP side cameras and 1 Point Grey 5 MP dash camera. The vehicles traversed an average route of 66 km in Michigan that included a mix of driving scenarios such as the Detroit Airport, freeways, city-centers, university campus and suburban neighbourhoods, etc. A sample trajectory of one vehicle in one of the runs is shown in 6.
This multi-agent autonomous vehicle data presents the seasonal variation in weather, lighting, construction and traffic conditions experienced in dynamic urban environments. The dataset can help design robust algorithms for autonomous vehicles and multi-agent systems. Each log in the dataset is time-stamped and contains raw data from all the sensors, calibration values, pose trajectory, ground truth pose and 3D Maps. All data is available in Rosbag format that can be visualized, modified and applied using ROS. We also provide the output of state-of-the-art reflectivity-based localization (Levinson et al., 2007; Levinson and Thrun, 2010) with cm level accuracy for benchmarking purposes. Each parent folder in the Data Directory represents a drive that is marked by the date. Each drive has sub-directories corresponding to the vehicles used and the corresponding 3D Maps. Each vehicle sub-directory contains a Rosbag with all the data associated with that vehicle in that drive. We also provide scripts and commands to convert all this data to human readable format. The dataset is freely available to download at avdata.ford.com, a collaboration with the AWS Public Dataset Program (AWS Open Data, 2018).
Each rosbag contains the standard ROS messages as described in Table 1
|Source||Type||ROS Message||ROS Topic||Max Frequency (Hz)|
|Red Lidar||3D Scan||velodyne_msgs/VelodyneScan||/lidar_red_scan||10|
|Green Lidar||3D Scan||velodyne_msgs/VelodyneScan||/lidar_green_scan||10|
|Blue Lidar||3D Scan||velodyne_msgs/VelodyneScan||/lidar_blue_scan||10|
|Yellow Lidar||3D Scan||velodyne_msgs/VelodyneScan||/lidar_yellow_scan||10|
|Front-Left Camera||1.3 MP Image Thumbnail||sensor_msgs/Image||/image_front_left||15|
|GPS Time||GPS Time||sensor_msgs/TimeReference||/gps_time||200|
|Raw Pose||3D Pose||geometry_msgs/PoseStamped||/pose_raw||200|
|Localized Pose||3D Pose||geometry_msgs/PoseStamped||/pose_localized||20|
|Ground Truth Pose||3D Pose||geometry_msgs/PoseStamped||/pose_ground_truth||200|
|Raw Linear Velocity||3D Velocity||geometry_msgs/Vector3Stamped||/velocity_raw||200|
5.1 3D Lidars Scans
Each vehicle is equipped with 4 HDL-32E lidars and their data is provided as standard VelodyneScan ROS messages. We designate these lidars as - Yellow, Red, Blue and Green going left to right when seen from the top.
5.2 Camera Images
Each rosbag contains images from all 7 cameras from the vehicle. The two front and the two rear cameras are 1.3 MP stereo pairs operating at a maximum rate of 15 Hz. The two 1.3 MP side cameras also work at 15 Hz. The front dash camera produces 5 MP images at a maximum rate of 6 Hz. All images have been rectified using the intrinsic parameters and stored as png images. All 1.3 MP cameras are triggered at the same time such that there is a image corresponding to each camera with the same timestamp. This dataset used an automated tool (Understand AI, 2018) to blur vehicle licence plates and people’s faces from all camera images.
The IMU data consists of linear acceleration and angular velocity in and . These values represent the rate of change of the body frame. The IMU frame is oriented exactly like the body frame.
The GPS data provides the latitude, longitude and altitude of the body frame with respect to the WGS84 frame. The data is published at 200 Hz but can be sub-sampled to simulate a cheaper GPS. We also provide the GPS time as seconds from top of the week. This helps in syncing logs from multiple vehicles driven at the same time.
5.5 Global 3D Maps
Each drive in the dataset is accompanied by two types of global 3D maps - ground plane reflectivity and a map of 3D point cloud of non-road points as shown in Figure 14. These maps are provided in open source PCD format (Point Cloud Library, 2010). Estimation of the global prior-map modeling the environment uses the standard maximum a posteriori (MAP) estimate given the position measurements from odometry and various sensors (Durrant-Whyte and Bailey, 2006a, b). We use pose-graph SLAM with lidars to produce a 3D map of the environment. A pose-graph is created with odometry, 3D lidar scan matching and GPS constraints. Here, the lidar constraints are imposed via generalized iterative closest point (GICP)(Segal et al., 2009). To minimize the least squares problem we use the implementation of incremental smoothing and mapping (iSAM)(Kaess et al., 2008) in an offline fashion. From the optimized pose-graph, we create a dense ground plane and a full 3D point cloud which includes buildings, vegetation, road signs etc. by accumulating points in corresponding cells of a 3D grid. Since the range of the lidar reflectivity is between 0 and 100, we scale the values linearly to 0-255 to cover the range, where 0 represents no data. This is also reflected in the the local relfectivity map.
Our localization framework is based on using measurements from 3D lidar scanners to create a 2D grid local map of reflectivity around the vehicle. Localization is run online by matching the local grid map with the prior map along the x-y plane, also called image registration. The cell values within this local grid which we chose to be of size m. (with a 10 cm cell resolution) represent the property of the world same as that stored in global maps. The local grid map is computed and updated online from the accumulated lidar points which are motion compensated using inertial measurements.
The planar GPS constraint of zero height in the optimization of the global prior-map described in section 5.5 simplifies the registration problem to a 3-DOF search over the , and vehicle pose. Here, we propose to maximize the normalized mutual information (NMI)(Studholme et al., 1999) between reflectivity of candidate patches from the prior map and the local grid-map (Wolcott and Eustice, 2014):
where , and
represent our search space in all 3 degrees of freedom. The map matching is agnostic to the filter we use to fuse data from various sensors. We use an EKF filter to mainly localize the vehicle in 3 DOF with state vector= . In addition, the correction in direction is accomplished using prior map lookup after the image registration corrections are obtained in and . We use the standard EKF Predict and Update equations where represents the state transition matrix obtained from Applanix GNSS solution, represents the linearized measurement model, and represent the measurement and the uncertainty obtained from the image registration process respectively and represents the Kalman gain. We show the results of this localization technique for a sample log measured against the ground truth in Figure 10. The localization error is well within the bounds of requirements for autonomous vehicles as defined by Reid et al. (2019)
Sample output from visualization and development tools. (a) RViz visualization of the map and sensor data. (b) Reflectivity map and 3D pointcloud map. (c) Multi lidar live data visualization
5.7 Ground Truth Pose
An important contribution of this work is that we provide ground truth pose for all our logs as standard ROS Pose messages. This can help researchers design their own localization algorithm and calculate the 6 Degrees of Freedom errors against the ground truth. The ground truth pose is generated using full bundle adjustment; this method has been popularly used in the community (Castorena and Agarwal, 2017; Wolcott et al., 2017).
6 Software and Tools
For easy visualization and development, we provide a set of ROS packages on our Github repository. These have been tested on Ubuntu 16.04, 32GB RAM and ROS Kinetic Kame. The sample output of the easy to use tools is shown in Figure 14. We also provide scripts and commands to convert all the data to human readable format.
This package contains the roslaunch files, rviz plugins and extrinsic calibration scripts. The demo.launch file loads the maps, rviz, and the vehicle model. In addition, it also loads the sensor to body extrinsic calibration files from the specified folder. Usage: roslaunch ford_demo demo.launch map_dir:=/path/to/your/map calibration_dir:=/path/to/your/calibration/folder/
This package contains the Ford fusion URDF for visualization in Rviz. The physical parameters mentioned in the URDF are just for representation and visualization and do not represent the actual properties of a Ford Fusion vehicle.
Map loader package loads the ground plane reflectivity and 3D pointcloud maps as ROS PointCloud2 messages. The package subscribes to vehicle pose to decide what section of the map to display. Various dynamic parameters include publish_rate, pcd_topic, pose_topic, neighbor_dist. By default, the reflectivity map publishes on the /reflectivity_map topic and the 3D pointcloud is published on the /pointcloud_map topic. Usage: roslaunch map_loader reflectivity_map_demo.launch map_folder:=/path/to/your/map
This dataset presents the seasonal variation in weather and lighting conditions experienced throughout the year in urban environments such as the Detroit Metro Area. As shown in Figure 19, the same scene can look very different depending on the weather conditions. Any self-driving vehicle platform should be able to operate safely throughout the year. This variation in the dataset can help researchers design better algorithms that are robust to such weather and lightning changes. Besides, this dataset also captures different traffic conditions like construction zones, under-pass, tunnels, airport, residential areas, highways and country side as shown in Figure 28. Most importantly, we have used multiple autonomous vehicle platforms collecting this data simultaneously, that will help in opening new research avenues in the area of collaborative autonomous driving.
We present a multi-agent time-synchronized perception (camera/lidar) and navigational (GPS/INS/IMU) data from a fleet of autonomous vehicle platforms travelling through a variety of scenarios over the period of one year. This dataset also includes 3D point cloud and ground reflectivity map of the environment along with ground truth pose of the host vehicle obtained from an offline SLAM algorithm. We also provide ROS based tools for visualization and easy data manipulation for scientific research. We believe that this dataset will be very useful to the research community working on various aspects of autonomous navigation of vehicles. This is first-of-a-kind dataset containing data from multiple vehicles driving through an environment simultaneously, therefore, it will open new research opportunities in collaborative autonomous driving.
This data set is the outcome of a joint effort, starting with the foresight and guidance provided by our Senior Technical Leader Dr. Jim McBride and our manager Tony Lockwood. This work was made possible by the diligence and persistence of the Ford / AV LLC team members. Preparing the vehicles, maintaining the code and sensor calibration, and taking a half day on every run to collect the data reflects the values of this team and the desire to make a lasting contribution to the field. During this time, starting in June 2017 and extending to July 2018 developers were also qualified safety drivers and test engineers, so these data drives represented a significant investment of time for colleagues also delivering on immediate team objectives. We wish to thank those without whom this dataset could not be made available to the community: Peng-yu Chen, Thaddeus Townsend, Jakob Hoellerbauer, Thomas Iverson, Sharath Nair, Kevin Walker, Matt Warner, Matt Wilmes and Lu Xu. We thank Rob Lupa and his team at Quantum Signal AI for designing the 3D Ford Fusion model released with this dataset. We would also like to thank Daniel Pierce and his communications team at Ford AV LLC for helping out with the public facing website and information articles.
- External Links: Cited by: §3.3.
- External Links: Cited by: §5.
- NuScenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027. Cited by: §2.
- Ground-edge-based lidar localization without a reflectivity calibration for autonomous driving. IEEE Robotics and Automation Letters 3 (1), pp. 344–351. Cited by: §5.7.
Argoverse: 3d tracking and forecasting with rich maps.
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
- Simultaneous localization and mapping: part i. IEEE Robotics and Automation Magazine 13 (2), pp. 99–110. Cited by: §5.5.
- Simultaneous localization and mapping: part ii. IEEE Robotics and Automation Magazine 13 (3), pp. 108–117. Cited by: §5.5.
- External Links: Cited by: §3.2.
- Vision meets robotics: the kitti dataset. International Journal of Robotics Research (IJRR). Cited by: §2.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
- The apolloscape open dataset for autonomous driving and its application. External Links: Cited by: §2.
- ISAM: incremental smoothing and mapping. IEEE Transactions on Robotics 24 (6), pp. 1365–1378. Cited by: §5.5.
- Map-based precision vehicle localization in urban environments. In Robotics Science and Systems, Cited by: item (v), §5.
- Robust vehicle localization in urban environments using probabilistic maps. IEEE International Conference on Robotics and Automation, pp. 4372–4378. Cited by: item (v), §5.
- Unsupervised calibration for multi-beam lasers. In Experimental Robotics, pp. 179–193. Cited by: §4.2.1.
- 1 Year, 1000km: The Oxford RobotCar Dataset. The International Journal of Robotics Research (IJRR) 36 (1), pp. 3–15. External Links: Cited by: §2.
- The mapillary vistas dataset for semantic understanding of street scenes. In International Conference on Computer Vision (ICCV), External Links: Cited by: §2.
- Ford campus vision and lidar data set. IJRR International Journal of Robotics Research. Cited by: §2.
- Automatic targetless extrinsic calibration of a 3d lidar and camera by maximizing mutual information. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, AAAI’12, pp. 2053–2059. External Links: Cited by: §4.3.2.
- External Links: Cited by: §5.5.
- ROS: an open-source robot operating system. Vol. 3, pp. . Cited by: item (vi).
- Localization requirements for autonomous vehicles. SAE International Journal of Connected and Automated Vehicles 2 (3). External Links: Cited by: §5.6.
- AprilCal: assisted and repeatable camera calibration. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol. , pp. 1814–1821. External Links: Cited by: §4.3.1.
- Generalized-icp.. In Robotics: science and systems, Vol. 2, pp. 435. Cited by: §4.2.2, §5.5.
- A stochastic map for uncertain spatial relationships. In Proceedings of the 4th International Symposium on Robotics Research, Cambridge, MA, USA, pp. 467–474. External Links: Cited by: §4.2.2, §4.
- An overlap invariant entropy measure of 3d medical image alignment. Pattern Recognition 32 (1), pp. 71–86. Cited by: §5.6.
- Pix3d: dataset and methods for single-image 3d shape modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2974–2983. Cited by: §2.
- External Links: Cited by: §5.2.
- External Links: Cited by: §3.1, §4.2.1.
- Robust lidar localization using multiresolution gaussian mixture maps for autonomous driving. The Int. Journal of Robotics Research 36 (3), pp. 292–319. Cited by: §5.7.
- Visual localization within lidar maps for automated urban driving. In Intelligent Robots and Systems (IROS 2014), 2014 IEEE/RSJ International Conference on, pp. 176–183. Cited by: §5.6.
- Cooperative vehicle-infrastructure localization based on the symmetric measurement equation filter. Geoinformatica 20 (2), pp. 159–178. Cited by: §2.