The autonomous driving has long been seen as one of the ultimate solutions to transportation problems like the traffic jam and traffic accidents[15, 14], and it is believed to be able to reform the way of traveling in our society[20, 9, 3, 14]. In the past decade, the well-known DARPA grand challenge had proved its feasibility and demonstrated the technical frameworks for autonomous driving[15, 19, 2]. Later led by universities as well as their industrial counterparts, autonomous driving researchers have witnessed a dramatic growth[1, 22, 8, 5, 17, 6]. The latest prototypes, such as Waymo, have already shown their capability of driving more safely than human beings[20, 9]. Nevertheless, in general, the autonomous driving technique is still in its early stages, especially when facing complex urban scenes where human drivers could easily interpret the traffic and act accordingly based on him/her experiences.
Sponsored by NSFC, China’s similar event of DARPA urban challenge, the Intelligent Vehicle Future Challenge (IVFC) began from 2009. In the last eight years, more than thirty universities, as well as companies, participated in this annual challenge, which is now recognized as the most influential event of the research and development of the autonomous driving in China.
As a newcomer of IVFC, Tongji Intelligent Electric Vehicle (TiEV) project111cs1.tongji.edu.cn/tiev funded by the Tongji University was started in 2015. A driverless prototype TiEV is built based on a modified electric vehicle (Fig. 1). It is equipped with vision sensors as well as laser scanners and an integrated localization system. The computer systems are fused of two x86 IPCs and one embedded system.
Most of the software of TiEV are devised from scratch in C++ based on flexible cross-platform ZeroCM/LCM middleware222https://github.com/ZeroCM/zcm and does not rely on off-the-shelf implementations, such as ROS and Autoware333https://github.com/CPFL/Autoware. Moreover, the TiEV software will also be made open-source in the future. TiEV proposed novel modules of probabilistic perception fusion (Sec. III), large-scale mapping and updating (Sec. IV), the 1st and the 2nd planning (Sec. V), which will be detailed in the following sections. The overall safety is of great importance in our systematic design (Sec. VI), which guarantees collision-free even if the planning module made the wrong decision. TiEV participated in 2016 and 2017 IVFC and successfully managed to pass most of the tasks including simulated traffic, tunnels and blockages without human intervene.
The main contributions of this paper are listed as follow:
We introduced the architectural design and innovational algorithms of our autonomous driving prototype TiEV.
We shared the experiences and our views on the competitions of IVFC of China and future development of autonomous driving vehicles.
Ii Tiev the Architecture
Ii-a The vehicle
TiEV is modified from Rowe E50 of SAIC motors (Fig. 1). The EPS and the motor can be controlled-by-wire through the CAN. We install an electrohydraulic brake system (EHB) developed by the Institute of Intelligent Vehicles of Tongji University to enable the control-by-wire of the braking system.
Fig. 2 presents the overall architecture of the hardware system of TiEV. We equipped seven vision sensors. Two of the three forward-looking cameras compose a stereo vision system, and the other is connected to the embedded system for the detection task. The four fish-eye cameras are calibrated to provide a top-down panorama view of about 10 meters by 10 meters (Fig. 7). The Velodyne HDL64 scanner is responsible for the segmentation of drivable area and the detection and tracking of moving obstacles such as pedestrians and vehicles. An IBEO Lux4 and a Sick LMS511 scanner are installed to complement the blind-area of HDL64. Fig. 3 illustrates the spatial coverage of TiEV’s sensors.
All the above sensors are calibrated interactively and transformed to the defined vehicle coordinate frame centered at the front axis, which is discovered to be helpful for obstacle avoidance. A high precision DGPS+IMU system integrated by Novatel simpak6 GPS receiver and Oxts RT2000 IMU provides accurate localization information of about 10 centimeters in the outdoor environment. We devise an EKF-based fusion method to integrate the vehicle kinematics and the IMU to keep the vehicle on track in GPS denied area such as in a tunnel.
Two of the three computers installed on TiEV are Advantech IPCs, and the other is an embedded system based on NVIDIA Jetson TX2. TX2 is built with a Pascal GPU and the CAN communication capability. As a result, a camera and the CAN bus are linked to TX2, on which the deep learning module and CAN actuation module are implemented.
Ii-B Modules overview
The software modules of TiEV are highly distributed, and communications between modules are decentralized. Many similar systems have adopted this flexible and robust structure. We employed the ZeroCM middleware, which is lightweight, multifunctional and supports cross-platform. The exact synchronization between modules is not required in our system. A spatiotemporal stamp is introduced for the fusion of asynchronous information. As a result, each module processes in its operation cycle, which is constrained by the upstream and downstream modules.
Fig. 4 presents the software modules and communications between modules. Different colors indicate the different computers on which modules are implemented, i.e., orange indicates IPC I, gray indicates IPC II, and green indicates Jetson. This configuration is rooted from the considerations of the bandwidths and interface types of different sensors. Messages are categorized into four classes, which are grid-based map, tracked objects, detected signal and generated trajectory. The message of the spatiotemporal stamp is received by all modules. The following sections will describe the featured modules in our system.
Iii Perception Under Uncertainty
Perception is the basis for autonomous systems. Multiple sensor readings should be processed and fused into a unified representation for decision-making. TiEV adopted a 2D grid-based representation for obstacles located within the decision region (80 meters by 30 meters and with the grid resolution of 0.2 meters). However, sensor readings can be noisy thus we should fuse them in a probabilistic form. In this section, we will first address the modules for laser scanners and vision sensors. Then the fusion methods will be explained.
Iii-a Laser perception
Iii-A1 Multibeam Laser
This module processes the 3D point clouds send from HDL64. We implement obstacle segmentation, classification, and tracking consecutively.
The segmentation is conducted in two steps: Firstly, the average of lowest values of points in an upsampled grid cell444We adopt 4 or 5 times upsampling, which is about 1 meter resolution. is counted as . All the points in the upsampled grid cell that are higher than
by a certain amount are classified as obstacle points. This step results in a coarse segmentation of obstacles, which can better preserve flat surfaces such as roofs of cars than directly using the finer grid. In the next step, we traverse each of the original grid cells. The cell which contains obstacle points located within the vertical span of the vehicle is marked as an obstacle. Therefore influence from tree branches can be eliminated.
In the meanwhile, we classify each of the obstacle clusters based on a multiboost classifier proposed in 
, which results in cars, bicycles, and pedestrians. Kalman filters are then created for each object for filtering and predicting their movements in adjacent frames (Fig.5). The detection and tracking processes are further constrained by the Forward vision module.
Iii-A2 Sick and IBEO
These sensors are mainly used as complements to HDL64, especially in the near and far front of the vehicle. We compensate the extrinsic parameters of both sensors based on readings from IMU and project the points onto the grid map. Afterward, all the grid maps generated from laser perception modules will be fused in the Fusion module (Sec. III-C).
Iii-B Visual perception
Iii-B1 Forward vision
We obtain rich visual information from three Basler Ace cameras mounted behind the windshield and above the rear-view mirror of the vehicle. Two of them compose a stereo rig for coarse depth vision (Fig. 6 a)) and the other is used for detection. We trained a model based on YOLO2 to detect cars, pedestrians, and traffic signs . The results are shown in Fig. 6 b). 2D boxes of cars and pedestrians are further mapped to 3D point cloud in Multibeam Laser module by inverse perspective projection based on the camera-laser calibration[unnikrishnan2005fast]. They can provide semantic references for object detection and tracking in the Multibeam Laser module.
Iii-B2 Surround vision
Iii-C Synchronization and Fusion
Notwithstanding our modules do not require hard synchronization as mentioned in Sec. II-B, multiple perceptual modules observe the surroundings in different frequencies. Their observation results should be aligned and fused.
Pure timestamps based systems require a universal extrapolator to interpolate poses. Alternatively, we explicitly attached a spatiotemporal stamppublished by the localization module to messages in each sensory module once a measurement is made. The pose information will then be directly used in the fusion stage for transforming the observations to the current pose. The overall mismatch is about several centimeters, which is due to the millisecond-level time delay in receiving the stamped message and is satisfied with driving even at relatively high speed. The stamp also provides a temporal constraint for the fusion, which ensures the latest observation is adopted.
Lacking context information as multibeam laser scanners offered, false alarms and noises generated during severe maneuvers of the vehicle, such as an emerge brake, cannot be easily removed for ”sparse-beam” laser scanners, e.g., 1, 4 or 8 beams. Therefore, a probabilistic fusion method of multiple sensors is employed in our system. The fusion is performed in the overlapping region of different laser measurements defined by the intersections of their field of views in plane. A 2D virtual scan is firstly generated from Multi-beam Laser module to be aligned with IBEO and Sick
. The occupancy-based representation is then adopted for the fusion of obstacles, in which the Odds of a cell is derived by:
where is the state of being obstacle in one grid cell and is its complementary. indicate measurements of sensors , represented by maps from each sensory module.
We choose a belief threshold of 0.75 to conservatively extract the maximum likelihood map (Fig. 8 a)). This fused map represents the drivable area closed to the vehicle, covering the blind-area of HDL64. Influences of noises, as well as false alarms, generated from SICK or IBEO Lux sensors, are drastically abbreviated.
Iv Large-Scale Mapping and Updating
Planning demands the historical map of the static driving environment e.g., when the view of sensors are occluded. However, the existing methods for generating HD driving map are usually labor intensive and expensive. Based on the local probabilistic fusion, we extend it to mapping the historical map of the whole driving environment fully automatically.
The state-of-the-art SLAM system, e.g., Cartographer , stored all the local maps and scans in memory, which can be burdensome when used in large-scale mapping. Alternately, we take the usage of R-tree to index all the locally fused maps. Only the visible portion of the map will be kept in the memory while others are streamed out. In this way, we could map almost an area of arbitrary size. Fig. 9 shown a 12 km long path mapped using our method in 30 minutes. The memory print is bounded continuously by around 150 MB. Moreover, the time efficiency of maintaining and searching are guaranteed by R-tree.
In practice, the mapping of a large area could not be realized in one go, because of the limits of the battery life or the traffic conditions during mapping. Therefore, we implemented an incremental mapping strategy thanks to the streaming design. When one mapping process finished, all the local maps are streamed onto disk. In the next run, our mapping module could load the surrounding previously built maps and restore the mapping process. This mechanism brings the extra benefit of automatically map updating. We allow the overlapping between local maps, both spatially and temporally. When retrieving the visible maps during driving, we fused the surrounding overlapping local maps based on a weighted averaging strategy. The probability of a fused grid cell is derived from:
where is the weighting for the local map, which is given according to the date of acquisition. Fig. 10 shows a local map with a car parking on the roadside a), which is updated by the second round mapping in another day b).
V The 1St and 2Nd Planning
We termed the path planning as the 1st planning and the trajectory planning as the 2nd planning because of their temporal relation. The 1st planning is only triggered when the current path cannot be continued, such as the road is blocked. The 2nd planning operates in real-time when the vehicle is moving according to the path. We implement the less-frequent 1st planning in an open source spatial database system and proposed a novel unified 2nd planning for both the structured and unstructured environment.
V-a Path planning - The 1st planning
The 1st planning is based on HD maps captured by using our vehicle and edited using QGIS555https://www.qgis.org/. All the lanes and intersections are sampled and topologically connected to form a lane-based road network. The road network also records path-related information such as the speed limits, the right of lane-change. We choose to manage the HD map with a spatial database rather than map files because such a map is mostly constant and infrequently updated. And the most important, such an HD map should be able to be accessed by multiple users simultaneous.
We adopt the open-source spatial database PostGIS666http://www.postgis.net/, and use pgRouting777http://pgrouting.org/ to perform the shortest path finding (Fig. 11). This database-based implementation has a high performance and provides multiple accesses from multiple autonomous vehicles if the database is accessed remotely on a server.
V-B Trajectory planning - The 2nd planning
TiEV introduces a unified planning module for both the structured and unstructured driving environment. An enhanced real-time A* algorithm is proposed to find the optimal path on the grid-based representation. We model the lanes, the static and dynamic obstacles, the parking space and the path from the 1st planning as weightings according to unified weighting police based on the breadth-first search (Fig. 12
middle). Optimizations, including simplifying the collision detection, discretization of angle states and pre-calculating the kinematics-aware heuristics, are proposed to bound the time expense of planning to within 20 to 80 milliseconds.
As a result, our unified planner could greatly simplify the finite state machine in our decision-making engine. Moreover, it offers more flexible and intelligent planning than conventional trajectory generation methods while TiEV is running on complicated urban roads.
Vi The Safety Concern
Safety design is the highest priority of TiEV system. We regard the safety of all traffic participants on the road as well as the vehicle itself. The implementations will be described in car behavior and system design respectively.
Vi-a Safety concern of TiEV’s behavior
One of the primary goals of autonomous driving is to minimize the rate of traffic accidents. To our understanding, the top safety guarantee of an autonomous vehicle is to prevent active collision of the vehicle to any traffic participants, including pedestrians, cyclists, other vehicles, and infrastructures. This fatal behavior is tightly constrained in TiEV by the introduction of a dual ACC/AEB implementation.
The ACC/AEB function is implemented within two modules. The 2nd planning module calculates a safe speed in real time based on the distance to obstacles on the referenced path according to the following equation :
where is the highest permitted speed, is the distance between ACC/AEB target and current vehicle, and are model parameters.
In the meantime, an independent ACC/AEB module is introduced and acts as a shadow planner, which bypasses the main planning module and directly communicates to the actuator module (Fig. 4
). This module receives both the observed obstacle maps from each of the sensory modules, as well as the historical maps sent by the perception fusion module. The runtime steering angle and velocity are integrated to estimate a future trajectory according to current control states (Fig.13). A safe speed is then calculated according to the previous equation.
The target speed for actuator will then be restricted by both the speed limits extracted from the map and the above two safe speeds. We run the ACC/AEB module on two different computers to increase the redundancy. So we seldom collide with any static or moving objects during our experiments.
Vi-B Safety concern of TiEV’s system
As an autonomous car will eventually carry families, driving on real roads every day, any system faults cannot be tolerated, especially those related to the core functionality. Although TiEV is designed as an experimental prototype, we make the design to fulfill the systematic safety requirements.
The daemon modules that listen to heartbeats of all other modules are implemented redundantly. On a local computer, a daemon module tries to restart local modules that were are no longer sending out heartbeats or sending out heartbeats with frozen spatiotemporal stamps. A forked child thread of itself can also monitor the daemon module. Remotely, the heart beats of daemon modules are also monitored by each other from separate computers. Once a daemon module is judged as a failure, it means either the computer or the networking service of that computer has failed. In this case, TiEV will try to stop the vehicle immediately. To guarantee the robust communication between computers and the vehicle, we also introduce redundant CAN communication interfaces. The default CAN control messages are sent by modules on the TX2 embedded system. However, another CAN interface is installed on one of the IPCs as the backup. Both computers could monitor and send CAN messages independently.
In practice, TiEV system has an extremely low probability of failure. One can even introduce redundancy to computers to further decrease the risks.
Vii Experiences Gained From the Future Challenge
The on-road competition of the IVFC is composed of two events, i.e., the express road competition (around 12km with more than 10 tasks) and the urban road competition (around 3km with more than 20 tasks) (Fig. 14). The exact task points of both events are not released until 30 minutes before the competition. As a result, the participants have to build their roadmap in advance. In the competitions, vehicles have to recognize various situations on the road, e.g., signals, blockages, tunnels, pedestrians, other vehicles and behave appropriately. The final score will be given based on evaluations of the task achievements, the traffic violations, and the time costs. TiEV ranked the 11th (out of 25) and the 13th (out of 28) in the IVFC 2016 and 2017 respectively.
Our main lessons gained can be laid in three-folds. The first is the lack of comprehensive perception ability. TiEV could recognize traffic signals, three kinds of traffic participants as stated in Sec. III, but it treats others only as obstacles. This cause problem when coming across specific scenes containing barriers made by e.g., reflective triangles or cones, which warn the driver of blockages located ahead. A human driver would interpret the scene based on their meanings rather than regarding them as generic obstacles. As a result, our detector should be enhanced by learning an enriched set of traffic signs and objects.
Secondly, the tight coupling of autonomous driving and high precise lane-based map can be problematic if the map is erroneous because of, e.g., simply out of date. The using of the lane-based map also demands highly precise localization. Our planning method is designed not to follow the lane-based path precisely. TiEV treats the path as one of the references as the detected lanes and obstacles, and decide the best planning goal for the 2nd planning. Nevertheless, a coarse path shifted meters from the correct position would still cause the problem, especially when the planning goal cannot be decided wisely. In contrary, human drivers could drive according to inaccurate maps or even with only directional instructions.
Finally, the optimal searching nature of the 2nd planning, which keeps on trying to find the best trajectory will abruptly turn the wheel when the current trajectory is deformed. This behavior results in an agile but uncomfortable and riskful riding experience. To abbreviate this effect, we introduce a piecewise planning strategy that keeps a local window of trajectory constant and concatenates the dynamic trajectory smoothly at the point beyond the look-ahead region. In practice, this method successively stabilizes the vehicle when driving up to 60kph (the highest speed limit of IVFC).
Viii Conclusion Remarks
We realized that the ability of TiEV is still far from what we demanded for practical applications on complicated urban roads. The system could already cope with many scenarios and drive the vehicle safely. Nevertheless, it still cannot match up to human drivers in respect of adaptivity to environmental variations and robustness when facing noises.
At present, deep learning-based methods have already proved their abilities to detect and segmenting different types of objects from images almost in real time . However, such perception cannot still understand the characteristics of objects or their relations, e.g., we can tell from a new driver in front of us based on his/her reactions to certain traffic conditions. Many of these in-depth understanding of traffic scenes contribute to appropriate driving behavior. This can only be realized with the help of the modeling of driving experiences. SLAM community intensively studies the memorization of the static driving environment. However, the modeling of the contextual semantics of driving experiences, such as the relations and interactions between objects in a driving environment, is still an open question.
Besides, the online perception burden can easily surpass the onboard computing resources. Human driver relieves this burden by emphasizing on specific objects according to current driving intentions, which is known as the attention mechanism. It calls for a tighter coupling between the planning and the perception functions in the future.
Moreover, human drivers usually do not have to behave ”optimally” as algorithms do. Planning optimization should, therefore, be relaxed for temporally suboptimal solutions and should be regularized to react robustly to disturbances.
At last, we argue that the technical route of autonomous driving is still under drastic evolution. Differentiations of implementations worldwide will eventually benefit to the maturation of the autonomous driving systems.
Special thanks to Deyi Li, Wei Han, Changzhu Zhang, Lifeng An, Dan Hai, Jing Zhu and Peizhi Zhang for the provided help and discussions during the implementations.
-  M. Aeberhard, S. Rauch, M. Bahram, G. Tanzmeister, J. Thomas, Y. Pilat, F. Homm, W. Huber, and N. Kaempchen, “Experience, results and lessons learned from automated driving on germany’s highways,” IEEE Intelligent Transportation Systems Magazine, vol. 7, no. 1, pp. 42–57, 2015.
-  C. Berger and B. Rumpe, “Autonomous driving - 5 years after the urban challenge: The anticipatory vehicle as a cyber-physical system,” ArXiv e-prints, 2014.
-  J.-F. Bonnefon, A. Shariff, and I. Rahwan, “The social dilemma of autonomous vehicles,” Science, vol. 352, no. 6293, pp. 1573–1576, 2016.
-  A. Broggi, P. Cerri, S. Debattisti, M. C. Laghi, P. Medici, M. Panciroli, and A. Prioletti, “Proud-public road urban driverless test: Architecture and results,” in 2014 IEEE Intelligent Vehicles Symposium Proceedings, June 2014, pp. 648–654.
-  A. Broggi, S. Debattisti, P. Grisleri, and M. Panciroli, “The deeva autonomous vehicle platform,” pp. 692–699, June 2015.
-  M. Buechel, J. Frtunikj, K. Becker, S. Sommer, C. Buckl, M. Armbruster, A. Marek, A. Zirkler, C. Klein, and A. Knoll, “An automated electric vehicle prototype showing new trends in automotive architectures,” in 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Sept 2015, pp. 1274–1279.
C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning
affordance for direct perception in autonomous driving,” in
2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015, pp. 2722–2730.
-  A. Cosgun, L. Ma, J. Chiu, J. Huang, M. Demir, A. M. Anon, T. Lian, H. Tafish, and S. Al-Stouhi, “Towards full automated drive in urban environments: A demonstration in gomentum station, california,” ArXiv e-prints, 2017.
-  DMV, “Autonomous vehicles in california,” DMV, Tech. Rep., 2016.
-  D. Dolgov, S. Thrun, M. Montemerlo, and J. Diebel, “Path planning for autonomous vehicles in unknown semi-structured environments,” The International Journal of Robotics Research, vol. 29, no. 5, pp. 485–501, 2010.
-  W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time loop closure in 2d lidar slam,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), ser. 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 1271–1278.
-  A. S. Huang, E. Olson, and D. C. Moore, “Lcm: Lightweight communications and marshalling,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, ser. 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010, pp. 4057 – 4062.
-  L. Li, W. L. Huang, Y. Liu, N. N. Zheng, and F. Y. Wang, “Intelligence testing for autonomous vehicles: A new approach,” IEEE Transactions on Intelligent Vehicles, vol. 1, no. 2, pp. 158–166, June 2016.
-  M. Maurer, J. C. Gerdes, B. Lenz, and H. Winner, Autonomous Driving Technical, Legal and Social Aspects. Springer, Berlin, Heidelberg, 2016.
-  M. Montemerlo, J. Becker, S. Bhat, H. Dahlkamp, D. Dolgov, S. Ettinger, D. Haehnel, T. Hilden, G. Hoffmann, B. Huhnke, and Others, “Junior: The stanford entry in the urban challenge,” Journal of field Robotics, vol. 25, no. 9, pp. 569–597, 2008.
-  J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” ArXiv e-prints, 2016.
-  I. Shim, J. Choi, S. Shin, T. H. Oh, U. Lee, B. Ahn, D. G. Choi, D. H. Shim, and I. S. Kweon, “An autonomous driving system for unknown environments using a unified map,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 4, pp. 1999–2013, 2015.
A. Teichman and S. Thrun, “Tracking-based semi-supervised learning,”The International Journal of Robotics Research, vol. 31, no. 7, pp. 804–818, 2012.
-  C. Urmson, J. Anhalt, D. Bagnell, C. Baker, R. Bittner, M. N. Clark, J. Dolan, D. Duggins, T. Galatali, C. Geyer, M. Gittleman, S. Harbaugh, M. Hebert, T. M. Howard, S. Kolski, A. Kelly, M. Likhachev, M. McNaughton, N. Miller, K. Peterson, B. Pilnick, R. Rajkumar, P. Rybski, B. Salesky, Y.-W. Seo, S. Singh, J. Snider, A. Stentz, W. R. Whittaker, Z. Wolkowicki, and J. Ziglar, “Autonomous driving in urban environments boss and the urban challenge,” Journal of Field Robotics, vol. 25, no. 8, pp. 425–466, 2008.
-  Waymo, “On th road to fully self-driving,” Waymo, Tech. Rep., 2017.
-  T. Yang, Y. Wu, J. Zhao, and L. Guan, “Semantic segmentation via highly fused convolutional network with multiple soft cost functions,” arXiv preprint arXiv:1801.01317, 2018.
-  J. Ziegler, P. Bender, M. Schreiber, H. Lategahn, T. Strauss, C. Stiller, T. Dang, U. Franke, N. Appenrodt, C. Keller, and Others, “Making bertha drive - an autonomous journey on a historic route,” IEEE Intelligent Transportation Systems Magazine, vol. 6, no. 2, pp. 8–20, 2014.