Precise vehicle localization is critical to intelligent vehicle systems and aims to minimize traffic accidents, congestion, and overall costs of road traffic. For safety, highway operation and local city roads require reliable knowledge of vehicle coordination . Further, accurate positioning enables the driving assistance modules to predict future trajectory of the vehicle and thus supports Advanced Driver Assistance Systems (ADAS) such as Adaptive cruise control (ACC), Forward collision warning (FCW), and Lane departure warning system (LDW).
In this work, we propose infrastructure node-based vehicle localization to estimate vehicle coordination for autonomous vehicles. Infrastructure node [25, 13], as we call ix-node, is an infrastructure based smart sensing system built to perceive the nearby vehicles and pedestrians. Similar to autonomous vehicles carrying their sensors to perceive the surrounding, ix-node has its own sensor heads to run perception algorithm and detect passing by objects. As shown in Fig. 1
, we fuse the information collected from the ix-node and autonomous vehicles’ sensing system to improve the overall localization performance of these vehicles. We use Extended Kalman Filter (EKF) based multisensor fusion that combines GPS and IMU data from the vehicle as well as the ix-node perception output from camera and LiDAR. To the best of our knowledge, this is the first work that performs infrastructure node-based localization with real-world testing and validation.
The smart ix-node is a stationary infrastructure based sensing solution which provides some unique benefits to the robotics industry as listed below:
These systems have their own sensing suite and perception which can be communicated to and used by nearby passing AVs. This forms a unique sensor fusion based approach where we can assimilate information from two different view points and increase the total accuracy of the system as a whole.
Depending on the need at a particular road intersection, these ix-nodes can be made sensor heavy or light. This kind of flexibility results in a cost-effective autonomy solution because now, the AVs operating in that region can reduce some of their sensors, to become lightweight, and instead use the rich information from these nodes.
These nodes increase the effective sensing field of view of AVs. If placed at strategic locations, these can fill in blind spots and reduce the probability of accidents and near miss incidents. In addition, these can stand in as a proxy for vehicle sensors just in case a failure occurs.
A number of works [6, 17, 3] have implemented vehicle-to-vehicle (V2V) communication-based platforms where the sensor measurements depend on relative localization between those vehicles. Since ix-node is static, it can represent information in a global reference frame with higher accuracy. As a result, we use it for the purpose of localization in our work. This work utilizes the aforementioned benefits and implements ix-based vehicle localization. In short, our main contributions are:
Realization of ix-based vehicle localization. Our proposed method was tested on the real-world environment where we built our infrastructure node with sensor heads. The ix-node collects perception data of autonomous vehicles driving nearby the node.
Improvement of localization performance by incorporating ix-node perception output with vehicle’s sensor data. We use Extended Kalman Filter (EKF) based multisensor fusion that combines GPS and IMU data from the vehicle as well as the ix-node perception output from camera and LiDAR.
Ii Related Work
Vehicle localization has been extensively studied in the past [16, 5, 7, 11, 10, 9, 1, 20]. The traditional methods [5, 7] integrate GPS with odometry. Wei et al.  fuse GPS with visual odometry using EKF for vehicle localization in urban environments. Although combining GPS with odometry can improve the coordination accuracy, these methods provide degraded results when sensors are noisy or sensitive to the environments. Visual odometry is feature dependent and assumes the surrounding with relevant features. Therefore, fusion of only GPS and odometry is unsuitable for reliable localization.
To achieve robust vehicle localization, LiDAR-based localization has been also explored. Kummerle et al.  use multi-level surface maps to localize the vehicle using particle filter. Levinson et al.  address LiDAR intensity-based localization approach where the intensity offers more cues than the traditional 3D shape matching provides. Recently, Wan et al.  combine both LiDAR intensity and altitude information to demonstrate more robust system that can adapt to environmental changes.
Further, learning-based methods are discussed using LiDAR. Wolcott et al.  use a Gaussian mixture map that captures LiDAR intensity and the altitude, and resulting system performs well under challenging weather conditions. Schaupp et al. 
utilize deep learning for extracting a point cloud descriptor that improves localization results. Lu et al. perform end-to-end point cloud registration for better localization. All of these LiDAR-based approach requires the vehicle to be equipped with the expensive LiDAR sensor and perform computationaly intensive point cloud registration.
Map matching based vehicle localization has been also popular in recent years. Li et al.  propose a particle filter based map matching algorithm to compute likelihood of the matched road with the topology of the lane-level map. Asghar et al.  also execute lane level map matching and perform topological map matching using GPS, odometry, and IMU data. Vora et al.  tackle the problem of reliance on prior maps by using freely available aerial imagery to match it against the live LiDAR intensity. With prior knowledge of map, these map matching based methods result in accurate localization, but ICP based matching assumes good initialization and the algorithm depends on the map.
To remedy the aforementioned problems, vehicle-to-vehicle (V2V) communication-based localization has been studied. The V2V communication is robust to GPS signal outage and cooperative sensing capability will add reliability to localization accuracy. Challita et al.  use V2V communication with GPS and vision-based ranging system for localizing ego vehicle. Rife et al.  utilize V2V, GPS, and vision-based lane-boundary detection for accurate vehicle positioning. Bento et al.  achieve V2V-based sensor fusion using encoders and inter-vehicle absolute positioning data from magnet detection. Although V2V approach performs better than GPS and odometry fusion, its robustness is supported by relative measurements. In the case of lane boundary detection method, the system is also sensitive to proper lane marking.
Since V2V-based approach depends on relative measurements, infrastructure-based localization has been recently introduced. Wang et al.  combines the location information of traffic lights from a high-precision map. This approach no longer depends on relative location and generates the traffic light map by collecting prior geographical information of traffic lights. The vision information is then fused with INS using EKF to generate high precision localization result. Soatti et al.  jointly localize non-cooperative physical features in the surrounding, and the information on sensed features are fused by V2V by a consensus procedure. This work presents theoretical solution to the localization problem with simulation. Our work also addresses infrastructure-based localization, and proposed method is validated by real-world experiment. Our ix-node is capable of detecting vehicles with its sensor heads and thus independently provides absolute measurements. This method does not need to assume good initialization and prior knowledge of map.
Iii Ix-based Localization
Iii-a System overview
denote the vehicle state vector. This state vector is in East, North, Up (ENU) coordinate frame and relevant to our application since the utilized sensors (e.g. GPS and ix-node) directly measureand position of the vehicle. We assume 2D motion and thus resulting output corresponds to East and North components in ENU coordinate frame.
The overall system is summarized in Fig. 3. Vehicle collects linear velocity using IMU as well as its position measured by GPS. Ix-node captures image using camera and generates point cloud using LiDAR. The vehicle candidate generation algorithm produces set of tuples that indicate predicted vehicle positions by ix-node. The vehicle and ix-node data are associated and synchronized in spatial and time domain. Then, EKF-based multisensor fusion is performed to combine both information from the vehicle and ix-node.
Iii-B Vehicle candidate generation
This work utilizes sensors mounted on an infrastructure node, which perceives the vehicle from another perspective. We have deployed multiple infrastructure nodes at each intersection but for the purpose of this study we focus on a single IX node that includes two sensor-heads mounted on a single pole connected to one compute node. Each sensor-head is comprised of a non-spinning LiDAR, a 5MP camera, and infrastructure to power the sensors. The compute node is equipped with an Nvidia TITAN V GPU and a Xeon 2.5GHz CPU and all the processing is done on this compute node. The vehicle and the sensor heads of ix-node are shown in Fig. 2(a) and Fig. 2(b), respectively.
The ix-node perceives the nearby objects by the pipeline shown in Fig. 4. This pipeline implements a late fusion method where detection is run on each sensor modality separately and then the 3D detected objects from LiDAR are tracked and fused with the 2D camera detections to add the classification of objects. Once the tracked objects from each sensor-head are obtained, the tracks from the two sensor-heads are fused to obtain a single view of the environment. We are using YOLO V4  for detecting objects from the images. The 3D detection on LiDAR is done using background removal and then clustering in a Voxel grid representation. For tracking, we’re using EKF that includes the center, dimentions, and velocity of the object in the state. The fusion is simply performed by reprojecting the 3D bounding cuboids onto the image and matching the reprojected box with 2D detections using IoU. Finally, the fusion between the two sensor-heads is done using transforming the tracks to UTM coordinates and matching is done using 3D IoU to obtain the final fused tracks from the IX node.
Fig. 2(c) shows the ix-node perception as well as generated vehicle candidates. The left figure shows the camera view of a moving vehicle, and the right figure shows the point cloud by LiDAR sensors and colored markers as detected vehicle candidates.
Iii-C Dynamic model (EKF Prediction)
The motion prediction equation can be written as: where
is the process noise and Gaussian-distributed. We choose a linear constant velocity model for state prediction, expressed as
where are x and y velocities obtained from IMU, and is time interval.
The Covariance matrix of motion prediction is represented by
where is the Jacobian matrix of with respect to , is the covariance matrix of estimation at time , and is the covariance matrix of the Guassian noise that affects the vehicle state.
Iii-D Data association
Since all the sensors have different timestamps and coordinate systems, we need to synchronize time and associate the data in the same spatial domain. As shown in Fig. 3, we have GPS measurement and vehicle candidate from ix-node . We synchronize timestamps with respect to POSIX time of the GPS since it has the lowest frequency (10Hz), transform sensor measurements to ENU coordinate frame, and offset the East and North positions by map origin . We assume the vehicle has negligible acceleration so the GPS and ix-node outputs are linearized to have the same timestamps.
For initial filtering of generated vehicle candidates, we set a threshold to remove the candidates that are too far from the predicted state and also filter out any noise that is out of ix-node detection range. For multiple candidates that are overlapped to each other, we choose a nearest neighbor from the predicted state. Further, the GPS measurements that is too far from the predicted state are ignored. Filtered GPS and ix-node outputs are fed into EKF-based multisensor fusion module as and , respectively.
Iii-E Multisensor fusion (EKF measurement update)
In the system architecture, GPS and ix-node perception output are used as vehicle position measurement. The GPS mounted on the vehicle measures vehicle state in ENU coordinate frame, and the observation equation can be expressed as
where is the GPS measurement and is the measurement noise. The covariance matrix of GPS observation error is estimated by
are the standard deviations of the GPS measurement noise inand directions.
As for ix-node perception, sensor heads including camera and LiDAR run vehicle tracking algorithms that estimate the vehicle state, and the observation equation is given by
where is the ix-node perception output and is the measurement noise. Similar to the case of GPS, the covariance matrix of ix-node perception error is given by
where and are the standard deviations of the ix-node perception algorithm noise in and directions. We refer to Najjar et al.  for detailed derivation of the covariance matrix.
We first estimate the vehicle position using GPS measurements and then incorporate ix-node perception outputs. The innovation of the GPS measurement is
The covariance matrix of innovation is
where is the Jacobian matrix of with respect to . Kalman gain, corrected position, and associated covariance matrix of GPS measurement are
Next, the innovation of the ix-perception output is
The covariance matrix of innovation is
where is the Jacobian matrix of with respect to . Kalman gain of ix-node measurement is
The final output from Extended Kalman Filter for vehicle position is given as
with associated covariance matrix
Iv-a Experimental setup
To validate our algorithm, we run Ford’s autonomous vehicle (Fig. 2(a)) at Mcity test track , a state-of-the-art testing facility designed for testing performance of autonomous vehicles in Ann Arbor, Michigan. The vehicle carries a GPS unit and a Novatel ProPak6 that provides a ground truth trajectory by fusing it’s RTK corrected GPS and IMU measurements. Fig. 5 shows the entire trajectory of the vehicle that is run for 300 seconds on the test track. The ground truth Novatel system is mounted on the vehicle at the position that is distance ahead of the GPS. Then, the ground truth measurement in ENU coordinate frame can be expressed as
where are raw measurement for vehicle position and is vehicle orientation measured by Novatel’s device. Fig. 5 shows ix-node position , overall trajectories for GPS and ground truth, and satellite view map of Mcity.
|Longitudinal error (m)||1.17||0.65|
|Lateral error (m)||1.20||1.05|
|Total error (m)||1.91||0.82|
Fig. 6 shows the error analysis for entire trajectory from Fig. 5. The top shows the root mean square error (RMSE) for estimated trajectory, and the bottom shows localization error histogram. It can be seen that the raw GPS error is non-trivial (e.g. the maximum error is greater than 5m), so the fusion with ix-node is necessary and improves the localization performance. Table I summarizes statistics of GPS error for entire trajectory.
Iv-B Multisensor fusion for ix-based localization
To evaluate ix-based localization performance, we study three cases where vehicle drives nearby an ix-node. The camera image of Fig. 2(c) shows an instance of such case. For each case, the vehicle is within ix-node’s detection range so we can execute multisensor fusion algorithm combining GPS, IMU, and ix-node perception.
|Time interval for dynamic model||0.1s|
|East position of map origin||277495m|
|North position of map origin||4686600m|
|Threshold for filtering vehicle candidate||5m|
|Longitudinal std. deviation of GPS noise||0.8m|
|Lateral std. deviation of GPS noise||2m|
|Longitudinal std. deviation of ix-node noise|
|Lateral std. deviation of ix-node noise||0.3m|
|Distance of ground truth system from GPS||0.0381m|
|Longitudinal ix-node position w.r.t map origin||7.13m|
|Lateral ix-node position w.r.t map origin||62.19m|
Table II shows the details of used parameters for ix-based localization. For the longitudinal standard deviation of ix-node perception noise , we define the value as
where . We have observed that the ix-node perception error increases as the object being detected gets farther away from the ix-node; thus, we model the non-linear covariance matrix that is dynamically changing based on the distance between ix-node and vehicle.
Fig. 7 shows the multisensor fusion results for the three cases. We observe longitudinal and lateral positions provided by ground truth, GPS, ix-node, and proposed fusion method. The RMSE of GPS, ix-node, and fusion method are also plotted. Circled regions in red indicate the intervals when the vehicle runs in a tunnel. Fig. 8 displays corresponding localization error histograms for each case. Quantitative sensor statistics for the position error of each case is detailed in Table III. These errors measure the performance of our localization approach.
|Case 1 (m)||2.09||0.78||1.40||0.50||1.05||0.25|
|Case 2 (m)||2.30||0.43||1.62||0.53||0.89||0.22|
|Case 3 (m)||1.15||0.33||1.05||0.59||0.92||0.49|
From all the cases, we observe that the GPS measurement is noisy in both longitudinal and lateral direction, the mean position errors (raw GPS) for the case 1 and 2 exceed two meters. The ix-node perception has lower position error than GPS output in all cases but higher average standard deviation for position errors. The case 2 and 3 from Fig. 7 show that there exist instances where ix-node perception output fails but it recovers quickly. The ix-node perception error comes from vehicle occlusion and sensor noise so the position of generated vehicle candidates may deviate from the ground truth further than the position estimated by GPS that is Gaussian-distributed from the ground truth. By fusing these two sources of measurement, we see that the mean position errors are improved in all cases, achieving an average of 0.95 meter. The localization error histograms in Fig. 8 also demonstrate that mean errors get shifted to the left when fusing GPS and ix data in case 1 and 2. Although mean errors in case 3 seem similar for ix and fusion, fused data has less standard deviation. In all cases, the mean errors of fusion are reduced from using GPS.
It is also worth noting that the ix-node perception works even if the vehicle is moving in a tunnel. As shown in Fig. 2(c), our ix-node is built so that its detection range covers the case when a vehicle is running in a tunnel. The circled regions in Fig. 7 shows high peaks of GPS errors inside the tunnel due to GPS attenuation. Ix-node still detects the vehicle with sensor heads, and the fusion result shows that ix-node compensates for GPS failure in a tunnel.
This paper realizes infrastructure node-based vehicle localization. This work utilizes additional sensors mounted on an infrastructure node, namely ix-node, which perceives the vehicle from another perspective. An ix-node has two sensor heads, where each sensor head consists of a camera and a LiDAR. The ix-node perceives the nearby vehicles, and the vehicles can localize themselves better via communication with the ix-node.
We execute ix-based localization framework with Extended Kalman Filter including data association and time synchronization between the vehicle and ix-node. The ix-node has its own perception algorithm to generate vehicle candidates, and the outputs from dynamic model are associated with the generated candidates to be fed into the multifusion system. We test the proposed algorithm using Ford autonomous vehicle and ix-node built on a test track at Mcity in Ann Arbor, Michigan. The result shows that the proposed ix-based localization performs better than GPS and ix perception in all cases, achieving less than one meter of mean position error. It is also observed that the ix-node compensates for GPS failure in a tunnel.
The future work will implement ix-based localization framework with multiple ix-node. We intend to explore combining information at the sensor level like the raw images and pointclouds instead of the perception output.
-  (2018-February 2) Localization technique selection. Google Patents. Note: US Patent App. 15/907,623 Cited by: §II.
-  (2020) Vehicle localization based on visual lane marking and topological map matching. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 258–264. Cited by: §II.
-  (2012) Inter-vehicle sensor fusion for accurate vehicle localization supported by v2v and v2i communications. In 2012 15th International IEEE Conference on Intelligent Transportation Systems, pp. 907–914. Cited by: §I, §II.
-  (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934. Cited by: §III-B.
-  (2001) Data fusion of four abs sensors and gps for an enhanced localization of car-like vehicles. In Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No. 01CH37164), Vol. 2, pp. 1597–1602. Cited by: §II.
-  (2009) An application of v2v communications: cooperation of vehicles for a better car tracking using gps and vision systems. In 2009 IEEE Vehicular Networking Conference (VNC), pp. 1–6. Cited by: §I, §II.
-  (2005) A road-matching method for precise vehicle localization using belief theory and kalman filtering. Autonomous Robots 19 (2), pp. 173–191. Cited by: §II, §III-E.
-  (2009) Autonomous driving in a multi-level parking structure. In 2009 IEEE International Conference on Robotics and Automation, pp. 3395–3400. Cited by: §II.
-  (2019) Bird’s eye view localization of surrounding vehicles: longitudinal and lateral distance estimation with partial appearance. Robotics and Autonomous Systems 112, pp. 178–189. Cited by: §II.
-  (2017) Feature-based lateral position estimation of surrounding vehicles using stereo vision. In 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 779–784. Cited by: §II.
-  (2010) Robust vehicle localization in urban environments using probabilistic maps. In 2010 IEEE international conference on robotics and automation, pp. 4372–4378. Cited by: §II, §II.
-  (2018) Map-aided dead-reckoning with lane-level maps and integrity monitoring. IEEE Transactions on Intelligent Vehicles 3 (1), pp. 81–91. Cited by: §II.
-  (2021)(Website) External Links: Cited by: §I.
Deepvcp: an end-to-end deep neural network for point cloud registration. In
Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12–21. Cited by: §II.
-  (2016)(Website) External Links: Cited by: §IV-A.
-  (2019) Localization requirements for autonomous vehicles. arXiv preprint arXiv:1906.01061. Cited by: §I, §II.
-  (2011) Collaborative vision-integrated pseudorange error removal: team-estimated differential gnss corrections with no stationary reference receiver. IEEE Transactions on Intelligent Transportation Systems 13 (1), pp. 15–24. Cited by: §I, §II.
-  (2019) OREOS: oriented recognition of 3d point clouds in outdoor scenarios. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3255–3261. Cited by: §II.
-  (2018) Implicit cooperative positioning in vehicular networks. IEEE Transactions on Intelligent Transportation Systems 19 (12), pp. 3964–3980. Cited by: §II.
-  (2020) Aerial imagery based lidar localization for autonomous vehicles. arXiv preprint arXiv:2003.11192. Cited by: §II, §II.
-  (2018) Robust and precise vehicle localization based on multi-sensor fusion in diverse city scenes. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4670–4677. Cited by: §II.
-  (2018) Vehicle localization at an intersection using a traffic light map. IEEE Transactions on Intelligent Transportation Systems 20 (4), pp. 1432–1441. Cited by: §II.
-  (2011) Intelligent vehicle localization in urban environments using ekf-based visual odometry and gps fusion. IFAC Proceedings Volumes 44 (1), pp. 13776–13781. Cited by: §II.
-  (2017) Robust lidar localization using multiresolution gaussian mixture maps for autonomous driving. The International Journal of Robotics Research 36 (3), pp. 292–319. Cited by: §II.
-  (2018-June 6) Transportation infrastructure communication and control. Google Patents. Note: US Patent App. 16/018,144 Cited by: §I.