Fast and Accurate Extrinsic Calibration for Multiple LiDARs and Cameras

09/14/2021 ∙ by Xiyuan Liu, et al. ∙ The University of Hong Kong 0

In this letter, we propose a fast, accurate, and targetless extrinsic calibration method for multiple LiDARs and cameras based on adaptive voxelization. On the theory level, we incorporate the LiDAR extrinsic calibration with the bundle adjustment method. We derive the second-order derivatives of the cost function w.r.t. the extrinsic parameter to accelerate the optimization. On the implementation level, we apply the adaptive voxelization to dynamically segment the LiDAR point cloud into voxels with non-identical sizes, and reduce the computation time in the process of feature correspondence matching. The robustness and accuracy of our proposed method have been verified with experiments in outdoor test scenes under multiple LiDAR-camera configurations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Multiple LiDARs and cameras have been increasingly used on mobile robots for missions such as autonomous navigation [7] and mapping [13, 15, 14]

. This is due to the superior characteristics of the LiDAR in three-dimensional range detection and point cloud density, and the rich color information from the camera. The integration of multiple sensors could facilitate the state estimation of the robot 

[12], meanwhile produce a dense and colorized map (see Fig. 1). To better perceive the surrounding environment, it is worthwhile to transform the perceptions from multiple sensors into the same coordinate frame, i.e., to know the rigid transformation between each pair of senors. In this letter, our work deals with the extrinsic calibration between multiple LiDARs and cameras.

Several challenges reside in the multi-sensor extrinsic calibration: (1) limited field-of-view (FoV) overlap among the sensors. Current methods usually require the existence of common FoV between each pair of sensors [2, 23, 10, 18], such that each feature is viewed by all sensors. However, this FoV overlap might be very small or not even exist (e.g., to only focus on a dedicated area of interest), making these methods less practical. (2) computation time demands. For general ICP-based LiDAR extrinsic calibration approaches [6, 14], the extrinsic is optimized by aligning the point cloud from all LiDARs and maximizing the point cloud’s consistency. The increase in the number of LiDARs implies that the feature point correspondence searching will be more time-consuming. This is due to the reason that each feature point needs to search for and match with nearby feature points using a

-d tree which contains the whole point cloud. In the LiDAR-camera extrinsic calibration, a larger amount of LiDAR points will also lead to more computation time in the LiDAR feature extraction.

Fig. 1: A) The dense colorized point cloud with the LiDAR poses and extrinsic parameters optimized by our proposed method. The views from other perspectives are exhibited in B) left side and C) right side. Our experiment video is available at https://youtu.be/PaiYgAXl9iY.

To address the above challenges, we propose a fast and targetless extrinsic calibration method for multiple LiDARs and cameras. To create co-visible features among all sensors, we introduce motions to the sensor platform such that each sensor will scan the same area (hence features) at different times. We first calibrate the extrinsic among LiDARs by registering their point cloud using an efficient bundle adjustment (BA) method. To produce multi-view correspondence, we implement the adaptive voxelization to dynamically segment the point cloud into multiple voxels with non-identical sizes. This process greatly reduces the time consumption during the feature correspondence matching as only one plane feature exists in each voxel. We then calibrate the extrinsic between the cameras and LiDARs by matching the co-visible features between the images and the above-reconstructed point cloud. To further accelerate the feature correspondence matching, we inherit the above adaptive voxel map to extract LiDAR edge features. Moreover, we utilize depth-continuous edges from the point cloud to avoid foreground inflation and bleeding points issues. In summary, our contributions are listed as follows:

  • We formulate the multi-LiDAR extrinsic calibration into a BA problem and derive the Jacobian and Hessian matrix of the cost function w.r.t. the LiDAR extrinsic to accelerate the optimization.

  • We implement the adaptive voxelization to accelerate the process of feature correspondence matching and LiDAR depth-continuous edge extraction.

  • We propose a fast, reliable, and targetless extrinsic calibration method for multiple LiDARs and cameras. Our method could handle the configurations when there is little or even no FoV overlap among the sensors. The precision and robustness of our proposed method are comparable to target-based methods and have been validated in outdoor test scenes.

  • We open source our implementation on GitHub

    111https://github.com/hku-mars/mlcc to benefit the community.

Ii Related Works

Ii-a LiDAR-LiDAR Extrinsic Calibration

The extrinsic calibration methods between multiple LiDARs could be divided into two categories, motion-based and motionless approaches. In motion-based approaches, the sensor suite is usually required to move along such a trajectory that the onboard sensors could percept the same region of interests [14, 4, 22, 12]. In this manner, more constraints will be considered in the optimization problem, including those between the base and auxiliary LiDARs and those between different poses of the base LiADR during the motion. In [11, 16, 1], authors also introduce external inertial navigation sensors to ease the motion estimation of LiDARs. Then the extrinsic parameter could be calibrated by optimizing the consistency of the point cloud map with ICP-based [6, 14] or entropy-based [16] indicators. The issue within these methods is that they generally consider the correlation between each pair of features, which will be computationally expensive when the number of LiDAR increases. In [12]

, authors have maintained a state vector containing the extrinsic parameter of each LiDAR w.r.t. the robot center and updated it with EKF whenever a new measurement is available. This approach relies highly on the accuracy of the LiDAR odometry result that its calibration precision might be unreliable.

Motionless methods have been discussed in [2, 23] where authors attach the retro-reflective tapes to the surface of calibration targets to create and facilitate the feature extraction among multiple LiDARs. These methods require prior preparation work and FoV overlap between LiDARs, which have limited their works from wide implementation.

Ii-B LiDAR-Camera Extrinsic Calibration

The extrinsic calibration between LiDAR and camera could be mainly divided into target-based and targetless methods. In target-based approaches, the geometric features, e.g., edges and surfaces, are extracted from natural geometric solids [9, 3, 19] or chessboard [8, 25] using intensity and color information. These features are matched either automatically or manually and are solved with general non-linear optimization tools. Since extra calibration targets and manual work are needed, these methods are less practical compared with targetless solutions.

The targetless methods could be further divided into motion-based and motionless approaches. In motion-based methods, the extrinsic parameter is usually initialized by the motion information and refined by the appearance information. In [17], authors reconstruct the point cloud from images using the structure from motion (SfM) to determine the initial extrinsic parameter and refine it by back-projecting LiDAR points onto the image plane. In [22], authors initialize the extrinsic parameter by Hand-eye calibration [20] and optimize it by minimizing the re-projection error between each image and adjacent two LiDAR scans. Though the introduction of motion addresses extra constraints between sensors, these methods require the sensor suite to move along a sufficiently excited trajectory. In motionless approaches, only the edge features that co-exist in both sensors’ FoV are extracted and matched. Then the extrinsic parameter is optimized by minimizing the re-projected edge-to-edge distances [24, 26, 21, 10] or by maximizing the mutual information between the back-projected LiDAR points and the images [18].

Our proposed work is targetless and requires no FoV overlap between any sensor pairs to be calibrated. We create co-visible features by moving the sensor suite to multiple poses to eliminate the requirement of FoV overlap. Unlike [14, 22] which also optimize the LiDAR poses and extrinsic by minimizing the summed point-to-plane distances, our work directly operates on the raw point cloud without feature extraction and is more reliable in terms of both time consumption and precision. Compared with [24, 18] which also optimizes the LiDAR-camera extrinsic by minimizing the re-projection errors, our work could also handle the configuration when the LiDAR and camera have no common FoV and is more time-consuming.

Iii Methodology

Iii-a Overview

Let represent the rigid transformation from frame to frame , where and are the rotation and translation. We denote the set of LiDARs, where represents the base LiDAR for reference, the set of cameras, the set of LiDAR extrinsic parameters and the set of LiDAR-camera extrinsic parameters. To create co-visible features between multiple LiDARs and cameras that may share no FoV overlap, we rotate the robot platform to poses such that the same region of interest is scanned by all sensors (see Fig. 2). Denote the time for each of the poses and the pose of the base LiDAR at the initial time as the global frame, i.e., . Denote the set of the base LiDAR poses in global frame. The point cloud patch scanned by LiDAR at time is denoted by , which is in ’s local frame. This point cloud patch could be transformed to global frame by

(1)
Fig. 2: FoV overlap created by rotation between two opposite pointing sensors. The original setup of two sensors and share no FoV overlap. With the introduction of rotational motion, the same region is scanned by all sensors across different times.

In our proposed approach of multi-sensor calibration, we sequentially calibrate the and . In the first step, we simultaneously estimate the LiDAR extrinsic and the base lidar pose trajectory based on an efficient multi-view registration (see Sec. III-C). In the second step, we calibrate the by matching the depth-continuous edges extracted from images and the above-reconstructed point cloud (see Sec. III-D). Lying in the center of both LiDAR and camera extrinsic calibration is an adaptive map, which finds correspondence among LiDAR and camera measurements efficiently (Sec. III-B).

Iii-B Adaptive Voxelization

To find the correspondences among different LiDAR scans, we assume the initial base LiDAR trajectory , LiDAR extrinsic , and camera extrinsic are available. The initial base LiDAR trajectory could be obtained by an online LiDAR SLAM (e.g., [13]) and the initial extrinsic could be obtained from the CAD design or a rough Hand-eye calibration [20]. Our previous work [14] extracts edge and plane feature points from each LiDAR scan and matches them to the nearby edge and plane points in the map by a

-nearest neighbor search (kNN). This would repeatedly build a

-d tree of the global map at each iteration. In this letter, we use a more efficient voxel map proposed in [15] to create correspondences among all LiDAR scans.

The voxel map is built by cutting the point cloud (registered using the current and

) into small voxels such that all points in a voxel roughly lie on a plane (with some adjustable tolerance). The main problem of the fixed-resolution voxel map is that if the resolution is high, the segmentation would be too time-consuming, while if the resolution is too low, multiple small planes in the environments falling into the same voxel would not be segmented. To best adapt to the environment, we implement an adaptive voxelization process. More specifically, the entire map is first cut into voxels with a pre-set size (usually large, e.g., 4m). Then for each voxel, if the contained points from all LiDAR scans roughly form a plane (by checking the ratio between eigenvalues), it is treated as a planar voxel; otherwise, they will be divided into eight octants, where each will be examined again until the contained points roughly form a plane or the voxel size reaches the pre-set minimum lower bound. Moreover, the adaptive voxelization is performed directly on the LiDAR raw points, so no feature points extraction is needed as in

[14].

Fig. 3 shows a typical result of the adaptive voxelization process in a complicated campus environment. As can be seen, this process is able to segment planes of different sizes, including large planes on the ground, medium planes on the building walls, and tiny planes on tree crowns.

Fig. 3: A) LiDAR point cloud segmented with the adaptive voxelization. Points within the same voxel are colored identically. The detailed adaptive voxelization of points in the dashed white rectangle could be viewed in B) colored points and C) original points. The default size for the initial voxelization is 4m, and the minimum voxel size is 0.25m.

Iii-C Multi-LiDAR Extrinsic Calibration

Fig. 4: (a) The -th factor item relating to and with and . (b) The distance from the point to the plane .

With the adaptive voxelization, we can obtain a set of voxels of different sizes, and each voxel contains points that are roughly on a plane and creates a planar constraint for all LiDAR poses that have points in this voxel. More specifically, considering the -th voxel consisting of a group of points scanned by at times . We define a point cloud consistency indicator which forms a factor on and as shown in Fig. 4. Then, the base LiDAR trajectory and extrinsic are estimated by optimizing the factor graph. A natural choice for the consistency indicator would be the summed Euclidean distance between each to the plane to be estimated (see Fig. 4). Taking account of all such indicators within the voxel map, we could formulate the problem as

(2)

where , is the total number of points in , is the normal vector of the plane and is a point on this plane. The optimization dimension in (2) is too high due to the dependence on the planar parameters . Fortunately, since one plane parameter is independent from another, we can optimize over first, i.e.,

(3)

The inner optimization over in (3) could be further performed on first and on then, i.e.,

(4)

As can be seen, the cost function in (4) is quadratic w.r.t. . Hence the inner optimization can be solved analytically by setting the derivatives to zeros, i.e.,

(5)

It is seen that the solution to (5) is not unique as long as is perpendicular to , which allows to move freely along any direction perpendicular to . Since this free movement of does not change the plane parameterized by it, nor affect the cost function in (4), any solution of satisfying (5) would be an optimal solution to the inner optimization problem of (4). One such solution could be

(6)

Substituting the optimal solution of in (6) back to (4) leads to

(7)

Again, this optimization problem has the well-known analytical optimal solution

, which is the eigenvector corresponding to the smallest eigenvalue

of the matrix . As a result, substituting the optimal back to (3) leads to

(8)

As can be seen, the optimization variables are analytically solved before the optimization, which significantly reduces the optimization dimension. The resultant optimization in (8) is over the LiDAR pose (hence the base LiDAR trajectory and extrinsic ) only. To see this, we note that depends on (directly or via in (6)), which is observed locally by pose .

The optimization in (8) is nonlinear and solved iteratively. In each iteration, the cost function is approximated to the second order. More specifically, we view as a function of all the contained points which is the column vector containing each :

The in (8) could be approximated by

(9)

where and are the first and second derivatives of w.r.t. . The detailed derivation of and could be found in [15] and is omitted here due to limited space.

Suppose the -th point in is scanned by LiDAR at time , then

(10)

which implies is dependent on and . To perturb , we perturb a pose in its tangent plane with the as defined in [5], i.e.,

(11)

Based on the error parameterization in (11) for both and extrinsic , the perturbed point location in (10) is

(12)

Then, subtracting (10) from (12), we obtain

(13)

and

(14)

where

is a small perturbation of the entire optimization vector

and

(15)

Substituting (14) to (9) leads to

(16)

Then the optimal could be determined by iteratively solving the (17) with the LM method and updating the to .

(17)

Iii-D LiDAR-Camera Extrinsic Calibration

With the calibrated and , we could obtain a dense point cloud in the global frame. This global point cloud could then be used to find the optimal extrinsic by matching edge features from the point cloud and the image. Two types of edges could be extracted from the point cloud and images. One is depth-discontinuous edges between foreground and background objects, and the other is the depth-continuous edge between two neighboring non-parallel planes. As explained in [24], depth-discontinuous edges suffer from foreground inflation and bleeding points phenomenon, we hence use depth-continuous edges to match point cloud and images.

In [24], the point cloud is segmented into uniform size voxels and the planes inside each voxel are estimated by the RANSAC algorithm. In contrast, our method uses the same adaptive voxel map obtained in Sec. III-B. Then for every two adjacent voxels, we calculate the angle between their containing planes. If this angle exceeds a threshold, the intersection line of these two planes is extracted as the depth-continuous edge. As shown in Fig. 5, our method could effectively remove the false estimations and saves computation time.

Fig. 5: Depth-continuous edge extraction comparison. A) real-world image. B) raw point cloud of this scene. C) edges extracted using method in [24] where the yellow circles indicate the false estimations. D) edges extracted with adaptive voxelization. The time consumption of edge extraction in this scene is 38s for [24] versus 5s of our proposed method.

For image edge extraction, we use the Canny algorithm. To further accelerate the calibration process and avoid miss-matching, we conduct an FoV check that only the depth-continuous edges within the current camera’s FoV are used. The correspondence between the LiDAR edge and camera edge is built by projecting the LiDAR edge onto the image plane with the current extrinsic estimate. Then we optimize the extrinsic parameters by minimizing the residuals of point-to-edge distances on the image plane similar to [24].

Iii-E Calibration Pipeline

The workflow of our proposed multi-sensor calibration is illustrated in Fig. 6. At the beginning of the calibration, the base LiDAR’s raw point cloud is processed by a LOAM [13] to obtain the initial base LiDAR trajectory . Then, the raw point cloud of all LiDARs are segmented by time into point cloud patches, i.e., that is collected under the pose .

In multi-LiDAR extrinsic calibration, the base LiDAR poses are first optimized using the base LiDAR’s point cloud patches . Noticed that only is involved and optimized in (3). Then the extrinsic are calibrated by aligning the point cloud from the LiDAR to be calibrated with those from the base LiDAR. In this stage’s problem formulation (3), is fixed at the optimized values from the previous stage, and only is optimized. Finally, both and are jointly optimized using the entire point cloud patches. In each iteration of the optimization (over , , or both), the adaptive voxelization (as described in Sec. III-B) is performed with the current value of and .

In multi-LiDAR-camera extrinsic calibration, the adaptive voxel map obtained with the and in the previous step is used to extract 3D depth-continuous edges (Sec. III-D). Then those 3D edges are back-projected onto each image using the extrinsic parameter and are matched with 2D Canny edges extracted from the image. By minimizing the residuals defined by these two edges, we iteratively solve for the optimal with the Ceres Solver222http://ceres-solver.org/.

Fig. 6: The workflow of our proposed method: multi-LiDAR extrinsic calibration (light blue region) and LiDAR-camera extrinsic calibration (light green region). The adaptive voxelization takes effect in the steps surrounded by the yellow lines.

Iv Experiments and Results

To test the proposed algorithm, we customized a remotely operated vehicle platform333https://www.agilex.ai/product/3?lang=en-us with one Livox AVIA LiDAR444https://www.livoxtech.com/avia (with 70.4 FoV), one Livox MID-100 LiDAR555https://www.livoxtech.com/mid-40-and-mid-100 and two MV-CA013-21UC666https://www.rmaelectronics.com/hikrobot-mv-ca013-21uc/ cameras (with 82.9 FoV each), as illustrated in Fig. 7. The MID-100 LiDAR consists of three MID-40 LiDAR units (with 38.4 FoV each), of which the extrinsic parameters are calibrated by the manufacturer and could be used as the ground truth for the calibration evaluation. Note that the two types of LiDAR units (e.g., AIVIA and MID-40) have different scanning patterns, densities, and FoVs.

We have verified our proposed algorithm with the data collected in two random test scenes in our campus as shown in Fig. 8. Scene-1 is a square in front of the library with moving pedestrians and scene-2 is an area near a garden. In Sec. IV-A, the data collected in our previous work [14] have also been used for comparison with the previous method. All experiments are conducted on a high-performance PC with an i7-9700K processor and 32GB RAM.

For our proposed multi-LiDAR extrinsic calibration, we first conduct a standard Hand-eye calibration [20] with an 8’-figure path to initialize the extrinsic . Then we rotate our multi-sensor platform by 360 and keep the robot platform still every few degrees, such that we could acquire dense enough point cloud from each LiDAR at each pose. Keeping the robot platform still during data collection also eliminates the problem caused by motion distortion and time synchronization. The timestamps are manually selected that only the point cloud and image data when the robot platform is still, are selected.

Fig. 7: Our customized multi-sensor vehicle platform. Left: the FoV coverage of each sensor with their FoV specs. Right: the orientation of each sensor is denoted in the right-handed coordinate system.
((a)) Scene-1
((b)) Scene-2
Fig. 8: Our experiment test scenes.

Iv-a Convergence and Computation Time Comparison

In this section, we demonstrate that the proposed algorithm converges faster than our previous work [14] in terms of both iteration times and computation time while remains accurate. We use the dataset collected in [14] on MID-100 and choose the middle MID-40 as the base LiDAR to calibrate the adjacent two LiDARs. We perform 10 independent trials that in each trial the initial extrinsic is initialized by randomly perturbing each Euler angle of by and each axis’s offset by from the manufacturer’s calibrated values.

The extrinsic rotation and translation errors of both methods versus iteration time are plotted in Fig. 9 and the averaged time cost of each iteration of both methods is summarized in Table. I. It is shown in Fig. 9 that the proposed work makes both the extrinsic translation and rotation errors quickly converge to the appropriate values. This is due to the second-order optimization we used in Sec. III-C, where the Jacobian and Hessian matrix with respect to the optimization variables ( and ) are exactly derived. In contrast, in the previous work [14], only the Jacobian of the residual w.r.t. one LiDAR is considered, causing inaccurate Jacobian computation. The calibration results of both methods in the above 10 trials are plotted in Fig. 10 which indicates that the increase in speed of our proposed method does not result in the loss of accuracy. Considering the computation time cost and the convergence rate, our proposed algorithm could save the total calibration time to at least one-tenth of the previous work.

Fig. 9: Convergence comparison of the proposed method and previous work [14]. Each box-plot consists of 40 values from 10 trials, two test scenes and two LiDAR pairs, i.e.,

. The mean and standard deviation of the initial extrinsic errors are 0.0929m and 0.0262m for translation and 0.0553rad and 0.0257rad for rotation, respectively.

previous work [14] proposed
time cost per iteration (s) 7.6372 1.4516
TABLE I: COMPUTATION TIME COMPARISON
Fig. 10: Extrinsic calibration results of three MID-40 LiDARs. Each box plot consists of 20 values respectively from 10 trials and two pairs of LiDARs.

Iv-B Multi-LiDAR Calibration

Iv-B1 MID-100 LiDAR Self Calibration

In this section, we compare our algorithm with the motion-based method [22] using the MID-100 LiDAR and the data collected in both test scenes. The middle MID-40 is chosen as the base LiDAR to calibrate the extrinsic of other MID-40s, i.e., . For both methods, the extrinsic are initialized by the Hand-eye calibration and the results are summarized in Table. II. Since the MID-40 LiDAR is of small FoV and the vehicle’s movements in both test scenes are limited to planar motions, the pure motion-based method is less comparable to our proposed method.

Methods Rotation Error Translation Error
mean sd mean sd
motion based [22] 2.7223 2.4137 0.3955m 0.1267m
proposed 0.0075 m 0.0016m
TABLE II: EXTRINSIC CALIBRATION RESULTS OF LIDARS INSIDE MID-100 IN TWO TEST SCENES

Iv-B2 AVIA and MID-100 LiDAR

In this section, we demonstrate that our method works well given two types of LiDARs with different FoVs and point cloud densities and we compare the results with those from motion-based method [22]. The AVIA is chosen as the base LiDAR to calibrate the extrinsic between AVIA and each MID-40s, i.e., and . Then we calculate the from the above results and compare them with the known values obtained from manufacturer. For both methods, the extrinsic are initialized by Hand-eye calibration and the results from both test scenes are summarized in Table. III. It is shown that the proposed method’s performance is less affected by the distinct characteristics introduced from different types of LiDARs.

Methods Rotation Error Translation Error
mean sd mean sd
motion based [22] 5.0876 4.3721 0.9945m 0.5701m
proposed 0.0084m 0.0023m
TABLE III: EXTRINSIC CALIBRATION RESULTS BETWEEN AVIA AND MID-100 IN TWO TEST SCENES

Iv-C Multiple LiDAR Camera Calibration

Iv-C1 Among AVIA, MID-100 and Cameras

In this section, we compare our proposed LiDAR-camera extrinsic calibration method with the motion-based [22] and the mutual information based [18] methods. Both [22, 18] require the sensors to share the common FoV, utilize the intensity information of the LiDAR point cloud and match it with the edge features extracted from the image. Here, we select the AVIA as the base LiDAR. The initial extrinsic are calculated by adding disturbance to the values measured from the CAD model. We perform 20 independent trials with the data collected in scene-2, that in each trial we randomly perturb each Euler angle of by and each axis’s offset of by from the CAD model’s measurements. We calibrate the extrinsic of each camera individually (i.e., and ), then we calculate the and compare it with that obtained by the standard chessboard method serving the ground-truth. The calibration results are illustrated in Fig. 11. It is shown that our proposed method outperforms [18, 22] both quantitatively and qualitatively.

Fig. 11: Extrinsic calibration results of [22, 18] and the proposed method. Each box-plot illustrates the results of 20 trials using the data collected in scene-2. The average and standard deviation of the initial rotation error are and . The average and standard deviation of the initial translation error are 0.1389m and 0.0778m, respectively.

Iv-C2 MID-100 and Cameras

Fig. 12: Extrinsic calibration results of MID-100 and opposite pointing cameras in two test scenes. Each box-plot illustrates the results of 20 trials. The average and standard deviation of the initial rotation error are and . The average and standard deviation of the initial translation error are 0.1007m and 0.0588m, respectively.

In this section, we demonstrate that the proposed method could calibrate the extrinsic between LiDAR and cameras without FoV overlap. We choose the middle MID-40 of the MID-100 as the base LiDAR and calibrate the extrinsic of each LiDAR-camera pairs (i.e., ). The initial extrinsic are calculated by adding disturbance to the values measured from the CAD model. We perform 20 independent trials with the data collected in both scenes, that in each trial we randomly perturb each Euler angle of by and each axis’s offset of by from the CAD’s measurements. Then we calculate the and compare it with that obtained by the standard chessboard method. The calibration results and the corresponding colorized point cloud are illustrated in Fig. 12 and Fig. 13.

It is seen that the general extrinsic calibration performance between MID-40 and cameras is less competitive than that between AVIA and cameras. This is due to the fact that AVIA has larger FoV coverage (70.4 versus 38.4

) and thus point cloud density (6 laser beams versus 1 laser beam) than MID-40, which will provide more edge correspondences in all directions. The performance of MID-40 and cameras extrinsic calibration in scene-2 is also more robust and acceptable than scene-1. This is probably due to the reason that the extracted LiDAR edges mismatch with and trapped into the image edges largely existed on the ground of scene-1.

Fig. 13: Colorized point cloud using MID-100 LiDAR and opposite pointing cameras. The left camera’s images color the point clouds in both test scenes. The brightness of the window is due to the reflection of the sunlight. Top: scene-1; Bottom: scene-2.

V Conclusion

In this letter, we propose a fast and targetless extrinsic calibration method for multiple LiDARs and cameras. We analytically derive the derivatives of the cost function w.r.t. the LiDAR extrinsic, and implement adaptive voxelization which has greatly shortened the total calibration time. Experiment results under multiple LiDAR-camera configurations in outdoor test scenes demonstrate the robustness and reliability of our proposed method, even when no FoV overlap exists between the sensor pairs.

Acknowledgment

The authors thank Livox Technology and AgileX Robotics for their product support.

References

  • [1] M. Billah and J. A. Farrell (2019) Calibration of multi-lidar systems: application to bucket wheel reclaimers. IEEE Transactions on Control Systems Technology, pp. 1–12. External Links: Document Cited by: §II-A.
  • [2] C. Gao and J. R. Spletzer (2010) On-line calibration of multiple lidars on a mobile vehicle platform. In 2010 IEEE International Conference on Robotics and Automation, Vol. , pp. 279–284. External Links: Document Cited by: §I, §II-A.
  • [3] X. Gong, Y. Lin, and J. Liu (2013) 3D lidar-camera extrinsic calibration using an arbitrary trihedron. Sensors 13 (2), pp. 1902–1918. External Links: Document Cited by: §II-B.
  • [4] L. Heng (2020) Automatic targetless extrinsic calibration of multiple 3d lidars and radars. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vol. , pp. 10669–10675. External Links: Document Cited by: §II-A.
  • [5] C. Hertzberg, R. Wagner, U. Frese, and L. Schröder (2013) Integrating generic sensor fusion algorithms with sound state representations through encapsulation of manifolds. Information Fusion 14 (1), pp. 57–77. Cited by: §III-C.
  • [6] J. Jiao, H. Ye, Y. Zhu, and M. Liu (2021) Robust odometry and mapping for multi-lidar systems with online extrinsic calibration. IEEE Transactions on Robotics (), pp. 1–10. External Links: Document Cited by: §I, §II-A.
  • [7] F. Kong, W. Xu, Y. Cai, and F. Zhang (2021) Avoiding dynamic small obstacles with onboard sensing and computation on aerial robots. IEEE Robotics and Automation Letters 6 (4), pp. 7869–7876. External Links: Document Cited by: §I.
  • [8] G. Koo, J. Kang, B. Jang, and N. Doh (2020) Analytic plane covariances construction for precise planarity-based extrinsic calibration of camera and lidar. 2020 IEEE International Conference on Robotics and Automation (ICRA). External Links: Document Cited by: §II-B.
  • [9] J. Kummerle and T. Kuhner (2020) Unified intrinsic and extrinsic camera and lidar calibration under uncertainties. 2020 IEEE International Conference on Robotics and Automation (ICRA). External Links: Document Cited by: §II-B.
  • [10] J. Levinson and S. Thrun (2013) Automatic online calibration of cameras and lasers.. In Robotics: Science and Systems, Vol. 2, pp. 7. Cited by: §I, §II-B.
  • [11] J. Levinson and S. Thrun (2014) Unsupervised calibration for multi-beam lasers. Experimental Robotics Springer Tracts in Advanced Robotics 79, pp. 179–193. External Links: Document Cited by: §II-A.
  • [12] J. Lin, X. Liu, and F. Zhang (2020) A decentralized framework for simultaneous calibration, localization and mapping with multiple lidars. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4870–4877. External Links: Document Cited by: §I, §II-A.
  • [13] J. Lin and F. Zhang (2020) Loam-livox: a fast, robust, high-precision lidar odometry and mapping package for lidars of small fov. In Proc. of The International Conference in Robotics and Automation (ICRA), Cited by: §I, §III-B, §III-E.
  • [14] X. Liu and F. Zhang (2021) Extrinsic calibration of multiple lidars of small fov in targetless environments. IEEE Robotics and Automation Letters 6 (2), pp. 2036–2043. External Links: Document Cited by: §I, §I, §II-A, §II-B, §III-B, §III-B, Fig. 9, §IV-A, §IV-A, TABLE I, §IV.
  • [15] Z. Liu and F. Zhang (2021) BALM: bundle adjustment for lidar mapping. IEEE Robotics and Automation Letters 6 (2), pp. 3184–3191. External Links: Document Cited by: §I, §III-B, §III-C.
  • [16] W. Maddern, A. Harrison, and P. Newman (2012) Lost in translation (and rotation): rapid extrinsic calibration for 2d and 3d lidars. In 2012 IEEE International Conference on Robotics and Automation, Vol. , pp. 3096–3102. External Links: Document Cited by: §II-A.
  • [17] B. Nagy, L. Kovács, and C. Benedek (2019) Online targetless end-to-end camera-lidar self-calibration. In 2019 16th International Conference on Machine Vision Applications (MVA), Vol. , pp. 1–6. External Links: Document Cited by: §II-B.
  • [18] G. Pandey, J. R. McBride, S. Savarese, and R. Eustice (2015) Automatic extrinsic calibration of vision and lidar by maximizing mutual information. J. Field Robotics 32, pp. 696–722. Cited by: §I, §II-B, §II-B, Fig. 11, §IV-C1.
  • [19] Y. Park, S. Yun, C. Won, K. Cho, K. Um, and S. Sim (2014) Calibration between color camera and 3d lidar instruments with a polygonal planar board. Sensors 14 (3), pp. 5333–5353. External Links: Document Cited by: §II-B.
  • [20] H. Radu and D. Fadi (1995-06) Hand-eye calibration. The International Journal of Robotics Research 14 (3), pp. 195–210. External Links: Document, Link Cited by: §II-B, §III-B, §IV.
  • [21] D. Scaramuzza, A. Harati, and R. Siegwart (2007) Extrinsic self calibration of a camera and a 3d laser range finder from natural scenes. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4164–4169. Cited by: §II-B.
  • [22] Z. Taylor and J. Nieto (2016) Motion-based calibration of multimodal sensor extrinsics and timing offset estimation. IEEE Transactions on Robotics 32 (5), pp. 1215–1229. External Links: Document Cited by: §II-A, §II-B, §II-B, Fig. 11, §IV-B1, §IV-B2, §IV-C1, TABLE II, TABLE III.
  • [23] B. Xue, J. Jiao, Y. Zhu, L. Zhen, D. Han, M. Liu, and R. Fan (2019) Automatic calibration of dual-lidars using two poles stickered with retro-reflective tape. In 2019 IEEE International Conference on Imaging Systems and Techniques (IST), Vol. , pp. 1–6. External Links: Document Cited by: §I, §II-A.
  • [24] C. Yuan, X. Liu, X. Hong, and F. Zhang (2021) Pixel-level extrinsic self calibration of high resolution lidar and camera in targetless environments. IEEE Robotics and Automation Letters 6 (4), pp. 7517–7524. External Links: Document Cited by: §II-B, §II-B, Fig. 5, §III-D, §III-D, §III-D.
  • [25] L. Zhou, Z. Li, and M. Kaess (2018) Automatic extrinsic calibration of a camera and a 3d lidar using line and plane correspondences. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). External Links: Document Cited by: §II-B.
  • [26] Y. Zhu, C. Zheng, C. Yuan, X. Huang, and X. Hong (2020) CamVox: a low-cost and accurate lidar-assisted visual slam system. arXiv preprint arXiv:2011.11357. Cited by: §II-B.