DeepAI
Log In Sign Up

Lasers to Events: Automatic Extrinsic Calibration of Lidars and Event Cameras

07/03/2022
by   Kevin Ta, et al.
ETH Zurich
0

Despite significant academic and corporate efforts, autonomous driving under adverse visual conditions still proves challenging. As neuromorphic technology has matured, its application to robotics and autonomous vehicle systems has become an area of active research. Low-light and latency-demanding situations can benefit. To enable event cameras to operate alongside staple sensors like lidar in perception tasks, we propose a direct, temporally-decoupled calibration method between event cameras and lidars. The high dynamic range and low-light operation of event cameras are exploited to directly register lidar laser returns, allowing information-based correlation methods to optimize for the 6-DoF extrinsic calibration between the two sensors. This paper presents the first direct calibration method between event cameras and lidars, removing dependencies on frame-based camera intermediaries and/or highly-accurate hand measurements. Code will be made publicly available.

READ FULL TEXT VIEW PDF

page 1

page 3

page 4

page 5

page 6

03/10/2021

DSEC: A Stereo Event Camera Dataset for Driving Scenarios

Once an academic venture, autonomous driving has received unparalleled c...
12/03/2019

Physics-based Simulation of Continuous-Wave LIDAR for Localization, Calibration and Tracking

Light Detection and Ranging (LIDAR) sensors play an important role in th...
03/24/2021

Single-Shot is Enough: Panoramic Infrastructure Based Calibration of Multiple Cameras and 3D LiDARs

The integration of multiple cameras and 3D Li- DARs has become basic con...
04/04/2018

Event-based Vision meets Deep Learning on Steering Prediction for Self-driving Cars

Event cameras are bio-inspired vision sensors that naturally capture the...
07/03/2019

Intrinsic Calibration of Depth Cameras for Mobile Robots using a Radial Laser Scanner

Depth cameras, typically in RGB-D configurations, are common devices in ...
06/29/2020

Spatiotemporal Calibration of Camera and 3D Laser Scanner

The multi-sensory setups consisting of the laser scanners and cameras ar...
02/24/2020

Automatic Estimation of Sphere Centers from Images of Calibrated Cameras

Calibration of devices with different modalities is a key problem in rob...

1 Introduction

Figure 1: (Left) Uncalibrated scene with lidar points projected using initial values. (Right) Calibrated scene showing improved lidar point projection alignment.

In recent years, autonomous vehicle development has accelerated due to both academic and industrial research. Many sensing modalities have become part of the standard autonomous driving sensor suite, including cameras, lidars, and radars. With adverse conditions still a challenging problem in autonomous driving, event-based vision offers many advantages in low-light and latency-demanding situations.

Event-based vision derives from neuromorphic engineering, aimed at replicating fundamental, biological neural functions [9]. An event camera asynchronously extracts individual pixel-wise events that correspond to luminosity changes. This contrasts with traditional cameras that capture entire frames at regular intervals, even in static scenes.

The use of multiple sensors has enabled significant advances in perception, localization, and odometry [16]. These tasks may rely on a single sensor or may fuse the output of multiple sensors. This said, a key aspect of enabling these sensors is to accurately calibrate the intrinsic and extrinsic parameters of the sensors. These parameters specify how each modality represents the environment and how each sensor is positioned relative to the others [32]. Without calibration, sensor fusion methods incorrectly associate spatial features, as seen in Fig. 1, which negatively impacts downstream perception.

Despite the increasing interest in event cameras, few large-scale multi-sensor datasets include event cameras as part of their sensor suite. Rather, combinations of more typical sensors—traditional cameras, lidars, and radars—are available in large-scale datasets like KITTI [12], the Oxford RobotCar Dataset [2, 18], and Boreas [4]. To the best of our knowledge, there exist only two datasets which contain both lidar and event cameras, MVSEC [35] and DSEC [11]. However, neither performs direct calibration between event cameras and lidars.

As event cameras represent a still immature technology, methods have yet to be explored for the direct calibration between them and non-imaging sensors. This work addresses this gap by proposing a direct extrinsic calibration between a MEMS lidar and a state-of-the-art event camera.

Event cameras are typically designed as monochromatic detectors, operating across the visible light spectrum. To achieve operation over a high dynamic range, the use of optical filters and the design of the imaging sensor node [14] are key considerations. Lidars operate at near-infrared (NIR) wavelengths within the typical sensitivity of silicon photodiodes. Consequently, lidar signals should be detectable by event cameras in the absence of IR cut filters. Indeed, we found that current state-of-the-art event cameras can directly register lidar laser signals as discrete events.

In this paper, we present an automatic information-based method for calibrating a MEMS lidar with respect to an event-based camera. It leverages event-based structured light for static correspondence matching. In particular, we show how a dense lidar scan collected by a RoboSense RS-LiDAR-M1 [27] can be easily and reliably correlated to a series of events registered by a Prophesee GEN4.1 [24] event camera for robust calibration using accumulated event maps. Our key contributions can be summarized as follows:

  1. [topsep=6pt,itemsep=3pt,partopsep=3pt, parsep=3pt]

  2. We propose a novel method to create an image reconstruction of a static scene using accumulated event maps from the lidar-correlated event activity.

  3. We present the first direct 6-DoF calibration between an event camera and a lidar using accumulated event maps and lidar point clouds.

  4. We investigate the impact of intrinsic calibration and scene selection on the accuracy of calibration results.

2 Related Work

Neuromorphic sensors have rarely been integrated in the perception stack of autonomous vehicles. They suffer from a lack of maturity in neuromorphic technology. Yet, event cameras offer distinct advantages beyond the current suite of standard sensors. In particular, event cameras can capture moving objects at the speed of the motion itself, instead of the fixed update rates of traditional frame-based cameras and lidars. With their high dynamic range, these sensors can operate in low-light and over-saturated environments that may pose problems for frame-based cameras [9]. The event-based paradigm has the potential to reduce data bandwidth in static scenes without sacrificing the advantageous dense representation of imaging technology.

As event-driven technology has advanced in recent years, new datasets have emerged that incorporate event camera systems. There are two main datasets which contain both event cameras and lidar sensors. MVSEC [35] uses two DAVIS346 event cameras and a Velodyne VLP-16 lidar. The DAVIS346 camera captures a relatively low spatial resolution (346  260 or 0.1 MP) compared to the Prophesee GEN4.1 event camera (1280  720 or 0.92 MP) we consider. DSEC [11] uses two Prophesee GEN3.1 event cameras and a Velodyne VLP-16 lidar. The Prophesee GEN3.1 operates at a higher spatial resolution (640  480 or 0.3 MP) compared to the DAVIS346 camera, but at a lower spatial resolution than the current GEN4.1 sensors.

For calibration, MVSEC attempted to use the grayscale image produced by the DAVIS346’s Active Pixel Sensor (APS) with the Camera and Range Calibration Toolbox [13]. The APS design is able to simultaneously act as a traditional monochrome frame-based camera and asynchronous event camera. The calibration results were found to be inaccurate, however, and the authors resorted to relying on CAD measurements and manual adjustment of extrinsic parameters. DSEC uses the Prophesee event cameras which do not have the APS. Instead, the authors rely on stereo methods for rotation-only calibration. Both datasets take fixed translation parameters from the CAD model and only calibrate for the 3-DoF rotation.

Intrinsic and stereo calibration of cameras is a well-established field with widely available tools such as OpenCV [10] or Kalibr [21]

. These standard methods have ROS implementations and are implemented in multiple programming languages, allowing for their ease-of-use in robotics and autonomous vehicle development. Extensions of these methods have recently been made for event cameras where state-of-the-art event-to-video reconstruction is used to interpolate frames of moving checkerboard patterns for the purpose of intrinsic and stereo calibration 

[20]. DSEC uses this image reconstruction method to employ stereo calibration between the event and frame-based cameras using the Kalibr toolbox. The authors then perform extrinsic calibration to optimize for the rotation between a frame-based camera and a lidar sensor by performing modified point-to-plane ICP [5] from a stereo point cloud generated by SGM [15] and the lidar point cloud.

Calibration methods have been explored quite intensively between traditional cameras and lidars. These include automatic mutual information maximization schemes [22, 30], edge alignment methods [34], and plane alignment methods [13, 34]. These methods vary from being completely unstructured to using manufactured targets, such as standard planar checkerboards [33] or custom multi-modal targets [6]. Kang and Doh [17] propose a probabilistic edge detection method to maximize the detected edge overlap between lidar scans and images. An  [1] fused 3D-2D and 3D-3D point correspondences with structured planar targets and unstructured environmental objects for more robust calibration. These methods are designed with traditional camera images in mind and are not directly applicable to an event-based data stream.

Our basic observation is that lidar laser returns can trigger events, generating highly correlated signals in both sensors. Previous work has explored the use of event-based structured light in the visible spectrum using line or raster-pattern scans to generate terrain and 3D reconstruction [3, 19]. We can naturally formulate the one-to-one correspondence matching under a mutual information framework, such as the one proposed by Pandey  [22]. Our work thus extends the use of mutual information frameworks from traditional cameras to event cameras.

3 Sensors Overview

Our experimental sensor setup consists of a RoboSense RS-LiDAR-M1 MEMS lidar [27] and a Prophesee GEN4.1 event camera  [24]. These sensors are mounted on top of a car, facing forward. Their nominal parameters are listed in Table 1 and the sensor installation is shown in Fig. 2.

Sensor Resolution HFoV VFoV Freq.
RS-LiDAR-M1 75k points 120 25 10 Hz
GEN4.1 1280  720 63 38 50 Mev/s
Table 1: Specifications of the MEMS lidar and event camera.
Figure 2: Annotated image of the as-built sensor platform.

The Prophesee GEN4.1 event-based vision system, which utilizes the Sony IMX636 sensor [8], was recently released for evaluation and comes with higher spatial resolution and smaller pixel sizes (4.86  4.86 ) than the previous Prophesee GEN3.1 (15  15 [23]. One of the key advantages exhibited by the IMX636 event-driven sensor is the high dynamic range between 5 and 100,000 . The sensor exhibits an operating dynamic range of 86 dB and a reported full dynamic range in excess of 120 dB.

Silicon-based image sensors are naturally sensitive to wavelengths between 400-1000 , a range of wavelengths that contains both the visible light and NIR [14]. Such devices typically employ IR cut filter arrays to restrict the response of each photo-detector to RGB imaging. Unfortunately, this process reduces the quantum efficiency (QE) and hurts applications in low-light settings. Monochrome or night-mode cameras often forego IR cut filters to improve low-light operation. Additional improvements such as back illumination (BI) have also improved QE, thus enhancing night vision with NIR [7] and low-light operations in event-based neuromorphic vision [29]. The Prophesee GEN4.1 employs monochrome vision with an IR correcting lens to prevent aberrations, but does not incorporate any IR cut filters that would inhibit NIR sensitivity.

The RS-LiDAR-M1 is a 905  MEMS lidar that captures a large frontal field-of-view (120° horizontal) while maintaining a very small profile. With an angular resolution of 0.2°, the generated point cloud densely overlaps the event camera view. The Prophesee GEN4.1 event camera can thus directly register the laser signals generated by the RS-LiDAR-M1.

4 Methodology

For intrinsic camera calibration, we use the standard pinhole camera model and the Brown–Conrady distortion model. The pinhole model, , is described by the focal length and principal point parameters. The distortion model is described by the five distortion parameters , where are the even-ordered radial correction terms for barrel distortion and

are the tangential correction terms for image skew.

In order to represent our extrinsic calibrations as homogeneous transformations, we collapse the transformations down to a 6-parameter vector given by the translation parameters

and the rotational vector (axis-angle) representation where the is the angle of rotation in radians. Thus the full representation of the extrinsic transformation, , is given by . The translation parameters will be referred to as the translation vector . The rotation expanded into its matrix form will be referred to as .

4.1 Accumulated Event Map

The events registered by the event camera are provided in the following format: . Events are recorded with the specific time of the event, the location of the event, and the polarity of the event—, whether the intensity has increased or decreased. While this polarity can be used to identify edges during motion, a laser pulse triggers both a positive and a negative polarity event. With the lidar performing complete scans at 10 Hz, we can simply increment the location at each triggered event, regardless of polarity, in the pixel canvas to get a corresponding accumulated event map, . This process is completed with a fully static scene lacking active elements, decoupling any temporal dependencies. By accumulating events over a short period of time (three seconds), we can also ensure greater robustness to random event noise without the use of optical filters.

We clip the event map such that to prevent scaling issues with abnormally high event activity at a pixel location. Additionally, we apply Gaussian smoothing to the synthetic image for smoother optimization.

An accumulated event map for a sample static scene is shown in Fig. 3. The images exhibit a clear correlation between high-intensity lidar signals and areas of significant event activity.

Figure 3: (Top) Accumulated event map serving as a scene image reconstruction. (Bottom) Cropped projected lidar scans into synthetic image space.

4.2 Sensor Intrinsic Calibration

The RoboSense lidar is calibrated by the manufacturer and assumed to be accurate. To calibrate the intrinsic parameters for the event camera, we use the process described in [20], where a moving checkerboard is captured as a series of events. These events are then reconstructed as video frames at a fixed frequency. The reconstructed checkerboard patterns can then be extracted using standard intrinsic calibration methods.

4.3 Event Camera-Lidar Extrinsic Calibration

With the Prophesee GEN4.1 event camera’s low-light sensitivity to NIR, the intensity of lidar returns is highly correlated to the synthetic grayscale values found in the accumulated event map. This correlation between the lidar return intensities and the event map can be exploited as a mutual information maximization problem to accurately calibrate the event camera to the lidar.

4.3.1 Mutual Information Formulation

Mutual information (MI) is a measure of statistical dependence between random variables, indicating how much information one variable contains regarding the other. MI can be described in multiple ways, but we take the same entropy-based representation used in 

[22]. MI is defined in terms of the entropy of the random variables and , and their respective joint entropy :

(1)

The entropy denotes a measure of uncertainty within one variable, while the joint entropy represents the uncertainty present in the event of a co-observation of and . We take the random variables and to be the lidar return intensities and the event activity in the event map, respectively. The entropies of random variables and and their joint entropy are described in Eq. (2, 3, 4).

(2)
(3)
(4)

4.3.2 Probability Distribution Formulation

We approximate the probability distribution from the intensity and activity histograms of the lidar scan and event map respectively, as was done in 

[22]. Let be the set of homogeneous 3D points in the 3D scan and be the set of intensity returns for each point in the set . The lidar points can be projected into the image space through the extrinsics (, ) and the intrinsics .

(5)

The location of the projected point, , is then used to acquire the associated event activity in the accumulated event map image as shown in Eq. (6).

(6)

From the accumulated points, a normalized histogram is generated from the discretized intensity values and the total number of points, , that lie within the valid image region.

(7)

where is the number of lidar points with intensity that lie within the projected image space. A histogram for the event map is generated in the same way with the associated pixel location for each lidar point. A joint 2D histogram is also generated for every intensity-activity pair.

These raw histograms, however, are quite noisy and the optimization benefits from a smooth function to find suitable solutions. To address this issue, we perform a kernel density estimation (KDE) using Silverman’s rule-of-thumb 

[28]

. In practice, we smooth our histograms using Gaussian blurring convolutions, a method that is nearly equivalent to kernel density estimation (KDE) in the case where the histogram bins are equally spaced and the point values already discretized. Gaussian blurring runs orders of magnitude faster than true KDE algorithms. This approximation is nearly equivalent, except for the marginal difference in finite and infinite support between the two methods. Fig. 

4

shows the joint probability distribution of the lidar intensities and the event map grayscale values.

Figure 4: Smoothed joint histogram as the approximate joint probability distribution between the event map and lidar intensities.

4.4 Optimization Formulation

With the formulated MI objective function and our approximations of the probability distributions, we formulate an optimization problem as follows:

(8)

where and MI is maximized at the correct extrinsic parameters given an arbitrary set of scenes. Effective optimization of this function is aided by greater smoothness and convexity in the objective function. Fig. 5 shows the clear optimum in the cost landscape when varying the - translation parameters.

Figure 5: Cost surface when varying the x-y translation parameters around the optimal values.

The optimization process can be accomplished by any minimization algorithm. In this work, we evaluate multiple optimizers including the Nelder-Mead simplex-refinement algorithm and the Powell shooting method, as well as gradient-based methods like the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm, conjugate-gradient methods, and ”Sequential Least SQuares Programming” (SLSQP) as implemented in SciPy [31].

5 Results

5.1 Calibration Dataset Acquisition

Our calibration dataset consists of indoor static scenes recorded over 3 seconds divided into two major classes: garage scenes and checkerboard scenes. These scenes were collected in an underground site with the garage scenes taken at multiple locations inside the garage. The checkerboard scenes were taken with the vehicle platform in a set position with a checkerboard target placed in close proximity to both the event camera and lidar. The dataset consists of 35 checkerboard scenes, 5 blank-faced board scenes, and 53 garage scenes for a total of 93 scenes.

5.2 Intrinsic Calibration

Prior to extrinsic calibration, accurate intrinsic calibration of the event cameras is required. We performed calibration using the image frame reconstruction method discussed in [20] using checkerboard calibration targets. Fig. 6 shows the checkerboard detection on an image reconstructed with E2VID [25, 26].

Figure 6: Checkerboard detection (blue) and reprojection (red) of detected corners in the reconstructed event images.

We first performed calibration using an in-house fabricated target made from a printed checkerboard pattern. This calibration, however, exhibited a very high reprojection error of 0.904 pixels. Consequently, we re-performed calibration using a commercially-available metrology-grade checkerboard target111https://calib.io. This target has reported accuracy within 0.1  + 0.3  (at 20°C). The results of these two calibrations are shown in Tables 2 and 3.

We notice a significant reduction in the reprojection error from 0.904 to 0.667 pixels, which is an indicator of improved calibration. Additionally, the estimated focal length differs between the two calibrations by 15-17 pixels or approximately 1.5%.

Target Error
Prototype 0.904 1027.79 1029.03 615.96 342.07
Calib.io 0.667 1043.98 1044.39 620.35 343.76
Table 2: Camera intrinsic calibration pinhole results.
Target
Prototype -0.4408 0.2570 0.0012 0.0003 -0.0903
Calib.io -0.4558 0.2994 0.0001 0.0001 -0.1391
Table 3: Camera intrinsic calibration distortion results.

5.3 Optimizer Choice

One of our experimental goals is the search for efficient and robust optimizers. We analyzed the results of noise-induced experiments with gradient-free and gradient-based optimizers. Specifically, we calculate the mean result and the associated standard deviation of 40 calibrations where 40 scenes are sub-sampled from the total of 93 scenes to evaluate the convergence and computational speed. In each of these experiments the seed parameters are induced with uniform noise in the range of 0.1 

and 0.1 radians.

Robustness Analysis. The gradient-free Nelder-Mead simplex refinement method and the gradient-based, but unbounded, conjugate-gradient and BFGS methods perform poorly compared to the three other optimizers.

Fig. 7 shows the optimization robustness for the evaluated optimizers. We particularly note the poor performance of the Nelder-Mead method when optimizing translation parameters in the presence of uniform noise to the seed values. Despite the method’s general optimization robustness, it performs poorly when solving for the extrinsic calibration. Some results from the unbounded BFGS and CG methods wildly diverge from the median and are outside the bounds of the repeatability plots.

Figure 7: Optimized extrinsic parameters for the event camera-lidar calibration where grey squares are the initial seed values. Dashed line shows the hand-measured calibration values.

For the effective methods, the convergence of calibration results in the presence of significant noise indicates a convergence basin robust against errors in the seed calibration. The rotation calibration is highly consistent, converging to results within a standard deviation of 0.0007 rad (0.04°) against uniformly induced stochastic noise of 0.1 rad (5.7°) for each axis. The translation calibration is also consistent, but exhibits a measurable standard deviation of 3  against the 100  of uniform stochastic noise induced in each translation axes. Precise rotation calibration is arguably more important than translation calibration due to the magnifying effects of rotation error that lead to greater absolute error in real space at further distances. Table 4 summarizes the full optimizer evaluation results.

Optimizer (m) (m) (m) (rad) (rad) (rad)
Nelder-mead 0.18294 (0.01789) 0.00680 (0.02597) -0.04530 (0.02212) 1.20455 (0.00343) -1.20819 (0.00299) 1.21293 (0.00293)
CG 0.18306 (0.01809) 0.00244 (0.03502) -0.03208 (0.00620) 1.18456 (0.11725) -1.20409 (0.02080) 1.22869 (0.09062)
BFGS 0.18331 (0.02308) 0.00225 (0.02622) -0.03208 (0.00603) 1.18561 (0.11116) -1.20620 (0.00854) 1.23089 (0.10341)
Powell 0.18671 (0.00334) -0.00156 (0.00315) -0.03144 (0.00229) 1.20345 (0.00078) -1.20760 (0.00075) 1.21416 (0.00068)
L-BFGS-B 0.18612 (0.00313) -0.00176 (0.00325) -0.03174 (0.00330) 1.20352 (0.00055) -1.20748 (0.00068) 1.21419 (0.00047)
SLSQP 0.18671 (0.00267) -0.00217 (0.00282) -0.03141 (0.00250) 1.20347 (0.00060) -1.20751 (0.00059) 1.21426 (0.00037)
Table 4: Extrinsic calibration mean parameters in the noise-free calibration experiments with the standard deviations reported in brackets.

Computational Time Analysis. Another consideration regarding performance is computational speed of the different optimizers. Fig. 8 shows the comparison between the average compute times across optimizers. We find that of the three most effective methods, SLSQP performs significantly faster than the limited memory and bounded BFGS (L-BFGS-B) and Powell shooting methods. The Powell shooting method takes up to an order of magnitude longer than SLSQP and multiple times longer than L-BFGS-B, reducing its effective use as an optimizer at scale.

Figure 8: Comparison of the average compute time between different optimizers.

We can also analyze the computational time in terms of the number of optimized scenes, . More scenes lead to increased robustness, but longer computation. We can theorize that the optimization process has a complexity of , as the expected number of optimization steps should not depend on the scene count. This assumes that the underlying cost landscape structure is independent of the number of scenes used as the scene sampling is a random process. Consequently, the computation depends on the mutual information calculation, which must calculate the projection and mutual information for each individual scene. This calculation should scale linearly with the number of scenes used. The linear relationship can be observed in experiments where we vary the number of scenes, as shown in Fig. 9.

Figure 9: Graph of computation time as a function of the number of scenes used in SLSQP optimization.

5.4 Scene Selection

Impact of Number of Scenes. One of the key components for optimization is the collection of sufficient scenes for optimization. We investigate the number of scenes required for consistent calibration in Fig. 10

, which shows the variance reduction with an increasing number of scenes when using the SLSQP optimizer. As expected, increasing the number of scenes used in optimization decreases repeatability uncertainty. The variance of calibration results substantially decreases when using 30 or more scenes in the optimization process. We performed 10 repetitions for each sub-selection quantity experiment, where we analyzed results at every 5-scene interval.

Figure 10: Calibration results in red from SLSQP optimization for increasing number of scenes from left to right. The grey squares are the initial calibration seeds.

Garage Checkerboard Scenes. We perform repeated experiments using 20 sub-sampled scenes within the full, checkerboard-only, and garage-only subsets. We evaluate the per-scene MI for scenes excluded from calibration, and show the averaged MI scores across each scene subset in Table 5. We report higher average MI when calibrating using checkerboard-only, compared to using garage-only.

Calibration Set Full Eval. Checker Eval. Garage Eval.
Full 0.44808 0.45331 0.43825
Checker-only 0.44633 0.45547 0.43368
Garage-only 0.44281 0.43678 0.44064
Measured 0.42750 0.41968 0.42598
Table 5: Averaged MI scores across different scene subsets.

6 Discussion

Impact of Intrinsic Calibration. Intrinsic calibration and extrinsic calibration are closely connected domains for sensor calibration. As the projection of 3D points relies on the intrinsic calibration, extrinsic calibration results are directly affected by the intrinsics. For example, we noted a difference of approximately 1.5% in the focal lengths between intrinsic calibration results.

The focal length acts as a magnification factor and dictates the field-of-view of the projection. With the optimization using the in-house target for calibration, we noticed that the lidar position relative to the event camera was shifted backwards (z-direction) by 7  compared to the revised intrinsic calibration. Physically, this is improbable based on the as-built dimensions and is a compensation effect in the extrinsics for poor intrinsics.

For extrinsic calibration, the intrinsic calibration must be accurately performed. With significant improvements to reprojection error using a commercial target, high quality hardware is a crucial consideration for calibration using structured methods.

Considerations in Scene Selection. Our automatic method can be used in an unstructured environment with any combination of static scenes. However, this method tends to perform better when including checkerboard images. This phenomenon is likely a result of the proximity of the checkerboard targets to both sensors, providing stronger rotation and translation constraints to the optimization. Poorer optimization in outdoor scenes was observed in [22], where Pandey partially attributed the error to having fewer near-field 3D points in outdoor scenes. Consequently, the use of close-proximity objects with a reasonable amount of texture should have a beneficial effect on the calibration results.

Checkerboard targets provide one such example of a textured object which can be placed close to both sensors. However, the use of structured targets may be undesirable in an otherwise automatic framework. Fortunately, these textured objects do not need to be precisely constructed as the automatic nature of this method does not rely on accurate identification of 3D points. One could theoretically use any reasonably textured object in the scene at close proximity to improve the calibration constraints.

Managing NIR Sensitivity. The proposed method relies on the registration of lidar laser signals by the event camera. However, NIR sensitivity is undesirable during standard operation. To address this concern, the Prophesee GEN4.1 contains bias tuning parameters that can eliminate high frequency flicker effects. We note that in outdoor daytime settings, the event camera does not register lidar returns, likely due to the ambient NIR saturation from the sun.

7 Conclusion

This work presents the first direct extrinsic calibration method between event-based cameras and lidars using event-based structured light. Our method offers a flexible automatic approach to extrinsic calibration that leverages direct correlation between the lidar active signals and the corresponding registered events without precise time synchronization. We have showcased the robustness of this method against errors in the seed calibration and the aspects of scene selection that improve the optimization constraints.

As event-driven vision technology continues to mature with more applications being explored, multi-sensor setups with event cameras may become more prominent in the field of robotics. Accordingly, we hope that our direct calibration method will enable further research on event cameras and sensor fusion.

References

  • [1] P. An, T. Ma, K. Yu, B. Fang, J. Zhang, W. Fu, and J. Ma (2020) Geometric calibration for LiDAR-camera system fusing 3D-2D and 3D-3D point correspondences. Optics Express 28 (2), pp. 2122. External Links: Document, ISSN 10944087 Cited by: §2.
  • [2] D. Barnes, M. Gadd, P. Murcutt, P. Newman, and I. Posner (2020) The Oxford Radar RobotCar Dataset: A Radar Extension to the Oxford RobotCar Dataset. Proceedings - IEEE International Conference on Robotics and Automation, pp. 6433–6438. External Links: ISBN 9781728173955, Document, ISSN 10504729 Cited by: §1.
  • [3] C. Brandli, T. A. Mantel, M. Hutter, M. A. Höpflinger, R. Berner, R. Siegwart, and T. Delbruck (2014) Adaptive pulsed laser line extraction for terrain reconstruction using a dynamic vision sensor. Frontiers in Neuroscience 7 (8 JAN), pp. 1–9. External Links: Document, ISSN 1662453X Cited by: §2.
  • [4] K. Burnett, D. J. Yoon, Y. Wu, A. Z. Li, H. Zhang, S. Lu, J. Qian, W. Tseng, A. Lambert, K. Y. K. Leung, A. P. Schoellig, and T. D. Barfoot (2022) Boreas: A Multi-Season Autonomous Driving Dataset. arXiv. External Links: Link Cited by: §1.
  • [5] Y. Chen and G. Medioni (1991) Object modeling by registration of multiple range images. In Proceedings. 1991 IEEE International Conference on Robotics and Automation, pp. 2724–2729. External Links: Link, ISBN 0-8186-2163-X, Document Cited by: §2.
  • [6] J. Domhof, J. F.P. Kooij, and D. M. Gavrila (2021) A Joint Extrinsic Calibration Tool for Radar, Camera and Lidar. IEEE Transactions on Intelligent Vehicles 6 (3), pp. 571–582. External Links: Document, ISSN 23798858 Cited by: §2.
  • [7] P. Fereyre, M. Guilon, V. Prevost, F. Ramus, and T. Ligozat (2012) Back Illuminated System-on-Chip for Night Vision. 5th International Symposium on Optronics in Defence and Security 10, pp. 2012–074. External Links: Link Cited by: §3.
  • [8] T. Finateu, A. Niwa, D. Matolin, K. Tsuchimoto, A. Mascheroni, E. Reynaud, P. Mostafalu, F. Brady, L. Chotard, F. LeGoff, H. Takahashi, H. Wakabayashi, Y. Oike, and C. Posch (2020-02) 1280×720 Back-Illuminated Stacked Temporal Contrast Event-Based Vision Sensor with 4.86µm Pixels, 1.066GEPS Readout, Programmable Event-Rate Controller and Compressive Data-Formatting Pipeline. In 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 112–114. External Links: Link, ISBN 978-1-7281-3205-1, Document Cited by: §3.
  • [9] G. Gallego, T. Delbruck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis, and D. Scaramuzza (2022) Event-Based Vision: A Survey. IEEE transactions on pattern analysis and machine intelligence 44 (1), pp. 154–180. External Links: Document, ISSN 19393539 Cited by: §1, §2.
  • [10] B. Gary (2008) The OpenCV Library. Dr. Dobb’s Journal of Software Tools 25 (2236121), pp. 120–123. External Links: ISSN 1044-789X Cited by: §2.
  • [11] M. Gehrig, W. Aarents, D. Gehrig, and D. Scaramuzza (2021) DSEC: A Stereo Event Camera Dataset for Driving Scenarios. IEEE Robotics and Automation Letters 6 (3), pp. 4947–4954. External Links: Document, ISSN 23773766 Cited by: §1, §2.
  • [12] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun (2013) Vision meets robotics: The KITTI dataset. International Journal of Robotics Research 32 (11), pp. 1231–1237. External Links: Document, ISSN 02783649 Cited by: §1.
  • [13] A. Geiger, F. Moosmann, Ö. Car, and B. Schuster (2012) Automatic camera and range sensor calibration using a single shot. Proceedings - IEEE International Conference on Robotics and Automation, pp. 3936–3943. External Links: ISBN 9781467314039, Document, ISSN 10504729 Cited by: §2, §2.
  • [14] L. C. P. Gouveia and B. Choubey (2016) Advances on CMOS image sensors. Sensor Review 36 (3), pp. 231–239. External Links: Document, ISSN 02602288 Cited by: §1, §3.
  • [15] H. Hirschmuller (2008-02) Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (2), pp. 328–341. External Links: Link, Document, ISSN 0162-8828 Cited by: §2.
  • [16] S. Jusoh and S. Almajali (2020) A Systematic Review on Fusion Techniques and Approaches Used in Applications. IEEE Access 8, pp. 14424–14439. External Links: Document, ISSN 21693536 Cited by: §1.
  • [17] J. Kang and N. L. Doh (2020)

    Automatic targetless camera–LIDAR calibration by aligning edge with Gaussian mixture model

    .
    Journal of Field Robotics 37 (1), pp. 158–179. External Links: Document, ISSN 15564967 Cited by: §2.
  • [18] W. Maddern, G. Pascoe, C. Linegar, and P. Newman (2017-01) 1 year, 1000 km: The Oxford RobotCar dataset. International Journal of Robotics Research 36 (1), pp. 3–15. External Links: Link, Document, ISSN 17413176 Cited by: §1.
  • [19] M. Muglikar, G. Gallego, and D. Scaramuzza (2021-12) ESL: Event-based Structured Light. In 2021 International Conference on 3D Vision (3DV), pp. 1165–1174. External Links: Link, ISBN 978-1-6654-2688-6, Document Cited by: §2.
  • [20] M. Muglikar, M. Gehrig, D. Gehrig, and D. Scaramuzza (2021) How to calibrate your event camera.

    IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

    , pp. 1403–1409.
    External Links: ISBN 9781665448994, Document, ISSN 21607516 Cited by: §2, §4.2, §5.2.
  • [21] L. Oth, P. Furgale, L. Kneip, and R. Siegwart (2013) Rolling shutter camera calibration. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1360–1367. External Links: Document, ISSN 10636919 Cited by: §2.
  • [22] G. Pandey, J. R. McBride, S. Savarese, and R. M. Eustice (2015-08) Automatic Extrinsic Calibration of Vision and Lidar by Maximizing Mutual Information. Journal of Field Robotics 32 (5), pp. 696–722. External Links: Link, Document, ISSN 15564967 Cited by: §2, §2, §4.3.1, §4.3.2, §6.
  • [23] Prophesee (2019) Prophesee Metavision Packaged Sensor Product Brief. External Links: Link Cited by: §3.
  • [24] Prophesee (2021) Prophesee Evaluation Kit 2 - HD Brochure. External Links: Link Cited by: §1, §3.
  • [25] H. Rebecq, R. Ranftl, V. Koltun, and D. Scaramuzza (2019) Events-to-video: Bringing modern computer vision to event cameras. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2019-June, pp. 3852–3861. External Links: ISBN 9781728132938, Document, ISSN 10636919 Cited by: §5.2.
  • [26] H. Rebecq, R. Ranftl, V. Koltun, and D. Scaramuzza (2021) High Speed and High Dynamic Range Video with an Event Camera. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (6), pp. 1964–1980. External Links: Document, ISSN 19393539 Cited by: §5.2.
  • [27] RoboSense (2021) RS-LiDAR-M1 Brochure EN. External Links: Link Cited by: §1, §3.
  • [28] B. W. Silverman (1986) Density Estimation for Statistics and Data Analysis Estimation Density. Monographs on statistics and applied probability, Kluwer Academic Publishers. External Links: Link Cited by: §4.3.2.
  • [29] G. Taverni, D. Paul Moeys, C. Li, C. Cavaco, V. Motsnyi, D. San Segundo Bello, and T. Delbruck (2018-05) Front and Back Illuminated Dynamic and Active Pixel Vision Sensors Comparison. IEEE Transactions on Circuits and Systems II: Express Briefs 65 (5), pp. 677–681. External Links: Link, ISBN 9781538648810, Document, ISSN 1549-7747 Cited by: §3.
  • [30] Z. Taylor and J. Nieto (2012) A mutual information approach to automatic calibration of camera and lidar in natural environments. Australasian Conference on Robotics and Automation, ACRA (December). External Links: ISBN 9780980740431, Document, ISSN 14482053 Cited by: §2.
  • [31] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R.J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, A. Vijaykumar, A. P. Bardelli, A. Rothberg, A. Hilboll, A. Kloeckner, A. Scopatz, A. Lee, A. Rokem, C. N. Woods, C. Fulton, C. Masson, C. Häggström, C. Fitzgerald, D. A. Nicholson, D. R. Hagen, D. V. Pasechnik, E. Olivetti, E. Martin, E. Wieser, F. Silva, F. Lenders, F. Wilhelm, G. Young, G. A. Price, G. L. Ingold, G. E. Allen, G. R. Lee, H. Audren, I. Probst, J. P. Dietrich, J. Silterra, J. T. Webber, J. Slavič, J. Nothman, J. Buchner, J. Kulick, J. L. Schönberger, J. V. de Miranda Cardoso, J. Reimer, J. Harrington, J. L. C. Rodríguez, J. Nunez-Iglesias, J. Kuczynski, K. Tritz, M. Thoma, M. Newville, M. Kümmerer, M. Bolingbroke, M. Tartre, M. Pak, N. J. Smith, N. Nowaczyk, N. Shebanov, O. Pavlyk, P. A. Brodtkorb, P. Lee, R. T. McGibbon, R. Feldbauer, S. Lewis, S. Tygier, S. Sievert, S. Vigna, S. Peterson, S. More, T. Pudlik, T. Oshima, T. J. Pingel, T. P. Robitaille, T. Spura, T. R. Jones, T. Cera, T. Leslie, T. Zito, T. Krauss, U. Upadhyay, Y. O. Halchenko, and Y. Vázquez-Baeza (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 17 (3), pp. 261–272. External Links: Document, ISSN 15487105 Cited by: §4.4.
  • [32] D. J. Yeong, G. Velasco-hernandez, J. Barry, and J. Walsh (2021) Sensor and sensor fusion technology in autonomous vehicles: A review. Sensors 21 (6), pp. 1–37. External Links: Document, ISSN 14248220 Cited by: §1.
  • [33] Z. Zhang (2000) A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (11), pp. 1330–1334. External Links: Link, Document, ISSN 01628828 Cited by: §2.
  • [34] L. Zhou, Z. Li, and M. Kaess (2018) Automatic Extrinsic Calibration of a Camera and a 3D LiDAR Using Line and Plane Correspondences. IEEE International Conference on Intelligent Robots and Systems, pp. 5562–5569. External Links: ISBN 9781538680940, Document, ISSN 21530866 Cited by: §2.
  • [35] A. Z. Zhu, D. Thakur, T. Özaslan, B. Pfrommer, V. Kumar, and K. Daniilidis (2018) The Multivehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception. IEEE Robotics and Automation Letters 3 (3), pp. 2032–2039. External Links: Document, ISSN 23773766 Cited by: §1, §2.