An important recent development in visual information acquisition is the emerging low-cost and fast cameras for measuring depth. With the advent of Microsoft Kinect [kinect_website], PMD CamCube [pmd_website] and WAVI Xtion [wavi_xtion_website], the depth cameras are being used extensively in applications such as gaming and virtual reality. While PMD camera measures the time of flight (TOF) of infrared light, Kinect uses structured light to estimate depth at each pixel. With the development of these depth cameras, the structural information about the scene can be captured at high speed, and it can be incorporated in many applications due to their portibility. Obtaining such information is crucial in many 3D applications; examples include image based rendering [Kolb2009], 3D reconstruction [Izadi2011], and motion capture [shotton2013real].
In order to perform such tasks, depth cameras need to be properly calibrated. Camera calibration refers to performing a set of controlled experiments to determine initial parameters of the camera that affect the imaging process of the scene. Thus, camera calibration is an extremely important step in 2D and 3D computer vision. Unfortunately, the imaging capabilities of some TOF cameras are very limited when compared to conventional color sensors. They can only provide a low-resolution intensity image and depth map containing significant depth noise. This causes the traditional calibration scheme to be inaccurate. Hence, both the camera calibration and the depth denoising need to be significantly improved to obtain satisfactory calibration results.
In this paper, we propose a novel algorithm that takes in few calibration images and utilizes them to simultaneously denoise and calibrate TOF depth cameras. Our formulation is based on two key elements. First, we use depth planarization in D to denoise the depth at each corner pixel. Then, in the second stage, we use these improved depth measurements along with the corner pixel information to estimate the calibration parameters using a non-linear estimation algorithm. We demonstrate that our framework estimates the intrinsic and extrinsic calibration parameters more accurately using less number of images and corners that are needed for traditional camera calibration. We evaluate our approach on both synthetic dataset where groundtruth information is available, and real data taken from a PMD camera. In both cases, we demonstrate that our proposed framework outperforms traditional calibration technique without significant increase in computational complexity. Moreover, our framework requires less number of images and corners which makes it easier to use for general public.
2 Related Work
Color camera calibration: A lot of work has been done in computer vision and photogrammetry community [zhang2000flexible, Maybank1992] to perform color camera calibration. The traditional approaches use a set of checkerboard images taken at various positions and exploit planar geometry to estimate the calibration parameters.
PMD camera calibration: Since PMD cameras are relatively new, most of the current approaches borrow heavily from traditional camera calibration technique. Kahlmann et al. [kahlmann2006calibration] explore the depth related errors at various exposure times. They use a look-up table to correct for the depth noise. This approach is time consuming and entails creating a look-up table each time. Linder et al. [lindner2006lateral] use a controlled set of measurements to perform depth camera calibration. The checkerboard is put on a very precise optical measurement rack which is moved away from camera iteratively and this prior knowledge is used to correct the depth at corner points. Fuchs and Hirzinger [fuchs2008extrinsic] use a color and a depth camera rigidly set up on a robotic arm and move the arm with a pre-determined set of poses to estimate the calibration parameters using a checkerboard. They do not estimate the lens distortion parameters assuming the camera contains insignificant radial and tangential distortion. Beder and Koch [beder2008calibration] estimate the focal length and extrinsic parameters of the PMD camera using the intensity map and depth measurements from a single checkerboard image. They assume the camera to be distortion free with optical center lying at the image center.
Kinect camera calibration: Kim et al. [kim2011depth] present a method to calibrate and enhance depth measurements for Kinect. They project the depth onto color sensor’s camera plane and use a Weighted Joint Bilateral Filter considering the color and depth information at the same time to reduce the depth noise. Herrera et al. [herrera2011accurate] use a depth and color camera pair to perform camera calibration using a planar checkerboard by utilizing the camera’s depth to improve the calibration. However, they assume the depth camera to be distortion free and only estimate two disparity mapping related parameters for the Kinect camera. Hence, their method is unable to estimate the actual intrinsic parameters of the depth camera. In a recent work, Herrera et al. [herrera2012joint] propose an algorithm that performs calibration with Kinect depth sensor and two color cameras using checkerboard images. While their algorithm accounts for depth noise, they assume the depth sensor to be distortion free. Our approach closely resembles their approach. However, we use a PMD camera that contains significant photon noise and has a much lower resolution than Kinect.
Most of these techniques either require multiple cameras or a controlled set-up to exploit some prior knowledge to estimate the calibration parameters. Moreover, most of these approaches ignore lens distortion which is significant in PMD cameras. We aim to provide a simple approach that estimates lens distortion and performs calibration while simultaneously denoising the depth map by exploiting scene planarity using as few images and corners as possible.
3 Standard Camera Calibration
In this section, we describe the basics of traditional color camera calibration and a commonly used algorithmic approach to estimate the camera calibration parameters.
Color Camera Calibration Parameters: The intrinsic calibration matrix of a camera, , contains five parameters - focal length in and directions,
; skew; and the location of optical center, as defined in [zhang2000flexible]. The skew is commonly set to zero for non fish-eye lenses. Usually a lens is more “spherical” than being perfectly parabolic. This leads to radial distortion. Another common distortion seen in some cameras happens when the sensor and lens do not align properly. This results in tangential distortion. This usually happens due to manufacturing defects where the imaging plane of the camera is not perfectly parallel to the lens. The radial and tangential distortion are normally bundled together as . We represent a 3D point in camera coordinate frame as . The 3D points are projected onto camera plane at the normalized pixel position, as:
The distorted pixel value of this point, , is obtained after adding the forward distortion model as:
Here, refers to the magnitude of the normalized pixel position. Lets call this function , i.e., . Eventually, the final pixel position, , recorded by the camera is obtained by using the intrinsic calibration matrix as:
Color Camera Calibration Scheme There are various ways to perform color camera calibration with lens distortion taken into account. A widely used calibration toolbox [bouguet2004camera] uses a planar checkerboard pattern with corners to perform the calibration. The user holds a checkerboard in front of the camera and takes images with the checkerboard held in various positions. The 3D points that lie on the checkerboard are expressed in terms of a world coordinate frame, . For every image, the two coordinate frames are related via a rotation matrix,
, and translation vector,.
Both the rotation matrix and translation vector contain three parameters each. The rotation matrix and translation vector, ,, are bundled together for each image and calibrated together with the intrinsic parameters. We denote all the calibration parameters , , , as .
Global Optimization: The following objective function is used in traditional calibration to obtain the calibration parameters by minimizing the projected 2D distance between the measured corners and projected corners:
Here, refers to the corner of image projected on the camera plane and refers to the actual corresponding measured corner using corner detection algorithm. This is usually solved using a non-linear estimator such as gradient-descent or Levenberg-Marquardt algorithm (LMA) with a user defined Jacobian matrix.
Initialization: Most non-linear solvers such as LMA require a good initialization. The distortion, , is initialized as zero. A planar homography, per image, between the interior corners of the checkerboard in world coordinate frame and imaging plane is estimated. These matrices are combined together to initialize
using Direct Linear Transformation (DLT) algorithm. Then,is used to reinitialize rotation and translation per image individually by decomposing the homography matrices [Hartley2004, Bradski2008]. The extrinsic parameters are usually re-estimated per image individually using LMA for better accuracy. This is known as local optimization. After performing local optimization, the parameters are bundled together and global optimization is performed on the entire dataset as seen in Eq. (5).
4 Depth Camera Calibration
PMD depth cameras not only provide us an estimated intensity image but also another measurable quantity - depth at each pixel. This is the 3D scalar distance between the camera center and the point in 3D corresponding to that pixel. Using Eq. (4), we can represent depth as:
We use this additional set of measurements per corner pixel to perform the global optimization process by minimizing the following function using LMA with a user defined Jacobian matrix.
Here, refers to the estimated depth of corner of image and refers to the measured depth by the depth camera. We normalize the error terms in Eq. (7
) with their respective variances, () for every image, as they have different measurement units.
Depth noise: Like every sensing device, PMD also exhibits various error sources which effect the accuracy of depth information captured by it. There are three major sources of error in PMD cameras. First, the wiggling error is caused due the hardware and manufacturing limitations. The outgoing signal is assumed to be perfectly sinosoidal. However, in reality, this signal is more “box-shaped” than sinosoidal [lindner2010time]. Second, the flying-pixel error occurs at depth discontinuities. The depth at each pixel is computed by using four readings at each pixel. The information captured at each smart-pixel in PMD can come from either the background and foreground object which leads to an unreliable depth measurement at these pixels. Third, the Poisson-shot noise error occurs due to reflectivity of the scene [lindner2010time]. This inherent noise present in the capturing process leads to an unsteady 3D point cloud. The noise can be partly reduced by spatial averaging using bilateral filters, but we cannot use this process for applications requiring accurate depth map as smoothing a depth map is highly undesirable. Thus, before we use the depth measurements, we pre-process the depth image to ensure that the depth at corner pixels is as accurate as possible.
4.1 Optimization Algorithm
In this section, we describe, step by step, how our calibration scheme works. Algorithm 1 delineates our depth based calibration process.
Color Image Calibration (line 2): We perform traditional calibration as described in Section 3. This provides us an initial estimate for the calibration parameters.
Planarizing the depth image (line 3): Since we only look at the interior corner points of a planar checkerboard, there is insignificant flying-pixel noise. Instead of denoising the depth measurement through spatial filtering, we employ prior knowledge about the scene which is a checkerboard in our case. We account for wiggling error and reflectivity based noise by performing image segmentation and 3D plane estimation. We use the corner pixel information to segment out the white squares where depth is more accurate than the black squares. This is because the Poisson-shot noise is higher in darker regions (black squares) compared to lighter regions (white squares) as seen in Fig. 1. We segment out the white squares and use their corresponding depth along with initial calibration parameter estimates to project the points in 3D. Thereafter, we use RANSAC along with gradient threshold to find the best plane using SVD. We estimate the depth at sub-pixel corners by finding the intersection of this estimated plane and a line passing through the sub pixel corners when projected in 3D using traditional calibration results. This provides us a more accurate depth at the sub-pixel corners as seen in Fig. 1. The wiggling error is non-systematic and can lead to both under and over estimation of depth [lindner2010time]. We claim that the 3D planarization eliminates the wiggling error in these regions once we have enough white checkerboard regions. We denote this denoised depth as .
Updating (lines 6-8): The calibration parameters provided by traditional calibration when using a small set of images and corners are very unreliable. Since calibration procedure involves using non-linear estimation, a good initialization of the calibration parameters is extremely important. Hence, it is critical to re-initialize these parameters before using them for global optimization. Due to the coupling of with and , as seen in Eqs. (1-4
), traditional calibration often fails to provide a good estimate for intrinsic calibration matrix as we lose a degree of freedom by projecting 3D coordinates onto the 2D camera plane. First, we use the estimated distortion parameters to obtain the normalized pixel positions for each corner,. We use the denoised depth, , to obtain the 3D coordinates for each corner by projecting 2D corner locations in 3D:
We use a non linear optimizer to re-estimate by enforcing projected checkerboard squares in 3D using the denoised depth data to be the same size as the actual checkerboard squares for each image:
where refers to the checkerboard square size and represents the neighbors of corner. We repeat this process red until at least of the images have an avg. distance between points to be within of the checkerboard size. This provides us a reliable initial estimate for which is crucial for the optimization process.
Re-initialization (lines 11-12): We use the updated to re-initialize our extrinsic parameters in the same fashion as it is done for traditional calibration process. We also update the distortion parameters by assuming the remaining parameters as groundtruth and minimizing the objective function in Eq. (7).
Global Optimization (line 13): Finally, we bundle everything together and perform a global optimization using Eq. (7) using LMA as our non-linear solver with our new Jacobian matrix.
5 Experimental results
In this section, we perform synthetic and real experiments on PMD camera and compare our calibration scheme with the traditional calibration scheme.
Synthetic data results: We synthesized a checkerboard with checker size containing interior corners. We used upto images and
corners for calibration. We added white Gaussian noise to corner pixels and depth data with a standard deviation ofpixels and respectively to generate noisy data. This amount of noise resembles the noise present in real data in corner estimation and depth measurements captured by PMD cameras. We used varying subsets of images and corners to estimate the calibration parameters to highlight the fact that our approach outperforms traditional approach when little information is available for calibration. We tested the calibration results on the entire checkerboard region ( corners). Both the traditional and our calibration approaches achieved perfect results for noiseless dataset when more than corners and images are available. Table 1 shows the mean 3D error as shown in Eq. (10) between the groundtruth corners and corners computed using the estimated calibration parameters from the two methods and groundtruth depth. Our approach outperforms traditional calibration in every test. Fig. 2 shows relative error in focal length (=) for noisy synthetic data. Our approach consistently provided significantly better results than the traditional calibration approach. We observed similar improvements in optical center and extrinsic calibration parameters.
Real data (PMD) results: We used a checkerboard with checker size to capture images using a PMD camera. Each checkerboard contains corners. We used upto images and corners to estimate the intrinsic and extrinsic calibration parameters. We compare the focal length, , obtained from both approaches to the manufactured focal length of the PMD camera, pixels. We assume this value as groundtruth. As seen in Fig. 3, our approach consistently provides a reasonably accurate focal length while traditional calibration estimates a highly inaccurate focal length in most cases. One significant deviation from this behaviour happens when only four corners are available for calibration. This is because the estimation process diverges as the initial estimates are far away from the ground truth where the non-linear estimation process (LMA) is known to fail frequently. However, once we use nine or more corners per image, our approach provides significantly better results consistently.
We presented a simple and accurate method to simultaneously denoise depth data and calibrate depth cameras. The presented method excels in estimating calibration parameters when only a handful of corners and calibration images are available where traditional approach really struggles. While this approach is simple and easily applicable, it still relies on using a checkerboard pattern to perform calibration. In future, we intend to exploit planarity in generic scene to perform calibration so that any user at home can use our calibration procedure.