A Light Field Camera Calibration Method Using Sub-Aperture Related Bipartition Projection Model and 4D Corner Detection

01/11/2020 ∙ by Dongyang Jin, et al. ∙ Xidian University 15

Accurate calibration of intrinsic parameters of the light field (LF) camera is the key issue of many applications, especially of the 3D reconstruction. In this paper, we propose the Sub-Aperture Related Bipartition (SARB) projection model to characterize the LF camera. This projection model is composed with two sets of parameters targeting on center view sub-aperture and relations between sub-apertures. Moreover, we also propose a corner point detection algorithm which fully utilizes the 4D LF information in the raw image. Experimental results have demonstrated the accuracy and robustness of the corner detection method. Both the 2D re-projection errors in the lateral direction and errors in the depth direction are minimized because two sets of parameters in SARB projection model are solved separately.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 7

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Rcent years have witnessed tremendous development in light field (LF) technologies. LF cameras, devices to collect light rays on the image sensor in a single photographic exposure in both spatial and angular dimensions, are designed to realize the LF technique in various engineering applications. Due to the capability of recording angular information of the light rays entering from the external world, LF cameras have boosted the development of the computational photography and computer vision in a wide variety of applications, including refocusing  

[1]

, depth estimation  

[2],  [3], [4] , Synthetic Aperture Imaging  [5],  [6] and visual Simultaneous Localization and Mapping (SLAM)  [7],  [8].

At early years, many types of LF cameras were proposed. Wilbrun et al.  [9] designed a large camera array with high spatial and angular resolutions to acquire 4D LF. Veeraraghavan et al.  [10] proposed a mask-equipped LF camera to capture conventional 2D photos at full sensor resolution. Liang et al.  [11] demonstrated a type of LF cameras to capture LF at full sensor resolution through multiple exposures with an adjustable aperture. Marwah et al.  [12] proposed a compressive LF camera architecture, which allows for higher-resolution LF recovery. Taguchi et al.  [13] described a spherical catadioptric imaging system using axial-cone cameras to acquire LF images with broader field of view.

In contrast to above designs, micro-lens array (MLA) based LF cameras are soon popularized due to their advantages in low manufacturing expenditure and high quality of LF image producing. State of the art MLA-based LF cameras include conventional LF camera designed by Ng. et al  [14] and focused LF camera designed by Georgiev and Lumsdaine  [15]. Ng et al significantly shortened the optical path and designed a portable camera by inserting an MLA between the sensor and main lens, creating a conventional LF camera, which enabled shorter exposures and lower image noise. Georgiev and Lumsdaine presented a modified LF camera, in which the MLA is interpreted as an imaging system focused on the focal plane of the main lens. This camera capture LF with higher spatial resolution but lower angular resolution. Well-known commercial MLA-based LF cameras developed on the basis of  [14] and  [15] are Lytro  [16] and Raytrix  [17].

To achieve an optimal performance in these applications, it is of fundamental importance to accurately calibrate LF cameras before applying them in any applications. In a typical LF camera calibration pipeline, camera projection model and corner detection are two key component. A projection model describes the relationship between pixels on the internal camera sensors and the rays coming into the camera or 3D scene geometry. A corner detection algorithm is applied to accurately detect feature points on the checkerboard for building the correspondence between 3D points on the checkerboard grid and their corresponding 2D image points. Great progress has been made in these two areas in recent years. Well-known projection models include the one describing the relationship between pixels and rays  [18][19], the one describing the relationship between pixels and 3D scene geometry  [20] and the one describing 3D feature of the raw data and 3D scene geometry  [21]. For corner detection research in  [18][20] extracted checkerboard corners in each individual sub-aperture image. Differently, algorithms proposed in  [22][23] and  [24] directly utilized the raw image to detect corners. Improving the accuracy of projection models and corner detection methods remain to be solved.

In this paper, a Sub-Aperture Related Bipartition (SARB) projection model is proposed to describe the LF camera with two sets of parameters targeting on center view sub-aperture and relations between sub-apertures. Correspondingly, a two-step calibration method is proposed with each step dealing with each set of parameters. Moreover, a corner detection algorithm that fully utilizes 4D LF data from the raw image is also proposed.

Our main contributions are as follows:

We proposed a projection model in a simple form with the capability to characterize LF camera without redundancy. This projection model consists of two sets of parameters targeting on center view sub-aperture and relations between sub-apertures. By doing so, calibration methods designed for traditional camera can be reused to estimate the intrinsic matrix of the pinhole camera at the center view sub-aperture. Another advantage of separating the parameters is that lateral-direction errors and depth-direction errors can be both reduced, since depth-direction error is only affected by parameters represent relations between sub-apertures.

We proposed a corner point detection method for LF camera calibration which jointly use the 4D LF information. The detection accuracy as well as the robustness of the method are significantly improved.

The rest of this paper is organized as follows. Related works are investigated in Section 2. The proposed SARB projection model is detailed in Section 3. The procedures of the proposed calibration method are described in Section 4. The details of the proposed corner detection algorithm are provided in Section 5. Experimental results are given in Section 6. Finally, Section 7 concludes the paper.

2 Related Works

Different algorithms have been proposed to solve the LF camera calibration problem related to projection model and corner detection  [18][22][23][20][21][25][19][24][26]. For camera projection model, Dansereau et al  [18] was the first to deliver an end-to-end geometric calibration method for conventional LF camera. A 12-parameter homogeneous matrix was proposed to express the projection model, which connects every pixel on the sensor and the corresponding ray coming into the camera. Nevertheless, this method involves too many parameters and its redundancy was analyzed in  [20]. Duan et al  [19] modeled the imaging process of conventional LF cameras and proposed a homogenous intrinsic matrix, which describes the relationship between pixels and rays using a more compact form if compared with  [18]. Zhang et al  [20] proposed a multi-projection-center (MPC) model with six intrinsic parameters to characterize both conventional and focused LF cameras. The projection model efficiently connects 3D geometry to the recorded light field with a simple form. O’Brien et al  [21]

proposed a novel projection model which relates the plenoptic disc features extracted from the raw image to 3D scene points, i.e. a 3D to 3D correspondence is made between image and 3D scene geometry. It should, however, be noted that the parameters involved in the above projection models, both the ones related to information inside a sub-aperture and the ones corresponding to information between sub-apertures are deduced together. This non-separate process can trigger large errors in the depth direction in the final result, as discussed in Section 6.3. Different from the above research where parameters in the projection models are determined together, Zhou et al  

[25] proposed a camera model with a clear separation of the involved papameters, namely the ones describing the main lens and the ones describing the MLA. A two-step calibration method was proposed accordingly. This projection model is, however, of higher dimensionality and requires additional camera parameters (e.g., size of pixel) to be included.

For corner detection, algorithms described in  [18],  [20] and  [25] are based on the sub-aperture images which is generated by sampling pixels at regular grids. Under this circumstance, corner detection is performed in each individual sub-aperture image. The corner locations of all sub-aperture images were not guaranteed along a special disparity line. In other words, relations between different sub-apertures were ignored in these algorithms. Other calibration algorithms, including  [23] and  [26] were based on raw images. Noury et al  [23]

proposed a detection algorithm with two steps. They first classify the micro-lens images in the raw image with their content type, a patterm registration method is followed to detect subpixelic corners locations in each micro-lens image. Nousias et al  

[26] detected corner locations in each micro-lens image, by calculating the intersection of two potential saddle axes which are retrieved beforehand. However, these two methods is only applicable for the focused LF camera which captures clear and high resolution micro-lens image. In view of conventional LF cameras, it is difficult to directly detect corners locations in micro-lens images that are in extremely low resolution and blur. Instead of detecting corners, Bok et al.  [22] innovatively extracted line features from micro-lens images and proposed a geometry calibration method. Nevertheless, the line feature template was designed for each single micro-lens image, which ignores the relations between different micro-lens images.

Fig. 1: Optical paths of the MLA-based LF camera (2D space)

3 SARB Projection Model

In this section, the Sub-Aperture Related Bipartition (SARB) projection model is proposed to establish the relationship between the rays coming into the camera and the pixels on the sensor. To do so, we first parameterize the rays with the space-angle TPP (two-plane-parameterization) model  [27][28][29] and the pixels with 4D indices. Then two sets of parameters are proposed to build the SARB projection model, with one set to form the intrinsic matrix of the pinhole camera at the center view sub-aperture, and the other related to generating sub-apertures in other views.

3.1 Parameterization of Rays and Pixels

Fig. 2: Space-angle TPP model

To parameterize light rays, the space-angle TPP model as used in  [27][28][29] is employed. In this model, a light ray is characterized by its intersections with two parallel planes, the distance between which is one unit length, as shown in Fig. 2. denotes the intersection between the ray and the first plane, and denotes the intersection between the ray and the second plane. denote the 2D coordinates of on the first plane. and denote the offset of relative to in horizontal and vertical directions respectively. In the space-angle TPP model, a ray is parameterized by . In this paper, the main lens plane is regarded as the first plane, and the second plane is the plane with one unit length from the main lens, as shown in Fig. 1.

In the proposed algorithm, main lens are modelled as “thin lens” and the micro-lens are treated as “pinholes array”. Without loss of generality, the optical center of the main lens and optical axis are defined as the origin and the axis of the coordinate system. Moreover, the axis points towards outside of the camera. The axis points upwards, and axis is determined by the right-hand rule. The optical paths of the micro-lens based LF camera in a 2D space is shown in Fig.  1. With the space-angle TPP model, the light ray outside the camera emitting from 3D scene point is described by , and the light ray refracted by it is described by

After the parameterization of rays, pixels on the image sensor are expressed by 4D coordinates. On one hand, a pixel on the sensor can be expressed using its spatial locations in the camera coordination system as , where describes the center of the sub-image (the micro-lens image) that pixel belongs to, and denotes the offset of that pixel to , as shown in Fig. 1. On the other hand, a pixel can also be represented using its pixel index as, where is the center of the sub-image in the image coordinates, and correspond to the offset to . The relationship between the two representations is

(1)

where and are indices of the pixel where the optical axis intersects at the sensor plane, and is the physical distance between two adjacent pixels.

3.2 SARB Projection Model

In this subsection, to better introduce the SARB model in a readily comprehensible way, subsequent analysis is performed in 2D space. The 4D version of the proposed concept can be easily deduced based on the analysis.

According to Gaussian formula in ray transfer matrix theory  [30], the relationship between the ray outside the camera and the ray refracted by it inside the camera is given by

(2)

where denotes the focal length of main lens.

Then, inside the camera, according to Fig.  2, the ray corresponding to pixel , can be expressed using a similar-triangles argument as

(3)

and

(4)

where is the distance between the man lens and the MLA, and is the distance between MLA and the sensor plane. Substitute  (3)  and  (4)  to  (2) , we can further obtain

(5)

After replacing the physical coordinates of the pixel to the pixel indices using  (1) , we can obtain

(6)

For the whole 4D LF,  (6)  can be easily expanded to  (7).

(7)

We have now expressed the relationship between the rays and pixels using the projective matrix in  (7) , which is formed by the physical parameters of the LF camera. Using  (7) , two sets of parameters are further proposed, i.e., one forming the intrinsic matrix of the pinhole camera at the center view sub-aperture and the other related to generating other views sub-apertures.

3.2.1 Intrinsic matrix of the pinhole camera at center view sub-aperture

As described in  [31], rays passing through the pixels with the same offset in different sub-images come from a single sub-aperture on the main lens. Specially, the pixels that cater for exactly form the image of the sub-aperture in the center view. Moreover, when , equals to . In this circumstance,  (6) ) can be simplified to

(8)

The inverse form of  (8)  is represented as

(9)

By defining

(10)

and

(11)

equation  (9)  can be rewritten as

(12)

where denotes a 3D scene point that the ray passes through. Equation (12) indicates that the matrix block composed of four elements at the bottom right of the projective matrix is exactly the inverse of intrinsic matrix of the pinhole camera at the center view sub-aperture.

3.2.2 Parameters to generate images of non-center views sub-apertures

For a simple derivation, the following analysis is first carried out in 2D space. Suppose the ray emits from a scene point , using a similar triangles argument in Fig. 1, the relationship between the scene point and the ray is

(13)

Substituting  (6)  to  (13) , the relationship between the scene point and the pixel is

(14)

where

(15)

and

(16)

Furthermore, we derive the relationship between the depth and the disparity in the raw image of the scene point based on  (14) . Concretely, considering the pixels and which record rays coming from the same scene point , we can derive

(17)

which is further expressed as

(18)

By defining

(19)

(18)  can be rewritten as

(20)

In fact, is regarded as the disparity between adjacent sub-aperture-images if neglecting a marginal scale factor. About this, we can make a simple interpretation, can be regarded as the difference of indices of two sup-apertures due to the one-to-one correspondence between and the sub-aperture. And is proportional to the disparity of the image points of in two sub-aperture images. To be more precise, the difference of the pixel coordinates between two sub-aperture images is the disparity, but it is equal to after changing the unit from pixels in raw image to indices of micro-lens. Given the images of the center view sub-aperture, the images of other view sub-apertures can be obtained using  (19)  and  (20) . More specifically, the disparity of a 3D scene point is firstly calculated by  (20) . Then, with, the projected point that this 3D scene point corresponds to in the image of a given sub-aperture is calculated by  (19) , where corresponds to the index of the center view sub-aperture, i.e. , and corresponds to the index of the given sub-aperture. is the pixel of the projected point in center view sub-aperture, which is a known quantity, and is the pixel of the projected point in the given sub-aperture, which is waiting to be determined.

3.2.3 SARB projection model

Using two sets of parameters as discussed above, the relationship between pixels and rays can be re-represented. To be specific, with  (15)  and  (16) , elements in the first two columns of the matrix in  (7)  can be rewritten as

(21)

and

(22)

With  (21)  and  (21) , the SARB projection model in this paper is be eventually represented in Fig.3.

Fig. 3: The SARB projection model

4 Calibration

In this paper, we utilize the proposed SARB projection model for calibration. As it consists of two sets of parameters targeting on different aspects, our calibration method is therefore divided into two steps, with each step corresponding to a set of parameters. In the literature, a typical calibration pipeline includes feature detection, initial solution and nonlinear optimization  [18][21][32]. Different from existing research, the nonlinear optimization in the proposed calibration only exists in step 1. Our two-step method aims at acquiring the parameters of center view sub-aperture, i.e. , , and in the first step, and the parameters to generate images of non-center view sub-apertures, i.e., and in the second step. Besides, all extrinsic parameters and all distortion parameters are also estimated in the first step.

4.1 Preparations

Before calibration, for every raw image, the sub-pixel locations of corner points in the center view sub-aperture image and their disparities should be acquired. Details related to how to obtain reliable and accurate corner points are introduced in Section 5. Moreover, the entire 4D raw image data is utilized for calibration.

4.2 Step 1

, , and denote the intrinsic parameters of the pinhole camera at the center view sub-aperture, so they can be estimated by calibration methods designed for traditional cameras. In this section, method in  [32] is utilized to estimate , , and . Both the closed-form solution and the maximum likelihood estimation of this method are performed. Besides, the extrinsic parameters and as well as all distortion parameters are also refined by the maximum likelihood estimation.

4.3 Step 2

After acquiring the disparity parameter of every corner point in every raw image by the method in Section 5, the depth value, i.e. , of the corner point is calculated as

(23)

where is the coordinate of the corner on the checkerboard, i.e. the coordinate of the corner in the world coordinate system. After obtaining the disparity and depth of every corner point in every shot, a linear equation is obtained as

(24)

By stacking all the linear equations to a matrix , and

are obtained by the Singular Value Decomposition (SVD) decomposition of matrix

.

4.4 Distortion Model

In this paper, only the distortion of the main lens is considered. We treat micro lens array as a pinhole array where distortion isnon-exist. The proposed distortion model is based on an assumption that rays emitting from one 3D scene point will still converge to one point after the refraction of main lens. To put it differently, the distortion correction procedure should not bring divergence for these rays coming from the same scene point. Under this assumption, the distortion components in all sub-aperture images are the same. In this case, it is only required to make a distortion model for the images of center view sub-aperture. The second-order radial distortion and tangential distortion are considered in this section. The radial distortion term is denoted as

(25)

and

(26)

where and are the normalized image coordinates of the undistorted pixel locations, and . and are the radial distortion coefficients. The tangential distortion term is

(27)

and

(28)

where , are the tangential distortion coefficients. The total distortion term is the sum of two elements,

(29)

and

(30)

Different from previous works  [18][22] and  [25], our method does not have a global optimization procedure. The distortion coefficients as well as , , and are established in Step 1.

5 Detection of LF-Points of Corner Points from 4D Raw Data

Different from traditional cameras, a 3D scene point does not correspond to only one pixel in LF camera. Instead, LF cameras capture 4D LF of the 3D scene point. Thus, the output of corner detection for LF camera should be a description of the entire 4D LF this 3D corner corresponds to. Moreover,the recorded 4D LF, i.e. raw data on the sensor in the LF camera, should be jointly utilized.

Especially, with regard to calibration, 3D scene points and 3D lines are regarded as corners and line segments connecting two adjacent corners in the checkerboard, as shown in Fig. 4. In this section, a detection method is proposed to solve the ”LF-point” which is defined as the representation of 4D LF of the corner in the checkerboard. Similarly, ”LF-line” is also defined, which is the representation of 4D LF of the line segments in the checkerboard. The detection method consists of generation of 4D templates, calculation of normalized cross-correlation (NCC), nonlinear optimization and calculating intersection of lines.

Fig. 4: Part of checkerboard and its image of the center view sub-aperture

5.1 LF-Point

The X and Y coordinates of the projected point of a 3D scene point in center view sub-aperture image and the disparity parameter that is defined in Section 2.2, i.e., , are adopted to represent 4D LF of the 3D scene point. That underlying reason is that, given , the location of the projected point of the 3D scene point in every 2D slice of the 4D LF (i.e., every micro-lens image and every sub-aperture image) can be calculated by  (19) , which indicates is a complete representation of 4D LF of a 3D scene point. For brevity, is termed as a “LF-point”.

5.2 LF-Line

In this paper, 3D lines are divided into two categories according to the angle between the 2D image of the 3D line and the X-axis in the center sub-aperture image. The first category is the horizontal line whose angle is smaller than 45 rad, and the other is the vertical line whose angle is greater than 45 rad. Refer to  [33]

, a 3D line has 4 degree of freedom. In other words, it requires at least four parameters to completely represent the 4D LF of a 3D line. We thus use

and to describe 4D LF of the horizontal line and the vertical line respectively. Note the coordinate is eliminated for horizontal line segment and the coordinate is also eliminated for vertical line segment. Besides, as shown in Fig. 4, and represent the “LF-points” of two terminals of the 3D horizontal line. and represent the “LF-points” of two terminals of the 3D vertical line. Taking the horizontal line as an example and given , we can obtain the projected 2D image of this horizontal line in every 2D slice of the 4D LF with following steps: (1) Given the center of a sub-image , the locations of the projected points of two terminals of the horizontal line in this sub-image is calculated by  (19) . (2) Denoting two projected points with and , the line equation of the 2D image of this line segment is calculated as

(31)

where denotes the coefficients of the 2D line, i.e. . Similar procedure can be applied for the vertical line. In this case, we can draw the conclusion that is a complete representation of 4D LF of a 3D line. For brevity, is termed as a “LF-line”.

Fig. 5: 3D slice of the 4D template of a 3D line

5.3 Detection of LF-Points of Corner Points

Evidence showed that it is difficult to directly detect the locations of projected points of checkerboard corners in sub-images that are in extremely low resolution  [22]. To cope with this issue, the algorithm in  [22] detected the line feature in every sub-image respectively. It should be noted that this algorithm fails to take full advantage of the 4D LF-data. In this section, we propose 4D templates that make the best of 4D LF-data to recognize LF-lines of 3D line segments. After solving the intersection points of horizontal line segments and vertical line segments by SVD method, the LF-points of checkboard corners are acquired.

5.3.1 Generation of 4D Template

Take the horizontal line segment as an example. By changing the values of ( and are fixed), we can obtain a series of 4D templates fully exploiting 4D LF-data. Concretely, given a set of values of , we can calculate the line equation of the corresponding line segment in every 2D angular slice by the method detailed in Section 5.2. Then 2D templates are generated using the method as described in  [22]. After obtaining the 2D templates of all these angular slices, an entire 4D template is obtained. Fig.  5 illustrates a 3D slice of a 4D template. We can find that the 2D line in every angular slice changing gradually along the space axis.

5.3.2 Calculation of NCC

We first reshape the 4D templates into a sequence of 2D angular slices, and then calculate the NCC between each 2D angular template and its corresponding sub-image in the raw data. Total NCC of the entire 4D template is considered as the sum of the NCC of each one.

5.3.3 Nonlinear Optimization

Considering the calculation of NCC between all these 4D templates to the actual raw data is a time-consuming task, a nonlinear optimization procedure is utilized to find the optimal template. Total NCC value is regarded as the objective function, and is regarded as the optimization variables. Starting from an initial solution, a direct search method  [33] , that does not use numerical or analytic gradients was applied to find the optimal solution.

5.3.4 Calculating intersection of lines

A horizontal line and a vertical line intersect at a corner point in the checkerboard. As shown in Fig. 6 we first use raw data in the red area to detect the vertical line, then use raw data in the blue area to detect the horizontal line. By denoting “LF-lines” of the vertical and horizontal line segments with and respectively, we first calculate the null space of the coefficient matrix in

(32)

and

(33)

which represent a collection of planes that pass through two “LF-points”. Then the intersection of the two null spaces, i.e., the intersection of the two plane collections, must be the intersection corner of the vertical line segment and the horizontal line segment. It is calculated by solving

(34)

where and are two bases of the null space obtained by solving (32). and are two bases of the null space obtained by solving (33). All three linear equation as discussed above, i.e. (32)-(34), are solved by the SVD operation. And the result acquired by solving (34) is exactly the “LF-point” of the intersection corner.

6 Experimental Result

(a)  
(b)  
Fig. 6: Results of line detection. (a) shows the detection result of the method  [22]  (b) shows the re-projection of the “LF-line” to the raw data of the proposed method.
Fig. 7: Results of corner detection of the proposed method and the method  [22] . Green dot shows the result of projected corner locations in the method of  [22] , and red dot shows corner locations projected from “LF-point” detected by the proposed method. They are all come from the corner area in Fig. 6(a) and Fig. 6(b).
(a)  
(b)  
Fig. 8: Results of corner detection by Harris detection manner (a) before eliminating false corner (b) after eliminating false corner

Six datasets are used to evaluate the performance of the proposed calibration method. Among these six datasets, three of them are from  [18], which are captured by one Lytro camera with different focal settings and are denoted as D-A, D-B and D-E in this paper. The checkerboard sizes are grid of 3.61 mm cells, grid of 3.61 mm cells and grid of mm cells respectively. The fourth dataset is taken from  [22], which is captured by Lytro illum camera, and it is denoted as G-A in this paper. The checkerboard size is grid of 26.25mm cells. The remaining two datasets are captured by ourselves using Lytro illum camera. The checkerboard size is grid of 29.92 mm cells and grid of 22.25 mm cells respectively. These two datasets are denoted as P-A, P-B in this paper. Noted that only 5, 9 and 14 images respectively in D-A, D-B and D-E could be used in which the checkboard is out of focus and line features are visible  [22].

We compare the proposed method with other well-known algorithms in three aspects:first, to evaluate the performance of the corner detection algorithm described in Section 5, we compare the detection result of the proposed method with the state of the art Harris detection method  [34], and the projected corner locations detection method in  [22]. Second, to evaluate the 2D lateral re-projection errors of the intrinsic and extrinsic parameters, we compare the proposed method to two state-of-the-arts, namely the method of Dansereau et al.  [18] and the method of Bok et al.  [22], using metrics point-to-ray error (P2RE) and point-to-point error (P2PE). Third, to evaluate the accuracy of the depth value calculated by intrinsic parameters, we compare the relative depth error (RDE) of the propose method with the method of Dansereau et al.  [18] and the method of Bok et al.  [22].

6.1 Corner Detection Results

Similar to  [22], the first step of our corner detection algorithm is “line detection”, whose result directly influence the corner point detection. Thus, before comparing the final result of corner detection. we first make a comparison between the 2D line images detected by  [22], and the 2D line images projected from “LF-line” detected in Section 5. The result is shown in Fig.  6. It shows that the 2D line images captured by  [22] is more cluttered, where stronger noise affects the regular pattern between 2D line images in different sub-images. In contrast, the proposed method outputs a group of regularly distributed 2D line images in high noise level. The relationship between positions of 2D line images in different sub-images is preserved, since the entire 4D light field is utilized.

Then we compare the proposed corner detection method with Harris method and the method in  [22]. Noted that methods based on sub-aperture images are not included in this section, since they could not directly give the locations of the corners in raw image. Fig.  8(a) shows the result of the Harris detection method in which areas between two adjacent sub-images is often detected as a “false corner”. Other detection errors also exist even though the above-mentioned false corners are eliminated, as shown in Fig.  8(b). Due to the low resolution and blur of the sub-image, directly using the traditional corner detection method to the LF raw image is problematic. Fig.  7 shows the corners locations projected from ‘LF-point’ and projected corner locations in the method of  [22]. It is noted that the corner locations detected by the method of  [22] is diverge from the actual locations whilst the proposed method generates an accurate detection result.

6.2 Re-Projection Errors in Lateral Direction

To compare the re-projection errors of the calibration method, a variety of benchmark metrics are used in different calibration methods  [18][22] and  [21]. For fairness, in this paper, we evaluate the proposed method using P2RE and P2PE measures which are unused in the optimization step of the proposed algorithm. P2RE measures the distance between the ideal corner point and the re-projecting ray of the detected corner point, which is proposed in  [18]. P2PE measures the 2D distance between the ideal corner point and the re-projecting point of the detected corner onto the checkerboard plane. Two state of the art methods, namely  [18] and  [22] are compared against. Experimental results are shown in Table I and Table II, respectively. The errors of “Bok-org” are provided by the paper  [22] directly, and the errors of “Bok-run” are obtained by running their latest released code.

Theoretically, for one calibration method, its P2PE value should be slightly larger than its P2RE value. It is because the checker plane is not perpendicular to the ray emitting from a point on this plane, in most cases. However, for the actual data, due to factors including image noise, these two values can be significantly different. Minimizing one metric in the non-linear optimization procedure may not ensure the optimization of another metric simultaneously. From Table 1 and Table 2, the method in  [18] performs well in datasets D-A D-B D-E on P2RE metric, where a weak performance measured by P2PE is witnessed for especially for dataset D-E. Similarly, the method in  [22] also has a large P2RE error for D-E dataset.

For both metric, the method of  [18] has a larger error in dataset P-A and P-B. This may because the method of  [18] is a sub-aperture based method where inaccuracy occurring in the sub-aperture extracting procedure of this method. As shown in Fig.  8, there exits some ghosts in this image, which comes from a sub-aperture far from the optical axis.

Comparing to other two methods,the proposed method performs well for both P2RE and P2PE metric on all the six datasets, whatever the images are noisy or clean. It reflects the stable and accuracy of this method.

Fig. 9: Image of a sub-aperture far from the optic axis extracted in  [18] .
Fig. 10: Image of center view sub-aperture extracted in  [18] .
Algorithm Proposed Dansereau  [18] Bok-run  [22]
D-A 0.0977225 0.0984018 0.2711
D-B 0.0411096 0.0442647 0.1525
D-E 0.173376 0.146232 0.5404
P-A(2) 0.0186368 0.0622009 0.2076
P-B(3) 0.0157983 0.0398623 0.1392
G-A 0.101822 - 0.2349
TABLE I: Point-to-point errors (mm)
Algorithm Proposed Dansereau [18] Bok-org [22] Bok-run
D-A 0.0806737 0.0815884 0.1076 -
D-B 0.0389486 0.0420258 0.0714 -
D-E 0.163982 0.134463 0.454 -
P-A(2) 0.0179212 0.0598808 - 0.1972
P-B(3) 0.0149911 0.0381868 - 0.1330
G-A 0.0895863 - 0.2066 -
TABLE II: Point to ray errors (mm)

6.3 Errors in depth direction

Different from a traditional camera, a calibrated LF-camera can provide the depth value corresponding to each pixel. Therefore, we also made a comparison between the depth value calculated by intrinsic parameters calibrated by methods  [18]  [22] and the proposed method.

Dataset Proposed Dansereau  [18] Bok  [22]
D-A 1.85 3.14 8.96
D-B 1.75 1.82 9.44
D-E 12.94 25.29 29.22
P-A 1.94 10.35 12.71
P-B 2.33 4.95 13.52
G-A 3.33 - 17.98
TABLE III: Relative depth errors
(a)  
(b)  
(c)  
(d)  
(e)  
(f)  
Fig. 11: 3D corner points calculated from raw image by using the intrinsic parameters (blue points), and 3D corner points calculated from their actual world coordinates by using the extrinsic parameters (red points). (For a clearer illustration, this figure only shows the 2D projection in XZ coordinate plane). (a)(c)(d) come from dataset D-E, and (b)(d)(e) come from dataset P-B. (a)(b) is the result of  [18] , (c)(d) is the result of  [22] , and (e)(f) is the result of the proposed method.

For every method, the depth value calculated from raw image using the intrinsic parameters, denoted by , and the depth value calculated from the world coordinates of the actual 3D corner point using the extrinsic parameters, denoted by , are both obtained . For the proposed method, is calculated by  (20) . For methods of  [18] and  [22], we derived a simple method to calculated since they does not directly estimate depth value from the raw image. We first choose two pixels corresponding to the same 3D corner point. These two pixels belong to two sub-apertures with a large baseline. Then the location of this 3D corner point is determined by calculating the intersection of two rays which is calculated from each pixel by the intrinsic parameters of the method of  [18] and  [22]. , is just the Z coordinate of the estimated 3D corner point. Finally, the relative depth error (RDE), is calculated by (35)

(35)

Table IIIshows the RDE results of the proposed method against  [18] and  [22]. It shows that the depth error caused in  [18] and  [22] is much larger than the proposed method. And more details are shown in Fig.  11. It indicates that a corner point with small re-projection error may have a large depth error.

Based on the results listed in Table 1 to 3, it can be concluded that the proposed model outperforms the ones described in  [18] and  [22] in terms of both the calibration accuracy on lateral direction and depth direction. The underlying reason is that we separately estimated and from other parameters.

7 Conclusion And Future Work

In this paper, for calibrating LF cameras, we have present a Sub-Aperture Related Bipartition (SARB) projection model. Due to the two-part structure of the proposed model, the calibration method for the traditional camera can be effectively reused. Meanwhile, both the 2D re-projection errors in the lateral direction and errors in the depth direction are reduced efficiently due to this structure. Besides, an accurate and robust corner detection method is also proposed. To our best knowledge, it is the first research where 4D light field data is fully utilized to detect corners. Future work may include global optimization step in the proposed method to considering the misalignment of the MLA in the calibration pipeline.

References

  • [1] R. Ng, “Fourier slice photography,” in ACM transactions on graphics (TOG). ACM, 2005, vol. 24, pp. 735–744.
  • [2] H.-G. Jeon, J. Park, G. Choe, J. Park, Y. Bok, Y. Tai, and I. So Kweon, “Accurate depth map estimation from a lenslet light field camera,” in

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , 2015, pp. 1547–1555.
  • [3] Michael W Tao, Sunil Hadap, Jitendra Malik, and Ravi Ramamoorthi, “Depth from combining defocus and correspondence using light-field cameras,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 673–680.
  • [4] Sven Wanner and Bastian Goldluecke, “Globally consistent depth labeling of 4d light fields,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 41–48.
  • [5] Vaibhav Vaish, Marc Levoy, Richard Szeliski, C Lawrence Zitnick, and Sing Bing Kang, “Reconstructing occluded surfaces using synthetic apertures: Stereo, focus and robust measures,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). IEEE, 2006, vol. 2, pp. 2331–2338.
  • [6] Tao Yang, Yanning Zhang, Jingyi Yu, Jing Li, Wenguang Ma, Xiaomin Tong, Rui Yu, and Lingyan Ran, “All-in-focus synthetic aperture imaging,” in European Conference on Computer Vision. Springer, 2014, pp. 1–15.
  • [7] F. Dong, S.-H. Ieng, X. Savatier, R. Etienne-Cummings, and R. Benosman, “Plenoptic cameras in real-time robotics,” The International Journal of Robotics Research, vol. 32, no. 2, pp. 206–217, 2013.
  • [8] Niclas Zeller, Franz Quint, and Uwe Stilla, “From the calibration of a light-field camera to direct plenoptic odometry,” IEEE Journal of selected topics in signal processing, vol. 11, no. 7, pp. 1004–1019, 2017.
  • [9] B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” in ACM Transactions on Graphics (TOG). ACM, 2005, vol. 24, pp. 765–776.
  • [10] A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J. Tumblin, “Dappled photography: Mask enhanced cameras for heterodyned light fields and coded aperture refocusing,” in ACM transactions on graphics (TOG). ACM, 2007, vol. 26, p. 69.
  • [11] C.-K. Liang, T.-H. Lin, B.-Y. Wong, C. Liu, and H.H. Chen, “Programmable aperture photography: multiplexed light field acquisition,” in ACM Transactions on Graphics (TOG). ACM, 2008, vol. 27, p. 55.
  • [12] K. Marwah, G. Wetzstein, Y. Bando, and R. Raskar, “Compressive light field photography using overcomplete dictionaries and optimized projections,” ACM Transactions on Graphics (TOG), vol. 32, no. 4, pp. 46, 2013.
  • [13] Y. Taguchi, A. Agrawal, A. Veeraraghavan, S. Ramalingam, and R. Raskar, “Axial-cones: Modeling spherical catadioptric cameras for wide-angle light field rendering,” ACM Trans. Graph., vol. 29, no. 6, pp. 172, 2010.
  • [14] R. Ng et al., Digital light field photography, stanford university Stanford, 2006.
  • [15] A. Lumsdaine and T. Georgiev, “The focused plenoptic camera,” in 2009 IEEE International Conference on Computational Photography (ICCP). IEEE, 2009, pp. 1–8.
  • [16] Lytro, “The lytro camera [online]. available,” http://www.lytro.com/, 2016.
  • [17] Raytrix, “3d light field camera technology [online]. available,” http://www.raytrix.de/, 2016.
  • [18] D. G. Dansereau, O. Pizarro, and S.B. Williams, “Decoding, calibration and rectification for lenselet-based plenoptic cameras,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 1027–1034.
  • [19] H. Duan, L. Mei, J. Wang, L. Song, and N. Liu, “A new imaging model of lytro light field camera and its calibration,” Neurocomputing, vol. 328, pp. 189–194, 2019.
  • [20] Q. Zhang, C. Zhang, J. Ling, Q. Wang, and J. Yu, “A generic multi-projection-center model and calibration method for light field cameras,” IEEE transactions on pattern analysis and machine intelligence, 2018.
  • [21] S. O’brien, J. Trumpf, V. Ila, and R. Mahony, “Calibrating light-field cameras using plenoptic disc features,” in 2018 International Conference on 3D Vision (3DV). IEEE, 2018, pp. 286–294.
  • [22] Y. Bok, H.-G. Jeon, and I.S. Kweon, “Geometric calibration of micro-lens-based light field cameras using line features,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 2, pp. 287–300, 2016.
  • [23] C.-A. Noury, C. Teuliere, and M. Dhome, “Light-field camera calibration from raw images,” in 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2017, pp. 1–8.
  • [24] C. Heinze, S. Spyropoulos, S. Hussmann, and C. Perwass, “Automated robust metric calibration algorithm for multifocus plenoptic cameras,” IEEE Transactions on Instrumentation and Measurement, vol. 65, no. 5, pp. 1197–1205, 2016.
  • [25] P. Zhou, W. Cai, Y. Yu, Y. Zhang, and G. Zhou, “A two-step calibration method of lenslet-based light field cameras,” Optics and Lasers in Engineering, vol. 115, pp. 190–196, 2019.
  • [26] S. Nousias, F. Chadebecq, J. Pichat, P. Keane, S. Ourselin, and C. Bergeles, “Corner-based geometric calibration of multi-focus plenoptic cameras,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 957–965.
  • [27] C.-K. Liang and R. Ramamoorthi, “A light transport framework for lenslet light field cameras,” ACM Transactions on Graphics (TOG), vol. 34, no. 2, pp. 16, 2015.
  • [28] T.G. Georgiev and A. Lumsdaine, “Focused plenoptic camera and rendering,” Journal of Electronic Imaging, vol. 19, no. 2, pp. 021106, 2010.
  • [29] C.-K. Liang, Analysis, acquisition, and processing of light field for computational photography, Ph.D. thesis, Ph. D. dissertation, National Taiwan Univ., Taipei, Taiwan, 2008.
  • [30] A. Gerrard and J.M. Burch, Introduction to matrix methods in optics, Courier Corporation, 1994.
  • [31] R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, P. Hanrahan, et al., “Light field photography with a hand-held plenoptic camera,” Computer Science Technical Report CSTR, vol. 2, no. 11, pp. 1–11, 2005.
  • [32] Z. Zhang et al., “Flexible camera calibration by viewing a plane from unknown orientations.,” in Iccv, 1999, vol. 99, pp. 666–673.
  • [33] R. Hartley and A. Zisserman, Multiple view geometry in computer vision, Cambridge university press, 2003.
  • [34] C. G. Harris, M. Stephens, et al., “A combined corner and edge detector.,” in Alvey vision conference. Citeseer, 1988, vol. 15, pp. 10–5244.