A Generic Multi-Projection-Center Model and Calibration Method for Light Field Cameras

08/07/2018 ∙ by Qi Zhang, et al. ∙ 2

Light field cameras can capture both spatial and angular information of light rays, enabling 3D reconstruction by a single exposure. The geometry of 3D reconstruction is affected by intrinsic parameters of a light field camera significantly. In the paper, we propose a multi-projection-center (MPC) model with 6 intrinsic parameters to characterize light field cameras based on traditional two-parallel-plane (TPP) representation. The MPC model can generally parameterize light field in different imaging formations, including conventional and focused light field cameras. By the constraints of 4D ray and 3D geometry, a 3D projective transformation is deduced to describe the relationship between geometric structure and the MPC coordinates. Based on the MPC model and projective transformation, we propose a calibration algorithm to verify our light field camera model. Our calibration method includes a close-form solution and a non-linear optimization by minimizing re-projection errors. Experimental results on both simulated and real scene data have verified the performance of our algorithm.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 5

page 10

page 11

page 12

page 13

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The micro-lens array (MLA) based light field cameras, including conventional light field camera [1] and focused light field camera [2], can capture radiance information of light rays in both spatial and angular dimensions, i.e., 4D light field [3, 4]

. The data from light field camera is equivalent to narrow baseline images of traditional cameras with coplanar projection centers. The measurement of same point in multiple directions allows or strengthens the applications in computational photography and computer vision, such as digital refocusing

[5]

, depth estimation

[6], segmentation [7] and so on. Recent work also proposed the methods on light field registration [8] and stitching [9, 10] to expand the field of view (FOV). To support these applications, it is crucial to accurately calibrate light field cameras and establish exact relationship between the ray space and 3D scene.

It plays an important role to build a model for describing the ray sampling pattern of light field cameras. Previous approaches have dealt with imaging models on light field cameras in different optical designs [11, 12, 13, 14, 15]. The common points are based on the fact that the micro-lens is regarded as a pinhole model and the main-lens is described as a thin-lens model. However, some of open issues still remain in the models and methods. Firstly, the proposed models focus on angular and spatial information of rays, but the relationship between light field and 3D scene geometry is not explored. Secondly, very little work has considered a generic model before to describe light field cameras with different image formations [1, 2]. Thirdly, existing intrinsic parameters of light field camera models are either redundant or incomplete such that corresponding solutions are neither effective nor efficient.

In the paper, we first propose a multi-projection-center (MPC) model based on two-parallel-plane (TPP) representation [3, 4]. Then we deduce the transformations between 3D scene geometry and 4D light rays. Based on geometry transformations in the MPC model, we characterize various light field cameras in a generic 6-intrinsic-parameter model and present an effective intrinsic parameter estimation algorithm. Experimental results on both virtual (simulated data) and physical (Lytro, Illum and a self-assembly focused) light field cameras have verified the effectiveness and efficiency of our model.

Our main contributions have three aspects, including

(1) We deduce the transformations to describe the relationship between light field and scene structure.

(2) We describe light field cameras with different image formations as a generic 6-parameter model without redundancy.

(3) We propose an effective intrinsic parameter estimation algorithm for light field cameras, including a closed-form linear solution and a nonlinear optimization.

The remainder of the paper is organized as follows. Section 2 summarizes related work on the models of light field cameras and calibration methods. Section 3 introduces our MPC model and the transformations between the 3D structure and 4D light field. Based on the theory of light field parameterization, a generic 6-intrinsic-parameter light field camera model is proposed. Section 4 provides the details of our calibration method and analyzes computational complexity of the closed-form solution. In Section 5, we present extensive results on the simulated and real scene light fields, demonstrating more accurate intrinsic parameter estimation than previous work [11, 13].

2 Related Work

To acquire 4D light field, there are various imaging systems developed from traditional camera. Wilburn et al. [16] present a camera array to obtain light field with high spatial and angular resolutions. Classic calibration approach is employed for the camera array [17]. More general, in traditional multi-view geometry framework, multiple cameras in different poses are defined as a set of unconstrained rays, which is known as Generalized Camera Model (GCM) [18]. The ambiguity of the reconstructed scene is discussed in traditional topics [19]. However, such applications on the camera array are limited by its high cost and complex control. In contrast, the MLA enables a single camera to record 4D light field more conveniently and efficiently, though the baseline and spatial resolution are relatively smaller than camera array. Compared to the camera array, multiple projection centers of MLA-based light field camera are aligned on a plane strictly due to physical design. Recent work devotes to intrinsic parameter calibration of light field cameras in two designs [1, 2], which are quite different according to the image pattern of micro-lenses.

The main difference of light field cameras is the relative position of main lens’s imaging plane and the MLA plane [20]. It determines rays’ distribution from the same point, which affects the way to extract sub-apertures from raw image, i.e., the micro-lens images [21, 22]. However, the measurements of the same point in multiple directions are obtained in different types of light field cameras, equivalent to the data of GCM. Therefore, the light field camera model can use classic multi-view geometry theory for reference.

Recently, some state-of-the-art methods have proposed models on conventional light field camera, where multiple viewpoints or sub-apertures are convenient to be synthesized. Dansereau et al. [11] present a model to decode pixels into rays for a Lytro camera, where a 12-free-parameter transformation matrix is related to reference plane outside the camera (in nonlinear optimization, 10 intrinsic parameters and 5 distortion coefficients are finally estimated). However, the calibration method using traditional camera calibration algorithm is not effective, also there are redundant parameters in the decoding matrix. Bok et al. [13] formulate a geometric projection model consisting of a main lens and a MLA (their extended work has been published in IEEE TPAMI [23]). Intrinsic parameters are estimated by conducting raw images directly and an analytical solution is deduced. Moreover, Thomason et al. [15] try to deal with the misalignment of the MLA and estimated its position and orientation.

Apart from this, other researchers have explored models on the focused light field camera, where multiple projections of the same point are convenient to be recognized. Johannsen et al. [12] propose to calibrate intrinsic parameters of the focused light field camera. By reconstructing 3D points from the parallax in adjacent micro-lens images, the parameters (including depth distortion) are estimated. However, the geometry center of micro image is on its micro-lens’s optical axis in the camera model. This assumption causes inaccuracy on reconstructed points and estimated results are finally compensated by the coefficients of depth distortion. Hahne et al. [24] further discuss the influence of above-mentioned assumption, i.e., the deviation of micro-lens and its image. Heinze et al. [25] apply a similar model with Johannsen et al. [12] and deduce a linear initialization for intrinsic parameters.

In a word, previous light field camera models are either redundant or complex, which leads to a non-unique solution of intrinsic parameter estimation or inaccuracy of decoding light field. An unreliable camera model is also a bottleneck that might impede light field applications for computer vision and computational photography, especially on light field registration, stitching and enhancement. To support further applications, a general light field camera model capable of representing rays and scene geometry more concisely is in urgent need.

3 Multi-Projection-Center Model

In this section, we first propose the MPC model based on the TPP representation of light field. Then we deduce the transformation matrix to relate 3D scene geometry and 4D rays. Finally, we utilize the MPC model to describe the image formation of light field cameras and define generic intrinsic parameters, including conventional and focused light field cameras. Table I gives the notation of symbols used in the following sections.

Term Definition
Indexed pixel of raw image inside the camera
Virtual (conjugate) light field outside the camera
Decoded physical light field
Intrinsic parameters
3D point in the world coordinates
3D point reconstructed by
3D point reconstructed by
Rotation matrix of extrinsic parameter

Translation vector of extrinsic parameter

Measurement matrix of rays
Homogenous projection matrix
Non-homogenous projection matrix partitioned from
Homography matrix decided by intrinsic and extrinsic parameters only
Distortion vector
TABLE I: Notation of symbols in the paper

3.1 The Coordinates of MPC Model

As shown in Fig. 1, there are three coordinates in the MPC model, i.e., 3D world coordinates , 3D camera coordinates , 4D TPP coordinates ( for the view plane and for the image plane). In general, the transformation between world and camera coordinates is related by extrinsic parameters . The spacing between two parallel planes of traditional TPP representation is normalized as to describe a set of rays [3, 4]. Although it is complete and concise, to derive the transformation between 3D structure and 4D rays in light field cameras, we prefer a model consisting of two parallel planes with the spacing .

Let denote light field in the MPC model with the spacing . Then the ray is parameterized by two planes, i.e., and . Let denote the view plane and denote the image plane . In the MPC model, defines a ray passing and , where is the projection center and is the corresponding projection.

Fig. 1: An illustration of three coordinates in the MPC model.

Given a projection center (i.e., the - view or sub-aperture) and the 3D point , we can get the image projection in the local coordinate of the - view,

(1)

Since there are multiple projection centers , , the 3D point can be observed for times. Obviously, when the spacing changes to and there is only one projection center on the view plane, the image formation degenerates into traditional central-projective camera model [19].

3.2 Transformation between Geometry and Rays

It is known that different directional rays from one point enable 3D reconstruction. Let the ray intersect at the point in the 3D space, we can get the relationship between the ray and 3D point by the triangulation,

(2)

where is a matrix consisting of rays and the MPC parameter .

If two rays and are from one 3D point , they can be represented by the following two equivalent forms,

(3)

and

(4)

3.3 3D Projective Transformation

In fact, a linear transformation on the coordinates of

causes 3D projective distortion on the reconstructed point [19], deduced from Eqs.(3) and (4). As shown in Fig. 2, we show three examples of linear transformations, including the changing of , scaling in the image plane () (in general there are 4 scaling factors , two in the view plane and two in the image plane respectively), and translation in the image plane of specific view (generally in both planes). The details are derived as follows.

(1) If we change into , the imaging point passed by becomes and the intersection of rays becomes . Substituting it into Eqs.(3) and (4), we have

(5)

where and are in the homogeneous coordinates.

(2) Let become , thus there is a transformation on the rays caused by the offset . Substituting it into Eqs.(3) and (4), we can get the transformation matrix between and ,

(6)

(3) Let become , thus there is a transformation caused by the scaling vector . Then the transformation matrix between and is,

(7)

and

(8)

In particular, Eqs.(7) and (8) hold when .

As shown in the left-most of Fig. 2, there is a scene with a Lambertian cube recorded by a MPC model. The observation of the cube in multiple directions is 4D light field. If the coordinates are linearly transformed and the light intensity keeps constant, the intersections of rays will be transformed by a 3D projection matrix. Therefore, the cube will be projected by transformation parameters (the right three of Fig. 2).

Fig. 2: Examples of transformations between 3D structure and 4D light field of a Lambertian cube. The leftmost is an original cube and others are the projected ones with the changing of parameter , scaling in the image plane () and translation respectively.

3.4 The MPC Model in Light Field Cameras

Light field cameras are improved from traditional cameras. They record real world scene in different but similar ways. In traditional cameras, the central projection process of a 2D image is a dimension reduction of 3D space [19]. In light field camera, 3D structure projected by the main lens is arranged by the design of light path on the image sensor. The processes of multiple center projections are analyzed as follows.

On the one hand, as for a conventional light field camera, the sampling pattern of light field is shown in Fig. 3. The pixel of sub-aperture images is extracted from the micro-lens of . The sub-aperture image of the view is extracted from the pixels in the local micro-lens image coordinates, as shown in Fig. 3. Obviously, there are two light fields, i.e., inside the camera and in the outer world. Considering the projection of main lens, there is a 3D projective distortion between the 3D points reconstructed from and .

Fig. 3: Optical path of a conventional light field camera [1]. There are two MPC coordinates inside the camera and in the outer world with linear transformation, i.e., and respectively.

On the other hand, as for the focused light field cameras, two sampling patterns of light field in two different optical paths are shown in Fig. 4. The micro-lenses project the distorted 3D scene inside the camera on the image sensor, where the image range is controlled by the aperture of main lens and the distance of components. The light field inside the camera can be decoded by the pixels of image sensor and their corresponding optical centers of micro-lens , i.e., . In addition, is determined by the layout of MLA, as shown in Fig. 5b. By the transformation on the coordinate of we have discussed in Sec 3.3, the outside light field is obtained, which is the conjugate MPC coordinate outside the camera. The real world scene can be reconstructed by the light field without projective distortion.

Fig. 4: Optical paths of focused light field cameras with different designs [2]. There is a conjugate MPC coordinate in the outer world with the inner one .

Let denote indexed pixels of light field cameras with . Moreover, is a set of indexed pixels and not a physical light field. In conventional light field camera, are the sub-aperture images indexed by the view. In the focused light field cameras, are micro-lens images indexed by their relative positions on the raw image. Obviously, by a linear transformation on the , we can conduct and eliminate 3D projective distortion caused by the main lens. However, to parameterize 4D light field without redundancy, the spacing of two parallel planes should be 1. Let denote the normalized light field. According to Eqs.(5) to (8), the normalization is a linear operation on the coordinates, and transformation matrices , and are all identity matrices. It means that indexed pixels can be transformed to physical rays in real world scene by linear transformations as we discussed before. The indexed pixels and decoded physical light field of light field cameras in two different designs are shown in Fig. 5, where pixels and physical rays are related by intrinsic parameters.

In summary, we can transform an indexed pixel of raw image into a normalized physical light field by a decoding matrix that is consisting of intrinsic parameters .

Fig. 5: The indexed pixels and decoded physical light field of light field cameras in two designs.
(9)

Let and denote two 3D points reconstructed by and respectively. According to Eq.(9), the relationship between and is

(10)

where is determined by intrinsic parameters in the decoding matrix . Here, and , which are totally decided by the mapping from indexed pixels to real world light rays.

In addition, the light field inside a conventional light field camera (in Fig. 3) can also be parameterized by the MPC model that is consisting of image sensor and the MLA. However, considering the convenience of extracting sub-aperture images and the difficulty on detecting points on raw image in a conventional light field camera, we prefer to discuss the data as a set of sub-aperture images. Conversely, for the focused one, we model the parameterization plane by the raw image plane and discuss the raw image directly.

4 Light Field Camera Calibration

We verify our light field camera model by intrinsic parameter calibration. We will provide the details of how to solve intrinsic parameters, including a linear closed-form solution and a nonlinear optimization to minimize the re-projection error. In our method, the prior scene points are supported by a planar calibration board in different poses.

4.1 Linear Initialization

After necessary preprocessing, the micro-lens images are recognized [11, 21, 26], i.e., . We assume that the prior 3D point in the world coordinates is related to the 3D point in the MPC coordinates by a rigid motion, , with the rotation and translation . Let denote - column vector of . The relationship among , , and intrinsic parameters is obtained by Eqs.(2) and (10).

(11)

where is a measurement matrix of rays and . These rays are derived from the indexed pixels as mentioned in Eq.(1).

Suppose that the calibration board is on the plane of in the world coordinates, thus . To solve the unknown parameters, we simplify Eq.(11) as,

(12)

where is a matrix stretched on row from . is a direct product operator. is a matrix only consisting of intrinsic and extrinsic parameters, defined as

(13)

In addition, is a matrix containing at least 2 rays from light field , according to Eq.(2). By stacking measurements from at least 3 non-collinear points , the homography can be estimated by Eq.(12).

In order to derive intrinsic parameters from , we can partition to extract a upper triangle matrix . Let denote the element on the - row and - column of , we rewrite Eq.(13) as follows,

(14)

where is a matrix, i.e., top-left of .

Let denote the - column vector of . Utilizing the orthogonality and identity of , we have

(15)

where .

Let a symmetric matrix denote . The analytical form of is

(16)

Note that there are only 5 distinct non-zero elements in , denoted by . To solve , we rewrite Eq.(15) as follows,

(17)

By stacking at least two such equations (from two poses) as Eq.(17), we can obtain a unique general non-zeros solution for , which is defined up to an unknown scale factor.

Once is determined, it is an easy matter to solve using Cholesky factorization [27]. Let denote the estimation of , i.e., . Let denote the element on the - row and - column of , intrinsic parameters except and are estimated by the ratio of elements

(18)

Apart from intrinsic parameters, extrinsic parameters in different poses can be extracted as follows,

(19)

where denotes norm. values 1 or -1 and it is decided by image formation. In conventional light field camera and the focused one with shorter light path (as shown in Fig. 3 and 4b), makes . Otherwise, in the focused light field camera with longer light path (see Fig. 4a), makes .

To obtain other two intrinsic parameters and , we substitute the results in Eq.(19) for Eq.(11) and obtain using the estimated extrinsic parameters. Then, Eq.(2) is rewritten as,

(20)

Stacking the measurements in different poses, we can obtain a unique non-zeros solution for and .

4.2 Nonlinear optimization

The most common distortion of traditional camera is radial distortion. The optical property of main lens and physical machining error of the MLA might lead to the distortion of rays in light field camera. Theoretically, due to two level imaging design with main lens and micro-lens array, there should exist radial distortion on the image plane and sampling distortion on the view plane simultaneously. In the paper, we only consider the distortion on the image plane and omit sampling distortion on the view plane (i.e., angular sampling grid is ideal without distortion).

(21)

where and is the ray transformed from the measurement by intrinsic parameter according to Eq.(9). denotes distortion vector and is undistorted projection from the distorted one in the local image coordinates under the view. In the distortion vector , and regulate radial distortion on the image plane. and represent the distortion of image plane affected by the sampling view , which is caused by non-paraxial rays of the main lens.

We minimize the following cost function with the initialization solved in Section 4.1 to refine the parameters, including intrinsic parameter , distortion vector , and extrinsic parameters and , , is the number of poses.

(22)

where is the image point from according to Eq.(9) and followed by distortion rectification according to Eq.(21). is the projection of 3D point in the world coordinates according to Eq.(1).

In Eq.(22), is parameterized by Rodrigues formula [28]. In addition, the Jacobian matrix of cost function is simple and sparse. This nonlinear minimization problem can be solved with the Levenberg-Marquardt algorithm based on trust region method [29]. We adopt MATLAB’s function to complete the optimization.

0:  3D prior points and corresponding rays .
0:  Intrinsic parameters ;Extrinsic parameters , ;Distortion vector .
1:  for  to  do
2:     for each 3D point  do
3:         Generate the measurement matrix from indexed pixel Eq.(2)
4:     end for
5:     Calculate the homography matrix according to and Eq.(12)
6:  end for
7:  Calculate the matrix Eq.(17)
8:  Calculate projection matrix from using Cholesky factorization
9:  Obtain four intrinsic parameters Eq.(18)
10:  for  to  do
11:     Get extrinsic parameters and Eq.(19)
12:  end for
13:  Obtain other two intrinsic parameters Eq.(20)
14:  Initialize distortion coefficient
15:  Create the cost function according to intrinsic parameters, extrinsic parameters and distortion coefficient Eq.(22)
16:  Obtain optimized results using nonlinear LM algorithm
Algorithm 1 Light Field Camera Calibration Algorithm.

4.3 Computational Complexity

The calibration algorithm of light field camera is summarized in Alg. 1. Let denote sampling number on the view plane, be the number of prior points on the calibration board, and be the number of poses, respectively. For the measurement of each pose, there are linear equations to solve . Then linear equations and equations are solved to obtain intrinsic parameters. The main complexity is spent on the solution of from different poses, i.e., .

By contrast, the algorithm in Dansereau et al. [11] calculates a homography for every sub-aperture image in different view, i.e., . It suffers from a higher complexity and a lower accuracy on parameter initialization. The algorithm in Bok et al. [13] solves a linear equation on every pose and its computational complexity is . However, there are intrinsic and extrinsic parameters in the equations, which causes inaccuracy on the solution.

5 Experimental Results

In this section, we verify our light field camera model by the calibration of intrinsic parameters. We present various experimental results both on simulated and real datasets. The performance is analyzed by comparing with the ground truth or baseline algorithms [11] and [13].

5.1 Simulated data

In this subsection we verify our calibration method on simulated data. The simulated light field camera has the following property referred to Eq.(9), as shown in Table II. These parameters are close to the setting of Lytro camera so that we obtain plausible input close to real-world scenarios. The checkerboard is a pattern with points with cells.

5.1.1 Performance w.r.t the number of poses and views

Firstly, we test the performance with respect to the number of poses and the number of views. We vary the number of poses from 2 to 8 and the number of views from to . For each combination of pose and view, 200 trails of independent calibration board poses are generated. The rotation angles are randomly generated from to

, and the measurements are all added with Gaussian noise with zero mean and standard deviation 0.5 pixels.

The calibration results with increasing measurements are shown in Fig. 6. We find that the relative errors decrease with the increase in the number of poses. When the number of pose is greater than 2, all the relative errors are within an acceptable level, as summarized in TableIII. Meanwhile, the errors reduce as the number of views grows once the number of poses is fixed. In particular, when and , all the relative errors are less than . Furthermore, the standard deviations of relative errors of Fig. 6 are shown in Fig. 7, from which we can see that standard deviations decrease significantly when the number of pose is greater than 2. Particularly, when and , standard deviations keep at a low level stably. The results in Fig. 6 and Fig. 7 have verified the effectiveness of the proposed calibration algorithm.

2.4000e-04 2.5000e-04 2.0000e-03 1.9000e-03 -0.3200 -0.3300
TABLE II: Intrinsic parameter configuration of the simulated light field camera.
Min 0.0842 0.0795 0.1019 0.1020 0.1633 0.1295
Max 2.0376 1.9238 0.6871 0.6881 1.0511 0.9298
TABLE III: Min and Max relative errors of intrinsic parameters (unit: %) on the simulated data when the number of poses is great than 2.
Fig. 6: Relative errors of intrinsic parameters on the simulated data with different number of poses and views.

5.1.2 Performance w.r.t the measurement noise

Secondly, we employ the measurements of 3 poses and views to verify the robustness of calibration algorithm. The rotation angles of 3 poses are , and respectively. Gaussian noise with zero mean and a standard deviation is added to the projected image points. We vary from to pixels with a step. For each noise level, we performed 150 independent trials. The mean results compared with ground truth are shown in Fig. 8. It demonstrates that the errors increase almost linearly with the noise level. For pixels which is larger than normal noise in practical calibration, the errors of and are less than . Although the relative error of is , the absolute error of is less than pixel (In Eq.(9), and , where is the principal point of sub-aperture imaging), which further exhibits that the proposed algorithm is robust to higher noise level.

Fig. 7: Standard deviations of relative errors of intrinsic parameters on the simulated data with different number of poses and views.

5.2 Physical camera

We also verify the calibration method on real scene light fields captured by conventional and focused light filed cameras. For the conventional light field camera, we use Lytro and Illum to obtain measurements. For the focused one, we use a self-assembly camera according to optical design in Fig. 4a.

5.2.1 Conventional Light Field Camera

The sub-aperture images are obtained by the method of Dansereau et al. [11]. We compare the proposed method in ray re-projection error with state-of-the-arts, including DPW by Dansereau et al. [11] and BJW by Bok et al. [13].

Fig. 8: Relative errors of intrinsic parameters on the simulated data with different noise levels from 0.1 to 1.5 pixels.

Firstly, we carry out calibration on the datasets collected with [11]. For every different pose, the middle sub-apertures are utilized similar to DPW. Table IV summarizes the root mean square (RMS) ray re-projection errors of our method and DPW[11]. In Table IV, the errors of DPW[11]-1 are taking from the paper directly. The errors of DPW[11]-2 are obtained by running their latest released code. On the item of initial, the proposed method provides a smaller ray re-projection error than DPW except on datasets A and B. The result on dataset A performs worse because of bad corner extraction from several poses (i.e., 7, 8, 9 and 10 light field). On the item of optimized, compared with DPW [11] which employs 12 intrinsic parameters, the proposed MPC model only employs a half of parameters but achieves similar performance on ray re-projection errors (the results on datasets A, B and D are better but the results on datasets C and E are worse). Light fields within each dataset are taken over a range of depths and orientations, as shown in Fig. 9. The ranges of datasets A, B are whilst the ranges of datasets C and D do not exceed . Meanwhile, the ranges do not exceed in dataset E. Large ranges are reasonable in all datasets only deducing the accuracy in light of distortion model considering the shifted view. This is the main reason why the performance of the proposed method is worse than that of DPW on dataset E. From the dataset A, we select 6 light fields from which the corners are exactly extracted for the proposed method. The ray re-projection error decreases obviously in Table IV. Considering the fact that the errors exhibited in DPW are minimized in its own optimization (i.e., ray re-projection error), we additionally evaluate the performance in mean re-projection error of DPW and BJW. As exhibited in Table V, the errors of the proposed method are obviously smaller than those of DPW and BJW. In addition, the calibration with fewer number of poses on the datasets [11] is conducted. For dataset D, we randomly select 6 light fields, and for datasets B, C and E, 5 light fields are randomly selected for calibration. Table VI summarizes RMS ray re-projection errors and RMS re-projection errors of the proposed method and DPW [11] respectively. In Table VI, the proposed method achieves smaller errors than DPW. Besides, the calibration results on datasets D and E are obviously improved by reducing the number of poses. We find that smaller range of poses contributes to a performance improvement on datasets D and E, which is shown in Fig. 9. Table VII lists intrinsic parameter estimation results. The re-projection errors of sub-aperture images of B are summarized in Table VIII. The distribution of errors is almost homogeneous. All results have verified the effectiveness of the proposed method.

A A(6) B C D E
DPW[11]-1 3.2000 - 5.0600 8.6300 5.9200 13.8000
Initial DPW[11]-2 0.5190 0.4229 0.5403 0.8832 1.1021 5.9567
Ours 15.3753 0.5400 0.5952 0.5837 0.7473 2.6235
DPW[11]-1 0.0835 - 0.0628 0.1060 0.1050 0.3630
Optimized DPW[11]-2 0.0822 0.0903 0.0598 0.1300 0.1149 0.3843
Ours 0.0810 0.0810 0.0572 0.1123 0.1046 0.5390
TABLE IV: RMS ray re-projection errors of initial parameter estimation and optimization with distortion rectification (unit: ). The datasets are from [11]. The (N) indicates the number of light fields used for calibration among 10 light fields of dataset A. The errors of DPW[11]-1 are provided by the paper directly, and the errors of DPW[11]-2 are obtained by running the latest released code.
A A(6) B C D E
DPW[11] 0.2284 0.3338 0.1582 0.1948 0.1674 0.3360
BJW[23] 0.3736 - 0.2589 - - 0.2742
Ours 0.2200 0.2375 0.1568 0.1752 0.1475 0.2731
TABLE V: Mean re-projection errors of optimization with distortion rectification (unit: ). The results of DPW[11] are obtained by running their latest released code. The results of BJW[23] are from their latest paper.
Ray re-projection error Re-projection error
unit: unit:
DPW[11] Ours DPW[11] Ours
B(5) 0.0643 0.0622 0.2380 0.1458
C(5) 0.1260 0.1250 0.2323 0.1705
D(6) 0.0941 0.0622 0.2024 0.1458
E(5) 0.2967 0.2888 0.3525 0.2049
TABLE VI: RMS errors of optimization with distortion rectification using fewer poses. The (N) indicates the number of light fields used for calibration.
A B C D E
2.6998e-04 2.7937e-04 2.4569e-04 2.6833e-04 2.3004e-04
2.7608e-04 2.8874e-04 2.5359e-04 2.6930e-04 2.3073e-04
1.8572e-03 1.8357e-03 1.8122e-03 1.8342e-03 1.7585e-03
1.8692e-03 1.8323e-03 1.8133e-03 1.8352e-03 1.7634e-03
-0.3417 -0.3415 -0.3550 -0.3343 -0.3520
-0.3449 -0.3344 -0.3382 -0.3275 -0.3615
0.2288 0.1829 0.1639 0.1719 0.1612
-0.0928 0.0875 0.0174 0.0213 -0.0483
-4.5308 -3.6330 -3.3591 -3.5122 2.7747
-4.4428 -3.6064 -3.3394 -3.4662 2.8320
TABLE VII: Intrinsic parameter estimation results of datasets captured by [11].
-3 -2 -1 0 1 2 3
-3 0.1930 0.1820 0.1781 0.1759 0.1759 0.1812 0.2372
-2 0.1836 0.1763 0.1700 0.1687 0.1718 0.1786 0.1813
-1 0.1815 0.1724 0.1669 0.1658 0.1692 0.1761 0.1826
0 0.1783 0.1731 0.1683 0.1662 0.1713 0.1798 0.1897
1 0.1772 0.1733 0.1706 0.1705 0.1748 0.1837 0.1851
2 0.1769 0.1761 0.1757 0.1768 0.1809 0.1836 0.1815
3 0.2039 0.1746 0.1728 0.1730 0.1798 0.1833 0.2755
TABLE VIII: RMS re-projection error of sub-apertures in dataset B (uint: ).
Fig. 9: Pose estimation results of datasets captured by [11]. Light fields used for calibration in Table VI are indicated with bold red indexes of corresponding camera poses in Figs. (c-f).
Illum-1 Illum-2 Lytro-1 Lytro-2
DPW[11] 0.9355 0.6274 0.6201 0.5057
Initial BJW[13] 1.0765 0.8330 1.6676 1.0201
Ours 0.7104 0.4899 0.3538 0.2364
Optimized DPW[11] 0.5909 0.4866 0.1711 0.1287
without BJW[13] - - - -
Rectification Ours 0.5654 0.4139 0.1703 0.1316
Optimized DPW[11] 0.2461 0.2497 0.1459 0.1228
with BJW[13] 0.3966 0.3199 0.4411 0.2673
Rectification Ours 0.1404 0.0936 0.1400 0.1124
TABLE IX: RMS ray re-projection errors of initial parameter estimation, optimizations without and with distortion rectification (unit: ).
Illum-1 Illum-2 Lytro-1 Lytro-2
3.5721e-04 2.2464e-04 5.9386e-04 3.8915e-04
3.5455e-04 2.3299e-04 5.7870e-04 3.8247e-04
1.4309e-03 1.6670e-03 9.5083e-04 1.3195e-03
1.4303e-03 1.6657e-03 9.4794e-04 1.3261e-03
-0.4565 -0.5178 -0.1964 -0.2775
-0.2827 -0.3557 -0.1865 -0.2521
0.3001 0.3562 -0.4559 0.0254
0.2779 0.2595 6.8221 0.8469
-1.4109 -0.6185 -1.3060 -2.2441
-1.4204 -0.8879 -1.3234 -2.2684
TABLE X: Intrinsic parameter estimation results of our collected datasets.
-5 -3 -1 0 1 3 5
-5 0.7880 0.2997 0.3008 0.3041 0.3058 0.3178 0.8070
-3 0.2992 0.3003 0.3033 0.3025 0.3015 0.2930 0.3077
-1 0.2988 0.3086 0.3176 0.3182 0.3141 0.2996 0.2827
0 0.2942 0.3139 0.3115 0.3058 0.3064 0.3024 0.2772
1 0.2934 0.3178 0.3118 0.2963 0.3077 0.3057 0.2784
3 0.3002 0.2966 0.3170 0.3093 0.3115 0.2856 0.2843
5 0.3283 0.2961 0.2851 0.2854 0.2841 0.2852 0.3102
TABLE XI: RMS re-projection error of sub-apertures in dataset Illum-1 (uint: ).

Unlike the core idea of DPW, BJW directly utilizes raw data instead of sub-apertures. However it has a stricter requirement on the acquisition of the calibration board. The data for calibration must be unfocused in order to make the measurements detectable, thus some datasets provided by DPW are incalculable for BJW, just as shown in Table V (i.e. datasets C and D). In order to directly compare with DPW and BJW, we collect other 4 datasets111http://www.npu-cvpg.org/opensource using Lytro and Illum cameras. The dataset Illum-1 shoots corners with cells, including 9 poses. The dataset Lytro-1 shoots corners with cells, including 8 poses. The datasets Illum-2 and Lytro-2 shoot corners with cells, including 10 poses. For Illum-1 and Illum-2 datasets, the middle views are used ( views in total). For Lytro-1 and Lytro-2, the middle views are used ( views in total). Table IX summarizes the RMS ray re-projection errors compared with DPW and BJW at three calibration stages. As exhibited in Table IX, the proposed method obtains smaller ray re-projection errors on the item of initial solution which verified the effectiveness of linear initial solution for both intrinsic and extrinsic parameters. Besides, the proposed method provides similar or even smaller ray re-projection errors on the item of optimization without rectification compared with DPW. It is noticed that the result on dataset Lytro-2 is relatively larger than that of DPW. The main reason is that distortion coefficients and in our model are similar to the elements of the decoding matrix in [11]. Considering the fact that MPC model employs less parameters (i.e. 6-parameter) than DPW (i.e. 12-parameter), the proposed method is competitive with acceptable calibration performance. Further, it is more important that we achieve smaller ray re-projection errors if distortion rectification is introduced in optimization. The ray re-projection errors are encouraging that the proposed method outperforms DPW and BJW. Consequently, the 6-parameter MPC model and 4-parameter distortion model are effective to represent light field cameras.

Fig. 10: Pose estimation results of our collected datasets.
Fig. 11: The stitching results of Illum-1 and Illum-2 datasets (the first pose is regarded as the reference view).
Fig. 12: The central view sub-aperture and distortion rectification results of first pose light field in Illum-2 dataset. The re-projection error (unit: ) of central view sub-aperture image is represented in parentheses.

The reason why we compare ray re-projection errors here is to eliminate differences in camera models. The decoding matrix in [11] is similar to Eq.(9), except for the non-diagonal elements. The non-zero elements and indicate that pixels on the same sub-aperture image have specific relationships among different views. If we calculate the estimated rays by for the measurement , the views may be different. It indicates that there are errors both on the view plane and image plane in [11]. As a result, it is not reasonable to compare re-projection error only.

Moreover, the results of intrinsic parameter estimation and pose estimation on our datasets are demonstrated in Table X and Fig. 10 respectively. After the calibration process, we measure the RMS re-projection errors of sub-aperture images by utilizing estimated parameters, as shown in Table XI. In order to further verify the accuracy of intrinsic and extrinsic parameter estimation, we stitch all other light fields on the first pose, as shown in Fig. 11, from which we can see all view light fields are registered and stitched very well. Eventually, it is worthy noting that there exists distinct distortion in the Illum camera. In Fig. 12, we show original central view sub-aperture image and rectification results using distortion models of the proposed method, DPW [11] and BJW [13] respectively. Since the re-projection error indicates the image distance between a projected point and a rectified one, it can be used to quantify the error of distortion rectification results. In Fig. 12, we list the RMS re-projection error of central view sub-aperture image using different methods in parentheses, which further verifies that the rectification results of the proposed method are better than those of baseline algorithms.

High-precision calibration is essential in early stages of light field processing pipeline. In order to verify the accuracy of geometric reconstruction of the proposed method compared with baseline methods, we capture two real scene light fields, then reconstruct several typical corner points and estimate the distances between them. Fig. 13 shows reconstruction results on the central view sub-aperture images. As exhibited in Fig. 13, the estimated distances between points reconstructed by the proposed method are nearly equal to those measured lengths from real objects by rulers. In addition, Table