The micro-lens array (MLA) based light field cameras, including conventional light field camera  and focused light field camera , can capture radiance information of light rays in both spatial and angular dimensions, i.e., 4D light field [3, 4]
. The data from light field camera is equivalent to narrow baseline images of traditional cameras with coplanar projection centers. The measurement of same point in multiple directions allows or strengthens the applications in computational photography and computer vision, such as digital refocusing
, depth estimation, segmentation  and so on. Recent work also proposed the methods on light field registration  and stitching [9, 10] to expand the field of view (FOV). To support these applications, it is crucial to accurately calibrate light field cameras and establish exact relationship between the ray space and 3D scene.
It plays an important role to build a model for describing the ray sampling pattern of light field cameras. Previous approaches have dealt with imaging models on light field cameras in different optical designs [11, 12, 13, 14, 15]. The common points are based on the fact that the micro-lens is regarded as a pinhole model and the main-lens is described as a thin-lens model. However, some of open issues still remain in the models and methods. Firstly, the proposed models focus on angular and spatial information of rays, but the relationship between light field and 3D scene geometry is not explored. Secondly, very little work has considered a generic model before to describe light field cameras with different image formations [1, 2]. Thirdly, existing intrinsic parameters of light field camera models are either redundant or incomplete such that corresponding solutions are neither effective nor efficient.
In the paper, we first propose a multi-projection-center (MPC) model based on two-parallel-plane (TPP) representation [3, 4]. Then we deduce the transformations between 3D scene geometry and 4D light rays. Based on geometry transformations in the MPC model, we characterize various light field cameras in a generic 6-intrinsic-parameter model and present an effective intrinsic parameter estimation algorithm. Experimental results on both virtual (simulated data) and physical (Lytro, Illum and a self-assembly focused) light field cameras have verified the effectiveness and efficiency of our model.
Our main contributions have three aspects, including
(1) We deduce the transformations to describe the relationship between light field and scene structure.
(2) We describe light field cameras with different image formations as a generic 6-parameter model without redundancy.
(3) We propose an effective intrinsic parameter estimation algorithm for light field cameras, including a closed-form linear solution and a nonlinear optimization.
The remainder of the paper is organized as follows. Section 2 summarizes related work on the models of light field cameras and calibration methods. Section 3 introduces our MPC model and the transformations between the 3D structure and 4D light field. Based on the theory of light field parameterization, a generic 6-intrinsic-parameter light field camera model is proposed. Section 4 provides the details of our calibration method and analyzes computational complexity of the closed-form solution. In Section 5, we present extensive results on the simulated and real scene light fields, demonstrating more accurate intrinsic parameter estimation than previous work [11, 13].
2 Related Work
To acquire 4D light field, there are various imaging systems developed from traditional camera. Wilburn et al.  present a camera array to obtain light field with high spatial and angular resolutions. Classic calibration approach is employed for the camera array . More general, in traditional multi-view geometry framework, multiple cameras in different poses are defined as a set of unconstrained rays, which is known as Generalized Camera Model (GCM) . The ambiguity of the reconstructed scene is discussed in traditional topics . However, such applications on the camera array are limited by its high cost and complex control. In contrast, the MLA enables a single camera to record 4D light field more conveniently and efficiently, though the baseline and spatial resolution are relatively smaller than camera array. Compared to the camera array, multiple projection centers of MLA-based light field camera are aligned on a plane strictly due to physical design. Recent work devotes to intrinsic parameter calibration of light field cameras in two designs [1, 2], which are quite different according to the image pattern of micro-lenses.
The main difference of light field cameras is the relative position of main lens’s imaging plane and the MLA plane . It determines rays’ distribution from the same point, which affects the way to extract sub-apertures from raw image, i.e., the micro-lens images [21, 22]. However, the measurements of the same point in multiple directions are obtained in different types of light field cameras, equivalent to the data of GCM. Therefore, the light field camera model can use classic multi-view geometry theory for reference.
Recently, some state-of-the-art methods have proposed models on conventional light field camera, where multiple viewpoints or sub-apertures are convenient to be synthesized. Dansereau et al.  present a model to decode pixels into rays for a Lytro camera, where a 12-free-parameter transformation matrix is related to reference plane outside the camera (in nonlinear optimization, 10 intrinsic parameters and 5 distortion coefficients are finally estimated). However, the calibration method using traditional camera calibration algorithm is not effective, also there are redundant parameters in the decoding matrix. Bok et al.  formulate a geometric projection model consisting of a main lens and a MLA (their extended work has been published in IEEE TPAMI ). Intrinsic parameters are estimated by conducting raw images directly and an analytical solution is deduced. Moreover, Thomason et al.  try to deal with the misalignment of the MLA and estimated its position and orientation.
Apart from this, other researchers have explored models on the focused light field camera, where multiple projections of the same point are convenient to be recognized. Johannsen et al.  propose to calibrate intrinsic parameters of the focused light field camera. By reconstructing 3D points from the parallax in adjacent micro-lens images, the parameters (including depth distortion) are estimated. However, the geometry center of micro image is on its micro-lens’s optical axis in the camera model. This assumption causes inaccuracy on reconstructed points and estimated results are finally compensated by the coefficients of depth distortion. Hahne et al.  further discuss the influence of above-mentioned assumption, i.e., the deviation of micro-lens and its image. Heinze et al.  apply a similar model with Johannsen et al.  and deduce a linear initialization for intrinsic parameters.
In a word, previous light field camera models are either redundant or complex, which leads to a non-unique solution of intrinsic parameter estimation or inaccuracy of decoding light field. An unreliable camera model is also a bottleneck that might impede light field applications for computer vision and computational photography, especially on light field registration, stitching and enhancement. To support further applications, a general light field camera model capable of representing rays and scene geometry more concisely is in urgent need.
3 Multi-Projection-Center Model
In this section, we first propose the MPC model based on the TPP representation of light field. Then we deduce the transformation matrix to relate 3D scene geometry and 4D rays. Finally, we utilize the MPC model to describe the image formation of light field cameras and define generic intrinsic parameters, including conventional and focused light field cameras. Table I gives the notation of symbols used in the following sections.
|Indexed pixel of raw image inside the camera|
|Virtual (conjugate) light field outside the camera|
|Decoded physical light field|
|3D point in the world coordinates|
|3D point reconstructed by|
|3D point reconstructed by|
|Rotation matrix of extrinsic parameter|
Translation vector of extrinsic parameter
|Measurement matrix of rays|
|Homogenous projection matrix|
|Non-homogenous projection matrix partitioned from|
|Homography matrix decided by intrinsic and extrinsic parameters only|
3.1 The Coordinates of MPC Model
As shown in Fig. 1, there are three coordinates in the MPC model, i.e., 3D world coordinates , 3D camera coordinates , 4D TPP coordinates ( for the view plane and for the image plane). In general, the transformation between world and camera coordinates is related by extrinsic parameters . The spacing between two parallel planes of traditional TPP representation is normalized as to describe a set of rays [3, 4]. Although it is complete and concise, to derive the transformation between 3D structure and 4D rays in light field cameras, we prefer a model consisting of two parallel planes with the spacing .
Let denote light field in the MPC model with the spacing . Then the ray is parameterized by two planes, i.e., and . Let denote the view plane and denote the image plane . In the MPC model, defines a ray passing and , where is the projection center and is the corresponding projection.
Given a projection center (i.e., the - view or sub-aperture) and the 3D point , we can get the image projection in the local coordinate of the - view,
Since there are multiple projection centers , , the 3D point can be observed for times. Obviously, when the spacing changes to and there is only one projection center on the view plane, the image formation degenerates into traditional central-projective camera model .
3.2 Transformation between Geometry and Rays
It is known that different directional rays from one point enable 3D reconstruction. Let the ray intersect at the point in the 3D space, we can get the relationship between the ray and 3D point by the triangulation,
where is a matrix consisting of rays and the MPC parameter .
If two rays and are from one 3D point , they can be represented by the following two equivalent forms,
3.3 3D Projective Transformation
In fact, a linear transformation on the coordinates ofcauses 3D projective distortion on the reconstructed point , deduced from Eqs.(3) and (4). As shown in Fig. 2, we show three examples of linear transformations, including the changing of , scaling in the image plane () (in general there are 4 scaling factors , two in the view plane and two in the image plane respectively), and translation in the image plane of specific view (generally in both planes). The details are derived as follows.
where and are in the homogeneous coordinates.
(3) Let become , thus there is a transformation caused by the scaling vector . Then the transformation matrix between and is,
As shown in the left-most of Fig. 2, there is a scene with a Lambertian cube recorded by a MPC model. The observation of the cube in multiple directions is 4D light field. If the coordinates are linearly transformed and the light intensity keeps constant, the intersections of rays will be transformed by a 3D projection matrix. Therefore, the cube will be projected by transformation parameters (the right three of Fig. 2).
3.4 The MPC Model in Light Field Cameras
Light field cameras are improved from traditional cameras. They record real world scene in different but similar ways. In traditional cameras, the central projection process of a 2D image is a dimension reduction of 3D space . In light field camera, 3D structure projected by the main lens is arranged by the design of light path on the image sensor. The processes of multiple center projections are analyzed as follows.
On the one hand, as for a conventional light field camera, the sampling pattern of light field is shown in Fig. 3. The pixel of sub-aperture images is extracted from the micro-lens of . The sub-aperture image of the view is extracted from the pixels in the local micro-lens image coordinates, as shown in Fig. 3. Obviously, there are two light fields, i.e., inside the camera and in the outer world. Considering the projection of main lens, there is a 3D projective distortion between the 3D points reconstructed from and .
On the other hand, as for the focused light field cameras, two sampling patterns of light field in two different optical paths are shown in Fig. 4. The micro-lenses project the distorted 3D scene inside the camera on the image sensor, where the image range is controlled by the aperture of main lens and the distance of components. The light field inside the camera can be decoded by the pixels of image sensor and their corresponding optical centers of micro-lens , i.e., . In addition, is determined by the layout of MLA, as shown in Fig. 5b. By the transformation on the coordinate of we have discussed in Sec 3.3, the outside light field is obtained, which is the conjugate MPC coordinate outside the camera. The real world scene can be reconstructed by the light field without projective distortion.
Let denote indexed pixels of light field cameras with . Moreover, is a set of indexed pixels and not a physical light field. In conventional light field camera, are the sub-aperture images indexed by the view. In the focused light field cameras, are micro-lens images indexed by their relative positions on the raw image. Obviously, by a linear transformation on the , we can conduct and eliminate 3D projective distortion caused by the main lens. However, to parameterize 4D light field without redundancy, the spacing of two parallel planes should be 1. Let denote the normalized light field. According to Eqs.(5) to (8), the normalization is a linear operation on the coordinates, and transformation matrices , and are all identity matrices. It means that indexed pixels can be transformed to physical rays in real world scene by linear transformations as we discussed before. The indexed pixels and decoded physical light field of light field cameras in two different designs are shown in Fig. 5, where pixels and physical rays are related by intrinsic parameters.
In summary, we can transform an indexed pixel of raw image into a normalized physical light field by a decoding matrix that is consisting of intrinsic parameters .
Let and denote two 3D points reconstructed by and respectively. According to Eq.(9), the relationship between and is
where is determined by intrinsic parameters in the decoding matrix . Here, and , which are totally decided by the mapping from indexed pixels to real world light rays.
In addition, the light field inside a conventional light field camera (in Fig. 3) can also be parameterized by the MPC model that is consisting of image sensor and the MLA. However, considering the convenience of extracting sub-aperture images and the difficulty on detecting points on raw image in a conventional light field camera, we prefer to discuss the data as a set of sub-aperture images. Conversely, for the focused one, we model the parameterization plane by the raw image plane and discuss the raw image directly.
4 Light Field Camera Calibration
We verify our light field camera model by intrinsic parameter calibration. We will provide the details of how to solve intrinsic parameters, including a linear closed-form solution and a nonlinear optimization to minimize the re-projection error. In our method, the prior scene points are supported by a planar calibration board in different poses.
4.1 Linear Initialization
After necessary preprocessing, the micro-lens images are recognized [11, 21, 26], i.e., . We assume that the prior 3D point in the world coordinates is related to the 3D point in the MPC coordinates by a rigid motion, , with the rotation and translation . Let denote - column vector of . The relationship among , , and intrinsic parameters is obtained by Eqs.(2) and (10).
where is a measurement matrix of rays and . These rays are derived from the indexed pixels as mentioned in Eq.(1).
Suppose that the calibration board is on the plane of in the world coordinates, thus . To solve the unknown parameters, we simplify Eq.(11) as,
where is a matrix stretched on row from . is a direct product operator. is a matrix only consisting of intrinsic and extrinsic parameters, defined as
In order to derive intrinsic parameters from , we can partition to extract a upper triangle matrix . Let denote the element on the - row and - column of , we rewrite Eq.(13) as follows,
where is a matrix, i.e., top-left of .
Let denote the - column vector of . Utilizing the orthogonality and identity of , we have
Let a symmetric matrix denote . The analytical form of is
Note that there are only 5 distinct non-zero elements in , denoted by . To solve , we rewrite Eq.(15) as follows,
By stacking at least two such equations (from two poses) as Eq.(17), we can obtain a unique general non-zeros solution for , which is defined up to an unknown scale factor.
Once is determined, it is an easy matter to solve using Cholesky factorization . Let denote the estimation of , i.e., . Let denote the element on the - row and - column of , intrinsic parameters except and are estimated by the ratio of elements
Apart from intrinsic parameters, extrinsic parameters in different poses can be extracted as follows,
where denotes norm. values 1 or -1 and it is decided by image formation. In conventional light field camera and the focused one with shorter light path (as shown in Fig. 3 and 4b), makes . Otherwise, in the focused light field camera with longer light path (see Fig. 4a), makes .
Stacking the measurements in different poses, we can obtain a unique non-zeros solution for and .
4.2 Nonlinear optimization
The most common distortion of traditional camera is radial distortion. The optical property of main lens and physical machining error of the MLA might lead to the distortion of rays in light field camera. Theoretically, due to two level imaging design with main lens and micro-lens array, there should exist radial distortion on the image plane and sampling distortion on the view plane simultaneously. In the paper, we only consider the distortion on the image plane and omit sampling distortion on the view plane (i.e., angular sampling grid is ideal without distortion).
where and is the ray transformed from the measurement by intrinsic parameter according to Eq.(9). denotes distortion vector and is undistorted projection from the distorted one in the local image coordinates under the view. In the distortion vector , and regulate radial distortion on the image plane. and represent the distortion of image plane affected by the sampling view , which is caused by non-paraxial rays of the main lens.
We minimize the following cost function with the initialization solved in Section 4.1 to refine the parameters, including intrinsic parameter , distortion vector , and extrinsic parameters and , , is the number of poses.
In Eq.(22), is parameterized by Rodrigues formula . In addition, the Jacobian matrix of cost function is simple and sparse. This nonlinear minimization problem can be solved with the Levenberg-Marquardt algorithm based on trust region method . We adopt MATLAB’s function to complete the optimization.
4.3 Computational Complexity
The calibration algorithm of light field camera is summarized in Alg. 1. Let denote sampling number on the view plane, be the number of prior points on the calibration board, and be the number of poses, respectively. For the measurement of each pose, there are linear equations to solve . Then linear equations and equations are solved to obtain intrinsic parameters. The main complexity is spent on the solution of from different poses, i.e., .
By contrast, the algorithm in Dansereau et al.  calculates a homography for every sub-aperture image in different view, i.e., . It suffers from a higher complexity and a lower accuracy on parameter initialization. The algorithm in Bok et al.  solves a linear equation on every pose and its computational complexity is . However, there are intrinsic and extrinsic parameters in the equations, which causes inaccuracy on the solution.
5 Experimental Results
In this section, we verify our light field camera model by the calibration of intrinsic parameters. We present various experimental results both on simulated and real datasets. The performance is analyzed by comparing with the ground truth or baseline algorithms  and .
5.1 Simulated data
In this subsection we verify our calibration method on simulated data. The simulated light field camera has the following property referred to Eq.(9), as shown in Table II. These parameters are close to the setting of Lytro camera so that we obtain plausible input close to real-world scenarios. The checkerboard is a pattern with points with cells.
5.1.1 Performance w.r.t the number of poses and views
Firstly, we test the performance with respect to the number of poses and the number of views. We vary the number of poses from 2 to 8 and the number of views from to . For each combination of pose and view, 200 trails of independent calibration board poses are generated. The rotation angles are randomly generated from to
, and the measurements are all added with Gaussian noise with zero mean and standard deviation 0.5 pixels.
The calibration results with increasing measurements are shown in Fig. 6. We find that the relative errors decrease with the increase in the number of poses. When the number of pose is greater than 2, all the relative errors are within an acceptable level, as summarized in TableIII. Meanwhile, the errors reduce as the number of views grows once the number of poses is fixed. In particular, when and , all the relative errors are less than . Furthermore, the standard deviations of relative errors of Fig. 6 are shown in Fig. 7, from which we can see that standard deviations decrease significantly when the number of pose is greater than 2. Particularly, when and , standard deviations keep at a low level stably. The results in Fig. 6 and Fig. 7 have verified the effectiveness of the proposed calibration algorithm.
5.1.2 Performance w.r.t the measurement noise
Secondly, we employ the measurements of 3 poses and views to verify the robustness of calibration algorithm. The rotation angles of 3 poses are , and respectively. Gaussian noise with zero mean and a standard deviation is added to the projected image points. We vary from to pixels with a step. For each noise level, we performed 150 independent trials. The mean results compared with ground truth are shown in Fig. 8. It demonstrates that the errors increase almost linearly with the noise level. For pixels which is larger than normal noise in practical calibration, the errors of and are less than . Although the relative error of is , the absolute error of is less than pixel (In Eq.(9), and , where is the principal point of sub-aperture imaging), which further exhibits that the proposed algorithm is robust to higher noise level.
5.2 Physical camera
We also verify the calibration method on real scene light fields captured by conventional and focused light filed cameras. For the conventional light field camera, we use Lytro and Illum to obtain measurements. For the focused one, we use a self-assembly camera according to optical design in Fig. 4a.
5.2.1 Conventional Light Field Camera
The sub-aperture images are obtained by the method of Dansereau et al. . We compare the proposed method in ray re-projection error with state-of-the-arts, including DPW by Dansereau et al.  and BJW by Bok et al. .
Firstly, we carry out calibration on the datasets collected with . For every different pose, the middle sub-apertures are utilized similar to DPW. Table IV summarizes the root mean square (RMS) ray re-projection errors of our method and DPW. In Table IV, the errors of DPW-1 are taking from the paper directly. The errors of DPW-2 are obtained by running their latest released code. On the item of initial, the proposed method provides a smaller ray re-projection error than DPW except on datasets A and B. The result on dataset A performs worse because of bad corner extraction from several poses (i.e., 7, 8, 9 and 10 light field). On the item of optimized, compared with DPW  which employs 12 intrinsic parameters, the proposed MPC model only employs a half of parameters but achieves similar performance on ray re-projection errors (the results on datasets A, B and D are better but the results on datasets C and E are worse). Light fields within each dataset are taken over a range of depths and orientations, as shown in Fig. 9. The ranges of datasets A, B are whilst the ranges of datasets C and D do not exceed . Meanwhile, the ranges do not exceed in dataset E. Large ranges are reasonable in all datasets only deducing the accuracy in light of distortion model considering the shifted view. This is the main reason why the performance of the proposed method is worse than that of DPW on dataset E. From the dataset A, we select 6 light fields from which the corners are exactly extracted for the proposed method. The ray re-projection error decreases obviously in Table IV. Considering the fact that the errors exhibited in DPW are minimized in its own optimization (i.e., ray re-projection error), we additionally evaluate the performance in mean re-projection error of DPW and BJW. As exhibited in Table V, the errors of the proposed method are obviously smaller than those of DPW and BJW. In addition, the calibration with fewer number of poses on the datasets  is conducted. For dataset D, we randomly select 6 light fields, and for datasets B, C and E, 5 light fields are randomly selected for calibration. Table VI summarizes RMS ray re-projection errors and RMS re-projection errors of the proposed method and DPW  respectively. In Table VI, the proposed method achieves smaller errors than DPW. Besides, the calibration results on datasets D and E are obviously improved by reducing the number of poses. We find that smaller range of poses contributes to a performance improvement on datasets D and E, which is shown in Fig. 9. Table VII lists intrinsic parameter estimation results. The re-projection errors of sub-aperture images of B are summarized in Table VIII. The distribution of errors is almost homogeneous. All results have verified the effectiveness of the proposed method.
|Ray re-projection error||Re-projection error|
Unlike the core idea of DPW, BJW directly utilizes raw data instead of sub-apertures. However it has a stricter requirement on the acquisition of the calibration board. The data for calibration must be unfocused in order to make the measurements detectable, thus some datasets provided by DPW are incalculable for BJW, just as shown in Table V (i.e. datasets C and D). In order to directly compare with DPW and BJW, we collect other 4 datasets111http://www.npu-cvpg.org/opensource using Lytro and Illum cameras. The dataset Illum-1 shoots corners with cells, including 9 poses. The dataset Lytro-1 shoots corners with cells, including 8 poses. The datasets Illum-2 and Lytro-2 shoot corners with cells, including 10 poses. For Illum-1 and Illum-2 datasets, the middle views are used ( views in total). For Lytro-1 and Lytro-2, the middle views are used ( views in total). Table IX summarizes the RMS ray re-projection errors compared with DPW and BJW at three calibration stages. As exhibited in Table IX, the proposed method obtains smaller ray re-projection errors on the item of initial solution which verified the effectiveness of linear initial solution for both intrinsic and extrinsic parameters. Besides, the proposed method provides similar or even smaller ray re-projection errors on the item of optimization without rectification compared with DPW. It is noticed that the result on dataset Lytro-2 is relatively larger than that of DPW. The main reason is that distortion coefficients and in our model are similar to the elements of the decoding matrix in . Considering the fact that MPC model employs less parameters (i.e. 6-parameter) than DPW (i.e. 12-parameter), the proposed method is competitive with acceptable calibration performance. Further, it is more important that we achieve smaller ray re-projection errors if distortion rectification is introduced in optimization. The ray re-projection errors are encouraging that the proposed method outperforms DPW and BJW. Consequently, the 6-parameter MPC model and 4-parameter distortion model are effective to represent light field cameras.
The reason why we compare ray re-projection errors here is to eliminate differences in camera models. The decoding matrix in  is similar to Eq.(9), except for the non-diagonal elements. The non-zero elements and indicate that pixels on the same sub-aperture image have specific relationships among different views. If we calculate the estimated rays by for the measurement , the views may be different. It indicates that there are errors both on the view plane and image plane in . As a result, it is not reasonable to compare re-projection error only.
Moreover, the results of intrinsic parameter estimation and pose estimation on our datasets are demonstrated in Table X and Fig. 10 respectively. After the calibration process, we measure the RMS re-projection errors of sub-aperture images by utilizing estimated parameters, as shown in Table XI. In order to further verify the accuracy of intrinsic and extrinsic parameter estimation, we stitch all other light fields on the first pose, as shown in Fig. 11, from which we can see all view light fields are registered and stitched very well. Eventually, it is worthy noting that there exists distinct distortion in the Illum camera. In Fig. 12, we show original central view sub-aperture image and rectification results using distortion models of the proposed method, DPW  and BJW  respectively. Since the re-projection error indicates the image distance between a projected point and a rectified one, it can be used to quantify the error of distortion rectification results. In Fig. 12, we list the RMS re-projection error of central view sub-aperture image using different methods in parentheses, which further verifies that the rectification results of the proposed method are better than those of baseline algorithms.
High-precision calibration is essential in early stages of light field processing pipeline. In order to verify the accuracy of geometric reconstruction of the proposed method compared with baseline methods, we capture two real scene light fields, then reconstruct several typical corner points and estimate the distances between them. Fig. 13 shows reconstruction results on the central view sub-aperture images. As exhibited in Fig. 13, the estimated distances between points reconstructed by the proposed method are nearly equal to those measured lengths from real objects by rulers. In addition, Table