1 Introduction
Rcent years have witnessed tremendous development in light field (LF) technologies. LF cameras, devices to collect light rays on the image sensor in a single photographic exposure in both spatial and angular dimensions, are designed to realize the LF technique in various engineering applications. Due to the capability of recording angular information of the light rays entering from the external world, LF cameras have boosted the development of the computational photography and computer vision in a wide variety of applications, including refocusing
[1], depth estimation
[2], [3], [4] , Synthetic Aperture Imaging [5], [6] and visual Simultaneous Localization and Mapping (SLAM) [7], [8].At early years, many types of LF cameras were proposed. Wilbrun et al. [9] designed a large camera array with high spatial and angular resolutions to acquire 4D LF. Veeraraghavan et al. [10] proposed a maskequipped LF camera to capture conventional 2D photos at full sensor resolution. Liang et al. [11] demonstrated a type of LF cameras to capture LF at full sensor resolution through multiple exposures with an adjustable aperture. Marwah et al. [12] proposed a compressive LF camera architecture, which allows for higherresolution LF recovery. Taguchi et al. [13] described a spherical catadioptric imaging system using axialcone cameras to acquire LF images with broader field of view.
In contrast to above designs, microlens array (MLA) based LF cameras are soon popularized due to their advantages in low manufacturing expenditure and high quality of LF image producing. State of the art MLAbased LF cameras include conventional LF camera designed by Ng. et al [14] and focused LF camera designed by Georgiev and Lumsdaine [15]. Ng et al significantly shortened the optical path and designed a portable camera by inserting an MLA between the sensor and main lens, creating a conventional LF camera, which enabled shorter exposures and lower image noise. Georgiev and Lumsdaine presented a modified LF camera, in which the MLA is interpreted as an imaging system focused on the focal plane of the main lens. This camera capture LF with higher spatial resolution but lower angular resolution. Wellknown commercial MLAbased LF cameras developed on the basis of [14] and [15] are Lytro [16] and Raytrix [17].
To achieve an optimal performance in these applications, it is of fundamental importance to accurately calibrate LF cameras before applying them in any applications. In a typical LF camera calibration pipeline, camera projection model and corner detection are two key component. A projection model describes the relationship between pixels on the internal camera sensors and the rays coming into the camera or 3D scene geometry. A corner detection algorithm is applied to accurately detect feature points on the checkerboard for building the correspondence between 3D points on the checkerboard grid and their corresponding 2D image points. Great progress has been made in these two areas in recent years. Wellknown projection models include the one describing the relationship between pixels and rays [18], [19], the one describing the relationship between pixels and 3D scene geometry [20] and the one describing 3D feature of the raw data and 3D scene geometry [21]. For corner detection research in [18], [20] extracted checkerboard corners in each individual subaperture image. Differently, algorithms proposed in [22], [23] and [24] directly utilized the raw image to detect corners. Improving the accuracy of projection models and corner detection methods remain to be solved.
In this paper, a SubAperture Related Bipartition (SARB) projection model is proposed to describe the LF camera with two sets of parameters targeting on center view subaperture and relations between subapertures. Correspondingly, a twostep calibration method is proposed with each step dealing with each set of parameters. Moreover, a corner detection algorithm that fully utilizes 4D LF data from the raw image is also proposed.
Our main contributions are as follows:
We proposed a projection model in a simple form with the capability to characterize LF camera without redundancy. This projection model consists of two sets of parameters targeting on center view subaperture and relations between subapertures. By doing so, calibration methods designed for traditional camera can be reused to estimate the intrinsic matrix of the pinhole camera at the center view subaperture. Another advantage of separating the parameters is that lateraldirection errors and depthdirection errors can be both reduced, since depthdirection error is only affected by parameters represent relations between subapertures.
We proposed a corner point detection method for LF camera calibration which jointly use the 4D LF information. The detection accuracy as well as the robustness of the method are significantly improved.
The rest of this paper is organized as follows. Related works are investigated in Section 2. The proposed SARB projection model is detailed in Section 3. The procedures of the proposed calibration method are described in Section 4. The details of the proposed corner detection algorithm are provided in Section 5. Experimental results are given in Section 6. Finally, Section 7 concludes the paper.
2 Related Works
Different algorithms have been proposed to solve the LF camera calibration problem related to projection model and corner detection [18], [22], [23], [20], [21], [25], [19], [24], [26]. For camera projection model, Dansereau et al [18] was the first to deliver an endtoend geometric calibration method for conventional LF camera. A 12parameter homogeneous matrix was proposed to express the projection model, which connects every pixel on the sensor and the corresponding ray coming into the camera. Nevertheless, this method involves too many parameters and its redundancy was analyzed in [20]. Duan et al [19] modeled the imaging process of conventional LF cameras and proposed a homogenous intrinsic matrix, which describes the relationship between pixels and rays using a more compact form if compared with [18]. Zhang et al [20] proposed a multiprojectioncenter (MPC) model with six intrinsic parameters to characterize both conventional and focused LF cameras. The projection model efficiently connects 3D geometry to the recorded light field with a simple form. O’Brien et al [21]
proposed a novel projection model which relates the plenoptic disc features extracted from the raw image to 3D scene points, i.e. a 3D to 3D correspondence is made between image and 3D scene geometry. It should, however, be noted that the parameters involved in the above projection models, both the ones related to information inside a subaperture and the ones corresponding to information between subapertures are deduced together. This nonseparate process can trigger large errors in the depth direction in the final result, as discussed in Section 6.3. Different from the above research where parameters in the projection models are determined together, Zhou et al
[25] proposed a camera model with a clear separation of the involved papameters, namely the ones describing the main lens and the ones describing the MLA. A twostep calibration method was proposed accordingly. This projection model is, however, of higher dimensionality and requires additional camera parameters (e.g., size of pixel) to be included.For corner detection, algorithms described in [18], [20] and [25] are based on the subaperture images which is generated by sampling pixels at regular grids. Under this circumstance, corner detection is performed in each individual subaperture image. The corner locations of all subaperture images were not guaranteed along a special disparity line. In other words, relations between different subapertures were ignored in these algorithms. Other calibration algorithms, including [23] and [26] were based on raw images. Noury et al [23]
proposed a detection algorithm with two steps. They first classify the microlens images in the raw image with their content type, a patterm registration method is followed to detect subpixelic corners locations in each microlens image. Nousias et al
[26] detected corner locations in each microlens image, by calculating the intersection of two potential saddle axes which are retrieved beforehand. However, these two methods is only applicable for the focused LF camera which captures clear and high resolution microlens image. In view of conventional LF cameras, it is difficult to directly detect corners locations in microlens images that are in extremely low resolution and blur. Instead of detecting corners, Bok et al. [22] innovatively extracted line features from microlens images and proposed a geometry calibration method. Nevertheless, the line feature template was designed for each single microlens image, which ignores the relations between different microlens images.3 SARB Projection Model
In this section, the SubAperture Related Bipartition (SARB) projection model is proposed to establish the relationship between the rays coming into the camera and the pixels on the sensor. To do so, we first parameterize the rays with the spaceangle TPP (twoplaneparameterization) model [27], [28], [29] and the pixels with 4D indices. Then two sets of parameters are proposed to build the SARB projection model, with one set to form the intrinsic matrix of the pinhole camera at the center view subaperture, and the other related to generating subapertures in other views.
3.1 Parameterization of Rays and Pixels
To parameterize light rays, the spaceangle TPP model as used in [27], [28], [29] is employed. In this model, a light ray is characterized by its intersections with two parallel planes, the distance between which is one unit length, as shown in Fig. 2. denotes the intersection between the ray and the first plane, and denotes the intersection between the ray and the second plane. denote the 2D coordinates of on the first plane. and denote the offset of relative to in horizontal and vertical directions respectively. In the spaceangle TPP model, a ray is parameterized by . In this paper, the main lens plane is regarded as the first plane, and the second plane is the plane with one unit length from the main lens, as shown in Fig. 1.
In the proposed algorithm, main lens are modelled as “thin lens” and the microlens are treated as “pinholes array”. Without loss of generality, the optical center of the main lens and optical axis are defined as the origin and the axis of the coordinate system. Moreover, the axis points towards outside of the camera. The axis points upwards, and axis is determined by the righthand rule. The optical paths of the microlens based LF camera in a 2D space is shown in Fig. 1. With the spaceangle TPP model, the light ray outside the camera emitting from 3D scene point is described by , and the light ray refracted by it is described by
After the parameterization of rays, pixels on the image sensor are expressed by 4D coordinates. On one hand, a pixel on the sensor can be expressed using its spatial locations in the camera coordination system as , where describes the center of the subimage (the microlens image) that pixel belongs to, and denotes the offset of that pixel to , as shown in Fig. 1. On the other hand, a pixel can also be represented using its pixel index as, where is the center of the subimage in the image coordinates, and correspond to the offset to . The relationship between the two representations is
(1)  
where and are indices of the pixel where the optical axis intersects at the sensor plane, and is the physical distance between two adjacent pixels.
3.2 SARB Projection Model
In this subsection, to better introduce the SARB model in a readily comprehensible way, subsequent analysis is performed in 2D space. The 4D version of the proposed concept can be easily deduced based on the analysis.
According to Gaussian formula in ray transfer matrix theory [30], the relationship between the ray outside the camera and the ray refracted by it inside the camera is given by
(2) 
where denotes the focal length of main lens.
Then, inside the camera, according to Fig. 2, the ray corresponding to pixel , can be expressed using a similartriangles argument as
(3) 
and
(4)  
where is the distance between the man lens and the MLA, and is the distance between MLA and the sensor plane. Substitute (3) and (4) to (2) , we can further obtain
(5) 
After replacing the physical coordinates of the pixel to the pixel indices using (1) , we can obtain
(6)  
(7) 
We have now expressed the relationship between the rays and pixels using the projective matrix in (7) , which is formed by the physical parameters of the LF camera. Using (7) , two sets of parameters are further proposed, i.e., one forming the intrinsic matrix of the pinhole camera at the center view subaperture and the other related to generating other views subapertures.
3.2.1 Intrinsic matrix of the pinhole camera at center view subaperture
As described in [31], rays passing through the pixels with the same offset in different subimages come from a single subaperture on the main lens. Specially, the pixels that cater for exactly form the image of the subaperture in the center view. Moreover, when , equals to . In this circumstance, (6) ) can be simplified to
(8)  
The inverse form of (8) is represented as
(9) 
By defining
(10) 
and
(11) 
equation (9) can be rewritten as
(12)  
where denotes a 3D scene point that the ray passes through. Equation (12) indicates that the matrix block composed of four elements at the bottom right of the projective matrix is exactly the inverse of intrinsic matrix of the pinhole camera at the center view subaperture.
3.2.2 Parameters to generate images of noncenter views subapertures
For a simple derivation, the following analysis is first carried out in 2D space. Suppose the ray emits from a scene point , using a similar triangles argument in Fig. 1, the relationship between the scene point and the ray is
(13) 
Substituting (6) to (13) , the relationship between the scene point and the pixel is
(14) 
where
(15) 
and
(16) 
Furthermore, we derive the relationship between the depth and the disparity in the raw image of the scene point based on (14) . Concretely, considering the pixels and which record rays coming from the same scene point , we can derive
(17)  
which is further expressed as
(18) 
By defining
(19) 
(18) can be rewritten as
(20) 
In fact, is regarded as the disparity between adjacent subapertureimages if neglecting a marginal scale factor. About this, we can make a simple interpretation, can be regarded as the difference of indices of two supapertures due to the onetoone correspondence between and the subaperture. And is proportional to the disparity of the image points of in two subaperture images. To be more precise, the difference of the pixel coordinates between two subaperture images is the disparity, but it is equal to after changing the unit from pixels in raw image to indices of microlens. Given the images of the center view subaperture, the images of other view subapertures can be obtained using (19) and (20) . More specifically, the disparity of a 3D scene point is firstly calculated by (20) . Then, with, the projected point that this 3D scene point corresponds to in the image of a given subaperture is calculated by (19) , where corresponds to the index of the center view subaperture, i.e. , and corresponds to the index of the given subaperture. is the pixel of the projected point in center view subaperture, which is a known quantity, and is the pixel of the projected point in the given subaperture, which is waiting to be determined.
3.2.3 SARB projection model
Using two sets of parameters as discussed above, the relationship between pixels and rays can be rerepresented. To be specific, with (15) and (16) , elements in the first two columns of the matrix in (7) can be rewritten as
(21) 
and
(22) 
With (21) and (21) , the SARB projection model in this paper is be eventually represented in Fig.3.
4 Calibration
In this paper, we utilize the proposed SARB projection model for calibration. As it consists of two sets of parameters targeting on different aspects, our calibration method is therefore divided into two steps, with each step corresponding to a set of parameters. In the literature, a typical calibration pipeline includes feature detection, initial solution and nonlinear optimization [18], [21], [32]. Different from existing research, the nonlinear optimization in the proposed calibration only exists in step 1. Our twostep method aims at acquiring the parameters of center view subaperture, i.e. , , and in the first step, and the parameters to generate images of noncenter view subapertures, i.e., and in the second step. Besides, all extrinsic parameters and all distortion parameters are also estimated in the first step.
4.1 Preparations
Before calibration, for every raw image, the subpixel locations of corner points in the center view subaperture image and their disparities should be acquired. Details related to how to obtain reliable and accurate corner points are introduced in Section 5. Moreover, the entire 4D raw image data is utilized for calibration.
4.2 Step 1
, , and denote the intrinsic parameters of the pinhole camera at the center view subaperture, so they can be estimated by calibration methods designed for traditional cameras. In this section, method in [32] is utilized to estimate , , and . Both the closedform solution and the maximum likelihood estimation of this method are performed. Besides, the extrinsic parameters and as well as all distortion parameters are also refined by the maximum likelihood estimation.
4.3 Step 2
After acquiring the disparity parameter of every corner point in every raw image by the method in Section 5, the depth value, i.e. , of the corner point is calculated as
(23) 
where is the coordinate of the corner on the checkerboard, i.e. the coordinate of the corner in the world coordinate system. After obtaining the disparity and depth of every corner point in every shot, a linear equation is obtained as
(24) 
By stacking all the linear equations to a matrix , and
are obtained by the Singular Value Decomposition (SVD) decomposition of matrix
.4.4 Distortion Model
In this paper, only the distortion of the main lens is considered. We treat micro lens array as a pinhole array where distortion isnonexist. The proposed distortion model is based on an assumption that rays emitting from one 3D scene point will still converge to one point after the refraction of main lens. To put it differently, the distortion correction procedure should not bring divergence for these rays coming from the same scene point. Under this assumption, the distortion components in all subaperture images are the same. In this case, it is only required to make a distortion model for the images of center view subaperture. The secondorder radial distortion and tangential distortion are considered in this section. The radial distortion term is denoted as
(25) 
and
(26) 
where and are the normalized image coordinates of the undistorted pixel locations, and . and are the radial distortion coefficients. The tangential distortion term is
(27) 
and
(28) 
where , are the tangential distortion coefficients. The total distortion term is the sum of two elements,
(29) 
and
(30) 
Different from previous works [18], [22] and [25], our method does not have a global optimization procedure. The distortion coefficients as well as , , and are established in Step 1.
5 Detection of LFPoints of Corner Points from 4D Raw Data
Different from traditional cameras, a 3D scene point does not correspond to only one pixel in LF camera. Instead, LF cameras capture 4D LF of the 3D scene point. Thus, the output of corner detection for LF camera should be a description of the entire 4D LF this 3D corner corresponds to. Moreover,the recorded 4D LF, i.e. raw data on the sensor in the LF camera, should be jointly utilized.
Especially, with regard to calibration, 3D scene points and 3D lines are regarded as corners and line segments connecting two adjacent corners in the checkerboard, as shown in Fig. 4. In this section, a detection method is proposed to solve the ”LFpoint” which is defined as the representation of 4D LF of the corner in the checkerboard. Similarly, ”LFline” is also defined, which is the representation of 4D LF of the line segments in the checkerboard. The detection method consists of generation of 4D templates, calculation of normalized crosscorrelation (NCC), nonlinear optimization and calculating intersection of lines.
5.1 LFPoint
The X and Y coordinates of the projected point of a 3D scene point in center view subaperture image and the disparity parameter that is defined in Section 2.2， i.e., , are adopted to represent 4D LF of the 3D scene point. That underlying reason is that, given , the location of the projected point of the 3D scene point in every 2D slice of the 4D LF (i.e., every microlens image and every subaperture image) can be calculated by (19) , which indicates is a complete representation of 4D LF of a 3D scene point. For brevity, is termed as a “LFpoint”.
5.2 LFLine
In this paper, 3D lines are divided into two categories according to the angle between the 2D image of the 3D line and the Xaxis in the center subaperture image. The first category is the horizontal line whose angle is smaller than 45 rad, and the other is the vertical line whose angle is greater than 45 rad. Refer to [33]
, a 3D line has 4 degree of freedom. In other words, it requires at least four parameters to completely represent the 4D LF of a 3D line. We thus use
and to describe 4D LF of the horizontal line and the vertical line respectively. Note the coordinate is eliminated for horizontal line segment and the coordinate is also eliminated for vertical line segment. Besides, as shown in Fig. 4, and represent the “LFpoints” of two terminals of the 3D horizontal line. and represent the “LFpoints” of two terminals of the 3D vertical line. Taking the horizontal line as an example and given , we can obtain the projected 2D image of this horizontal line in every 2D slice of the 4D LF with following steps: (1) Given the center of a subimage , the locations of the projected points of two terminals of the horizontal line in this subimage is calculated by (19) . (2) Denoting two projected points with and , the line equation of the 2D image of this line segment is calculated as(31) 
where denotes the coefficients of the 2D line, i.e. . Similar procedure can be applied for the vertical line. In this case, we can draw the conclusion that is a complete representation of 4D LF of a 3D line. For brevity, is termed as a “LFline”.
5.3 Detection of LFPoints of Corner Points
Evidence showed that it is difficult to directly detect the locations of projected points of checkerboard corners in subimages that are in extremely low resolution [22]. To cope with this issue, the algorithm in [22] detected the line feature in every subimage respectively. It should be noted that this algorithm fails to take full advantage of the 4D LFdata. In this section, we propose 4D templates that make the best of 4D LFdata to recognize LFlines of 3D line segments. After solving the intersection points of horizontal line segments and vertical line segments by SVD method, the LFpoints of checkboard corners are acquired.
5.3.1 Generation of 4D Template
Take the horizontal line segment as an example. By changing the values of ( and are fixed), we can obtain a series of 4D templates fully exploiting 4D LFdata. Concretely, given a set of values of , we can calculate the line equation of the corresponding line segment in every 2D angular slice by the method detailed in Section 5.2. Then 2D templates are generated using the method as described in [22]. After obtaining the 2D templates of all these angular slices, an entire 4D template is obtained. Fig. 5 illustrates a 3D slice of a 4D template. We can find that the 2D line in every angular slice changing gradually along the space axis.
5.3.2 Calculation of NCC
We first reshape the 4D templates into a sequence of 2D angular slices, and then calculate the NCC between each 2D angular template and its corresponding subimage in the raw data. Total NCC of the entire 4D template is considered as the sum of the NCC of each one.
5.3.3 Nonlinear Optimization
Considering the calculation of NCC between all these 4D templates to the actual raw data is a timeconsuming task, a nonlinear optimization procedure is utilized to find the optimal template. Total NCC value is regarded as the objective function, and is regarded as the optimization variables. Starting from an initial solution, a direct search method [33] , that does not use numerical or analytic gradients was applied to find the optimal solution.
5.3.4 Calculating intersection of lines
A horizontal line and a vertical line intersect at a corner point in the checkerboard. As shown in Fig. 6 we first use raw data in the red area to detect the vertical line, then use raw data in the blue area to detect the horizontal line. By denoting “LFlines” of the vertical and horizontal line segments with and respectively, we first calculate the null space of the coefficient matrix in
(32) 
and
(33) 
which represent a collection of planes that pass through two “LFpoints”. Then the intersection of the two null spaces, i.e., the intersection of the two plane collections, must be the intersection corner of the vertical line segment and the horizontal line segment. It is calculated by solving
(34) 
where and are two bases of the null space obtained by solving (32). and are two bases of the null space obtained by solving (33). All three linear equation as discussed above, i.e. (32)(34), are solved by the SVD operation. And the result acquired by solving (34) is exactly the “LFpoint” of the intersection corner.
6 Experimental Result
Six datasets are used to evaluate the performance of the proposed calibration method. Among these six datasets, three of them are from [18], which are captured by one Lytro camera with different focal settings and are denoted as DA, DB and DE in this paper. The checkerboard sizes are grid of 3.61 mm cells, grid of 3.61 mm cells and grid of mm cells respectively. The fourth dataset is taken from [22], which is captured by Lytro illum camera, and it is denoted as GA in this paper. The checkerboard size is grid of 26.25mm cells. The remaining two datasets are captured by ourselves using Lytro illum camera. The checkerboard size is grid of 29.92 mm cells and grid of 22.25 mm cells respectively. These two datasets are denoted as PA, PB in this paper. Noted that only 5, 9 and 14 images respectively in DA, DB and DE could be used in which the checkboard is out of focus and line features are visible [22].
We compare the proposed method with other wellknown algorithms in three aspects：first, to evaluate the performance of the corner detection algorithm described in Section 5, we compare the detection result of the proposed method with the state of the art Harris detection method [34], and the projected corner locations detection method in [22]. Second, to evaluate the 2D lateral reprojection errors of the intrinsic and extrinsic parameters, we compare the proposed method to two stateofthearts, namely the method of Dansereau et al. [18] and the method of Bok et al. [22], using metrics pointtoray error (P2RE) and pointtopoint error (P2PE). Third, to evaluate the accuracy of the depth value calculated by intrinsic parameters, we compare the relative depth error (RDE) of the propose method with the method of Dansereau et al. [18] and the method of Bok et al. [22].
6.1 Corner Detection Results
Similar to [22], the first step of our corner detection algorithm is “line detection”, whose result directly influence the corner point detection. Thus, before comparing the final result of corner detection. we first make a comparison between the 2D line images detected by [22], and the 2D line images projected from “LFline” detected in Section 5. The result is shown in Fig. 6. It shows that the 2D line images captured by [22] is more cluttered, where stronger noise affects the regular pattern between 2D line images in different subimages. In contrast, the proposed method outputs a group of regularly distributed 2D line images in high noise level. The relationship between positions of 2D line images in different subimages is preserved, since the entire 4D light field is utilized.
Then we compare the proposed corner detection method with Harris method and the method in [22]. Noted that methods based on subaperture images are not included in this section, since they could not directly give the locations of the corners in raw image. Fig. 8(a) shows the result of the Harris detection method in which areas between two adjacent subimages is often detected as a “false corner”. Other detection errors also exist even though the abovementioned false corners are eliminated, as shown in Fig. 8(b). Due to the low resolution and blur of the subimage, directly using the traditional corner detection method to the LF raw image is problematic. Fig. 7 shows the corners locations projected from ‘LFpoint’ and projected corner locations in the method of [22]. It is noted that the corner locations detected by the method of [22] is diverge from the actual locations whilst the proposed method generates an accurate detection result.
6.2 ReProjection Errors in Lateral Direction
To compare the reprojection errors of the calibration method, a variety of benchmark metrics are used in different calibration methods [18], [22] and [21]. For fairness, in this paper, we evaluate the proposed method using P2RE and P2PE measures which are unused in the optimization step of the proposed algorithm. P2RE measures the distance between the ideal corner point and the reprojecting ray of the detected corner point, which is proposed in [18]. P2PE measures the 2D distance between the ideal corner point and the reprojecting point of the detected corner onto the checkerboard plane. Two state of the art methods, namely [18] and [22] are compared against. Experimental results are shown in Table I and Table II, respectively. The errors of “Bokorg” are provided by the paper [22] directly, and the errors of “Bokrun” are obtained by running their latest released code.
Theoretically, for one calibration method, its P2PE value should be slightly larger than its P2RE value. It is because the checker plane is not perpendicular to the ray emitting from a point on this plane, in most cases. However, for the actual data, due to factors including image noise, these two values can be significantly different. Minimizing one metric in the nonlinear optimization procedure may not ensure the optimization of another metric simultaneously. From Table 1 and Table 2, the method in [18] performs well in datasets DA DB DE on P2RE metric, where a weak performance measured by P2PE is witnessed for especially for dataset DE. Similarly, the method in [22] also has a large P2RE error for DE dataset.
For both metric, the method of [18] has a larger error in dataset PA and PB. This may because the method of [18] is a subaperture based method where inaccuracy occurring in the subaperture extracting procedure of this method. As shown in Fig. 8, there exits some ghosts in this image, which comes from a subaperture far from the optical axis.
Comparing to other two methods，the proposed method performs well for both P2RE and P2PE metric on all the six datasets, whatever the images are noisy or clean. It reflects the stable and accuracy of this method.
Algorithm  Proposed  Dansereau [18]  Bokrun [22] 

DA  0.0977225  0.0984018  0.2711 
DB  0.0411096  0.0442647  0.1525 
DE  0.173376  0.146232  0.5404 
PA(2)  0.0186368  0.0622009  0.2076 
PB(3)  0.0157983  0.0398623  0.1392 
GA  0.101822    0.2349 
Algorithm  Proposed  Dansereau [18]  Bokorg [22]  Bokrun 

DA  0.0806737  0.0815884  0.1076   
DB  0.0389486  0.0420258  0.0714   
DE  0.163982  0.134463  0.454   
PA(2)  0.0179212  0.0598808    0.1972 
PB(3)  0.0149911  0.0381868    0.1330 
GA  0.0895863    0.2066   
6.3 Errors in depth direction
Different from a traditional camera, a calibrated LFcamera can provide the depth value corresponding to each pixel. Therefore, we also made a comparison between the depth value calculated by intrinsic parameters calibrated by methods [18] [22] and the proposed method.
Dataset  Proposed  Dansereau [18]  Bok [22] 

DA  1.85  3.14  8.96 
DB  1.75  1.82  9.44 
DE  12.94  25.29  29.22 
PA  1.94  10.35  12.71 
PB  2.33  4.95  13.52 
GA  3.33    17.98 
For every method, the depth value calculated from raw image using the intrinsic parameters, denoted by , and the depth value calculated from the world coordinates of the actual 3D corner point using the extrinsic parameters, denoted by , are both obtained . For the proposed method, is calculated by (20) . For methods of [18] and [22], we derived a simple method to calculated since they does not directly estimate depth value from the raw image. We first choose two pixels corresponding to the same 3D corner point. These two pixels belong to two subapertures with a large baseline. Then the location of this 3D corner point is determined by calculating the intersection of two rays which is calculated from each pixel by the intrinsic parameters of the method of [18] and [22]. , is just the Z coordinate of the estimated 3D corner point. Finally, the relative depth error (RDE), is calculated by (35)
(35) 
Table IIIshows the RDE results of the proposed method against [18] and [22]. It shows that the depth error caused in [18] and [22] is much larger than the proposed method. And more details are shown in Fig. 11. It indicates that a corner point with small reprojection error may have a large depth error.
Based on the results listed in Table 1 to 3, it can be concluded that the proposed model outperforms the ones described in [18] and [22] in terms of both the calibration accuracy on lateral direction and depth direction. The underlying reason is that we separately estimated and from other parameters.
7 Conclusion And Future Work
In this paper, for calibrating LF cameras, we have present a SubAperture Related Bipartition (SARB) projection model. Due to the twopart structure of the proposed model, the calibration method for the traditional camera can be effectively reused. Meanwhile, both the 2D reprojection errors in the lateral direction and errors in the depth direction are reduced efficiently due to this structure. Besides, an accurate and robust corner detection method is also proposed. To our best knowledge, it is the first research where 4D light field data is fully utilized to detect corners. Future work may include global optimization step in the proposed method to considering the misalignment of the MLA in the calibration pipeline.
References
 [1] R. Ng, “Fourier slice photography,” in ACM transactions on graphics (TOG). ACM, 2005, vol. 24, pp. 735–744.

[2]
H.G. Jeon, J. Park, G. Choe, J. Park, Y. Bok, Y. Tai, and I. So Kweon,
“Accurate depth map estimation from a lenslet light field camera,”
in
Proceedings of the IEEE conference on computer vision and pattern recognition
, 2015, pp. 1547–1555.  [3] Michael W Tao, Sunil Hadap, Jitendra Malik, and Ravi Ramamoorthi, “Depth from combining defocus and correspondence using lightfield cameras,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 673–680.
 [4] Sven Wanner and Bastian Goldluecke, “Globally consistent depth labeling of 4d light fields,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 41–48.
 [5] Vaibhav Vaish, Marc Levoy, Richard Szeliski, C Lawrence Zitnick, and Sing Bing Kang, “Reconstructing occluded surfaces using synthetic apertures: Stereo, focus and robust measures,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). IEEE, 2006, vol. 2, pp. 2331–2338.
 [6] Tao Yang, Yanning Zhang, Jingyi Yu, Jing Li, Wenguang Ma, Xiaomin Tong, Rui Yu, and Lingyan Ran, “Allinfocus synthetic aperture imaging,” in European Conference on Computer Vision. Springer, 2014, pp. 1–15.
 [7] F. Dong, S.H. Ieng, X. Savatier, R. EtienneCummings, and R. Benosman, “Plenoptic cameras in realtime robotics,” The International Journal of Robotics Research, vol. 32, no. 2, pp. 206–217, 2013.
 [8] Niclas Zeller, Franz Quint, and Uwe Stilla, “From the calibration of a lightfield camera to direct plenoptic odometry,” IEEE Journal of selected topics in signal processing, vol. 11, no. 7, pp. 1004–1019, 2017.
 [9] B. Wilburn, N. Joshi, V. Vaish, E.V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” in ACM Transactions on Graphics (TOG). ACM, 2005, vol. 24, pp. 765–776.
 [10] A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J. Tumblin, “Dappled photography: Mask enhanced cameras for heterodyned light fields and coded aperture refocusing,” in ACM transactions on graphics (TOG). ACM, 2007, vol. 26, p. 69.
 [11] C.K. Liang, T.H. Lin, B.Y. Wong, C. Liu, and H.H. Chen, “Programmable aperture photography: multiplexed light field acquisition,” in ACM Transactions on Graphics (TOG). ACM, 2008, vol. 27, p. 55.
 [12] K. Marwah, G. Wetzstein, Y. Bando, and R. Raskar, “Compressive light field photography using overcomplete dictionaries and optimized projections,” ACM Transactions on Graphics (TOG), vol. 32, no. 4, pp. 46, 2013.
 [13] Y. Taguchi, A. Agrawal, A. Veeraraghavan, S. Ramalingam, and R. Raskar, “Axialcones: Modeling spherical catadioptric cameras for wideangle light field rendering,” ACM Trans. Graph., vol. 29, no. 6, pp. 172, 2010.
 [14] R. Ng et al., Digital light field photography, stanford university Stanford, 2006.
 [15] A. Lumsdaine and T. Georgiev, “The focused plenoptic camera,” in 2009 IEEE International Conference on Computational Photography (ICCP). IEEE, 2009, pp. 1–8.
 [16] Lytro, “The lytro camera [online]. available,” http://www.lytro.com/, 2016.
 [17] Raytrix, “3d light field camera technology [online]. available,” http://www.raytrix.de/, 2016.
 [18] D. G. Dansereau, O. Pizarro, and S.B. Williams, “Decoding, calibration and rectification for lenseletbased plenoptic cameras,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 1027–1034.
 [19] H. Duan, L. Mei, J. Wang, L. Song, and N. Liu, “A new imaging model of lytro light field camera and its calibration,” Neurocomputing, vol. 328, pp. 189–194, 2019.
 [20] Q. Zhang, C. Zhang, J. Ling, Q. Wang, and J. Yu, “A generic multiprojectioncenter model and calibration method for light field cameras,” IEEE transactions on pattern analysis and machine intelligence, 2018.
 [21] S. O’brien, J. Trumpf, V. Ila, and R. Mahony, “Calibrating lightfield cameras using plenoptic disc features,” in 2018 International Conference on 3D Vision (3DV). IEEE, 2018, pp. 286–294.
 [22] Y. Bok, H.G. Jeon, and I.S. Kweon, “Geometric calibration of microlensbased light field cameras using line features,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 2, pp. 287–300, 2016.
 [23] C.A. Noury, C. Teuliere, and M. Dhome, “Lightfield camera calibration from raw images,” in 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2017, pp. 1–8.
 [24] C. Heinze, S. Spyropoulos, S. Hussmann, and C. Perwass, “Automated robust metric calibration algorithm for multifocus plenoptic cameras,” IEEE Transactions on Instrumentation and Measurement, vol. 65, no. 5, pp. 1197–1205, 2016.
 [25] P. Zhou, W. Cai, Y. Yu, Y. Zhang, and G. Zhou, “A twostep calibration method of lensletbased light field cameras,” Optics and Lasers in Engineering, vol. 115, pp. 190–196, 2019.
 [26] S. Nousias, F. Chadebecq, J. Pichat, P. Keane, S. Ourselin, and C. Bergeles, “Cornerbased geometric calibration of multifocus plenoptic cameras,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 957–965.
 [27] C.K. Liang and R. Ramamoorthi, “A light transport framework for lenslet light field cameras,” ACM Transactions on Graphics (TOG), vol. 34, no. 2, pp. 16, 2015.
 [28] T.G. Georgiev and A. Lumsdaine, “Focused plenoptic camera and rendering,” Journal of Electronic Imaging, vol. 19, no. 2, pp. 021106, 2010.
 [29] C.K. Liang, Analysis, acquisition, and processing of light field for computational photography, Ph.D. thesis, Ph. D. dissertation, National Taiwan Univ., Taipei, Taiwan, 2008.
 [30] A. Gerrard and J.M. Burch, Introduction to matrix methods in optics, Courier Corporation, 1994.
 [31] R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, P. Hanrahan, et al., “Light field photography with a handheld plenoptic camera,” Computer Science Technical Report CSTR, vol. 2, no. 11, pp. 1–11, 2005.
 [32] Z. Zhang et al., “Flexible camera calibration by viewing a plane from unknown orientations.,” in Iccv, 1999, vol. 99, pp. 666–673.
 [33] R. Hartley and A. Zisserman, Multiple view geometry in computer vision, Cambridge university press, 2003.
 [34] C. G. Harris, M. Stephens, et al., “A combined corner and edge detector.,” in Alvey vision conference. Citeseer, 1988, vol. 15, pp. 10–5244.