1 Introduction
A key advantage of multicamera systems is their ability to recover the 3D information lost during projection. However, a formal, quantitive analysis on the accuracy of such recovered information is still lacking. This is surprising given the longstanding interest in these systems [1, 2, 3] and improvements that they can make to vision algorithms [4, 5, 6, 7, 8].
Recently, a range of novel cameraarray architectures, such as the Lytro lightfield camera [9] and PiCam mobile camera array architecture [10] have been introduced, altering the direction of modern consumer digital photography. In the presence of such developments a rigorous treatment of this topic is needed.
In the twoview (stereoimaging) case, probabilistic error analyses have been carried out [11, 12]; however in the case of more than two views, no such analysis exists. Excellent multiview scene reconstruction algorithms have been proposed [13, 14], but they have not considered a quantitative error analysis. Chai et al [15] studied the frequency spectrum of the plenoptic function [16], which has been used to find the critical sampling rate for errorfree imagebased rendering of different classes of scenes [17, 18, 19]. Raynor et. al. [20] analysed the error in range finding using a plenoptic camera utilising a model based on Gaussian noise.
In this paper, we study circular cameraarrays, in an algorithmindependent way, through different error criteria. We relate the problem to frame quantisation [21, 22, 23, 24], which is well studied in the information theory community and has a rigorous mathematical foundation. In particular, Cvetković [25] has analysed the partitions induced by uniform scalar quantisation of expansions of .
We show that these results can be utilised to bound the localisation error of circular camera systems under orthogonal projections. Furthermore, we extend these results to the perspective projection case.
More precisely, we show that, when the number of cameras is sufficiently large, the localisation error is upper bounded by a term that decreases quadratically with respect to the number of cameras. This shows that the error tends to zero as the number of cameras tends to infinity. These results hold for the majority of the region of interest, however there is an exception, which we fully quantify. Finally, we provide numerical simulations, which verify the quadratic bound.
2 Problem Setup
We are interested in the error in reconstructing a 3D point from different multicamera configurations. For simplicity, we will limit our analysis to circularcamera setups, such as the configuration shown in Fig. 1. Furthermore, we will assume that the world is 2D, rather than 3D, resulting in 1D images. This allows us to easily visualise the results and the extension to 3D is relatively straight forward.
More formally, we wish to reconstruct a point , located within a circle of radius , from images, captured with cameras positioned uniformly on the perimeter of the circle and oriented towards the origin. We will refer to the interior of this circle as the region of interest. Let us index the cameras from to and associate each one with its angular location , measured anticlockwise from the axis. Figure 1 depicts this configuration for the case of eight cameras.
Our goal is to find an estimator,
, for the location, , and analyse its behaviour and error bounds. To do this we need to understand how the point is captured in each camera. To this end, we utilise the camera acquisition model depicted in Fig. 2.Here, is a projection operator, which maps the 2D point onto the 1D image plane of the th sensor. The 1D projection is then subjected to a convolution with a pointspread function (PSF), , which models the blurring of the camera lens. Finally, the continuousdomain to discretedomain sampling is achieved by a convolution with the integration kernel, , before ideal sampling. The details and justification of each of these steps is given in the following subsections.
2.1 The Projection Operator
The first and most critical stage of the acquisition model is the projection of the point onto the image plane of the th sensor. This image plane is the line tangent to the circle at the point .
We will consider both orthogonal and nonorthogonal projections, as depicted in Fig. 3.
The orthogonal projection case has been widely studied and lays the foundations of computed tomography and magnetic resonance imaging. In this case, the projection point, , is given by the following simple inner product:
(1) 
where
is the unit vector parallel to the
th image plane that points anticlockwise around the circle.The nonorthogonal, perspective, projection is a more accurate model of traditional cameras and is equivalent to the pinhole camera model. In this case, the projections of onto the sensors are no longer inner products. In fact, the location of the projected point, , is given by
(2) 
where is the focal length of the camera, and is the unit vector, orthogonal to , that points towards the centre of the th image plane. Note that, as , .
Finally, we will assume that the cameras have equal length image planes, denoted , chosen so that each camera covers the whole region of interest. I.e. in the orthogonal case and in the nonorthogonal case.
2.2 The Image Acquisition Pipeline
The pinhole camera model is equivalent to a perspective projection; however, a more realistic model, such as the one depicted in Fig. 2, must also account for the camera optics and sampling process.
Let us represent the projection of the point , onto the image plane of the th sensor, as the 1D function ; i.e., a Dirac at the location . The effect of the camera lens can then be modelled by the following convolution:
(3) 
where is the pointspread function [26] which is commonly modeled by the Airy disc.
To model the sampling operation of an imaging sensor we apply a further convolution followed by ideal sampling. For an pixel sensor, we choose the sampling kernel, , to be a box function with support ; i.e., the pixel width. This models the fact that image sensors usually work by integrating all the light rays which fall into each pixel region [27]. Finally, the ideal sampling has period .
3 Point localisation as a frame quantisation problem
(11) 
In the previous section, we reviewed a traditional image acquisition model which is applicable to a wide range of imaging devices, including traditional cameras and computation tomography (CT) devices.
We will now show how this can be interpreted as a quantised frame expansion, which will allow us to derive closedform expressions for the worstcase localisation error.
To see that this interpretation is valid, note that the camera vectors, , form a frame in , which we will denote . Therefore, we can consider the projections of the point onto all the image planes as projected onto the frame ; i.e., .
In addition, if we assume the point activates only a single pixel in each camera^{1}^{1}1If the PSF causes the point to be spread across multiple pixels, one could exploit more advanced sampling techniques that achieve subpixel accuracy., the acquisition model, depicted in Fig. 2, is equivalent to the following quantisation:
(4) 
where is the quantisation function defined as^{2}^{2}2This definition is valid when
is even. For odd
, .(5) 
In quantisation terminology, the pixel width, , is the quantisation error.
3.1 Orthogonal Projection


We will first investigate if we can localise the point with infinite precision by using an infinite number of cameras. In the orthogonal projection case, it was shown, in [25], that this is possible iff .
This condition can be intuitively understood from Fig. 4: as we add more cameras, there is a central circle which, in the case of odd , is not further divided and, in the case of even , is only divided through the origin.
To understand this more precisely, let us consider the projection of the point onto an image plane at angle as a function of :
(6) 
where . For even , the quantised version,
(7) 
has discontinuities at angles , where for some integer . When , this only occurs at one threshold (), which occurs at the two angles and . As , we are guaranteed to have cameras at these locations; however, since the angles are radians apart, the image planes are parallel and we can only localise to be in the central circle of radius on the line connecting the two camera centres, i.e. on the line with a quantisation threshold of zero.
When , there are more than two discontinuity angles and, since , it can be perfectly reconstructed from its projections onto two nonparallel image planes:
(8) 
where and are the angles of the image planes, is the dual of and is the dual of .
Of course, we can only approximate because we only have access to quantised versions of these projections:
To investigate the accuracy of this approximation, consider two imaginary cameras placed at unknown angles and such that , ; i.e., is slightly shifted from , , so that the projection of is exactly between the two quantisation boundaries.
Using these cameras, we can write the approximated point, , as
(9) 
The localisation error, , is thus given by
(10) 
Substituting the expressions for the image plane vectors and their duals into (10) and applying basic algebraic manipulations yields (11).
By applying and to (11), we can bound the localisation error as
(12) 
This means is upper bounded by a term that decreases quadratically with respect to .
3.2 NonOrthogonal Projection
Figure 5 shows an example partition for nonorthogonal, perspective, projection with an odd number of pixels. Visually, we can see a similar central circle, which is not subdivided by adding further cameras. In the orthogonal case, the radius of this circle was equal to the pixel width . In the nonorthogonal case, the radius increases to , which approaches as .
To bound the approximation error, , we again analyse the partitionings of quantised frame expansions. In the orthogonal case, the partitioning is created from intersecting rectangles whose shorter edges have length . In the nonorthogonal case, we have trapezoids, instead of rectangles, with sides of length and .
It follows that each trapezoid has a greater area than the corresponding rectangle with side length . Therefore, the following bound holds in the nonorthogonal case:
(13) 
4 Simulation Results
In the previous section, we showed how a circularcamera array divides the region of interest into a finite number of partitions. Examples of such partitionings are shown in Figs. 4 and 5^{3}^{3}3You can generate your own partitionings using the web app located at http://rr.epfl.ch/demos/multicam..
The interesting semantic behind the constituent regions is that, after projection, we can not distinguish between two points that fall into the same region, meaning there is an inverse relationship between the number of partitions in the circle of interest and the mean squared localisation error.
Figure 6 depicts the mean squared localisation error for a circularcamera array with threesample sensors and a varying number of cameras. For nonorthogonal projection, a focal length of was used, which leads to a realistic field of view for pinhole cameras. The error values have been well approximated by the reciprocal of a polynomial of degree . This suggests that the error decreases quadratically as the number of cameras increases.
Moreover, in Fig. 6, the error decreases slightly faster for orthogonal projection, which corresponds to the theory of Section 3.2. Although this shows a benefit of orthogonal projection, we should note that the minimum sensor size required to cover the region of interest is much smaller when using perspective projection, as we computed in Section 2.1.
5 Conclusion
We have reformulated the problem of localising a 3D point from its images, in a circular camera array, as a frame quantisation problem. This reformulation allows us to derive closedform worstcase bounds for the localisation error.
We showed that the localisation error can be made arbitrarily low by increasing only the number of cameras. Moreover, we extended the results for orthogonal projections to nonorthogonal projections, which are more common in camera architectures.
In our reformulation, we assumed that the point activates only a single pixel in each camera. We believe that this assumption could be relaxed, by adjusting the quantisation noise model appropriately.
Furthermore, we believe that the frame quantisation interpretation could be used to derive similar results for other multicamera setups.
References
 [1] J. Y. S. Luh and J. A. Klaasen, “A threedimensional vision by offshelf system with multicameras,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 7, no. 1, 1985.
 [2] H. P. Trivedi, “Can multiple views make up for lack of camera registration?,” Image and Vision Computing, vol. 6, no. 1, 1988.
 [3] H. Aghajan and A. Cavallaro, Multicamera networks: principles and applications, Academic press, 2009.
 [4] F. Farshidi, S. Sirouspour, and T. Kirubarajan, “Active multicamera object recognition in presence of occlusion,” in Proceedings of IROS 2005. IEEE, 2005, vol. 1.
 [5] F. Porikli and A. Divakaran, “Multicamera calibration, object tracking and query generation,” in Proceedings of ICME 2003. IEEE, 2003, vol. 1.
 [6] A. Ghasemi and M. Vetterli, “Scaleinvariant representation of light field images for object recognition and tracking,” in Proceedings of EI 2014. IS&T/SPIE, 2014, vol. 1.
 [7] A. Ghasemi and M. Vetterli, “Detecting planar surface using a lightfield camera with application to distinguishing real scenes from printed photos,” in Proceedings of ICASSP 2014. IEEE, 2014, vol. 1.
 [8] A. Ghasemi, N. Afonso, and M. Vetterli, “LCAV31: a dataset for light field object recognition,” in Proceedings of EI 2014. IS&T/SPIE, 2014, vol. 1.
 [9] T. Georgiev, Z. Yub, A. Lumsdainec, and S. Goma, “Lytro camera technology: theory, algorithms, performance analysis,” in Proceedings of EI 2013. IS&T/SPIE, 2013, vol. 1.
 [10] K. Venkataraman, D. Lelescu, J. Duparré, A. McMahon, G. Molina, P. Chatterjee, and R. Mullis, “PiCam: an ultrathin high performance monolithic camera array,” Graphics, ACM Transactions on, vol. 32, no. 6, 2013.
 [11] S. Blostein and T. S. Huang, “Error analysis in stereo determination of 3D point positions,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 9, no. 6, 1987.
 [12] J. J. Rodriguez and J. K. Aggarwal, “Stochastic analysis of stereo quantization error,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 12, no. 5, 1990.
 [13] V. Kolmogorov and R. Zabih, “Multicamera scene reconstruction via graph cuts,” in Proceedings of ECCV 2002, vol. 1. Springer, 2002.
 [14] B. Wilburn, V. Vaish, E. V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” Graphics, ACM Transactions on, vol. 24, no. 3, 2005.
 [15] J. X. Chai, X. Tong, S. C. Chan, and H. Y. Shum, “Plenoptic sampling,” in Proceedings of SIGGRAPH 2000. ACM, 2000, vol. 1.
 [16] E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision,” Computational models of visual processing, vol. 1, no. 2, 1991.
 [17] C. Zhang and T. Chen, “Spectral analysis for sampling imagebased rendering data,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 13, no. 11, 2003.
 [18] M. N. Do, D. MarchandMaillet, and M. Vetterli, “On the bandwidth of the plenoptic function,” Image Processing, IEEE Transactions on, vol. 21, no. 2, 2012.
 [19] C. Gilliam, P. L. Dragotti, and M. Brookes, “On the spectrum of the plenoptic function,” Image Processing, IEEE Transactions on, vol. 23, no. 2, 2014.
 [20] R. Raynor and K. Walli, “Plenoptic camera range finding,” in Proceedings of AIPR Workshops. IEEE, 2013, vol. 1.
 [21] V. K. Goyal, M. Vetterli, and N. T. Nguyen, “Quantized overcomplete expansions in : analysis, synthesis, and algorithms,” Information Theory, IEEE Transactions on, vol. 44, no. 1, 1998.
 [22] B. BeferullLozano and A. Ortega, “Efficient quantization for overcomplete expansions in ,” Information Theory, IEEE Transactions on, vol. 49, no. 1, 2003.
 [23] S. Rangan and V. K. Goyal, “Recursive consistent estimation with bounded noise,” Information Theory, IEEE Transactions on, vol. 47, no. 1, 2001.
 [24] N. T. Thao and M. Vetterli, “Reduction of the MSE in Rtimes oversampled A/D conversion to ,” Signal Processing, IEEE Transactions on, vol. 42, no. 1, 1994.
 [25] Z. Cvetkovic, “Source coding with quantized redundant expansions: accuracy and reconstruction,” in Proceedings of DCC 99. IEEE, 1999, vol. 1.
 [26] G. Westheimer and F. W. Campbell, “Light distribution in the image formed by the living human eye,” Journal of the Optical Society of America, vol. 52, no. 9, 1962.
 [27] J. C. Russ, The image processing handbook, CRC press, 2010.
Comments
There are no comments yet.