1 Introduction
Recent improvements in projectors’ resolution, brightness and cost effectiveness, extended their pervasiveness beyond flat screen projection, and to projection mapping, AR and/or MR systems [1]. In those systems, the precalculated scene geometry is used to project an appropriately warped image to be mapped on the scene as an artificial texture, with impressive visual results. However, it is difficult to layout with moving objects or dynamic scenes in general. This is because the projected pattern is the same along each ray, hence what is viewed is spatially invariant up to a projective transformation. Conversely, the potential for practical applications could be significantly broadened if different patterns can be projected at different depths simultaneously; such volume displays are now intensely investigated [2, 3, 4, 5]. However, those techniques are still under research and are still not suitable for practical systems. Recently, a simpler system consisting of just two projectors which can simultaneously project independent images on screens at different depths has been proposed [6]. The system demonstrated the ability to project simultaneous movies on multiple semitransparent screens at different depths with no particular setup requirements, which is a promising research avenue to explore.
In this paper, we significantly extend the system [6] by 1) removing the planarity assumption: in this work, we are able to project depthdependent patterns on complex surfaces with arbitrary geometry, 2) dynamic range expansion by adding a constraint in pattern optimization, and 3) introducing epipolar constraint to keep the problem realistic size. With our extension, the potential application is significantly broadened. For instance, we can use the system for object placement or assembling purposes as shown in Fig. 1: since the pattern is sharply visible only at predefined depths, human or robotic workers can precisely align complex parts using only qualitative visual information, without distance scanners. While the visibility of the pattern depends on its 3D position and orientation in space, translational alignment can also be achieved by including markers in the pattern. The new system also allows dynamic projection mapping on complex and semitransparent objects. In the experiments, we show the effectiveness of the proposed technique, which is also demonstrated on multiple dynamic 3D solid objects.
2 Related work
Applications of multiple video projectors have a long history in VR systems such as wide or surround screen projections like CAVE [7]. These types of projectors needs precise calibration between projectors, i.e., the geometrical calibration to establish the correspondences between the projected images and the screens, and the photometric calibration to compensate for nonlinearities between the projected intensities and the pixel values. To this end, automated calibration techniques based on projectorcamera feedback systems were developed [8, 9]. Since some of the screens considered was curved, some of these works inevitably dealt with nonplanar screens. In other works, multiple projectors were tiled together for improving the resolutions of the projected images [10, 11]. Godin et al., inspired by human vision system, proposed a dualresolution display where the central part of the projection is projected in highresolution, while the peripheral area is projected in lowresolution [12]. Other works on multiprojector systems focused on increasing the depthoffield, since this is normally narrow and can cause defocusing issues on nonplanar screens. Bimber and Emmerling [13] proposed to widen the depthoffield by using multiple projectors with different focal planes. Nagase et al. [14] used an array of mirrors, which is equivalent to multiple projectors with different focal planes, for correcting defocus, occlusion and stretching artifacts. Levoy et al. also used an array of mirrors and a projector [15]. The array of mirrors was used to avoid occlusions from objects placed in front of the screen. Each of the aforementioned works is intended to project a single image onto a single display surface, which may or may not be planar. Conversely, the proposed method projects multiple, independent images onto surfaces placed at different depths, which may have a complex nonplanar geometry.
Multiple projectors are also used for applications in lightfield displays [3, 4]. For these applications, in order to create the large number of rays needed for the light field, each ray is projected separately for a specific viewpoint and is not intended to be mixed with other rays. This is in contrast with our proposed method, where the multiple independent images are created at the intended depths and on the intended surfaces exactly by leveraging the mixing properties of rays from the projectors.
A few works exist that have explored the concept of “depthdependent projection”; Kagami [16] projected Moire patterns to visually show the depth of a scene, similarly to activestereo methods for range finding. Nakamura et al. [17] used a linear algebra formulation to highlight predetermined volume sections with specific colors. The technique assumes the volume is discretized into multiple parallel planes and is not able to produce detailed patterns or images on non planar surfaces. Moreover, similarly to the work below, the underlying mathematical formulation suffers from a limited dynamic range. Recently, VisentiniScarzanella et al. proposed a method to display detailed images on distinct planes in space by actively exploiting interference patterns from multiple projectors [6]. In their experiments, two independent videos are simultaneously streamed on two screens using semitransparent material. However, the matrix factorisation used is similar to [17], so the method suffers from a limited dynamic range. Moreover, the purpose is to project images on planar screens, and thus, applications are limited. These limitations are removed in our proposed method, were the patterns can be projected onto arbitrary shapes, and our novel optimization procedure addresses the issues with dynamic range.
3 Algorithm Overview
We provide an outline of the algorithm using the example of projecting virtual face masks on dynamic subjects as shown in Fig. 2. In Figs. 2a and 2b two different virtual masks are shown, which are to be projected on the subject’s face when this changes position from Position 1 to Position 2 as shown in Figs. 2c and 2d respectively. The fundamental research question is what images should be generated for Projector 1 and Projector 2 so that the virtual masks would recombine into the patterns in Figs. 2a and 2b at the desired position: at position 1, only mask of Fig. 2a should appear, with no traces of mask of Fig. 2b. It should be noted that similar problem was raised for multiple LCDs and efficiently solved by [18].
VisentiniScarzanella et al. realised a similar system [6], but planar screens were assumed. Moreover, significant reduction in the dynamic range was observed due to their formulation. Because of the sensitivity of [6] to the exact screen placement, it is not possible to directly apply the method to project onto objects with complex geometry by simply considering a piecewise planar approximation.
In order for the desired images to appear at the desired locations, three tasks are necessary. First, the mapping between points on the object surfaces and the pixels in the projector images are obtained (i.e., geometrical calibration). Then, given the mappings, generation of the projection images can be cast as a constrained optimization problem. In [6], this was solved globally with a sparse matrix solver that distributes the error throughout the images. In this paper, we propose an efficient optimization method to solve the problem locally only for related rays, allowing to impose additional constraints on the solution resulting in improved dynamic range, as well as to allow parallelisation. Finally, the generated patterns have to be postprocessed, according to the photometric characteristics obtained in the process of photometric calibration, to correct nonlinearity of the projector. The main phases of the algorithm are shown in Fig. 2(b).
The system consists of two standard LCD projectors stacked vertically as shown in Fig. 2(a) with 3D objects for the projection target that can be placed at arbitrary positions.

4 Depthdependent simultaneous image projections
4.1 System calibration
To achieve depthdependent projections onto arbitrarily shaped objects, it is necessary to estimate the pixel correspondences between projector images and the object surfaces. Contrarily to the simple planar case in
[6] where the mapping can be calculated as a homography using only four corresponding points, we need to obtain correspondences between the projector pixels and the points on complexshaped 3D surfaces. To this end, we use the Gray code pattern projection [19].The actual process is as follows. First, an additional camera is placed in the scene. Then, the Gray code patterns are projected onto the object and the projections are synchronously captured by the camera as shown in Figs.4a, 4b, 4d and 4e.. By decoding the pattern from the captured image sequences, the correspondences between projected patterns and the camera image coordinates C are obtained. The decoded values are shown in Figs.4c and 4f where pixel value represents the projector coordinate for x and y direction, respectively. The decoded results are represented by a map , which consists of projector coordinate values embedded at each pixel of the camera image as shown in Figs.4c, 4f. An inverse map is also obtained to efficiently conduct the extraction of corresponding points along the epipolar lines as described in the following section. Note that because of mismatch between the resolutions of the image and the projection on the object, many correspondences are inevitably dropped from the maps, and thus, the final results are degraded. Such artifacts are mostly solved by preparing a high resolution map if projectors and camera can be placed close to each other. The remaining holes are in the order of a few pixels, and are removed with a simple hole filling algorithm.

In the process of estimating the projected image, we assume a linear relationship between the nominal intensity and the actual projected intensity. To achieve this, a linearly increasing gray scale pattern in the intensity range (Fig. 4(c)(top)) is captured by a camera with a known linear response. The recorded values are plotted against their nominal intensity, as shown in Fig. 4(a), 4(b), which are then fitted to a model, where is the intensity value. The function is then inverted and kept for compensating the generated pattern prior projection. The calibration pattern superimposed with its own mirrored version should be constant intensity. By intensity correction, this constraint is shown to be fulfilled(Fig. 4(c)(middle and bottom)).
4.2 A simple linear algebrabased pattern generation method
We model the problem by first extending the formulation in [6] to the case of nonplanar surfaces. The variables involved are shown in Fig. 6. While for clarity we illustrate the process in the case of two projectors and two different images projected onto two different objects, the system can be extended to a higher number of projectors and objects.
The projected patterns from projectors are denoted as where ( in case of Fig. 6), and images projected on 3D objects are depicted as where ( in case of Fig. 6). Let pixels on be expressed as and let pixels on be .
We also need to provide the mapping from the desired input images to the camera coordinates , which is . Practically, in this paper, we use identity map for , which means we use simple “projection mapping” as shown in case of Fig. 6, where the coordinates of and is the same as camera image of .
In the calibration step, the mappings have been obtained. Using and , we can map between the pixels of the projected image and the desired input image with and .
From these assumptions, we can define an inverse projection mapping , where, if is illuminated by (i.e., pixels of and are mapped with ), is defined as , and if is not illuminated by any pixels of , is defined as . In the example of Fig. 6, and since is illuminated by and . since is not illuminated by .
Let us define two imaginary pixels to simplify formulas. Then, using these definitions, the constraints of the projections are expressed as follows:
(1) 
where is the distance between a pixel on the object and the projector in order to compensate for the light falloff and is the angle between the surface normal of
and the incoming light vector
at pixel from to compensate the Lambertian reflectance of the matte plane. If , then we define and, .By collecting these equations, linear equations
(2)  
(3) 
follow, where is a vector , and is a vector , and the matrix is defined by its elements as
(4) 
By using , and , we get our complete linear system
(5) 
This system can be solved with linear algebra techniques as in [6]. Simple patterns are used to confirm that our algorithm can simultaneously project more than two planes as shown in Fig. 7.
In this paper, we assume there are no defocus blur of the projectors, which is true only if the surfaces of 3D screens are near the focal planes. In case that the defocus blur is not neglectable, the projected images becomes a convolution of the original image and the blur kernel. Let us assume a typical setup, where we fix a plane in 3D space, place two projectors with the same aperture size at the same distances from the fixed plane, and adjust the focuses of the two projectors onto the fixed plane. Note that these conditions are often approximately fulfilled in real setup. In this setup, the sizes of the blur kernel becomes the same for both projectors, even for the offfocus surfaces, thus, the projected images from both the projectors are convolved by the same blur kernel. Then, because of distributivity property of convolutions, the added image becomes the convolution of the nonblurred added image and the blur kernel. This means that, in this case, the offfocus blur on each projected images does not disturb the image addition process of equation (5), but only blur the resulting projected images. In reality, we did not find a high divergence between the simulations with nonblur assumptions and the real experiments with blur.
4.3 Problem reduction using epipolar constraints and constraintaware optimization
The problem to be solved is to obtain given and . The length of vector is , while the length of vector is . The matrix is a very large sparse matrix. To model the real system, this simple linear model has two problems. First, it implies a global solution through pseudoinversion of a very large matrix. Second, since and are images, their elements should be nonnegative values with a fixed dynamic range. However, the lack of positivity constraints in the solution of the sparse system means that may include negative or very large positive elements. This was solved in [6] by normalising so that the elements are in the range of [0,255] using a sparse matrix linear algebra solver. However, the effect of this is a compression of the resulting dynamic range and a lowering of the contrast.
Here, we consider the case that , thus, we assume the two projector scenario shown in Fig. 8 with two objects and onto which the images are to be projected. Given the optical centers of the two projectors and as well as any pixel on the first pattern (without loss of generality), the epipolar plane defined by the three points will intersect projected patterns and at lines , , and 3D screens and at and , respectively. We can see that the pixel compensation between and occurs only within epipolar lines and . This means that the problems of optimizing the pixels of the projected images can be solved for each epipolar line, instead of solving entire projected pixels of and .
To obtain a finite set of the optimized pixels, we use the following steps. For any pixel along , this will correspond to points along the intersections , of the epipolar plane with the image planes. Similarly, the projected pixels of these points to along , and in Fig. 8, are added to the list of variables. By iterating the process, we obtain the list of variables involved in the calculation of the pixel compensation with respect to and . Moreover, the spacing between points in the sequence on each intersection line will depend on the distance and the angle of vergence between the two projectors.


Hence, instead of formulating the problem as a large global optimization, we decompose it into a series of small local problems of the form:
(6) 
where and are the image and pattern pixels related by epipolar geometry and mapped to each other by the raytracing matrix . Contrarily to the large sparse system, the number of elements involved in each local optimisation is depending on the projector setup. This allows us to expand the above equation into a series of explicit sums:
(7) 
where , is the number of points on the image planes, and is the number of points from the patterns involved in this sequence. For each one of these explicit sums, we solve for the optimal pattern pixels by solving the constrained optimisation problem:
(8) 
where is the allowed range of intensities during the pattern generation. With this strategy, we are able to independently solve the pattern generation problem for small sets of points at a time, which in turn allows us to impose constraints and to confidently optimise each chain with fewer chances of getting stuck in a local minimum.
5 Experiments
Our setup consists of two stacked EPSON LCD projectors ( pixels) and a single CCD camera ( pixels) as shown in Fig. 2(a). We calibrate the system for both geometrically and photometrically, and test the proposed system under three tasks. First, the image quality of the proposed algorithm is numerically evaluated against the Linear Factorization (LF) method in [6]. Second, the visual positioning system application scenario shown in Fig. 1 is tested. Finally, the projection results on complex surfaces are shown in the scenario of virtual mask projection of Fig. 2.
5.1 Image quality assessment
We first assess the quality improvement of our dynamic range expansion technique using a planar screen and choosing the combinations Lena/Mandrill, Lena/Peppers, Peppers/House and Peppers/Lena for target images. The two screens were placed at approximately 80cm and 100cm from the projectors. For each combination, we projected the original image on the plane, and used it as a baseline for PSNR evaluation with our proposed method denoted as where is the range of allowed intensity values, as well as the method in [6]. Sample results are shown in Fig. 10, while exhaustive numerical results are given in Fig. 9. The results show that the result images obtained by have much wider dynamic range of colours than those of the results obtained by and . However, in certain cases some artifacts are visible, even for results of . Interestingly, the nature of the artifacts is the same regardless of the method used, even though the artifacts might appear less pronounced due to the overall lower contrast of LF and . This highlights the tradeoff between artifacts, contrast levels and number of projectors in the scene. As part of our future work, we will investigate redundant systems with a higher number of projectors than targets to characterize better this tradeoff. When comparing with , we can see that the latter suffers from drastically lower PSNR and SSIM levels due to the difference in image quality. The techniques were also qualitatively compared on nonplanar objects as shown in Fig. 11, highlighting similar improvements in dynamic range when the proposed method is used.
5.2 Visual positioning accuracy evaluation
As another application scenario, we use the system to position objects at the right position and orientation based exclusively on visual feedback. Such system could be used both by human as well as robotic workers without extra sensors. For our tests, we asked several subjects to place a box in two predetermined positions using just the visual feedback from the projected pattern. The location of the box was then captured with a 3D scanner and compared with the ground truth position. The test scene, projected patterns and reconstructed shapes are shown in Fig 12. The average RMSE was of 1.47% and 1.26% of the distance between the box and the projector for the positions closer and farther away from the projectors respectively. From the results, we can confirm that the proposed technique can be used for 3D positioning just using static passive pattern projectors.
5.3 Independent image projection on multiple 3D objects
For our third experiment, the system was tested on 3D objects with a more complex geometry, such as a mannequin head, as well as the combination of a square box and cylinders for the two scenarios mentioned in the introduction.
In the first case, we projected the virtual masks in Fig. 12(a) and 12(b) onto a mannequin placed at two different positions. Fig. 12(c) and 12(d) show the results of and Fig. 12(e) to Fig. 12(h) show the results of . The figures show that the two images projected on the mannequin are clearly visible from all angles. Moreover, our proposed optimization significantly improves the result of .
Finally, we show how the system can be used for the object assembly workflow shown in Fig. 1. Fig. 13(a) and 13(b) are the calculated patterns with for the two projectors, Fig. 13(c) is the projected image on two cylinders and Fig. 13(d) is the projected image on the large box placed outside the cylinders. We can confirm Lena/Mandrill is clearly shown on each object, confirming that the technique has a potential to be used for correct positioning during object assembly.
6 Conclusion
In this paper, we propose a new pattern projection method which can simultaneously project independent images onto objects with complex 3D geometry at different positions. This novel system is realized by using multiple projectors with geometrical calibration using a Gray code with a simple formulation to create suitable distributed interference patterns for each projector. In addition, an efficient calculation method including additional constraints of epipolar geometry to allow parallelisation and higher color dynamic range was proposed. Experiments showed the performance of a working prototype showing its improvement against the state of the art, and two application scenarios for object distance assessment and depthdependent projection mapping. Our future work will concentrate on extending the system to more complicated scenes involving a higher number of projectors, and studying their optimality characteristics.
References
 [1] Bimber, O., Raskar, R.: Spatial Augmented Reality: Merging Real and Virtual Worlds. A. K. Peters, Ltd., Natick, MA, USA (2005)
 [2] Barnum, P.C., Narasimhan, S.G., Kanade, T.: A multilayered display with water drops. ACM Transactions on Graphics (TOG) 29(4) (2010) 76

[3]
Jurik, J., Jones, A., Bolas, M., Debevec, P.:
Prototyping a light field display involving direct observation of a
video projector array.
In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). (2011) 15–20
 [4] Nagano, K., Jones, A., Liu, J., Busch, J., Yu, X., Bolas, M., Debevec, P.: An autostereoscopic projector array optimized for 3d facial display. In: ACM SIGGRAPH 2013 Emerging Technologies. SIGGRAPH ’13 (2013) 3:1–3:1
 [5] Hirsch, M., Wetzstein, G., Raskar, R.: A compressive light field projection system. ACM Transactions on Graphics (TOG) 33(4) (2014) 58
 [6] VisentiniScarzanella, M., Hirukawa, T., Kawasaki, H., Furukawa, R., Hiura, S.: A two plane volumetric display for simultaneous independent images at multiple depths. In: PSIVT workshop Vision meets Graphics. (2015) 1–8
 [7] CruzNeira, C., Sandin, D.J., DeFanti, T.A.: Surroundscreen projectionbased virtual reality: the design and implementation of the cave. In: Proceedings of the 20th annual conference on Computer graphics and interactive techniques, ACM (1993) 135–142
 [8] Raskar, R., Welch, G., Fuchs, H.: Seamless projection overlaps using image warping and intensity blending. In: Fourth International Conference on Virtual Systems and Multimedia, Gifu, Japan. (1998)
 [9] Yang, R., Gotz, D., Hensley, J., Towles, H., Brown, M.S.: Pixelflex: A reconfigurable multiprojector display system. In: Proceedings of the conference on Visualization’01, IEEE Computer Society (2001) 167–174
 [10] Chen, Y., Clark, D.W., Finkelstein, A., Housel, T.C., Li, K.: Automatic alignment of highresolution multiprojector display using an uncalibrated camera. In: Proceedings of the conference on Visualization’00, IEEE Computer Society Press (2000) 125–130
 [11] Schikore, D.R., Fischer, R.A., Frank, R., Gaunt, R., Hobson, J., Whitlock, B.: Highresolution multiprojector display walls. IEEE Computer Graphics and Applications 20(4) (Jul 2000) 38–44
 [12] Godin, G., Massicotte, P., Borgeat, L.: Highresolution insets in projectorbased display: Principle and techniques. In: SPIE Proceedings: Stereoscopic Displays and Virtual Reality Systems XIII. Volume 6055. (2006)
 [13] Bimber, O., Emmerling, A.: Multifocal projection: a multiprojector technique for increasing focal depth. Visualization and Computer Graphics, IEEE Transactions on 12(4) (July 2006) 658–667
 [14] Nagase, M., Iwai, D., Sato, K.: Dynamic defocus and occlusion compensation of projected imagery by modelbased optimal projector selection in multiprojection environment. Virtual Reality 15(23) (2011) 119–132
 [15] Levoy, M., Chen, B., Vaish, V., Horowitz, M., McDowall, I., Bolas, M.: Synthetic aperture confocal imaging. In: ACM Transactions on Graphics (TOG). Volume 23., ACM (2004) 825–834
 [16] Kagami, S.: Rangefinding projectors: Visualizing range information without sensors. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR). (Oct 2010) 239–240
 [17] Nakamura, R., Sakaue, F., Sato, J.: Emphasizing 3d structure visually using coded projection from multiple projectors. In: Computer Vision–ACCV 2010. Springer (2011) 109–122
 [18] Wetzstein, G., Lanman, D., Hirsch, M., Raskar, R.: Tensor Displays: Compressive Light Field Synthesis using Multilayer Displays with Directional Backlighting. ACM Trans. Graph. (Proc. SIGGRAPH) 31(4) (2012) 1–11
 [19] Sato, K., Inokuchi, S.: Threedimensional surface measurement by space encoding range imaging. Journal of Robotic Systems 2 (1985) 27–39
Comments
There are no comments yet.