Breaking the Spatio-Angular Trade-off for Light Field Super-Resolution via LSTM Modelling on Epipolar Plane Images

02/15/2019 ∙ by Mantang Guo, et al. ∙ 0

Light-field cameras (LFC) have received increasing attention due to their wide-spread applications. However, current LFCs suffer from the well-known spatio-angular trade-off, which is considered as an inherent and fundamental limit for LFC designs. In this paper, by doing a detailed geometrical optical analysis of the sampling process in an LFC, we show that the effective sampling resolution is generally higher than the number of micro-lenses. This contribution makes it theoretically possible to break the resolution trade-off. Our second contribution is an epipolar plane image (EPI) based super-resolution method, which can super-resolve the spatial and angular dimensions simultaneously. We prove that the light field is a 2D series, thus, a specifically designed CNN-LSTM network is proposed to capture the continuity property of the EPI. Rather than leveraging semantic information, our network focuses on extracting geometric continuity in the EPI. This gives our method an improved generalization ability and makes it applicable to a wide range of previously unseen scenes. Experiments on both synthetic and real light fields demonstrate the improvements over state-of-the-art, especially in large disparity areas.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The light-field camera[26, 31] is becoming more and more popular. Due to its capability to capture whole 4D light field[24, 12] in a single shot, it enables new imaging capabilities such as refocusing[28] and free-viewpoint roaming. However, the performance of current LFCs is limited by the well-known spatio-angular trade-off[9], namely, the notion that the product between the spatial resolution and angular resolution must not exceed the sensor resolution.

Figure 1: Comparison of light field super-resolution of the Amethyst[1]. Given a low-resolution (sparsely sampled) input light field (), our method is able to produce a high resolution (densely sampled) light field (). The bigger picture in each sub-figure shows the reconstructed center view at as obtained a number of different methods. In the bottom row of each sub-figure, the left panel shows a close-up image region (as indicated by the red box in the full image). On the right panel, we show the reconstructed horizontal and vertical EPIs.

To break the trade-off, several methods have been proposed to recover a high angular resolution light field from a low resolution input (Fig.1). However, there are still several challenges in current solutions. For depth-based methods[8, 11, 6, 5, 29, 38]

, the results are prone to errors in the depth estimation, which may cause artifacts on occlusion boundaries. Additionally, since each view is reconstructed independently, the geometric consistency between views can not be guaranteed.

Recently, learning-based light field reconstruction has also been explored. Kalantari et al.[16]

proposed two convolutional neural networks (CNNs)

[21, 19] to estimate depth and predict colors sequentially. However, since an explicit depth map has to be estimated, their method is still prone to estimation error. Wu et al.[39] tackled the issues with depth based approaches by focusing on learning EPI super-resolution. They eliminated the information asymmetry[42] between the spatial and angular dimensions by applying a blur operation on EPI. However, such a blur operation can not handle large disparity areas, where the continuous epipolar lines become discrete points. In this case, the information asymmetry still exists after the blur operation. Moreover, the EPI consistency is lost during the super-resolution process, which leads to fine structures in the image to be lost or over-smoothed in the reconstructed views.

In this paper, we first analyze the effective sampling resolution of an LFC and then propose a learning based method to super-resolve light fields in both, their angular and spatial dimensions. One of our key insights is that the spatio-angular trade-off only holds when the LFC is in generalized focused case (Sec.3). In the defocused case, the effective spatial sampling rate can be higher than the number of micro-lenses in the Plenoptic 1.0[28]. This insight is important since it provides the theoretical basis for further light field super-resolution beyond the resolution trade-off.

Another key insight is a learning-based framework for EPI super-resolution (Sec.4

). Although the appearance of epipolar lines varies (continuous vs discrete) in different disparities, they can be uniformly described with a 2D series model, which is the basis for introducing the well-known convolutional long short term memory (c-LSTM)

[15, 40] for light field super-resolution. In contrast with previous super-resolution methods[16, 39]

, which leverage semantic information and content based inpainting, our network focuses on extracting and interpolating geometric continuity in EPI. This gives our method a better generalization ability and makes it applicable to a wide range of previously unseen scenes. Experiments (Sec.

5) on both synthetic and real light fields demonstrate the performance of the proposed LSTM layers and hint at significant improvements over state-of-the-art learning-based methods (3dB), especially in large disparity areas.

2 Related Work

Light field sampling: Since the two-parallel-plane representation for light field sampling[24, 12] was proposed, two types of LFCs were developed, namely, the Plenoptic 1.0[28] and 2.0[10]. However, they both suffer the spatio-angular trade-off. Bishop et al.[2] analyzed the optical path in the Plenoptic 2.0. They pointed out that the aliasing effect in the spatial image contains new information, thus the resolution trade-off can be broken. The same conclusion was also summarized by Broxton et al. in [3], where the diffraction effects are proved to be helpful for improving lateral resolution of the light field microscope using wave optics. Compared with [2, 3], we focus on whole pixels instead of aliasing or diffraction, and prove that multiple views in the Plenoptic 1.0 record different point sets. The resolution of a light field can hence be improved by combining these point sets accordingly.

Depth based methods: Light field reconstruction can be viewed as a special case of image based rendering, as the input and reconstructed novel views are all restricted in a 2D grid. So previous depth based rendering techniques [8, 11, 6, 5, 29] can also be directly applied in light field reconstruction [16, 33, 36]. However, there are two problems in depth based algorithms. Firstly, there are depth ambiguities in shadow, reflection and refraction areas where a correct depth may not be a good depth. Secondly, as each view is reconstructed independently, the view consistency may be broken in the reconstructed light field.

Non-Depth based methods: Considering the special grid features of light field sampling, some signal processing cues have been used in light field reconstruction. These include, but are not limited to the dimension gap between 3D focal stack and 4D light field [22], the sparsity of light field sampling in continuous fourier domain[32] and sparse representation of EPI in shearlet transform domain[35].

Recently, CNNs have been used in light field reconstruction. Wu et al.[39] tackled the light field reconstruction task viewing it as a one-dimensional EPI super-resolution problem and proposed the “blur-restoration-deblur” method. Wang et al.[37] introduced the 4D CNN to directly super-resolve 4D light field instead of the 2D EPI. Yeung et al.[41] explored the coarse characteristics of the sparsely-sampled light field and proposed the spatial-angular alternating convolutions to accelerate the reconstruction process. All of these CNNs treat light fields (or EPI) as a traditional 2D image, where each pixel is correlated with its standard “4-neighboring system”. However, note the neighboring system size depends upon the direction of epipolar line in light field and its displacement. Thus, pixels with large disparities have a large neighboring system. As a result, previous CNN-based methods work well for narrow baseline light field while they often fail when applied to wide baseline light fields (see Fig.1).

3 Optical Path Analysis in LFC

In this section, we prove that the well-known spatio-angular trade-off only exists when an LFC is in a generalized focus case, i.e., the disparities of all pixels in the recorded light field are integer values. In this case, all views in an LFC capture the same point set. Otherwise, different views account for different point sets which are aliased with respect to other views. As a result, the effective spatial resolution of the Plenoptic 1.0 is larger than the number of micro-lenses.

(a) Generalized Focus case
(b) Defocused case
Figure 2: Optical Path in the Plenoptic 1.0.

3.1 On the Number of Recorded Scene Points

Generalized focused case: In Fig.2(a) we show the optical path of an ideal Plenoptic 1.0 camera, where all the pixels are covered by a micro-lens recording different views of a same point in space. In such case, the depth of the scene point and the distance between the micro-lens array (MLA) and the main lens must meet the Gaussian imaging principle, i.e., , where is the focal length of the main lens. Here, the spatio-angular trade-off holds and all recorded pixels are clear images of objects at depth . That is, the recorded light field describes a consistent point set observed from each of the different views.

If the scene depth varies, i.e., the LFC is in the defocused case, where the pixels covered by a micro-lens become a uniform sampling over a circular area in space (the gray areas in Fig.2(a)). In this case, different pixels under a micro-lens record different points in space. Note, however, that the above trade-off also holds in some defocused situations. When the point in space is moved to the depth in Fig.2(a), the point is still only recorded once by the micro-lens from view . Other views also record it at different positions, e.g., view records it at the micro-lens . In such a case, the images of point from different angles are also recorded at different micro-lenses (boundary pixels are ignored here). In other words, the recorded light field is still a multi-view description of a same point set.

The above defocused case is similar to the focused case in the sense that different views in the recorded light field describe the same set of scene points. We call both, the defocused and focused cases as generalized focus case, of which the mathematical formulation is given as,

(1)

where is the disparity of pixel , is the set of integers. If all pixels in an LFC have integer disparities, the LFC is in generalized focus case and previous spatio-angular trade-off holds.

Defocused case: Except for the above generalized focus case, different views of the recorded light field generally depict different scene point sets. As a result, the actual number of captured scene points is larger than the number of micro-lens. Roughly speaking, the “resolution-trade-off” is broken in this case. In Fig.2(b), we illustrate the defocused case. Pixels and under the micro-lens record two different points and from views and , respectively. Note that the ray passing through the point from view to the MLA (the orange areas in Fig.2(b)), is “aliased” by the micro-lenses and . We can also trace the ray for the micro-lens from view to space point . It can be seen that drops between points and . Because and are the nearest points in view , the point , which is recorded by view , is not recorded by view . Thus, the number of effectively recorded points in LFC is larger than the resolution of view . Since the resolution of each view equals the number of micro-lenses in the Plenoptic 1.0, the effective sampling resolution becomes larger than the number of micro-lenses.

Figure 3: The number of recorded pixels changes when the baseline in light field sampling also changes. From top-to-bottom we show the EPIs, the reconstructed point clouds and the sketch maps of light field sampling. From left-to-right we show the light fields with , and pixel disparity. Note that the light field with pixel disparity records the larger number of points.

We also provide an intuitive explanation to the above analysis on EPI. Fig.3 shows EPIs and the corresponding point clouds under different disparity levels. Three light-fields are captured with different baselines. It is noticed that the number of recorded points are different in these light fields and the one with disparity records the most points. The sketches in the third row of Fig.3 reveals the reason well. When the disparity is , it can be seen that the red line passes through an entire pixel once every 5 views; in other words, views sample same point set while views sample other point sets. When the disparity equals to , all views sample the same set of points. Thus, the light field with disparity records the largest number of scene points. To summarize:

Proposition 1.

The Spatio-angular trade-off in LFCs only holds when the LFC is in the generalized focused case. This is due to the fact that the depth has a continuous and complex distribution in a real-world scene. Thus, the effective spatial resolution of the Plenoptic 1.0 is larger than the number of micro-lenses. In such cases, the light-field can be super-resolved.

4 Approach

While the above sections show it is possible to recover high resolution light field defying the conventional spatio-angular trade-off, this is still a not straightforward task. The main difficulty strives from how to super-resolve a light field while keeping the consistency across different views. Most existing light field super-resolution methods are either based on depth recovery [43, 16, 33] or based on EPI analysis [39, 37, 41]. The former approached are overly sensitive to errors in depth estimation, often failing to maintain cross-view consistency. The latter one treat EPIs as a regular digital image, failing to capture the EPI nature of continuous traces corresponding to pixels across multiple views.

In this section, we first discuss the issue of continuity preservation in light field super-resolution. We prove these continuities can be uniformly described as a 2D series. We then propose a novel CNN-LSTM architecture tailored for EPI super-resolution.

Figure 4: Different types of continuity. The first, second, third and fourth rows show the input low resolution EPI, super-resolved EPI from [7], ours and ground truth, respectively. Previous image super-resolution CNN cope well with “continuous continuity” (green boxes), failing for “discontinuous continuity” (red boxes).

4.1 Different continuities

There are two types of continuities in light field, i.e., the ‘continuous continuity’ and ‘jumping continuity’ (Fig.4). When the disparity is small, epipolar lines are continuous (green boxes), previous single image super-resolution methods[7, 18, 20] can be applied directly in such case. However, the continuous epipolar lines become discrete points (red boxes) when the disparity increases. Previous image super-resolution methods treat the discrete epipolar line as independent points, so epipolar line may be lost or over-smoothed in the super-resolved new views, leading thin structure objects missing or becoming unclear. Compared with ‘continuous continuity’, the ‘jumping continuity’ is more common in light field reconstruction task. Because novel views can be synthesized directly by interpolation in 4D space[24, 4, 25] when the disparity is small, it is unnecessary to use expensive reconstruction techniques.

Figure 5: For each epipolar line, it can be projected to angular or spatial axis when fixing on of the axes for .

4.2 Light field as 2D series

Although the above two continuity cases appear to be distinct, they share a common characteristics, i.e., the pixels in these two different epipolar lines are all connected together through their disparities.

For each light ray in free space (without occlusion), there is a corresponding ray describing the same 3D point in any other view , such that

(2)

where refers to the disparity of . In such case, the light field is a series in an angular space.

From another point of view, there is a corresponding ray in any other pixel position , such that

(3)

Here, the light field is also a series, but in a Cartesian space.

Fig.5 illustrates the above assertions, where each epipolar line can be projected to an angular or spatial axis by fixing one of the axes given . In other words, the ray in the light field is predictable if the disparity is known. Hence a light field may be treated as a 2D series allowing for the use of LSTM.

Figure 6: The architecture of our neural network.

4.3 CNN-LSTM for EPI super-resolution

Considering large disparities in the light field, we propose a CNN-LSTM network whose architecture is shown in Fig.6. The overall network is inspired from the U-network in EPI analysis[13, 14]. Our network has four “levels”, where each of these accounts for the EPI at different resolutions. In contrast with previous work, four c-LSTM[40] layers are added at each level (the purple blocks in Fig.6), to model the series nature of the EPI in the top-down, bottom-up, left-right and right-left directions, respectively. In our network, each c-LSTM has 100 channels while the kernel size is .

When processing a low-resolution input EPI, this is firstly scaled 2 times up in both angular and spatial dimensions using Bicubic interpolation. Before LSTM analysis at each level, 4 convolutional layers are applied. These layers have kernels of size . The channels of these convolutional kernels equal to for the -th level. After LSTM analysis, three convolution layers are added with kernel size and channels , and

. Note that each convolutional layer is followed by a ReLU

[27]. Different levels in Fig.6 are connected by down and up-convolutional layers with kernel size .

It’s worth noting in passing that the proposed light field super-resolution network takes a 2D EPI as input. Despite effective, this induces an inherent ambiguity as related to spatial super-resolution. The reason being that each EPI only covers one spatial dimension whereas every pixel is related to two spatial dimensions in the super-resolution process ( pixel to pixels). This, however, as we will see later on in Section 5, does not overly affect the performance of our method in practice.

4.4 Datasets

In order to train and evaluate our network, we build an automatic light field generator based on POV-ray[30, 34] to render light fields. Fig.7 shows some examples. For training and testing we have included various challenging environments in our dataset. These include inter-reflection, occlusion, shadowing, various illumination conditions and structures with fine detail.

We augment the training data using two approaches. The first of these is exchanging the RGB channels. The second consists of shearing EPIs[28] using the expression

(4)

where and are the original and sheared EPIs, respectively. The main goal of the shearing operation is to enhance the performance in negative disparity areas.

Figure 7: Light field examples. Top row: central views; Bottom row: corresponding occlusion maps.
(a) Recorded light field
(b) Allowed occ.
(c) Forbidden occ.
Figure 8: Flip operation causing the network to learn incorrect occlusions. (a) Recorded light field[23]: The red lines refer to sampled views while the blue/green lines account for the EPI lines; (b) Occlusion after reconstruction; (c) Incorrect occlusion caused by the flip operation.

Note that, the flip operation as commonly used for data augmentation in traditional image super-resolution[7], can not be applied in EPI super-resolution. This is since, in its traditional form, the network will learn a wrong occlusion. Note that, as shown in Fig.8(a), the intersection between foreground and background is lost after the light field sampling. Thus, the flip operation will lead the wrong occlusion to be learnt, causing forbidden occlusions to appear in the reconstructed light fields. This is shown in Fig.8(c), where an incorrect light field is reconstructed when the foreground is occluded by the background.

5 Experimental Results

We compare our method with a combination of state-of-the-art light field reconstruction methods such as EPICNN[39] and LFST[35]

and deep-learning based image super-resolution methods such as SRCNN

[7]. All the results shown here are evaluated using the code released by the authors.

We evaluate the performance of the proposed method both on synthetic and real light fields. All the quantitative comparisons shown here are average values for all the views under study. As our network is trained on synthetic data, to be fair, the data introduced in Sec.4.4 is only used to validate the efficacy of the LSTM layers. Real-world data from camera array [1, 17] is used for comparison. Here we have not used the light fields from the Lytro Illum camera due to their small disparity.

5.1 Synthetic data

In Tab.1, we show the quantitative comparison between the results yielded by our network with and without the LSTM layers. Fig.9 shows a plot of the PSNR for both settings as a function of disparity. Note that, as expected, our network with LSTM layers outperforms the one without over almost all of the disparity range.

Furthermore, Fig.10 shows qualitative results. Notice that the network with LSTM delivers more detail than the one without LSTM in the reconstructed views. This is mainly due to the fact that discrete EPI lines in these areas are over-smoothed as shown in the bottom EPI comparison in Fig.10 when LSTM is not included. This is consistent with the notion that LSTM can better cope with the “discontinuous continuity” in EPIs.

PSNR (dB) SSIM
w. LSTM 28.34 0.886
w/o. LSTM 27.75 0.863
Table 1: Quantitative comparison between the results yielded by our network with and without LSTM layers.
Figure 9: Performance of our network with and without LSTM layers as a function of disparity.
Figure 10: Qualitative comparisons between the ground truth and the results from networks with and without LSTM layers, respectively. Compared with the one without LSTM, LSTM provides more clear novel views and more accurate EPI lines.

5.2 Real data

5.2.1 Comparison with State-of-the-arts

For purposes of comparison on real-data, we have used the Stanford light field dataset (SLFD)[1]. In order to compare the performance of different methods more fairly, we zoom out each view of the SLFD to of the original size. Recall that the disparity range decreases with respect to the zoom out factor. Thus, for the readers reference, we show the disparity ranges of the originally sized view are shown in Tab. 2.

Data Amethyst Bulldozer Bunny Chess Lego Truck
Dis.
Table 2: Disparity ranges of the SLFD [1].
Factor EPICNN[39]+SRCNN[7] LFST[35]+SRCNN[7] Ours
PSNR(dB) SSIM PSNR SSIM PSNR SSIM
0.2 28.77 0.898 32.38 0.945 32.17 0.956
0.3 28.28 0.891 32.20 0.940 32.85 0.958
0.4 27.97 0.889 32.27 0.944 33.44 0.961
0.5 27.71 0.888 31.19 0.935 33.70 0.960
Table 3: Quantitative comparison on SLFD.
(a) Bulldozer
(b) Legoknights
Figure 11: Qualitative comparison on the Bulldozer and the Lego. For each light field, the first and second rows show the results at different resolution inputs. For each of the zoom in areas in red and green, the left panel shows the reconstructed view, while the right two rectangular panels shwo the horizontal and vertical EPIs computed from the reconstructed light field.

Tab. 3 shows a quantitative comparisons of our method with respect to the alternatives on SLFD. Note that our method outperforms the alternatives at almost all of the zoom out factors. Although our network only employs synthetic data during the training process, it shows a good generalization ability as applied to unseen camera array data. Fig.11 shows qualitative demonstrations111More results are provided in the supplementary material.. For each scene in Fig.11, the first and second rows refer to results for zoom out factors and , respectively. Note that EPICNN and LFST achieve in general similar performance as ours at small zoom out factors. However, their performance decreases at large zoom out factors and they tend to over-smooth object boundaries. Ours, in the other hand, can always maintain sharp object boundaries at both small and large zoom out factors. For example, in Fig.11(a), the boundaries of Bulldozer are all preserved well. However, previous methods often fail at large zoom out factors.

Ours vs EPICNN: Compared with the state-of-the-art, our method has achieves at least a 3dB lead (32.17 vs 28.77). This advantage increases with the disparity. Fig.11(a) gives a better comparison in larger disparity areas. There are serious ghosting in the shovel boundaries recovered by EPICNN. Since the maximum disparity at these areas is about pixels, the EPI consistency on the shovel is lost by EPICNN, while our result remains sharp.

Ours vs LFST: Despite LFST achieves good results in large positive disparity regions (such as the shovel boundaries in Fig.11(a)), the results at negative disparity regions are somewhat mediocre. The best example is Fig.11(b), where the areas in front of and behind the toy warriors have positive and negative disparities, respectively. LFST induces ghosting in large negative disparity areas. In contrast, our method produces consistent results in both positive and negative areas. This is thanks to the shearing (Eqn.4) data augmentation and the LSTM’s ability to model EPIs. Furthermore, in contrast with our approach, LFST often generates artifacts in texture boundaries, as shown in some of the green boxes on Fig. 11(a).

5.2.2 Disparity vs Resolution

As shown in Tab. 3, the performance of our method increases in a manner commensurate with the scale. To better illustrate this, we conducted another two experiments by fixing the spatial resolution and disparity range, respectively. To this end, we have used the Disney light field dataset[17]222Details of the light fields are provided in the supplementary material.. Since it has high angular and spatial resolutions, we have conducted experiments by controlling the number of skipped views. Generally, a larger number of skipped views leads to a larger disparity range and lower resolution.

Tab. 4 and 5 show quantitative comparisons of the proposed method in these two experiments, respectively. The performance of our method decreases with increases in disparity. The performance increases with increments in resolution. In large disparity areas with complex textures, the EPI consistency is very weak, so a small error in EPI reconstruction leads to heavy artifacts in the reconstructed view. This can be seen in the reconstructed EPIs which are very close to the ground truth both in small disparity and large disparity cases in Fig.12. However, the corresponding view is well recovered at small disparity being overly smoothed at large disparity. Regardless, both the angular and spatial resolution are improved by our network. Further, the defects induced in the angular domain are compensated by the super-resolution in the spatial domain. As a result, the performance of our method seemingly increases as the zoom out factor increases in Tab. 3.

Skipped views Bikes Couch
PSNR(dB) SSIM PSNR SSIM
1 (low dis.)
3
5
7 (high dis.)
Table 4: Comparison of our method with the alternatives with fixed resolution and increasing disparity.
Skipped views Bikes Couch
PSNR(dB) SSIM PSNR SSIM
7 (low res.)
5
3
1 (high res.)
Table 5: Comparisons of our method with the alternatives with fixed disparity and increasing resolution.
Figure 12: Results yielded by our method with fixed resolution for several disparity values.

6 Conclusion

In this paper we have shown that, since most 3D points in a scene are generally defocused in an LFC, contrary to a common belief as per the spatial-angular resolution trade-off, the resolution of an LFC is in fact larger than its number of micro-lenses. This new insight provides a theoretical basis to overcome the barrier of “spatio-angular trade-off”. By analyzing the light path in an LFC, we have identified two different types of “continuity” in EPIs. We have proposed a novel CNN-LSTM network to practically super-resolve a high resolution light field in both, the spatial and angular axes. Experiments on synthetic and real-world light fields validate the superiority of the proposed method in large disparity areas, outperforming most of the state-of-the-art methods.

References

  • [1] The new stanford light field archive. http://lightfield.stanford.edu/lfs.html.
  • [2] T. E. Bishop and P. Favaro. The light field camera: Extended depth of field, aliasing, and superresolution. TPAMI, 34(5):972–986, 2012.
  • [3] M. Broxton, L. Grosenick, S. Yang, N. Cohen, A. Andalman, K. Deisseroth, and M. Levoy. Wave optics theory and 3-d deconvolution for the light field microscope. Optics express, 21(21):25418–25439, 2013.
  • [4] J.-X. Chai, X. Tong, S.-C. Chan, and H.-Y. Shum. Plenoptic sampling. In SIGGRAPH, pages 307–318. ACM Press/Addison-Wesley Publishing Co., 2000.
  • [5] G. Chaurasia, S. Duchene, O. Sorkine-Hornung, and G. Drettakis. Depth synthesis and local warps for plausible image-based navigation. TOG, 32(3):30, 2013.
  • [6] G. Chaurasia, O. Sorkine, and G. Drettakis. Silhouette-aware warping for image-based rendering. In CGF, volume 30, pages 1223–1232. Wiley Online Library, 2011.
  • [7] C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convolutional networks. TPAMI, 38(2):295–307, 2016.
  • [8] M. Eisemann, B. De Decker, M. Magnor, P. Bekaert, E. De Aguiar, N. Ahmed, C. Theobalt, and A. Sellent. Floating textures. In CGF, volume 27, pages 409–418. Wiley Online Library, 2008.
  • [9] T. Georgiev, K. C. Zheng, B. Curless, D. Salesin, S. K. Nayar, and C. Intwala. Spatio-angular resolution tradeoffs in integral photography. Rendering Techniques, 2006(263-272):21, 2006.
  • [10] T. G. Georgiev and A. Lumsdaine. Superresolution with plenoptic 2.0 cameras. In Signal recovery and synthesis, page STuA6. Optical Society of America, 2009.
  • [11] M. Goesele, J. Ackermann, S. Fuhrmann, C. Haubold, R. Klowsky, D. Steedly, and R. Szeliski. Ambient point clouds for view interpolation. In TOG, volume 29, page 95. ACM, 2010.
  • [12] S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen. The lumigraph. In SIGGRAPH, pages 43–54. ACM, 1996.
  • [13] S. Heber, W. Yu, and T. Pock. U-shaped networks for shape from light field. In BMVC, volume 3, page 5, 2016.
  • [14] S. Heber, W. Yu, and T. Pock. Neural epi-volume networks for shape from light field. In IEEE CVPR, pages 2252–2260, 2017.
  • [15] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  • [16] N. K. Kalantari, T.-C. Wang, and R. Ramamoorthi. Learning-based view synthesis for light field cameras. TOG, 35(6), 2016.
  • [17] C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. H. Gross. Scene reconstruction from high spatio-angular resolution light fields. TOG, 32(4):73–1, 2013.
  • [18] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-resolution using very deep convolutional networks. In IEEE CVPR, pages 1646–1654, 2016.
  • [19] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097–1105, 2012.
  • [20] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang. Deep laplacian pyramid networks for fast and accurate super-resolution. In IEEE CVPR, pages 624–632, 2017.
  • [21] Y. LeCun, B. E. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. E. Hubbard, and L. D. Jackel. Handwritten digit recognition with a back-propagation network. In NIPS, pages 396–404, 1990.
  • [22] A. Levin and F. Durand. Linear view synthesis using a dimensionality gap light field prior. In IEEE CVPR, pages 1831–1838, 2010.
  • [23] A. Levin, W. T. Freeman, and F. Durand. Understanding camera trade-offs through a bayesian analysis of light field projections. In Springer ECCV, pages 88–101, 2008.
  • [24] M. Levoy and P. Hanrahan. Light field rendering. In SIGGRAPH, pages 31–42. ACM, 1996.
  • [25] Z. Lin and H.-Y. Shum. A geometric analysis of light field rendering. IJCV, 58(2):121–138, 2004.
  • [26] Lytro. Lytro redefines photography with light field cameras. http://www.lytro.com, 2011.
  • [27] V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, pages 807–814, 2010.
  • [28] R. Ng et al. Digital light field photography. stanford university Stanford, CA, 2006.
  • [29] E. Penner and L. Zhang. Soft 3d reconstruction for view synthesis. TOG, 36(6):235, 2017.
  • [30] POV-ray. http://www.povray.org/.
  • [31] Raytrix. raytrix. http://www.raytrix.de, 2012.
  • [32] L. Shi, H. Hassanieh, A. Davis, D. Katabi, and F. Durand. Light field reconstruction using sparsity in the continuous fourier domain. TOG, 34(1):12, 2014.
  • [33] P. P. Srinivasan, T. Wang, A. Sreelal, R. Ramamoorthi, and R. Ng. Learning to synthesize a 4d rgbd light field from a single image. In IEEE ICCV, volume 2, page 6, 2017.
  • [34] G. Tran. Oyonale - 3d art and graphic experiments. http://www.oyonale.com/.
  • [35] S. Vagharshakyan, R. Bregovic, and A. Gotchev. Light field reconstruction using shearlet transform. TPAMI, 40(1):133–147, 2018.
  • [36] T.-C. Wang, J.-Y. Zhu, N. K. Kalantari, A. A. Efros, and R. Ramamoorthi. Light field video capture using a learning-based hybrid imaging system. TOG, 36(4):133, 2017.
  • [37] Y. Wang, F. Liu, Z. Wang, G. Hou, Z. Sun, and T. Tan. End-to-end view synthesis for light field imaging with pseudo 4dcnn. In Springer ECCV, pages 340–355, 2018.
  • [38] S. Wanner and B. Goldluecke. Variational light field analysis for disparity estimation and super-resolution. TPAMI, 36(3):606–619, 2014.
  • [39] G. Wu, M. Zhao, L. Wang, Q. Dai, T. Chai, and Y. Liu. Light field reconstruction using deep convolutional network on epi. In IEEE CVPR, volume 2017, page 2, 2017.
  • [40] S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo.

    Convolutional lstm network: A machine learning approach for precipitation nowcasting.

    In NIPS, pages 802–810, 2015.
  • [41] H. W. F. Yeung, J. Hou, J. Chen, Y. Y. Chung, and X. Chen. Fast light field reconstruction with deep coarse-to-fine modelling of spatial-angular clues. In Springer ECCV, pages 137–152, 2018.
  • [42] Y. Yoon, H.-G. Jeon, D. Yoo, J.-Y. Lee, and I. So Kweon. Learning a deep convolutional network for light-field image super-resolution. In IEEE ICCV Workshops, pages 24–32, 2015.
  • [43] T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely. Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817, 2018.