1 Introduction
Light field (LF) cameras provide multiple views of a scene, and thus enable many attractive applications such as post-capture refocusing [32], depth sensing [26], saliency detection [19], and de-occlusion [31]. However, LF cameras face a trade-off between spatial and angular resolutions [49]. That is, they either provide dense angular samplings with a low image resolution (e.g., Lytro111https://www.lytro.com and RayTrix222https://www.raytrix.de), or capture high-resolution (HR) sub-aperture images (SAIs) with sparse angular samplings (e.g., camera arrays [37, 30]). Consequently, many efforts have been made to improve the angular resolution through LF reconstruction [39, 38], or the spatial resolution through LF image super-resolution (SR) [1, 46, 23, 33, 41]. In this paper, we focus on the LF image SR problem, namely, to reconstruct HR SAIs from their corresponding low-resolution (LR) SAIs.

Image SR is a long-standing problem in computer vision. To achieve high reconstruction performance, SR methods need to incorporate as much useful information as possible from LR inputs. In the area of single image SR, good performance can be achieved by fully exploiting the neighborhood context (
i.e., spatial information) in an image. Using the spatial information, single image SR methods [5, 15, 20, 47] can successfully hallucinate missing details. In contrast, LF cameras capture scenes from multiple views. The complementary information among different views (i.e., angular information) can be used to further improve the performance of LF image SR.However, due to the complicated 4D structures of LFs [18], it is highly challenging to incorporate spatial and angular information in an LF. Existing LF image SR methods fail to fully exploit both the angular information and the spatial information, resulting in limited SR performance. Specifically, in [43, 42, 44], SAIs are first super-resolved separately using single image SR methods [5, 20], and then fine-tuned together to incorporate the angular information. The angular information is ignored by these two-stage methods [43, 42, 44] during their upsampling process. In [33, 46], only part of SAIs are used to super-resolve one view, and the angular information in these discarded views is not incorporated. In contrast, Rossi et al. proposed a graph-based method [23] to consider all angular views in an optimization process. However, this method [23]
cannot fully use the spatial information, and is inferior to deep learning-based SR methods
[20, 47, 46, 41]. It is worth noting that, even all views are fed to a deep network, it is still challenging to achieve superior performance. Yeung et al. proposed a deep network named LFSSR [41] to consider all views for LF image SR. However, as shown in Fig. 1, LFSSR [41] is inferior to resLF [46], EDSR [20], and RCAN [47].The spatial information and the angular information are highly coupled in 4D LFs, and contribute to LF image SR in different manners. Consequently, it is difficult for networks to perform well directly using these coupled information. To efficiently incorporate spatial and angular information, we propose a spatial-angular interactive network (i.e., LF-InterNet) for LF image SR. We first specifically design two convolutions to decouple spatial and angular features from an input LF. Then, we develop LF-InterNet to repetitively interact and incorporate spatial and angular information. Extensive ablation studies have been conducted to validate our designs. We compare our method to the state-of-the-art single and LF image SR methods on 6 public LF datasets. As shown in Fig. 1, our LF-InterNet substantially improves the PSNR and SSIM performance as compared to existing SR methods.
2 Related Works
In this section, we review several major works on single image SR and LF image SR.
2.1 Single Image SR
In the area of single image SR, deep learning-based methods have been extensively explored. Readers are referred to recent surveys [34, 3, 40] for more details in single image SR. Here, we only review several milestone works. Dong et al. proposed the first CNN-based SR method (i.e., SRCNN [5]) by cascading 3 convolutional layers. Although SRCNN [5] is shallow and simple, it achieves significant improvements over traditional SR methods [28, 14, 45]. Afterwards, SR networks became increasingly deep and complex, and thus more powerful in spatial information exploitation. Kim et al. proposed a very deep SR network (i.e., VDSR [15]) with 20 convolutional layers. Global residual learning is applied to VDSR [15] to avoid slow convergence. Lim et al. proposed an enhanced deep SR network (i.e., EDSR [20]) with 65 convolutional layers by cascading residual blocks [9]. EDSR substantially improves its performance by applying both local and global residual learning, and won the NTIRE 2017 Challenge on single image SR [27]. More recently, Zhang et al. proposed a residual dense network (i.e., RDN [48]) with 149 convolutional layers by combining ResNet [9] with DenseNet [12]. Using residual dense connections, RDN [48] can fully extract hierarchical features for image SR, and thus achieve further improvements over EDSR [20]. Subsequently, Zhang et al. proposed a residual channel attention network (i.e., RCAN) [47] by applying both recursive residual mechanism and channel attention module [11]. RCAN [47] has 500 convolutional layers, and is one of the most powerful SR methods to date.
2.2 LF image SR
In the area of LF image SR, different paradigms have been proposed. Most early works follow the traditional paradigm. Bishop et al. [4]
first estimated the scene depth and then used a de-convolution approach to estimate HR SAIs. Wanner
et al. [35] proposed a variational LF image SR framework using the estimated disparity map. Farrugia et al. [7] decomposed HR-LR patches into several subspaces, and achieved LF image SR via PCA analysis. Alain et al. extended SR-BM3D [6] to LFs, and super-resolved SAIs using LFBM5D filtering [1]. Rossi et al. [23] formulated LF image SR as a graph optimization problem. These traditional methods [4, 35, 7, 1, 23] use different approaches to exploit angular information, but cannot fully exploit spatial information.In contrast, deep learning-based SR methods are more effective in exploiting spatial information, and thus can achieve promising performance. Many deep learning-based methods have been recently developed for LF image SR. In the pioneering work proposed by Yoon et al. (i.e., LFCNN [43]), SAIs are first super-resolved separately using SRCNN [5], and then fine-tuned in pairs to incorporate angular information. Similarly, Yuan et al. proposed LF-DCNN [44], in which they used EDSR [20] to super-resolve each SAI and then fine-tuned the results. Both LFCNN [43] and LF-DCNN [44] handle the LF image SR problem in two stages, and do not use angular information in the first stage. Different from [43, 44], Wang et al. proposed LFNet [33] by extending BRCN [13] to LF image SR. In their method, SAIs from the same row (or column) are fed to a recurrent network to incorporate the angular information. Zhang et al. stacked SAIs along different angular directions to generate input volumes, and then proposed a multi-stream residual network named resLF [46]. Both LFNet [33] and resLF [46] reduce 4D LF to 3D LF by using part of SAIs to super-resolve one view. Consequently, the angular information in these discarded views cannot be incorporated. To consider all views for LF image SR, Yeung et al. proposed LFSSR [41] to alternately shuffle LF features between SAI pattern and macro-pixel image (MPI) pattern for convolution. However, the complicated LF structure and coupled information have hindered the performance gain of LFSSR [41].
3 Method
In this section, we first introduce the approach to decouple spatial and angular features in Section 3.1, and then present our network in details in Section 3.2.
3.1 Spatial-Angular Feature Decoupling

An LF has 4D structures and can be denoted as , where and represent the angular dimensions (e.g., for a LF), and represent the height and width of each SAI. Intuitively, an LF can be considered as a 2D angular collection of SAIs, and the SAI at each angular coordinate can be denoted as . Similarly, an LF can also be organized into an MPI, namely, a 2D spatial collection of macro-pixels. The macro-pixel at each spatial coordinate can be denoted as . An illustration of these two LF representations is shown in Fig. 2. Note that, when an LF is organized as a 2D SAI array, the angular information is implicitly contained among different SAIs and thus is hard to extract. Therefore, we use the MPI representation in our method, and specifically design two special convolutions (i.e., Angular Feature Extractor (AFE) and Spatial Feature Extractor (SFE)) to extract and decouple angular and spatial features.
Since most methods use SAIs distributed in a square array as their input, we follow [1, 23, 43, 42, 41, 46] to set in our method, where denotes the angular resolution. Given an LF of size , an MPI of size can be generated by organizing macro-pixels of size according to their spatial coordinates. Here, we use a toy example in Fig. 3
to illustrate the processes for angular and spatial feature extraction. Specifically, AFE is defined as a convolution with a kernel size of
and a stride of
. Padding is not performed so that features generated by AFE have a size of
, where represents the feature depth. In contrast, SFE is defined as a convolution with a kernel size of , a stride of , and a dilation of . We perform zero padding to ensure the output features to have the same spatial size as the input MPI. It is worth noting that, during angular feature extraction, each macro-pixel can be exactly convolved by AFE, while the information across different macro-pixels will not be aliased. Similarly, during spatial feature extraction, pixels in each SAI can be convolved by the SFE, while the angular information will not be involved. In this way, the spatial and angular information in an LF is decoupled.Due to the 3D property of real scenes, objects of different depths have different disparity values in LFs. Consequently, pixels of an object among different views cannot always locate at a single macro-pixel. To handle this problem, we apply AFE and SFE for multiple times (i.e., performing spatial-angular interaction) in our network. As shown in Fig. 4, in this way, the receptive field can be enlarged to cover pixels with disparities.



3.2 Network Design
Our LF-InterNet takes an LR MPI of size as its input and produces an HR SAI array of size , where denotes the upsampling factor. Following [46, 41, 33, 44], we convert RGB images into YCbCr color space, and only super-resolve Y channel images. An overview of our network is shown in Fig. 5.
3.2.1 Overall Architecture
Given an LR MPI , the angular and spatial features are first extracted by AFE and SFE, respectively.
(1) |
where and respectively represent the extracted angular and spatial features, and respectively represent the angular and spatial feature extractors (as described in Section 3.1). After initial feature extraction, features and are further processed by a series of interaction groups (i.e., Inter-Groups, see Section 3.2.2) to achieve spatial-angular feature interaction:
(2) |
where denotes the Inter-Group and denotes the total number of Inter-Groups.
Inspired by RDN [48], we cascade all these Inter-Groups to fully use the information interacted at different stages. Specifically, features generated by each Inter-Group are concatenated and fed to a bottleneck block to fuse the interacted information. The feature generated by the bottleneck block is further added with the initial feature to achieve global residual learning. The fused feature can be obtained by
(3) |
where denotes the bottleneck block, denotes the concatenation operation. Finally, the fused feature is fed to the reconstruction module, and an HR SAI array can be obtained by
(4) |
where , , and represent LF shuffle, pixel shuffle, and convolution, respectively. More details about feature fusion and reconstruction are introduced in Section 3.2.3.
3.2.2 Spatial-Angular Feature Interaction
The basic module for spatial-angular interaction is the interaction block (i.e., Inter-Block). As shown in Fig. 5 (b), the Inter-Block takes a pair of angular and spatial features as inputs to achieve interaction. Specifically, the input angular feature is first upsampled by a factor of . Here, a convolution followed by a pixel shuffle layer is used for upsampling. Then, the upsampled angular feature is concatenated with the input spatial feature, and further fed to an SFE to incorporate the spatial and angular information. In this way, the complementary angular information can be used to guide spatial feature extraction. Simultaneously, the new angular feature is extracted from the input spatial feature by an AFE, and then concatenated with the input angular feature. The concatenated angular feature is further fed to a convolution to integrate and update the angular information. Note that, the fused angular and spatial features are added with their respective input features to achieve local residual learning. In this paper, we cascade Inter-Blocks in an Inter-Group, i.e., the output of an Inter-Block forms the input of its subsequent Inter-Block. In summary, the spatial-angular feature interaction can be formulated as
(5) |
where and represent the output spatial and angular features of the Inter-Block in the Inter-Group, respectively, represents the upsampling operation.
3.2.3 Feature Fusion and Reconstruction
The objective of this stage is to fuse the interacted features to reconstruct an HR SAI array. The fusion and reconstruction stage mainly consists of bottleneck fusion (Fig. 5 (c)), channel extension, LF shuffle (Fig. 5 (d)), pixel shuffle (Fig. 5 (e)), and final reconstruction.
In the bottleneck, the concatenated angular features are first fed to a
convolution and a ReLU layer to generate a feature map
. Then, the squeezed angular feature is upsampled and concatenated with spatial features. The final fused feature can be obtained as(6) |
After the bottleneck, we apply another SFE layer to extend the channel size of to for pixel shuffle [25]. However, since is organized in the MPI pattern, we apply LF shuffle to convert into an SAI array representation for pixel shuffle. To achieve LF shuffle, we first extract pixels with the same angular coordinates in the MPI feature, and then re-organize these pixels according to their spatial coordinates, which can be formulated as
(7) |
where
(8) | ||||
Here, and denote the pixel coordinates in the shuffled SAI arrays, and denote the corresponding coordinates in the input MPI, represents the round-down operation. The derivation of Eqs. (7) and (8) is presented in the supplemental material.
Finally, a convolution is applied to squeeze the number of feature channels to for HR SAI reconstruction.
4 Experiments
In this section, we first introduce the datasets and our implementation details, then conduct ablation studies to investigate our network. Finally, we compare our LF-InterNet to recent LF image SR and single image SR methods.
4.1 Datasets and Implementation Details
Datasets | Type | Training | Test |
EPFL [22] | real-world | ||
HCInew [10] | synthetic | ||
HCIold [36] | synthetic | ||
INRIA [17] | real-world | ||
STFgantry [29] | real-world | ||
STFlytro [21] | real-world | ||
Total | — |
As listed in Table 1, we used public LF datasets in our experiments. All the LFs in the training and test sets have an angular resolution of . In the training stage, we first cropped each SAI into patches with a size of , and then used bicubic downsampling with a factor of to generate LR patches. The generated LR patches were re-organized into MPI pattern to form the input of our network. The loss function was used since it can generate good results for the SR task and is robust to outliers [2]. Following the recent works [46, 26], we augmented the training data by times using random horizontal flipping, vertical flipping, and -degree rotation. Note that, during each data augmentation, all SAIs need to be flipped and rotated along both spatial and angular directions to maintain their LF structures.
By default, we used the model with , , , and angular resolution of for both and SR. We also investigated the performance of other branches of our LF-InterNet in Section 4.2. We used PSNR and SSIM as quantitative metrics for performance evaluation. Note that, PSNR and SSIM were separately calculated on the Y channel of each SAI. To obtain the overall metric score for a dataset with scenes (each with an angular resolution of ), we first obtain the score for a scene by averaging its scores, and then get the overall score by averaging the scores of all scenes.
Our LF-InterNet
was implemented in PyTorch on a PC with an Nvidia RTX 2080Ti GPU. Our model was initialized using the
Xavier method [8] and optimized using the Adam method [16]. The batch size was set to and the learning rate was initially set to and decreased by a factor of for every epochs. The training was stopped after epochs and took about day.4.2 Ablation Study
In this subsection, we compare the performance of our LF-InterNet with different architectures and angular resolutions to investigate the potential benefits introduced by different modules.
4.2.1 Network Architecture
Model | PSNR | SSIM | Params. |
LF-InterNet-onlySpatial | |||
LF-InterNet-onlyAngular | |||
LF-InterNet-SAcoupled | |||
LF-InterNet | |||
Bicubic | — | ||
VDSR [15] | |||
EDSR [20] |
SR. Note that, the results of bicubic interpolation,
VDSR [15], and EDSR [20] are also listed as baselines.Angular information. We investigated the benefit of angular information by removing the angular path in LF-InterNet. That is, we only use SFE for LF image SR. Consequently, the network is identical to a single image SR network, and can only incorporate spatial information within each SAI. As shown in Table 2, only using the spatial information, the network (i.e., LF-InterNet-onlySpatial) achieves a PSNR of and a SSIM of . Both the performance and the parameter number of LF-InterNet-onlySpatial is between VDSR [15] and EDSR [20].
Spatial information. To investigate the benefit introduced by spatial information, we changed the kernel size of all SFEs from to . In this case, the spatial information cannot be exploited and integrated by convolutions. As shown in Table 2, the performance of LF-InterNet-onlyAngular is even inferior to that of bicubic interpolation. That is because, neighborhood context in an image is highly significant in recovering details. Consequently, spatial information plays a major role in LF image SR, while angular information can only be used as a complimentary part to spatial information but cannot be used alone.
Information decoupling. To investigate the benefit of spatial-angular information decoupling, we stacked all SAIs along the channel dimension as input, and used convolutions with a stride of to extract both spatial and angular information from these stacked images. Note that, the cascaded framework with global and local residual learning was maintained to keep the overall network architecture unchanged, and the feature depth was set to to keep the number of parameters comparable to that of LF-InterNet. As shown in Table 2, LF-InterNet-SAcoupled is inferior to LF-InterNet. That is, with comparable number of parameters, LF-InterNet can handle the 4D LF structure and achieve LF image SR in a more efficient way.
IG_1 | IG_2 | IG_3 | IG_4 | PSNR | SSIM | Params. |
AngRes | Scale | PSNR | SSIM | Params. |
Method | Scale | Params. | Dataset | ||||||
EPFL [22] | HCInew [10] | HCIold [36] | INRIA [17] | STFgantry [29] | STFlytro [21] | Average | |||
Bicubic | — | 29.500.935 | 31.690.934 | 37.460.978 | 31.100.956 | 30.820.947 | 33.020.950 | 32.270.950 | |
VDSR [15] | 32.010.959 | 34.370.956 | 40.340.985 | 33.800.972 | 35.800.980 | 35.910.970 | 35.370.970 | ||
EDSR [20] | 32.860.965 | 35.020.961 | 41.110.988 | 34.610.977 | 37.080.985 | 36.870.975 | 36.260.975 | ||
RCAN [47] | 33.460.967 | 35.560.963 | 41.590.989 | 35.180.978 | 38.180.988 | 37.320.977 | 36.880.977 | ||
LFBM5D [1] | — | 31.150.955 | 33.720.955 | 39.620.985 | 32.850.969 | 33.550.972 | 35.010.966 | 34.320.967 | |
GBSQ [23] | — | 31.220.959 | 35.250.969 | 40.210.988 | 32.760.972 | 35.440.983 | 35.040.956 | 34.990.971 | |
LFSSR_4D [41] | 32.560.967 | 34.470.960 | 41.040.989 | 34.060.976 | 34.080.975 | 36.620.976 | 35.470.974 | ||
resLF [46] | 33.220.969 | 35.790.969 | 42.300.991 | 34.860.979 | 36.280.985 | 35.800.970 | 36.380.977 | ||
LF-InterNet_32 | 34.430.975 | 36.960.974 | 43.990.994 | 36.310.983 | 37.400.989 | 38.470.982 | 37.880.983 | ||
LF-InterNet_64 | 34.760.976 | 37.200.976 | 44.650.995 | 36.640.984 | 38.480.991 | 38.810.983 | 38.420.984 | ||
Bicubic | — | 25.140.831 | 27.610.851 | 32.420.934 | 26.820.886 | 25.930.843 | 27.840.855 | 27.630.867 | |
VDSR [15] | 26.820.869 | 29.120.876 | 34.010.943 | 28.870.914 | 28.310.893 | 29.170.880 | 29,380.896 | ||
EDSR [20] | 27.820.892 | 29.940.893 | 35.530.957 | 29.860.931 | 29.430.921 | 30.290.903 | 30.480.916 | ||
RCAN [47] | 28.310.899 | 30.250.896 | 35.890.959 | 30.360.936 | 30.250.934 | 30.660.909 | 30.950.922 | ||
LFBM5D [1] | — | 26.610.869 | 29.130.882 | 34.230.951 | 28.490.914 | 28.300.900 | 29.070.881 | 29.310.900 | |
GBSQ [23] | — | 26.020.863 | 28.920.884 | 33.740.950 | 27.730.909 | 28.110.901 | 28.370.973 | 28.820.913 | |
LFSSR_4D [41] | 27.390.894 | 29.610.893 | 35.400.962 | 29.260.930 | 28.530.908 | 30.260.908 | 30.080.916 | ||
resLF [46] | 27.860.899 | 30.370.907 | 36.120.966 | 29.720.936 | 29.640.927 | 28.940.891 | 30.440.921 | ||
LF-InterNet_32 | 29.160.912 | 30.740.913 | 36.780.970 | 31.300.947 | 29.920.934 | 31.490.923 | 31.570.933 | ||
LF-InterNet_64 | 29.520.917 | 31.010.917 | 37.230.972 | 31.650.950 | 30.440.941 | 31.840.927 | 31.950.937 |
Note: Since the SR model of resLF [46] is unavailable, we cascaded two SR models for SR.
Spatial-Angular interaction. We investigated the benefits introduced by our spatial-angular interaction mechanism. Specifically, we canceled feature interaction in each Inter-Group by removing the upsampling and AFE modules in each Inter-Block (see Fig. 5 (b)). In this case, spatial and angular features can only be processed separately. When all interactions are removed, these spatial and angular features can only be incorporated by the bottleneck block. Table 3 presents the results achieved by our LF-InterNet with different numbers of interactions. It can be observed that, without any feature interaction, our network achieves comparable performance to the LF-InterNet-onlySpatial model ( vs in PSNR and vs in SSIM). That is, the angular and spatial information cannot be effectively incorporated by the bottleneck block without interactions. As the number of interactions increases, the performance is steadily improved. This clearly demonstrates the effectiveness of our spatial-angular feature interaction mechanism.
4.2.2 Angular Resolution
We compared the performance of LF-InterNet with different angular resolutions. Specifically, we extracted the central SAIs from the input LFs, and trained different models for both and SR. As shown in Table 4, the PSNR and SSIM values for both and SR are improved as the angular resolution is increased. That is because, additional views provide rich angular information for LF image SR. It is also notable that, the improvements tend to be saturated as the angular resolution increases from to (only dB improvement in PSNR). That is because, the complementary information provided by additional views is already sufficient. As the angular information is fully exploited, a further increase of views can only provide minor performance improvements.
4.3 Comparison to the State-of-the-arts


We compare our method to three milestone single image SR methods (i.e., VDSR [15], EDSR [20], and RCAN [47]) and four state-of-the-art LF image SR methods (i.e., LFBM5D [1], GBSQ [23], LFSSR [41], and resLF [46]). All these methods were implemented using their released codes and pre-trained models. We also present the results of bicubic interpolation as the baseline results. For simplification, we only present the results on LFs for and SR. Since the angular resolution of LFSSR [41] is fixed, we use its original version with input SAIs.
Quantitative Results. Quantitative results are presented in Table 5. For both and SR, our method (i.e., LF-InterNet_64) achieves the best results on all the datasets and surpasses existing methods by a large margin. For example, dB and dB PSNR improvements in average over the state-of-the-art LF image SR method resLF [46] can be observed for and SR, respectively. It is worth noting that, even the feature depth of our model is halved to , our method (i.e., LF-InterNet_32) can still achieve the highest SSIM scores on all the datasets and the highest PSNR scores on of the datasets as compared to existing methods. Note that, the numbers of parameters of LF-InterNet_32 are only for SR and for SR, which are significantly smaller than recent deep learning-based SR methods [47, 41, 46].
Qualitative Results. Qualitative results of and SR are shown in Figs. 6 and 7, with more visual comparisons being provided in our supplemental material. It can be observed from Fig. 6 that, our method can well preserve the textures and details (e.g., the horizontal stripes in the scene HCInew_origami and the stairway in the scene INRIA_Sculpture) in these super-resolved images. In contrast, although the single image SR method RCAN [47] achieves high PSNR and SSIM scores, the images generated by RCAN [47] are over-smoothed and poor in details. It can be observed from Fig. 7 that the visual superiority of our method is more obvious for SR. Since the input LR images are severely degraded by the down-sampling operation, the process of SR is highly ill-posed. Single image SR methods use spatial information only to hallucinate missing details, and they usually generate ambiguous and even fake textures (e.g., the window frame in scene EPFL_Palais generated by RCAN [47]). In contrast, LF image SR methods can use complementary angular information among different views to produce authentic results. However, the results generated by existing LF image SR methods [23, 41, 46] are relatively blurring. As compared to these single image and LF image SR methods, the results produced by our LF-InterNet are much more close to the groundtruth images.

Performance w.r.t. Perspectives. Since our LF-InterNet can super-resolve all SAIs in an LF, we further investigate the reconstruction quality with respect to different perspectives. We used the central views of scene HCIold_MonasRoom [36] as input to perform both and SR. The PSNR and SSIM values are calculated for each perspective and are visualized in Fig. 8. Since resLF [46] uses part of views to super-resolve different perspectives, the reconstruction qualities of resLF [46] for non-central views are relatively low. In contrast, our LF-InterNet jointly uses the angular information from all input views to super-resolve each perspective, and thus achieves much higher reconstruction qualities with a more balanced distribution among different perspectives.
5 Conclusion
In this paper, we proposed a deep convolutional network LF-InterNet for LF image SR. We first introduce an approach to extract and decouple spatial and angular features, and then design a feature interaction mechanism to incorporate spatial and angular information. Experimental results have clearly demonstrated the superiority of our method. Our LF-InterNet outperforms the state-of-the-art SR methods by a large margin in terms of PSNR and SSIM, and can recover rich details in the reconstructed images.
6 Acknowledgement
This work was partially supported by the National Natural Science Foundation of China (No. 61972435, 61602499), Natural Science Foundation of Guangdong Province, Fundamental Research Funds for the Central Universities (No. 18lgzd06).
References
- [1] (2018) Light field super-resolution via lfbm5d sparse coding. In 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 2501–2505. Cited by: Figure 1, §1, §2.2, §3.1, §4.3, Table 5.
- [2] (2019) SRLibrary: comparing different loss functions for super-resolution over various convolutional architectures. Journal of Visual Communication and Image Representation 61, pp. 178–187. Cited by: §4.1.
- [3] (2019) A deep journey into super-resolution: a survey. arXiv preprint arXiv:1904.07523. Cited by: §2.1.
- [4] (2011) The light field camera: extended depth of field, aliasing, and superresolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (5), pp. 972–986. Cited by: §2.2.
- [5] (2014) Learning a deep convolutional network for image super-resolution. In European Conference on Computer Vision (ECCV), pp. 184–199. Cited by: §1, §1, §2.1, §2.2.
- [6] (2015) Single image super-resolution via bm3d sparse coding. In European Signal Processing Conference (EUSIPCO), pp. 2849–2853. Cited by: §2.2.
- [7] (2017) Super resolution of light field images using linear subspace projection of patch-volumes. IEEE Journal of Selected Topics in Signal Processing 11 (7), pp. 1058–1071. Cited by: §2.2.
-
[8]
(2010)
Understanding the difficulty of training deep feedforward neural networks
. InProceedings of the International Conference on Artificial Intelligence and Statistics
, pp. 249–256. Cited by: §4.1. -
[9]
(2016)
Deep residual learning for image recognition.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 770–778. Cited by: §2.1. - [10] (2016) A dataset and evaluation methodology for depth estimation on 4d light fields. In Asian Conference on Computer Vision (ACCV), pp. 19–34. Cited by: Figure 1, Figure 4, Table 1, Table 5.
- [11] (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141. Cited by: §2.1.
- [12] (2017) Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708. Cited by: §2.1.
- [13] (2015) Bidirectional recurrent convolutional networks for multi-frame super-resolution. In Advances in Neural Information Processing Systems (NeurIPS), pp. 235–243. Cited by: §2.2.
- [14] (2010) Image super-resolution via sparse representation. IEEE Transactions on Image Processing 19 (11), pp. 2861–2873. Cited by: §2.1.
- [15] (2016) Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1646–1654. Cited by: Figure 1, §1, §2.1, §4.2.1, §4.3, Table 2, Table 5.
- [16] (2015) Adam: a method for stochastic optimization. Proceedings of the International Conference on Learning and Representation (ICLR). Cited by: §4.1.
- [17] (2018) Light field inpainting propagation via low rank matrix completion. IEEE Transactions on Image Processing 27 (4), pp. 1981–1993. Cited by: Figure 1, Table 1, Table 5.
- [18] (1996) Light field rendering. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 31–42. Cited by: §1.
- [19] (2014) Saliency detection on light field. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2806–2813. Cited by: §1.
- [20] (2017) Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 136–144. Cited by: Figure 1, §1, §1, §2.1, §2.2, §4.2.1, §4.3, Table 2, Table 5.
- [21] (2016) Stanford lytro light field archive. Cited by: Figure 1, Table 1, Table 2, Table 3, Table 4, Table 5.
- [22] (2016) New light field image dataset. In International Conference on Quality of Multimedia Experience (QoMEX), Cited by: Figure 1, Table 1, Table 5.
- [23] (2018) Geometry-consistent light field super-resolution via graph-based regularization. IEEE Transactions on Image Processing 27 (9), pp. 4207–4218. Cited by: Figure 1, §1, §1, §2.2, §3.1, §4.3, §4.3, Table 5.
- [24] (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 618–626. Cited by: Figure 4.
-
[25]
(2016)
Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network
. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1874–1883. Cited by: §3.2.3. - [26] (2018) Epinet: a fully-convolutional neural network using epipolar geometry for depth from light field images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4748–4757. Cited by: §1, §4.1.
- [27] (2017) Ntire 2017 challenge on single image super-resolution: methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 114–125. Cited by: §2.1.
- [28] (2013) Anchored neighborhood regression for fast example-based super-resolution. In Proceedings of the IEEE IInternational Conference on Computer Vision (ICCV), pp. 1920–1927. Cited by: §2.1.
- [29] (2008) The (new) stanford light field archive. Computer Graphics Laboratory, Stanford University 6 (7). Cited by: Figure 1, Table 1, Table 5.
- [30] (2013) Picam: an ultra-thin high performance monolithic camera array. ACM Transactions on Graphics 32 (6), pp. 166. Cited by: §1.
- [31] DeOccNet: learning to see through foreground occlusions in light fields. In Winter Conference on Applications of Computer Vision (WACV), 2020, Cited by: §1.
- [32] (2018) Selective light field refocusing for camera arrays using bokeh rendering and superresolution. IEEE Signal Processing Letters 26 (1), pp. 204–208. Cited by: §1.
- [33] (2018) LFNet: a novel bidirectional recurrent convolutional neural network for light-field image super-resolution. IEEE Transactions on Image Processing 27 (9), pp. 4274–4286. Cited by: §1, §1, §2.2, §3.2.
- [34] (2019) Deep learning for image super-resolution: a survey. arXiv preprint arXiv:1902.06068. Cited by: §2.1.
- [35] (2013) Variational light field analysis for disparity estimation and super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 36 (3), pp. 606–619. Cited by: §2.2.
- [36] (2013) Datasets and benchmarks for densely sampled 4d light fields.. In Vision, Modelling and Visualization (VMV), Vol. 13, pp. 225–226. Cited by: Figure 1, Figure 8, §4.3, Table 1, Table 5.
- [37] (2005) High performance imaging using large camera arrays. In ACM Transactions on Graphics, Vol. 24, pp. 765–776. Cited by: §1.
- [38] (2019) Learning sheared epi structure for light field reconstruction. IEEE Transactions on Image Processing 28 (7), pp. 3261–3273. Cited by: §1.
- [39] (2017) Light field reconstruction using deep convolutional network on epi. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6319–6327. Cited by: §1.
- [40] (2019) Deep learning for single image super-resolution: a brief review. IEEE Transactions on Multimedia. Cited by: §2.1.
- [41] (2018) Light field spatial super-resolution using deep efficient spatial-angular separable convolution. IEEE Transactions on Image Processing 28 (5), pp. 2319–2330. Cited by: Figure 1, §1, §1, §2.2, §3.1, §3.2, §4.3, §4.3, §4.3, Table 5.
- [42] (2017) Light-field image super-resolution using convolutional neural network. IEEE Signal Processing Letters 24 (6), pp. 848–852. Cited by: §1, §3.1.
- [43] (2015) Learning a deep convolutional network for light-field image super-resolution. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 24–32. Cited by: §1, §2.2, §3.1.
- [44] (2018) Light-field image superresolution using a combined deep cnn based on epi. IEEE Signal Processing Letters 25 (9), pp. 1359–1363. Cited by: §1, §2.2, §3.2.
- [45] (2010) On single image scale-up using sparse-representations. In International conference on Curves and Surfaces, pp. 711–730. Cited by: §2.1.
- [46] (2019) Residual networks for light field image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11046–11055. Cited by: Figure 1, §1, §1, §2.2, §3.1, §3.2, Figure 8, §4.1, §4.3, §4.3, §4.3, §4.3, Table 5.
- [47] (2018) Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 286–301. Cited by: Figure 1, §1, §1, §2.1, §4.3, §4.3, §4.3, Table 5.
- [48] (2018) Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2472–2481. Cited by: §2.1, §3.2.1.
- [49] (2019) Revisiting spatio-angular trade-off in light field cameras and extended applications in super-resolution.. IEEE transactions on visualization and computer graphics. Cited by: §1.
Comments
There are no comments yet.