1 Introduction
Light field camera is a powerful tool for capturing the 4D light field in a single shot. Compared with previous equipments, one of the most advantages of light field camera is passive depth estimation, which provides larger freedom for image segmentation in movie authoring industry. Due to the huge volume of 4D light field and the redundancy between different views, previous approaches
[35, 23]are either timeconsuming or not proposed for full 4D data. On the other hand, image segmentation is a fundamental task in computer vision domain and a key step from lowlevel image processing to highlevel image understanding. Accurate segmentation could be useful for face detection
[3], visual recognition [27], medical image [12] and so on. Since the regions of interest are different for different users or tasks, interactive segmentation is necessary to deal with this situation. In this paper, we develop a fast and accurate segmentation technique on full 4D light field data using the latest proposed light field superpixel (LFSP) algorithm [40].Light field image records spatial and angular information of the scene by a 4D function named , which has benefited many problems in computer vision, such as refocus [24], depth and scene flow estimation [32, 30, 39, 28], material recognition [31, 38, 37]
[4, 33]. By processing the ray recorded by light field, we could analyze the scene and solve some problems that could be intractable using traditional 2D image, such as segmentation ambiguity [40]. Some approaches [17, 35, 23] have been proposed to segment or edit light field, but the performance of these algorithms are not good enough because of two major difficulties mentioned by Jarabo et al. [16]. First, the volume of 4D data leads to poor efficiency in terms of running time and memory consumption. Second, the segmentation result of different views should be coherent to preserve the redundancy between different views.Light rays emitted from the region of a real object with similar characteristics are recorded and imaged by light field, which constitutes a LFSP. The LFSP can decrease the amount of data processing in spatial and angular domain. In position domain, the LFSP in each view contains many pixels from similar regions. In angle domain, the patch of each view is a part of a LFSP by fixing the angular dimension. Furthermore, since the 2D slice of the LFSP in each view records the light rays emitted from the same region, the LFSP remains coherent between different views and helps preserve the redundancy.
In this paper, we propose a novel light field graph structure based on LFSP. The users can interactively add labels on the central view image to specify the object of interest. The LFSP segmentation results and the user labels make up vertex of the graph. Then an energy function fusing position, appearance and disparity [35] information is established and optimized using the graph cuts [7].
The experiments on both synthetic and real light field data captured by Lytro [21] and ILLUM [15] demonstrate the effectiveness of the proposed algorithm. Quantitative comparisons show that the performance of our method in terms of accuracy and robustness is competitive with the stateoftheart methods. Moreover, thanks to its lower computational complexity, our method could be useful for some realtime applications.Our main contributions are summarized as follows,
(1) We propose a novel graphbased light field segmentation algorithm using LFSP. To the best of our knowledge, it is the first time such a solution using LFSP on full 4D light field data is available.
(2) Our method is capable of achieving competitive segmentation performance at a much lower running time complexity.
2 Related Work
As a fundamental problem in computer vision, segmentation has been extensively studied in recent decades. Various solutions are proposed to tackle the problem, such as region based segmentation [26], threshold based segmentation [20], graph based segmentation [10], learning based segmentation [9] etc. Among these methods, the graphcut algorithm [6, 7, 18, 10], which has been proved to be an effective multilabel segmentation method, is closely related to our work. For graphcut approaches, a graph structure consisting of vertices and edges is built to find a cut on the graph that minimizes the amount of energy by mincut/maxflow algorithm [5]. Although these segmentation methods have good performance in specific conditions, a thorough segmentation solution solving the ambiguity in defocus and occlusion boundary areas is still an open challenge.
Light field [19] records angular and spatial information of scene through 4D data. Disparity [35] that is closely related to depthmap is one of the most important features of light field. Several algorithms [32, 30, 39] have been proposed to generate an accurate disparity estimation. In this work, we use the algorithm developed by Zhu et al. [39] to estimate the disparity. Furthermore, the structure of the scene could be analyzed through ray tracing, and this is an advantage for segmentation. However, the volume of 4D data and the redundancy of light field make light field segmentation extremely difficult [16]. As a result, segmentation approaches built on traditional 2D image are not suitable for 4D light field.
There are a few works about light field editing. In [17], the authors try to propagate the input edits in the full light field. They propose a novel multidimensional downsampling and upsampling techniques which are used to propagate the input edits. Although light field could be edited effectively to some extent, the results are greatly influenced by clustering. It is not good enough to deal with complex scenes. Jarabo et al. [16] systematically analyze the ways of editing light field and construct corresponding interfaces, tools and workflows. This work gives detailed answers about how to edit light fields from a user perspective. Most of these methods rely on user inputs indicating the regions of interest to improve segmentation performance. In this paper, we need interactive user inputs to accurately segment complex scenes.
In recent years, several approaches are proposed to segment light field. Wanner et al. [35]
propose an algorithm for 4D light field segmentation for the first time, which introduces effective disparity cue. The authors use a set of input scribbles on the central view to train a random forest classifier based on disparity and appearance cues. Then, the global consistency of segmentation is optimized by using all views simultaneously. However, this method can only get the segmentation result of the central view. Moreover, it is time consuming. Xu et al.
[36] use the consistency and distortion properties of light field to segment transparent objects from background. However, the method is restricted to a certain type of background and the degree of reflection. Mihara et al. [23] define two neighboring ray types (spatial and angular) in light field data and design a learningbased likelihood function. Then they build a graph in 4D space and use graphcut algorithm to get an optimized segmentation result. Since the complexity of their method is very high, only viewpoints from light field are used to reduce the data size in their experiment. For faster segmentation of light field, Hog et al. [14] use a raybased graph structure that exploits the redundancy in ray space to reduce the graph size, decreasing the running time of MRFbased optimization tasks. However, the proposed freerays and ray bundles are greatly influenced by depth measurement. Moreover, the algorithm needs depth maps for all views, which are difficult to obtain.Superpixels are small regions consisting of a series of pixels similar to each other in position, appearance, brightness, etc. Most of these regions retain efficient information for further image segmentation. Generally, superpixel do not destroy the boundary areas of an object. Many excellent algorithms [1, 29] have been proposed for traditional 2D image. Zhu et al. [40] introduce light field superpixel and LFSP selfsimilarity for the first time. Different from traditional superpixel, LFSP is a 4D structure and keeps the redundancy in angular dimensions, which is the key characteristic of LFSP. Hog et al. [22] propose superrays for efficient light field processing which is similar to LFSP to some extent.
In contrast to previous approaches, we build a novel graph structure based on LFSP and treat LFSP as a unit of processing. Our algorithm can segment 4D light field as a whole and improve the accuracy and efficiency of light field segmentation. In the next sections, we will describe the graph model and the energy function respectively to cope with the above mentioned problems.
3 Segmentation Based on LFSP
In this section, we first introduce the representation of light field and the definition of LFSP. Then a graph structure based on LFSP is built. Finally, we formulate light field segmentation problem as an energy minimization problem.
The pipeline of the proposed algorithm is shown in Figure 2. The input contains a 4D light field and user scribbles in the central viewpoint (arbitrary viewpoint). Then, the disparity map for central view image is estimated, which benefits the LFSP segmentation. Next, the LFSP is calculated by Zhu et al. [40]. At the same time, the feature of each LFSP is initialized. Based on scribbles, disparity and LFSP, the initial segmentation result of light filed is obtained. Finally, a 4D graph structure is built and the optimized light field segmentation is achieved using graphcuts. We use overlap and EPI images to show the optimized segmentation results.
3.1 Representation of LFSP
For describing physical world more objectively and realistically, Adelson and Bergen [2] propose a plenoptic function. Light field is the result of reducing the dimension of plenoptic function, using to record a light ray. A light ray could be described by twoparallelplane (TPP) model [19]. Supposing plane is the view plane and plane is the image plane. A ray emitted from a scene can be uniquely determined by . Another way of representing light field is the multiview representation, which divides 4D light filed into different views and corresponding images . For a better understanding, we will show LFSP and graph structure in a multiview way.
Superpixel is a small patch composed of a series of pixels with similar appearance, brightness, texture in the 2D image, losing the structure information of scene. However, LFSP is composed of light rays emitted from a small region of a real object with similar characteristics. Light rays can be influenced by the structure of scene. That is to say, LFSP has a physical meaning. Supposing there is a light field recorded as , and R is a region of 3D space with similar features. Mathematically, the LFSP could be defined as,
(1) 
where denotes the number of elements at region and is a bundle of rays which are emitted from the point in region [40].
Obviously, LFSP is 4dimensional and consists of angular dimensions and spatial dimensions . In each view , there is a patch corresponding to a 3D object. Because LFSP represents a region of a 3D object, the appearance of patches in different viewpoints are similar, which is called selfsimilarity of LFSP. It is remarkable that the redundancy of light field in angular dimension is guaranteed by LFSP. By fixing two dimensions to we can obtain EPI image of light field. LFSP is a combination of similar EPI lines. EPI lines with similar gradient and appearance make up LFSP.
3.2 Graph structure based on LFSP
Different from traditional 2D image, for each pixel in light field, there are two kinds of neighborhoods, the spatial and the angular, which can be formulated as,
(2) 
where and represent spatial and angular neighbors respectively, represents the disparity. Since the shape of LFSP is irregular in most cases, the adjacent relationship of LFSP needs to be redefined. Let denotes the size of LFSP, and denote the 2D slices of LFSP and on view respectively. Then the spatial relationship of two LFSPs could be defined as,
(3) 
where is the Euclidean distance between the center of and . For angular relationship of LFSP, there are two kinds of angular neighbors, direct and indirect. Slices of one LFSP at different viewpoints are the direct angular neighbors of the LFSP. Due to the selfsimilarity of LFSP, the direct angular neighbors are similar in shape, color and position. Supposing and are spatial neighbors in viewpoint , and and are direct angular neighbors, then indirect angle adjacent can be defined as,
(4) 
In most cases, if LFSP and are spatial neighbors in viewpoint , they are very likely to be spatial neighbors in viewpoint . Considering the special cases that there are new spatial neighbors in viewpoint , we introduce the indirect angle adjacent to assess this condition.
After defining the neighboring relationship between LFSPs, we build the graph of 4D light field based on LFSP. The vertex is composed by nonlabeled LFSPs and labeled LFSPs . is terminal of graph and is initialized by propagating user inputs to LFSP. The edge is defined by spatial adjacent and angular adjacent,
(5) 
The graph structure in a multiview representation is shown in Figure 3, where each picture is a partial magnification of LFSP slices. In order to facilitate the representation, we choose views instead of all views. In central view image, a LFSP is represented by a red patch and its spatial neighbours are marked in yellow. There are red patches in other views, which are the direct angular neighbours. In addition, there are orange LFSPs next to the red LFSP, defined as indirect angular neighbours of . The right side are the labeled LFSP , in a gradually darken cyan color. The pink lines connect and , representing the similarity between them. The blue dotted lines represent the influence of spatial neighbors. The purple dotted lines connect and , representing the influence of indirect angular neighbors.
4 Energy Function
After building the graph structure, the 4D light field segmentation is treated as an energy minimization problem. Supposing
is a label vector that assigns label
to LFSP , the energy function is defined as follows,(6)  
where , and measure the color, position and disparity distance between the nonlabeled LFSP and the labeled LFSP . and measure the connectivity between the adjacent LFSPs in spatial and angular space respectively. We give the detailed definition of energy function terms in the following section.
4.1 Data term
As an interactive segmentation, a user needs to draw different scribbles over the interested objects on the reference view of light field. In this paper, we choose central view to draw scribbles for ease of calculation, and other view can be chose by propagating disparity from central view to corresponding view. Then we propagate user inputs to LFSP, getting the labeled LFSP . As seeds, is utilized to calculate the similarity with nonlabeled LFSP , using color, disparity and position cues.
Color, disparity and position represent different physical characteristics of objects. So it is important to integrate these cues for light field segmentation. There are some effective classifiers such as SVMs [13] or Random forests [11]. When the users are interested in a particular object or want to edit light field, they will frequently change the input scribbles. In that case, we calculate the similarity of color, disparity and position individually and use weights to balance influence of different cues. Experimental results show that our simple classifier can work effectively.
The color, disparity and position energy term are defined as,
(7)  
where , and denote color, position and disparity information of LFSP respectively. combines influence of different channels. is a normalization function changing the value to . In our experiments, LFSPs are the basic unit of processing and need to be initialized. The color and disparity of a LFSP are the mean value of its components, and be defined as,
(8)  
where represents the channels of color space (here we use color space). is the pixel belonging to in view.
4.2 Smooth term
When building graph structure, we have already mentioned the two adjacent relationship of light field. Spatial neighbors and angle neighbors act on smooth term, which has a great influence on the optimization result. Specifically, spatial neighbors encourage segmentation to be regular in the 2D slice of light field, while angle neighbors ensure consistency between the result in different view.
The definition of smooth term for spatial neighbors is as follows,
(9) 
where is a penalty factor of two LFSPs.
(10) 
When the labels are not the same, a penalty is added.
is the LFSP similarity between and , which is evaluated by,
(11)  
where and
are variance of color and disparity,
is used to balance the influence of two cues.Position cue is not used because adjacent relationship is determined by position information. The smooth term from all angle neighbors is defined as,(12) 
where is used to determine whether is new angle neighbor of . If is not a new angle neighbor, it will have a repetitive effect on the smoothing processing, which is not allowed. Concretely,
(13) 
The expansion algorithm [7] is used to solve Eqn.(6). The complete process of our algorithm is shown in Algo.1. 4D light field and user scribbles are the input of our algorithm. First, disparity map in central view is calculated by [39]. Then LFSP is obtained by algorithm [40]. And then LFSP information is initialized, getting number, color disparity of each LFSP. At the same time, user inputs are propagated to LFSP and some LFSP are labelled as seeds in the 4D graph. Next, the penalization of giving nonlabeled a label is calculated by , meanwhile the influence of neighbors and , is calculated by . We get total energy and use a graph cut algorithm to minimize the energy function. Finally, we obtain the optimized segmentation result.
5 Experiments
In this section, we compare our segmentation algorithm with three stateofart light field segmentation algorithms, including GCMLA (globally consistent multilabel assignment) [35], SACS (spatial and angular consistence segmentation) [23] and RBGSS (raybased graph structure segmentation) [14]. Synthetic data and real data are used to demonstrate the performance of our algorithm. For synthetic data, we use the HCI dataset proposed in [34], which contains 4 light fields with known depth, ground truth labeling and user input scribbles. Because there are few light field datasets with ground truth segmentation results, except the HCI benchmark. For real data, two popular light field camera Lytro and Illum are used to capture light field of scenes, and light field are decoded by LFToolbox [8] and Lytro Power Tools respectively. We make comparative experiments on segmentation accuracy and running time. In addition, we also verify the validity of smoothing term. Finally, we analyze the limiting cases of our algorithm. The code of GCMLA is implemented in coming from author’s website, and the result of SACS and RBGSS are obtained from their paper respectively.
Parameter setting is important to the algorithm, and then we introduce the important parameter settings and their significance. In data term, and are used to balance the influence of color, position and disparity. Because , and are normalized, it is reasonable to assign similar value to and . Since the real data and its disparity are relatively noisy, is set with a smaller value to provide some robustness. A larger leads to a more regular result. and control the smoothing term to avoid oversmoothing. It is not recommended to assign a large value to and because initial results are good enough. During the experiment, and , because spatial neighbor plays a dominate role and indirect angular neighbor are relatively rare.
5.1 Synthetic data
Figure 4 shows the segmentation results of HCI dataset obtained by stateofart light field segmentation algorithm. The first row displays the slices in central view respectively, and there are user’s scribbles on the central view image. The second row displays LFSP slices in the central view. The third row is the segmentation ground truth in the central view provided by HCI dataset. The next few rows are the segmentation results of four algorithm (RBGSS, GCMLA, SACS and our algorithm) respectively. The last column is the EPI pictures of our segmentation results. The EPI picture shows that our algorithm could segment 4D light field and the segmentation result in different view is coherent. While several algorithms have similar performance, the results of our method are more accurate in the boundary and detailed areas. The quantitative analysis is provided in Table. 1 where the accuracy of each scene and the average accuracy are calculated. The ground truth and estimated depth are used to compare different methods. In order to enhance the contrast, we stress the best results with Bold Fonts. In most cases, our method outperforms all the other evaluated techniques. It is worth noting that our method achieves the highest average accuracy rate.
Algorithm  GCMLA  RBGSS  Ours  w/o smooth  GCMLA  SACS  Ours 

Depth  GT  GT  GT  GT  EST  EST  EST 
Papillon  99.4  99.5  99.5  99.4  98.9  98.3  99.4 
Buddha  99.1  99.1  99.2  96.3  98.8  96.4  99.0 
Stilllife  99.2  99.2  99.1  98.2  98.9  97.7  98.9 
Horses  99.1  99.1  99.3  99.1  98.3  95.9  98.9 
Average  99.2  99.2  99.3  98.3  98.7  97.1  99.0 
In addition to high accuracy, the significant advantage of our approach is computational efficiency. GCMLA only gets the segmentation result in the central view and the running time is long. SACS uses views of views to reduce the data size, but the algorithm complexity is high. By using raybundle and free rays, RBGSS reduces running time to some extent. They perform the optimisation in 4 to 6s on an Intel Xeon E5640. Our method greatly reduces data size and achieve a great promotion in computational efficiency. The preprocessing (depth estimation and light field superpixel segmentation) of our method takes around 70s. While the average segmentation time is about 1.4s which is close to the requirements of realtime processing. Furthermore, the preprocessing and segmentation time could be achieved by GPU acceleration. Noting that, our algorithm is evaluated on a desktop computer with a 3.6 GHz i7 CPU.
Supposing there is a light field with views and the size of LFSP is , theoretically our method can simplify the data size by times. Specifically, taking Buddha data as an example, [14] can reduce data size from to , while our method can reduce data size from to . [35, 23] can not reduce data size in their algorithms so that they are timeconsuming.
Furthermore, we evaluate the effectiveness of two smooth constraints, spatial adjacent and angle adjacent. Portion of the results are shown in Figure 5. The right picture is optimized result with smooth constraints and the left one is initial segmentation result without smooth constraints. The initial segmentation result is noisy and suffers from a high error rate. The optimized result is more accurate for both central view image and 4D light field, proving the validity of two smooth constraints.
5.2 Real data
We use multiple sets of real light field data to verify the effectiveness of our algorithm. For each set of real data, we show its LFSP in central view image, user input, disparity map, segmentation result in central view and EPI result respectively. The disparity map for Illum can be obtained from the camera. For Lytro the disparity map should be estimated by other’s work (Here we use the algorithm proposed by [39] ). The EPI results show accuracy and consistency of segmentation result across all views.
Figure 6 shows the segmentation results of real data captured by Illum. Light field data from ‘Cherry’ to ‘Hide’ are provided by [25]. ‘People’ and ‘Road’ are captured by our Illum camera. Figure 8 shows segmentation results of real data captured by Lytro. Due to the low quality of light field data, the disparity map is poor and noisy, especially for Lytro data. So the segmentation algorithm ought to be robust against noise and errors. RBGSS is sensitive to depth map quality because raybundles and free rays are obtained by depth information. However, our method is robust to depth map quality, as shown in the segmentation and EPI results. Figure 9 shows the comparison of real data segmentation of different algorithms. When segmenting real data of light field, we reduce the value of disparity weight to decrease the influence of depth map errors. So we can see that the segmentation results are fine.
In addition, the proposed method is a complete 4D light field segmentation. As shown in Figure 1 and Figure 6, the EPI lines of raw data and segmentation result are consistent, which demonstrates the validity of the proposed algorithm in different views. The coherent segmentation across all views is available, which is important for light field editing. Moreover, the segmentation result can be used to change the color of specific region, remove occluding object from a scene and so on.
5.3 Limitations
There are some limitations of our algorithm. The segmentation results are affected by LFSP quality to some extent. For example, when some tiny objects are similar to background, or a nonLambertian object reflects the ray of adjacent objects, in such cases, our method is likely to have a poor performance. Figure 7 shows two limited situations. In figure (a), some tiny objects (feet and wings of bees) have similar or the same texture characteristics as the background. It is difficult to distinguish them from the background. In figure (b), a smooth glass column reflects the color of Buddha, making segmentation boundary confused. These two limitations are tough problems of image segmentation which are not solved well so far.
6 Conclusions
In this paper, we propose to utilize LFSP to interactively segment light field. The characteristics of LFSP in spatial and angular domains are helpful to improve segmentation accuracy and efficiency. We propose a novel 4D graph structure based on LFSP and define the spatial and the angular neighbors accordingly. Then a data term and smooth one of energy function are defined according to graph structure. After that, a graph cut algorithm is used to optimize segmentation result. Experiments on synthetic data show that the proposed method not only has high accuracy, but also has high computational efficiency compared to stateoftheart algorithms. Moreover, we apply our method to real light fields, which shows the effectiveness and robustness of our algorithm. In the future, we will adapt the proposed method for light field video segmentation.
Acknowledgement
We thank H. Mihara and T. Karin for their helps on real scene light field segmentation.
References
 [1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk. Slic superpixels compared to stateoftheart superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11):2274–2282, 2012.
 [2] E. H. Adelson and J. R. Bergen. The plenoptic function and the elements of early vision. Vision and Modeling Group, Media Laboratory, Massachusetts Institute of Technology, 1991.
 [3] A. Albiol, L. Torres, and E. J. Delp. An unsupervised color image segmentation algorithm for face detection applications. In Image Processing, 2001. Proceedings. 2001 International Conference on, volume 2, pages 681–684. IEEE, 2001.
 [4] T. E. Bishop and P. Favaro. The light field camera: Extended depth of field, aliasing, and superresolution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(5):972–986, 2012.
 [5] Y. Boykov and V. Kolmogorov. An experimental comparison of mincut/maxflow algorithms for energy minimization in vision. IEEE transactions on pattern analysis and machine intelligence, 26(9):1124–1137, 2004.
 [6] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. IEEE Transactions on pattern analysis and machine intelligence, 23(11):1222–1239, 2001.
 [7] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. The Proceedings of the Seventh IEEE International Conference on Computer Vision, pages 377–384 vol.1, 2002.

[8]
D. G. Dansereau, O. Pizarro, and S. B. Williams.
Decoding, calibration and rectification for lenseletbased plenoptic
cameras.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pages 1027–1034, 2013. 
[9]
G. Dong and M. Xie.
Color clustering and learning for image segmentation based on neural networks.
IEEE Transactions on Neural Networks, 16(4):925, 2005.  [10] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient graphbased image segmentation. International Journal of Computer Vision, 59(2):167–181, 2004.
 [11] P. O. Gislason, J. A. Benediktsson, and J. R. Sveinsson. Random forests for land cover classification. Pattern Recognition Letters, 27(4):294–300, 2006.
 [12] L. Grady and G. FunkaLea. Multilabel image segmentation for medical applications based on graphtheoretic electrical potentials. In ECCV Workshops CVAMIA and MMBIA, volume 3117, pages 230–245. Springer, 2004.
 [13] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf. Support vector machines. IEEE Intelligent Systems and their applications, 13(4):18–28, 1998.
 [14] M. Hog, N. Sabater, and C. Guillemot. Light field segmentation using a raybased graph structure. In European Conference on Computer Vision, pages 35–50. Springer, 2016.
 [15] Illum. Lytro illum a new way to capture your stories. http://www.illum.lytro.com, 2014.
 [16] A. Jarabo, B. Masia, A. Bousseau, F. Pellacini, and D. Gutierrez. How do people edit light fields? ACM Trans. Graph., 33(4):146–1, 2014.
 [17] A. Jarabo, B. Masia, and D. Gutierrez. Efficient propagation of light field edits. Proceedings of the SIACG, 2011.
 [18] V. Kolmogorov and R. Zabin. What energy functions can be minimized via graph cuts? IEEE transactions on pattern analysis and machine intelligence, 26(2):147–159, 2004.
 [19] M. Levoy and P. Hanrahan. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 31–42. ACM, 1996.
 [20] W.N. Lie. An efficient thresholdevaluation algorithm for image segmentation based on spatial graylevel cooccurrences. Signal Processing, 33(1):121–126, 1993.
 [21] Lytro. Lytro redefines photography with light field cameras. http://www.lytro.com, 2011.
 [22] C. G. Matthieu Hog, Neus Sabater. Superrays for efficient light field processing. IEEE Journal of Selected Topics in Signal Processing, PP(99):1–1, 2017.
 [23] H. Mihara, T. Funatomi, K. Tanaka, H. Kubo, Y. Mukaigawa, and H. Nagahara. 4d light field segmentation with spatial and angular consistencies. In Computational Photography (ICCP), 2016 IEEE International Conference on, pages 1–8. IEEE, 2016.
 [24] R. Ng. Fourier slice photography. In ACM Transactions on Graphics (TOG), volume 24, pages 735–744. ACM, 2005.
 [25] A. S. Raj, M. Lowney, and R. Shah. Lightfield database creation and depth estimation. 2016.
 [26] P. Salembier and F. Marqués. Regionbased representations of image and video: segmentation tools for multimedia services. IEEE Transactions on circuits and systems for video technology, 9(8):1147–1169, 1999.
 [27] J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost: Joint appearance, shape and context modeling for multiclass object recognition and segmentation. In European conference on computer vision, pages 1–15. Springer, 2006.
 [28] P. P. Srinivasan, M. W. Tao, R. Ng, and R. Ramamoorthi. Oriented lightfield windows for scene flow. In Proceedings of the IEEE International Conference on Computer Vision, pages 3496–3504, 2015.
 [29] O. Veksler, Y. Boykov, and P. Mehrani. Superpixels and supervoxels in an energy optimization framework. Computer Vision–ECCV 2010, pages 211–224, 2010.
 [30] T.C. Wang, A. A. Efros, and R. Ramamoorthi. Occlusionaware depth estimation using lightfield cameras. In Proceedings of the IEEE International Conference on Computer Vision, pages 3487–3495, 2015.
 [31] T.C. Wang, J.Y. Zhu, E. Hiroaki, M. Chandraker, A. A. Efros, and R. Ramamoorthi. A 4d lightfield dataset and cnn architectures for material recognition. In European Conference on Computer Vision, pages 121–138. Springer, 2016.
 [32] S. Wanner and B. Goldluecke. Globally consistent depth labeling of 4d light fields. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 41–48. IEEE, 2012.
 [33] S. Wanner and B. Goldluecke. Variational light field analysis for disparity estimation and superresolution. IEEE transactions on pattern analysis and machine intelligence, 36(3):606–619, 2014.
 [34] S. Wanner, S. Meister, and B. Goldluecke. Datasets and benchmarks for densely sampled 4d light fields. In VMV, pages 225–226, 2013.
 [35] S. Wanner, C. Straehle, and B. Goldluecke. Globally consistent multilabel assignment on the ray space of 4d light fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1011–1018, 2013.
 [36] Y. Xu, H. Nagahara, A. Shimada, and R.i. Taniguchi. Transcut: transparent object segmentation from a lightfield image. In Proceedings of the IEEE International Conference on Computer Vision, pages 3442–3450, 2015.
 [37] J. Xue, H. Zhang, K. Dana, and K. Nishino. Differential angular imaging for material recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 6940–6949, 2017.
 [38] H. Zhang, K. Dana, and K. Nishino. Friction from reflectance: Deep reflectance codes for predicting physical surface properties from oneshot infield reflectance. In European Conference on Computer Vision, pages 808–824. Springer, 2016.
 [39] H. Zhu, Q. Wang, and J. Yu. Occlusionmodel guided antiocclusion depth estimation in light field. IEEE Journal of Selected Topics in Signal Processing, 2017.
 [40] H. Zhu, Q. Zhang, and Q. Wang. 4d light field superpixel and segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6709–6717. IEEE, 2017.