Separate from Observation: Unsupervised Single Image Layer Separation

06/03/2019
by   Yunfei Liu, et al.
Beihang University
0

Unsupervised single image layer separation aims at extracting two layers from an input image where these layers follow different distributions. This problem arises most notably in reflection inference removal and intrinsic image decomposition. Since there exist an infinite set of combinations that can construct the given input image, one could infer nothing about the solutions without additional assumptions. To address the problem, we make the shared information consistency assumption and separated layer independence assumption to constrain the solutions. In this end, we propose an unsupervised single image separation framework based on cycle GANs and self-supervised learning. The proposed framework is applied for the reflection removal and intrinsic image problems. Numerical and visual results show that the proposed method achieves the state-of-the-art performance among unsupervised methods which require single image as input. Based on the slightly modified version of the presented framework, we also demonstrate the promising results of decomposing an image into three layer.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 8

12/02/2018

"Double-DIP": Unsupervised Image Decomposition via Coupled Deep-Image-Priors

Many seemingly unrelated computer vision tasks can be viewed as a specia...
08/11/2017

A Generic Deep Architecture for Single Image Reflection Removal and Image Smoothing

This paper proposes a deep neural network structure that exploits edge i...
09/01/2020

Unsupervised Single-Image Reflection Separation Using Perceptual Deep Image Priors

Reflections often degrade the quality of the image by obstructing the ba...
10/20/2021

Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation

Single image reflection separation (SIRS), as a representative blind sou...
06/30/2020

You Only Look Yourself: Unsupervised and Untrained Single Image Dehazing Neural Network

In this paper, we study two challenging and less-touched problems in sin...
10/22/2020

Self-Supervised Shadow Removal

Shadow removal is an important computer vision task aiming at the detect...
09/07/2020

User-assisted Video Reflection Removal

Reflections in videos are obstructions that often occur when videos are ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Images of indoor/outdoor scenes are usually mixed by different meaningful information. Scenes contaminated with reflection, objects with different shadings are such phenomena, which due to illumination affected through different medium or materials in the environment. Specifically, the irradiance received by the camera from the scene point is blended with different cues along with the line of sight. Different factors make the image look real, but they make it more difficult for computer to understand the image Bi et al. (2015); Yu and Koltun (2016); Long et al. (2015). Even worse, some factors make images degraded, decrease the visibility of scenes Li and Brown (2014); He et al. (2011).

The separation of images into multiple layers is desired in both computational photography and various vision tasks like surface re-texturing, 3D object compositing Bi et al. (2015), 3D point cloud processing Jae-Seong Yun (2018). In this regard, the single image separation intents to extract two independent layers from an image, in which the input image can be constructed as the pixel-wise addition of an image and another image , i.e.,

(1)

Many traditional image separation problems can be formulated as Eqn. 1. For instance, reflection interference often arises when a photo of a scene is taken behind a glass window. This is a typical image separation problem and can be expressed as a linear combination of a reflection layer and the background scene , as . The intrinsic image model assumes an input image is the pixel-wise product of an albedo (or reflectance) image and a shading image . This can be reformulated into the form in Eqn. 1 by taking the , i.e., .

While obviously useful, estimating such layers is fundamentally ill-posed as there exist infinitely many feasible solutions to Eqn.

1. To constrain the space of feasible solutions, lots of prior information is instantiated through carefully tailored image filters or energy terms  Li and Brown (2014); Shih et al. (2015). For example, Li et al. Li and Brown (2014)

assume that one output layer is more smooth than the other layer. Based on this assumption, they proposed relative smoothness prior to separate image into two. However, when the scene goes complicated, such hand-crafted prior is no longer enough to describe the difference between these two output layers. On the other hand, given access to ground truth aligned dataset, deep convolutional neural networks (CNNs) provide a data-driven candidate for solving the ill-posed inverse problem with fewer potentially heuristic or hand-crafted assumptions. However, existing databases are limited to single image separation problem in various aspects: (1) It’s hard to transfer information among different datasets for the same task. Existing synthetic datasets vary from each other because there are different application scenarios 

Fan et al. (2017); Wan (2019); Zhang et al. (2018); Fan et al. (2018). (2) Ground truth data of real images is extremely hard to acquire for training a general CNNs model Fan et al. (2018)

. Consequently, each existing dataset is limited in different ways, and thus far, supervised deep network models built using them likewise display a high degree of dataset-tailed architectural variance.

To this end, we proposed an USIS (Unpaired Single Image Separation) method which takes three image domains which without ground-truth data for training. Based on the difference of distributions in different image sets and cycle consistency, the USIS learns the relationship among the different domains via generative adversarial manner. The USIS separates the single input image into two meaningful components which are independent to each other, after learning the feature statics distribution from the given output image sets.

Experimental results show that our proposed framework could separate the input image into two desired images which following the distribution of output image distributes properly. The proposed architecture can be adopt to single image reflection removal, intrinsic image decomposition tasks without accessing ground-truth. We also extend the proposed method with slightly modification on a more challenging, single image three layer separation task. Results demonstrate that the proposed USIS can handle such problem properly.

2 Previous Work

Unsupervised domain translation methods receive two sets of samples and learn a function that maps between a sample of one domain and the analogous sample of the other domain Chintala (2016); Yi (2017); Zhu (2017); Liu (2017). However, due to the relationships among three sets of images fundamentally based on the physical model as shown in Sec. 1, previous unpaired image to image methods cannot adopt into image separate task directly.

There are various problems on image separation in computer vision area, different prior and physical models are applied in different sub-problems. For instance, relative smoothness 

Li and Brown (2014), ghost cues Shih et al. (2015) and layer independence prior are introduced for reflection removal problem. Although such hypothesis work in many cases, but these are all low-level prior, which is constructed with image gradient or color changing, such priors are not adaptable with complex scenes.

Recently, many deep learning based methods are carefully tailored for different datasets which contain ground-truth. In single image reflection removal task, full convolution neural networks with different guidance (like image gradient information 

Fan et al. (2017), the face structure prior Wan (2019)) branches or losses (like perceptual losses Zhang et al. (2018)) designed. For intrinsic image decomposition, various U-net Ronneberger (2015) like, encoder-decoder with skip connections are proposed to tackle the decomposition.

Due to it’s hard to collect real image database with ground truth label, unsupervised learning for single image separation is appealing because of inaccessibility of ground truth. However, the single image separation is even tough when the training image with unknown ground-truth. Michael

et al. Janner (2017) proposed self-supervised intrinsic image decomposition by training on the few images with ground-truth data, then transfer the model to other unpaired images. However, their method require the training images to share the same reflectance layer in the same group. Ma et al. Torralba (2018) and Li et al. Zhengqi Li (2018) proposed unsupervised intrinsic image decomposition methods, but these methods need multiple input with same reflectance layer to train the model.

3 Unsupervised Single Image Layer Separation

Figure 1: (a) The shared information consistency assumption and layer independence assumption. Here , are two encoding functions for sample and , mapping images to latent codes. We assume a set of corresponding images in three domains , all the information in (or ) can be found in (green arrow). The separated samples are independent to each other in latent space, i.e.

, the joint probability density between latent distribution

and is zero. (b) The proposed USIS framework. and are disentangled codes of in latent space, and . and are two generation functions, mapping latent codes into corresponding images in domain and . is image blend function based on Eqn. 1. We represent , , and using CNNs. We implement the shared information consistency assumption using cycle consistency constraint, and implement layer independence assumption via self supervision learning. The denotes adversarial discriminators for the respective domain, in charge of evaluating whether the separated images are realistic or not.

3.1 Problem Formulation

Let and

be three image domains. In supervised image-to-image translation, we are given samples

drawn from a joint distribution

. In unsupervised single image separation, we are given samples drawn from the marginal distributions , and . Besides, based on the Eqn. 1, . However, as explained in Sec. 1, solving the problem is highly ill-posed. We can infer nothing about the joint distribution of the marginal samples without additional assumptions.

Assumption 1. Shared information consistency. is blended by and , and (or ) share the same latent space, i.e. and , where is a function for mapping image from color space to latent space, is the corresponding latent space.

Computationally, to implement the information consistency assumption, the original image separation pipeline can be rewritten as:

(2)

where is the feature sample in latent space which is analogous to , , . and are both mapping functions which are used to project the feature sample back to color space for reconstructing the analogous separated images and .

Assumption 2. Layer independence. The separated image and are independent to each other in latent space, i.e., samples from the same domain share more similar features in latent space than that between different domains.

Layer independence assumption can be implemented by minimizing distance between any two samples from the same domain in latent space, maximizing the distance between any two samples from different domain in latent space. Computationally, for any two samples , from latent space , any other two samples , from latent space , we have:

(3)

where is distance between samples from different latent space, is distance between two samples of the same latent space.

3.2 Unsupervised Single Image Separation Learning

Self Supervising (SS). Based on shared information consistence assumption, contains all information constructing and . Furthermore, features of in latent space still contains all information of features of and . As shown in Fig.1 (b), an ideal encoder can encode from RGB space to the analogous feature code latent space . which contains all information of . Add on layer independence assumption, contains all information of without information of . i.e., in this way, we can disentangle and from through two encoders and , into two independent analogous feature of and in latent space and .

Following this idea, for , we minimize the L1 distance between and , where and are unpaired samples from and . For , we minimize the L1 distance between and , where and are unpaired samples from and . To make features of and more distinguishable, we also maximize the distance between features yielding from and .

GANs. Note that the self-supervising constraint and same information consistency assumption do not guarantee that output corresponding images in two domains have different latent code are feasible to look like real image in domain and . Hence, we adopt generative adversarial framework to make the output and look as real as samples from the corresponding domain and in perceptual manner.

There two parts of sub-networks in generative adversarial networks. The generator aims to solve the problem of single image separation, this part aims to get the mapping function , the encoder-decoder based mapping technique Kingma and Welling (2014), two encoders and two decoders are learned. We denote the encoder for mapping image color space to latent space, , and the decoders of different output layers , . Specifically, we write as the generator of our framework. Whereas we use two discriminators and to discriminate whether the separated images belong to the domain respectively, e.g., for real images from domain , should output true, while from the generator , it should output false.

Cycle Consistency (CC). Since the shared information consistency assumption and Eqn. 1 imply the cycle-consistency constraint, we also enforce the cycle-consistency constraint in the proposed framework to further regularizes the ill-posed unsupervised single image separation problem. Specifically, we get and through mapping function , then , should consistent to . Furthermore, and are generated by mapping function again by given , () should consistent to ().

Learning. We jointly solve the learning problems of self-supervising, GANs and cycle consistency for the image separation streams, the image reconstruction streams and the cycle-reconstruction streams:

(4)

Self-supervised training aims to split sample from in the latent space. Specifically, split the sample into latent code for the analogous sample in and another latent code for the analogous sample in . Based on Eqn. 3, the self-supervising object is:

(5)

where the hyper-parameters , and control the weights of different objective terms. We adopt in practice, i.e., we compute the L1 distance of two inputs . is the distance between the latent codes from and

, we use a modified sigmoid function as distance function:

(6)

where controls the shape of distance curve, experimental results are provided in the top of Fig. 2 (b).

In Eqn. 4, the GAN objective functions are given by

(7)

The objective functions in Eqn. 7 are conditional GAN objective functions. They are used to ensure the separated images resembling images in the target domains, respectively. The hyper-parameter controls the impact of the GAN objective functions.

We use the L1 difference function to model the cycle-consistency constraint, which is given by

(8)

where and . The hyper-parameters , and control the weights of these three different objective terms.

Inheriting from GAN, training of the proposed framework results in solving a min-max problem where the optimization aims to find a saddle point. To make a training process stable, we apply gradient update scheme similar to the one described in Bengio (2014) and gradient penalization to solve Eqn. 4. Specifically, we first apply a gradient descent step to update , , , with and fixed. We then apply a gradient ascent step to update and with , , , fixed.

4 Network Architecture

Our model USIS follows the pipeline of generative adversarial network introduced by Bengio (2014) which consists of one generator and two discriminators. The generator aims to separate the input sample into the analogous and . Discriminators share the same architecture with different parameters, discriminate whether the sample (or ) is real or fake, i.e., discriminate whether the sample belongs to the distribution of real images.

Generator. As illustrated in Fig. 1 (b), there are two types of generators in the USIS.

The first generator is called separate generator which is constructed by two convolutional encoder-decoder networks {} with skip connections. These two generators share the same structure but different parameters. According to the difference of experiment task, we designed two types of generators for different tasks:

Toy problem task. See Fig. 2 (a) for for the task description. Both networks employ mirror-link connections introduced by Ronneberger (2015)

, which connect layers of the encoder and decoder of the same size. These connections yield sharper results than the blurred outputs characteristic of many deconvolutional models. The encoder has 5 convolutional layers with {16, 32, 64, 128, 256} filters of size 4×4 and stride of 2. Batch normalization 

Ioffe and Szegedy (2015)

and leaky ReLU activation are applied after every convolutional layer. The layers in these two decoders have the same number of features as the encoder but in reverse order plus a final layer with 3 channels.

Reflection removal & Intrinsic image decomposition task. We adopt pretrained VGG-19 Simonyan and Zisserman (2015). Selecting ‘conv1_2’, ‘conv2_2’, and ‘conv3_2’ as skip connected features, shown to be successful for image synthesis and enhancement Zhang et al. (2018). The first 4 blocks in decoder are cascade convolution layer and upsample operations to fuse featues from encoder. The next contextual block is a fully convolutional network with 64 filters of size 3×3, stride of 1 and dilation rate of {2, 4, 8, 16, 32, 1}, followed an output layer, which is a convolution layer with 3 filters of size 1×1. Instance normalization Dmitry Ulyanov and Lempitsky (2016) and leaky ReLU activation are applied after every convolutional layer.

The other generator is called the combine generator, with no no learnable parameters, is illustrated by . Combine generator aims to combine the predicted analogous and together based on the Eqn 1.

Discriminator. We adopt multi-scale discriminator as UINT Liu (2017) to distinguish the real and fake images. For each discriminator network, which is constructed by multi-branches of sub-networks, distinguishes the real and fake images in different scales. For toy problem task, the number of branches is 1, otherwise, we set it to 3. Specifically, each sub-branch has 4 convolutional layers with {32, 64, 128, 32} filters of size 4×4 and stride of 2. Instance normalization Dmitry Ulyanov and Lempitsky (2016) and LeakyReLU activation are applied after every convolutional layer. For -th branch, image is down sampled by times via average pooling operation as input. Finally, the features yielded from different branches are fused together and followed with sigmoid activation.

5 Experiments

We first analyze different components of the proposed framework based on a toy problem. We then present visual and numerical results on real image separation tasks. Finally, we extend our framework to a more challenging single image separation task.

5.1 Performance Analysis

Method
w/o CC 71.71 71.52
w/o SS 60.39 54.84
Proposed 28.93 27.35
(a) (b) (c)
Figure 2: (a) Toy problem data samples. Top row: input sample (blended by circle and square), middle row: (only square), bottom row (only circle). We separate the blended sample into and , and measure the MSE scores achieved by different configurations of the proposed framework. (b) Top: Curve of how different effect the distance loss , distance denotes , are defined in Eqn. 6. Bottom: On evaluation set, separated layers’ MSE versus different internal parameters in self-supervision loss. (c) Impact of self-supervision and cycle consistency constraints on image separation accuracy.

We used ADAM Kingma and Ba (2015) for training where the learning rate was set to 0.0001 and momentums were set to 0.0 and 0.9. Each mini-batch consisted of an image from the domain , an image from the domain and an image from domain . Our framework had several hyper-parameters as shown in Equ. 5-8. The default values were .

We introduce a toy problem (visualized in Fig. 2 (a)), in which: domain contains gray images with solution of 128×128, with squares in different {lightness, position, size} in each image. Shapes in domain are circles and each image in domain are generated via . Based on this toy problem, we generate a dataset which contains 5K sets of images in three domains (blended , rectangle and circle

), which is convenient for quantitative evaluation. For the toy problem, the goal is to separate the blended image into the image containing only rectangle and the residual image which contains only circle. Here we use the unsupervised scheme, where we choose 4K images in each domain randomly without grouping the analogs images for training. We train for 200 epoches and use the final model, specifically, test the separate generator on the test set. We then compare the difference between separated images and the corresponding ground truth images via mean square error (MSE). Note that image pixel values are in [0, 255] in our experiment.

The hyper parameters analysis of self supervision loss and ablation study results are illustrated in bottom of Fig. 2 (b) and Fig. 2 (c).

5.2 Qualitative and quantitative results

Fig. 34 show the result of the proposed framework on two image separation tasks against state-of-the-art unsupervised methods with single input.

Reflection removal. We apply the proposed framework on single image reflection removal task. Most previous reflection removal works use synthesis data to train their CNN model Fan et al. (2017); Zhang et al. (2018) and the published real data with ground truth have too limited size (Zhang et al. Zhang et al. (2018) proposed a small dataset which contains 110 pair of images without ground truth reflection {input, background}). Wan et al. Wan et al. (2017) proposed a real dataset which contains 454 image sets with the corresponding background and reflection.

In this experiment, we train our model on the benchmark Wan et al. (2017) directly. Noted that we use 400 images sets for training and the remain images for evaluation. In the each iteration of training, we choose non-corresponding sample , and from the training set randomly. We train the network to separate reflection contaminated images of size 256×256 by random cropping patches from the image. We show the effectiveness of the encoders and for separating images in the latent space through clustering the features from encoders in Fig. 3(a), then following several separate results of qualitative comparisons. Cycle GAN and UNIT only provide the prediction of by given .

Intrinsic image decomposition. We use the 220 images in the MIT intrinsic dataset Freeman (2011). The dataset is extended by Narihira and Yu (2015). This data contains only 20 different objects, each of which has 11 images. We train the network to decompose intrinsic images of size 256×256. The cluster result, visual comparisons are illuminated in Fig. 4. The numerical comparison results are shown in Table 1.

Figure 3:

(a) Visualization of the training images based on their corresponding PCA vectors (which are extract by Encoders in the proposed USIS). In this figure, t-SNE 

v. d. Maaten and Hinton (2008) is used to aid visualization of the latent space. (input) and (background layer) share the same latent space based on encoder , and (reflection layer) share the same latent space based on encoder . (b) Visual results achieved by different unsupervised methods. Note that Cyc-GAN Zhu (2017) and UINT Liu (2017) can only translate images between pairs.
Figure 4: Visual results of intrinsic image decomposition. (a) Cluster result of features from encoders in USIS. (b) Qualitative comparison against two previous unsupervised methods. Here Cyc-GAN Zhu (2017) and UINT Liu (2017) provide the input and reflectance translation.
Method Cyc-GAN Zhu (2017) UNIT Liu (2017) USIS (proposed)
SSIM MSE SSIM MSE SSIM MSE
Reflection removal 0.622 97.04 0.738 68.15 0.842 51.14
Intrinsic decomposition 0.572 48.95 0.821 37.41 0.893 30.10
Table 1: Numerical comparison against two other unsupervised methods on reflection removal task and intrinsic image decomposition task. We compare MSE and SSIM Zhou et al. (2004) between predicted and ground truth in two tasks.

5.3 Multi-layers separation

In this section we designed another toy but more challenging problem to evaluate the proposed USIS framework: separating one single image into three analogous layers, see Fig. 5 (a) for the visualization of the task. Same as separating an image into two layers, as shown in Fig. 5 (a), the task here aims to separate the image into three layers: square, circle and triangle. We generate 5K image sets and use the same settings as described in paragraph 5.1. We add a generator and a discriminator to USIS to address the problem which split an input image into three. Results in Fig. 5 (b) show that the proposed USIS can still handle such challenging problems.

Figure 5: The extend experiment on single image to three layers separation task. (a) The extend toy problem data samples. The input sample and three different samples from three different domains. (b) Two visual results of proposed USIS. From left to right: input and separated layers.

6 Conclusion

In this paper, we propose an Unsupervised Single Image Separation network for single image separation task. We show that by learning both image consistency and independence of distributions in different layers, USIS can make use of the information of different layer distributions to separate single image into analogous different layers. USIS allows unlabeled data to be used in training. Experimental results show that the proposed framework can make unsupervised single image reflection removal and single intrinsic image decomposition properly.

References

  • [1] Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems, Cited by: §3.2, §4.
  • [2] S. Bi, X. Han, and Y. Yu (2015) An l1 image transform for edge-preserving smoothing and scene-level intrinsic decomposition. ACM Transactions on Graphics (TOG) 34 (4), pp. 78. Cited by: §1, §1.
  • [3] S. Chintala (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR, Cited by: §2.
  • [4] A. V. Dmitry Ulyanov and V. Lempitsky (2016) Instance normalization: the missing ingredient for fast stylization. In arXiv preprint arXiv:1607.08022, Cited by: §4, §4.
  • [5] Q. Fan, J. Yang, G. Hua, B. Chen, and D. Wipf (2017) A generic deep architecture for single image reflection removal and image smoothing. In ICCV, Cited by: §1, §2, §5.2.
  • [6] Q. Fan, J. Yang, G. Hua, B. Chen, and D. Wipf (2018) Revisiting deep intrinsic image decompositions. In CVPR, pp. 8944–8952. Cited by: §1.
  • [7] W. T. Freeman (2011) Ground truth dataset and baseline evaluations for intrinsic image algorithms. In ICCV, Cited by: §5.2.
  • [8] K. He, J. Sun, and X. Tang (2011) Single image haze removal using dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (12), pp. 2341–2353. Cited by: §1.
  • [9] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In ICML, Cited by: §4.
  • [10] J. S. Jae-Seong Yun (2018) Reflection removal for large-scale 3d point clouds. In CVPR, Cited by: §1.
  • [11] M. Janner (2017) Self-supervised intrinsic image decomposition. In Advances in Neural Information Processing Systems, Cited by: §2.
  • [12] D. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. In ICLR, Cited by: §5.1.
  • [13] D. P. Kingma and M. Welling (2014) Auto-encoding variational bayes. In ICLR, Cited by: §3.2.
  • [14] Y. Li and M. S. Brown (2014) Single image layer separation using relative smoothness. In CVPR, Cited by: §1, §1, §2.
  • [15] Liu (2017) Unsupervised image-to-image translation networks. In Advances in neural information processing systems 30, Cited by: §2, §4, Figure 3, Figure 4, Table 1.
  • [16] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In CVPR, Cited by: §1.
  • [17] M. M. T. Narihira and S. X. Yu (2015) Direct intrinsics:learning albedo-shading decomposition by convolutional regression. In CVPR, Cited by: §5.2.
  • [18] Ronneberger (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer- Assisted Intervention, Cited by: §2, §4.
  • [19] Y. Shih, D. Krishnan, F. Durand, and W. T. Freeman (2015) Reflection removal using ghosting cues. In CVPR, Cited by: §1, §2.
  • [20] K. Simonyan and A. Zisserman (2015) Very deep convolutional networks for large-scale image recognition. In ICLR, Cited by: §4.
  • [21] A. Torralba (2018) Single image intrinsic decomposition without a single intrinsic image. In ECCV, Cited by: §2.
  • [22] L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-sne.

    Journal of machine learning research

    9 (12), pp. 2579–2605.
    Cited by: Figure 3.
  • [23] R. Wan, B. Shi, L. Y. Duan, A. H. Tan, and A. C. Kot (2017) Benchmarking single-image reflection removal algorithms. In IEEE ICCV, Cited by: §5.2, §5.2.
  • [24] R. Wan (2019) Face image reflection removal. In CVPR, Cited by: §1, §2.
  • [25] Yi (2017) Dualgan: unsupervised dual learning for image-to-image translation. In ICCV, Cited by: §2.
  • [26] F. Yu and V. Koltun (2016) Multi-scale context aggregation by dilated convolutions. In ICLR, Cited by: §1.
  • [27] X. Zhang, R. Ng, and Q. Chen (2018) Single image reflection separation with perceptual losses. In CVPR, Cited by: §1, §2, §4, §5.2.
  • [28] N. S. Zhengqi Li (2018) Learning intrinsic image decomposition from watching the world. In CVPR, Cited by: §2.
  • [29] W. Zhou, B. Alan Conrad, S. Hamid Rahim, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13 (4), pp. 600–612. Cited by: Table 1.
  • [30] Zhu (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In CVPR, Cited by: §2, Figure 3, Figure 4, Table 1.