Garment Design with Generative Adversarial Networks

07/21/2020 ∙ by Chenxi Yuan, et al. ∙ Northeastern University 0

The designers' tendency to adhere to a specific mental set and heavy emotional investment in their initial ideas often hinder their ability to innovate during the design thinking and ideation process. In the fashion industry, in particular, the growing diversity of customers' needs, the intense global competition, and the shrinking time-to-market (a.k.a., "fast fashion") further exacerbate this challenge for designers. Recent advances in deep generative models have created new possibilities to overcome the cognitive obstacles of designers through automated generation and/or editing of design concepts. This paper explores the capabilities of generative adversarial networks (GAN) for automated attribute-level editing of design concepts. Specifically, attribute GAN (AttGAN)—a generative model proven successful for attribute editing of human faces—is utilized for automated editing of the visual attributes of garments and tested on a large fashion dataset. The experiments support the hypothesized potentials of GAN for attribute-level editing of design concepts, and underscore several key limitations and research questions to be addressed in future work.



There are no comments yet.


page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Technology-driven innovation through AI and machine learning has become an essential success factor for fashion design firms in the 21st century. According to McKinsey & Company (Amed et al., 2018), over 140% of the global fashion industry profit is generated by the leading 20% of the fashion brands. As a result, significant recent progress has been made in adopting AI and machine learning techniques for augmented and personalized design. Examples range from style matching (Kalantidis et al., 2013; Liu et al., 2016; Xu et al., 2020), to trend forecasting (Al-Halah et al., 2017), interactive search (Zhao et al., 2017; Kovashka et al., 2012), style recommendation (Lin et al., 2018; Simo-Serra et al., 2015), virtually trying clothes on (Han et al., 2018), and clothing type and style classification (Liang et al., 2016; Zhu et al., 2017b). AI and machine learning research in the fashion industry has the promise to directly influence the purchasing behavior of customers, the garment design thinking and ideation process, user-centered design and mass-personalization, and the ability of the fashion industry to adapt their product development strategies accordingly.

This article investigates how generative adversarial networks (GAN) (Goodfellow et al., 2014) can be adopted to enabled automated attribute-level editing of past successful products to inform new product design and development processes. Different from conventional adversarial attacks (Xu et al., 2019c, b, a), attribute editing involves making translations/adjustments to images based on the target attributes to generate a new sample with desired attributes while preserving other original details. Current GAN-based attribute editing research is predominantly centered on human face images (Ehrlich et al., 2016; Schroff et al., 2015). The facial attribute editing task allows to edit a face image by manipulating single or multiple attributes of interest such as hair color, expression, mustache, and age (He et al., 2019). For fashion products, the analogous visual attributes of interest may include style type, sleeve length, color, and pattern, among others. The ability to manipulate the attributes of a prior design is particularly useful in a variety of situations where customers are not satisfied with certain attributes or would like to explore various combinations of them (Zhu et al., 2017b).

Conditional GAN (Mirza and Osindero, 2014) is an extension of the original GAN formulation which allows to generate images conditioned on user-defined features that control the generative process. Among the various versions of conditional GAN proposed to date (Reed et al., 2016; Perarnau et al., 2016; Kaneko et al., 2017), attribute GAN (AttGAN) (He et al., 2019) has proven effective in generating realistic edited images with desired attributes on human face dataset. AttGAN can generate visually more pleasing results with fine facial details in comparison with the state-of-the-art GAN models. However, there is no proof or indication that AttGAN can be directly applied for attribute-level editing of fashion data such as garment images with acceptable performance.

To tackle this problem, this article develops and tests a novel AttGAN model that enables attribute-level editing of fashion items while preserving other visual aspects and attributes. First, the original AttGAN model (He et al., 2019) is implemented on a large fashion dataset consisting of 13,221 garment images along with 22 attribute values. Numerical experiments are then conducted to edit the images with respect to five desired attributes including “vest”, “polo”, “hoodie”, “blouse” and “T-shirt” (e.g., selecting the attribute “vest” is desired to turn any type of shirt into a vest). The experiments show that the great performance of AttGAN on the human face editing task cannot be achieved on the fashion editing task. The authors hypothesize the underlying reason to stem from the relative area of editing which, unlike human faces, corresponds to a large area of a garment image (e.g., entire sleeve or collar). A new version of AttGAN is thus developed to address this limitation. Numerical experiments indicate significant improvement in successful editing of different attributes such as sleeve length, color, pattern and clothes type, while preserving the remainder of the original garment image.

2. Related Work

Since its introduction in 2014, GAN (Goodfellow et al., 2014)

continues to attract growing interests in the deep learning community and has been applied to various domains such as computer vision, natural language processing

(Nie et al., 2018; Che et al., 2017), time series synthesis (Esteban et al., 2017; Luo et al., 2018), and semantic segmentation (Luc et al., 2016). Specifically, GAN has shown significant recent success in the field of computer vision field on a variety of tasks such as image generation (Choe et al., 2017; Zhang et al., 2017), image to image translation (Zhu et al., 2017a; Isola et al., 2017)

, and image super-resolution

(Ledig et al., 2017; Dong et al., 2017)

, among others. The standard GAN structure comprises two neural networks: a generator

and a discriminator iteratively trained by competing against each other in a minimax game with the following learning objective:



is a random or encoded vector,

is the empirical distribution of the input training images, and is the prior distribution of

(e.g., normal distribution).

In the standard GAN model, there is no control over the modes of the data being generated. In conditional GAN (cGAN) (Mirza and Osindero, 2014), however, the generative process is conditioned to generate images based on a user-defined vector of features. The learning objective of cGAN is as follows:


where is the extra information (e.g., class labels, attribute information) for a given real sample as input. cGAN allows to control the generation of samples using .

In cGAN, the generation of samples can be conditioned on class information (Odena et al., 2017) , text description (Reed et al., 2016; Zhang et al., 2018; Xu et al., 2018), audio (Chen et al., 2019, 2017), skeleton (Ma et al., 2017; Raj et al., 2018), and attributes (Shen and Liu, 2017). In the fashion industry, researchers have applied GANs for a variety of applications such as: (i) automated garment textures filling (Xian et al., 2018), (ii) texture transferring (Jiang and Fu, 2017) where given a basic clothing image and a fashion style image, (iii) virtual try-on (Zhu et al., 2017b) aimed at creating new clothing on a human body based on textual descriptions, (iv) interactive image editing (Cheng et al., 2018) where users can guide an agent to edit images via multi-turn conversational language, and (v) fashion recommendation (Kang et al., 2017) in which the model can be used for both personalized recommendation and personalized fashion design.

3. Methodology

This section first introduces the original formulation of a cGAN for attribute-level editing: attribute GAN (AttGAN). AttGAN (He et al., 2019) has shown great performance on facial image editing with binary attributes (e.g., {mustache, no-mustache}) and is used as our baseline model. Next, an in-depth analysis of the AttGAN formulation is conducted and a modified version of AttGAN is developed to achieve comparable performance on garment image editing.

3.1. Attribute Generative Adversarial Networks

A limitation of conventional cGAN is that the user-defined attributes/labels affect the editing of the entire image including the parts unrelated to the desired attribute. To avoid this limitation, AttGAN (He et al., 2019) builds an effective framework for high quality facial attribute editing and simultaneously preserving attribute-excluding details.

The learning objectives of the AttGAN generator and AttGAN discriminator and classifier are respectively as follows:


where is the reconstruction loss for satisfactory preservation of attribute-excluding details, is the classification constraint to guarantee the correct editing of the desired attributes, and is the adversarial learning employed for visually realistic editing. and

are hyperparameters that control the importance of different terms and are tuned experimentally.

Inspired by AttGAN’s success in human facial attribute editing, the authors first attempted to utilize the original AttGAN model for attribute-level editing of fashion product images. The preliminary observations was that AttGAN model does not perform as expected on fashion data such as garment images. Specifically, the observation was that although AttGAN can reconstruct original fashion images, it is unable to generate new vivid image with the desired attributes modified. The underlying reason behind such poor performance on fashion data is elaborated and address in the remainder of this section.

In Eq.  (3), is the attribute classification loss, employed to guide the generative process to learn and edit the desired attributes. The reconstruction loss

, on the other hand, is intended to enable the decoder to reconstruct the original input images so that the generated samples can preserve the attributes-excluding details. In the original AttGAN, these two loss functions are both trained on the generator. The aforementioned problem contributing to the poor performance of AttGAN on fashion data stems from an inherent conflict between these two loss functions. The classification loss wants the generator to distinguish the desired attributes

from the original images , by minimizing the summation of the binary cross entropy of the desired attributes and input images as follows:


where is the edited image expected to change the attributes of to another attributes . This is achieved by decoding latent representation conditioned on attributes : , where is encoded from image with binary attributes and is denoted by . Therefore the generated image is formulated as .

The reconstruction loss, on the other hand, wants the generator to preserve the original images as much as possible, by minimizing the following Manhattan distance function:


where . The reconstruction loss enables the decoder to restore the original images conditioned on its own attributes from .

The aforementioned conflict limits the use of the AttGAN for attribute-level editing of fashion data where some attributes account for larger relative areas of the image (e.g., entire sleeve or collar). It was observed that the classification loss is unable generate distinct samples from the original images of garment. The AttGAN model is thus reformulated by taking the classification loss out and training it on the generator independently. This, in turn, would enable more flexibility for the generator to edit larger areas of the input image. Accordingly, Eq. (3) is recast as follows


The modified model is referred to as Design-AttGAN with the training procedure elaborated in  1.

1:Input: images and their attributes and step number
2:for step 0 to  do
3:     Sample batch , and random generate
4:     for inner step 0 to 5 do
5:         Minimize Eq.(4)
6:     end for
7:     Minimize Eq.(8)
8:     Minimize Eq.(9)
9:end for
10:Output: a well-trained and
Algorithm 1 Design-AttGAN

4. Experiments

4.1. Dataset & Training

The Design-AttGAN model is tested on a fashion dataset (Ping et al., 2019)

, which contains 13221 images and each of which has annotation of 22 binary attributes (with/without). Attributes with great frequency are chosen in all our experiments, including “vest”, “polo”, “stripe”, “short sleeve”, “long sleeve”, “red”, “yellow”, “blue”, “purple”, “black”, and “white”. The dataset is separated into training set for training model and testing set for evaluation. The experiments are conducted on TensorFlow using the open-source code provided by

(He et al., 2019). The model is trained by Adam optimizer with the batch size of 32 and the learning rate of 0.0002.

Figure 1. AttGAN on fashion images.
Figure 2. Design-AttGAN on fashion images.
Figure 3. Design-AttGAN for different attribute editing tasks.

4.2. Results

Figure 1 shows the binary attribute editing results obtained from the original AttGAN model. As can be seen, AttGAN performs poorly on the garment images. The model dose not preserve any garment patterns and is not even able to properly edit. The underlying reason is that the classification learning task is negatively influenced by the reconstruction learning task in the original AttGAN formulation. The Design-AttGAN model is applied to address this problem (Figure 2). In the Design-AttGAN model, the classification loss is trained as an independent objective function to enhance the ability of the generator for attribute editing. As Figure 2 shows, the Design-AttGAN model outperforms the baseline AttGAN model in learning multiple attributes and changing the type of garment to “vest” or “polo”.

In the Design-AttGAN model, the classification loss is trained separately without the restriction of the reconstruction and adversarial losses in the minimax game. This provides the model with more flexibility to generate more good “fake” samples. This is necessary for the fashion attribute editing task because unlike the facial attributes, the attributes of garment products typically account for a relatively larger area of the image. This way, the generative model would have to generate more “wild” samples to incorporate the edited attributes in the original images.

Attributes Extension

Figure 3 shows the performance of the Design-AttGAN model on the attribute-editing task with eleven distinct attributes, including clothe type (“vest”, “polo”), clothe pattern (“slim horizontal stripes”), length of sleeves (“short sleeves”, “long sleeves”), and multiple colors (“red”, “yellow”, “blue”, “purple”, “black”, “white”). As shown, the model can successfully edit the color and stripe on clothes, and change the length of sleeves. However, it is not able to learn the latent pattern associated with the “polo” attribute.

To solve this problem, another experiment was conducted on a narrowed dataset. The original dataset has over 12,000 images; however, unlike sleeve length and color that are the indispensable attributes of any garment type, attributes such as “polo” are relatively rare and thus cause the data to be imbalanced. Hence, 5,782 images with attributes of clothe type category (e.g., vest, polo, blouse, t-shirt, hoodie, one-piece dress) are picked up from the original dataset. The Design-AttGAN model is then retrained on this narrowed dataset to generate samples with the desired attributes “vest” and “polo”. Results showed that the Design-AttGAN model yields better performance on the narrowed dataset (Figure 4, right) than the original dataset (Figure 4, left). With the narrowed dataset where each image is guaranteed to contain clothes type attributes, the model is more likely to capture these attributes and edit accordingly. This is further proof to the fact that training a cGAN model is highly sensitive to the balance of different attributes in the dataset. The imbalanced distribution of attributes hinders the ability of the model to learn the attributes with low frequency in the dataset.

Figure 4. Design-AttGAN trained on the entire dataset (left) and narrowed dataset (right).

5. Conclusions and Future Work

This paper introduces a deep learning model, Design-AttGAN, which has the ability to automatically edit garment images conditioned on certain user-defined attributes. The performance of the generative model is experimented and tested on a large fashion dataset. The original formulation of the AttGAN is modified to avoid the inherent conflict between the reconstruction loss and the attribute classification loss. An important observation was that generative adversarial networks are sensitive to different domains and therefore need careful revision and hand-engineering of the algorithms based on the dataset and target task.

Future work will involve seeking to generate images with higher resolution, improve the stability of the Design-AttGAN model, and broaden the scope of the proposed methodology. Evaluation of GAN’s performance will also be an important area to explore.


  • Z. Al-Halah, R. Stiefelhagen, and K. Grauman (2017) Fashion forward: forecasting visual style in fashion. In Proceedings of the IEEE International Conference on Computer Vision, pp. 388–397. Cited by: §1.
  • I. Amed, J. Andersson, A. Berg, M. Drageset, S. Hedrich, and S. Kappelmark (2018) The State of Fashion: Renewed optimism for the fashion industry. External Links: Link Cited by: §1.
  • T. Che, Y. Li, R. Zhang, R. D. Hjelm, W. Li, Y. Song, and Y. Bengio (2017) Maximum-likelihood augmented discrete generative adversarial networks. arXiv preprint arXiv:1702.07983. Cited by: §2.
  • L. Chen, R. K. Maddox, Z. Duan, and C. Xu (2019) Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 7832–7841. Cited by: §2.
  • L. Chen, S. Srivastava, Z. Duan, and C. Xu (2017) Deep cross-modal audio-visual generation. In Proceedings of the on Thematic Workshops of ACM Multimedia 2017, pp. 349–357. Cited by: §2.
  • Y. Cheng, Z. Gan, Y. Li, J. Liu, and J. Gao (2018) Sequential attention gan for interactive image editing via dialogue. arXiv preprint arXiv:1812.08352. Cited by: §2.
  • J. Choe, S. Park, K. Kim, J. Hyun Park, D. Kim, and H. Shim (2017) Face generation for low-shot learning using generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1940–1948. Cited by: §2.
  • H. Dong, S. Yu, C. Wu, and Y. Guo (2017) Semantic image synthesis via adversarial learning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5706–5714. Cited by: §2.
  • M. Ehrlich, T. J. Shields, T. Almaev, and M. R. Amer (2016) Facial attributes classification using multi-task representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 47–55. Cited by: §1.
  • C. Esteban, S. L. Hyland, and G. Rätsch (2017) Real-valued (medical) time series generation with recurrent conditional gans. arXiv preprint arXiv:1706.02633. Cited by: §2.
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1, §2.
  • X. Han, Z. Wu, Z. Wu, R. Yu, and L. S. Davis (2018) Viton: an image-based virtual try-on network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7543–7552. Cited by: §1.
  • Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen (2019) Attgan: facial attribute editing by only changing what you want. IEEE Transactions on Image Processing 28 (11), pp. 5464–5478. Cited by: §1, §1, §1, §3.1, §3, §4.1.
  • P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134. Cited by: §2.
  • S. Jiang and Y. Fu (2017) Fashion style generator.. In IJCAI, pp. 3721–3727. Cited by: §2.
  • Y. Kalantidis, L. Kennedy, and L. Li (2013) Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, pp. 105–112. Cited by: §1.
  • T. Kaneko, K. Hiramatsu, and K. Kashino (2017) Generative attribute controller with conditional filtered generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6089–6098. Cited by: §1.
  • W. Kang, C. Fang, Z. Wang, and J. McAuley (2017) Visually-aware fashion recommendation and design with generative image models. In 2017 IEEE International Conference on Data Mining (ICDM), pp. 207–216. Cited by: §2.
  • A. Kovashka, D. Parikh, and K. Grauman (2012) Whittlesearch: image search with relative attribute feedback. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2973–2980. Cited by: §1.
  • C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690. Cited by: §2.
  • X. Liang, L. Lin, W. Yang, P. Luo, J. Huang, and S. Yan (2016) Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval. IEEE Transactions on Multimedia 18 (6), pp. 1175–1186. Cited by: §1.
  • Y. Lin, P. Ren, Z. Chen, Z. Ren, J. Ma, and M. de Rijke (2018) Explainable fashion recommendation with joint outfit matching and comment generation. arXiv preprint arXiv:1806.08977 2. Cited by: §1.
  • Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1096–1104. Cited by: §1.
  • P. Luc, C. Couprie, S. Chintala, and J. Verbeek (2016) Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408. Cited by: §2.
  • Y. Luo, X. Cai, Y. Zhang, J. Xu, et al. (2018)

    Multivariate time series imputation with generative adversarial networks

    In Advances in Neural Information Processing Systems, pp. 1596–1607. Cited by: §2.
  • L. Ma, X. Jia, Q. Sun, B. Schiele, T. Tuytelaars, and L. Van Gool (2017) Pose guided person image generation. In Advances in Neural Information Processing Systems, pp. 406–416. Cited by: §2.
  • M. Mirza and S. Osindero (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. Cited by: §1, §2.
  • W. Nie, N. Narodytska, and A. Patel (2018)

    Relgan: relational generative adversarial networks for text generation

    Cited by: §2.
  • A. Odena, C. Olah, and J. Shlens (2017) Conditional image synthesis with auxiliary classifier gans. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2642–2651. Cited by: §2.
  • G. Perarnau, J. Van De Weijer, B. Raducanu, and J. M. Álvarez (2016) Invertible conditional gans for image editing. arXiv preprint arXiv:1611.06355. Cited by: §1.
  • Q. Ping, B. Wu, W. Ding, and J. Yuan (2019) Fashion-attgan: attribute-aware fashion editing with multi-objective gan. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0. Cited by: §4.1.
  • A. Raj, P. Sangkloy, H. Chang, J. Lu, D. Ceylan, and J. Hays (2018) Swapnet: garment transfer in single view images. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 666–682. Cited by: §2.
  • S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee (2016) Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396. Cited by: §1, §2.
  • F. Schroff, D. Kalenichenko, and J. Philbin (2015)

    Facenet: a unified embedding for face recognition and clustering

    In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823. Cited by: §1.
  • W. Shen and R. Liu (2017) Learning residual images for face attribute manipulation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4030–4038. Cited by: §2.
  • E. Simo-Serra, S. Fidler, F. Moreno-Noguer, and R. Urtasun (2015) Neuroaesthetics in fashion: modeling the perception of fashionability. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 869–877. Cited by: §1.
  • W. Xian, P. Sangkloy, V. Agrawal, A. Raj, J. Lu, C. Fang, F. Yu, and J. Hays (2018) Texturegan: controlling deep image synthesis with texture patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8456–8465. Cited by: §2.
  • K. Xu, H. Chen, S. Liu, P. Chen, T. Weng, M. Hong, and X. Lin (2019a) Topology attack and defense for graph neural networks: an optimization perspective. In International Joint Conference on Artificial Intelligence (IJCAI), Cited by: §1.
  • K. Xu, S. Liu, G. Zhang, M. Sun, P. Zhao, Q. Fan, C. Gan, and X. Lin (2019b) Interpreting adversarial examples by activation promotion and suppression. arXiv preprint arXiv:1904.02057. Cited by: §1.
  • K. Xu, S. Liu, P. Zhao, P. Chen, H. Zhang, Q. Fan, D. Erdogmus, Y. Wang, and X. Lin (2019c) Structured adversarial attack: towards general implementation and better interpretability. In International Conference on Learning Representations, Cited by: §1.
  • K. Xu, G. Zhang, S. Liu, Q. Fan, M. Sun, H. Chen, P. Chen, Y. Wang, and X. Lin (2020) Adversarial t-shirt! evading person detectors in a physical world. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §1.
  • T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He (2018) Attngan: fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1316–1324. Cited by: §2.
  • G. Zhang, M. Kan, S. Shan, and X. Chen (2018) Generative adversarial network with spatial attention for face attribute editing. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 417–432. Cited by: §2.
  • H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 5907–5915. Cited by: §2.
  • B. Zhao, J. Feng, X. Wu, and S. Yan (2017) Memory-augmented attribute manipulation networks for interactive fashion search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1520–1528. Cited by: §1.
  • J. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman (2017a) Toward multimodal image-to-image translation. In Advances in neural information processing systems, pp. 465–476. Cited by: §2.
  • S. Zhu, R. Urtasun, S. Fidler, D. Lin, and C. Change Loy (2017b) Be your own prada: fashion synthesis with structural coherence. In Proceedings of the IEEE international conference on computer vision, pp. 1680–1688. Cited by: §1, §1, §2.