Designing a logo is a lengthy process that requires continuous collaboration between the designers and their clients. Each drafted logo takes time and effort to be designed, which turns this creative process into a tedious and expensive endeavor.
Recent advancements in generative models suggest a possible use of artificial intelligence as a solution to this problem. Specifically, Generative Adversarial Neural Networks (GANs) , which learn how to mimic any distribution of data. They consist of two neural networks, a generator and a discriminator, that are contended against each other, trying to reach a Nash Equilibrium  in a minimax game.
Whilst GANs and other generative models have been used to generate almost anything, from MNIST digits  to anime faces , Mario levels  and fake celebrities , logo generation has yet to receive thorough exploration. One possible explanation is that logos do not contain a hierarchy of nested segments, which networks can learn and try to reproduce. Furthermore, their latent space is not continuous, meaning that not every generated logo will necessarily be aesthetically pleasing. To the best of our knowledge, Sage, et al.  to be only one to have tackled this problem thus far. They propose a clustered approach for dealing with multi-modal data, specifically logos. Logos get assigned synthetic labels, defined by the cluster they belong to, and a GAN is trained, conditioned on these labels. However, synthetic labels do not provide enough flexibility. The need for more descriptive labels arises: labels that could better convey what is in the logos, and present designers with more detailed selection criteria.
This paper provides a first step to more expressive and informative labels instead of synthetic ones, which is achieved by defining logos using their most prominent color. Twelve color classes are introduced: black, blue, brown, cyan, gray, green, orange, pink, purple, red, white, and yellow. We propose LoGAN: an improved Auxiliary Classifier Wasserstein Generative Adversarial Neural Network (with Gradient Penalty) (AC-WGAN-GP) that is able to generate logos conditioned on the aforementioned twelve colors.
The rest of this paper is structured as follows: In Section 2 the related work and background will be discussed, then in Section 3 the proposed architecture be conveyed, followed by Section 4, where the dataset and labeling process will be explained. Section 5 presents the experimental results, consisting of the training details and results, and finally we conclude and discuss some possible extensions of this work in Section 6.
Ii Background & Related Work
First proposed in 2014 by Goodfellow et al. , GANs have seen a rise in popularity during the last couple of years, after possible solutions to training instability and mode collapse were introduced [7, 8, 9].
Ii-a Generative Adversarial Networks
GANs (architecture depicted in Fig.1(a)) consist of two different neural networks, a generator and a discriminator, that are trained simultaneously in a competitive manner.
The generator is fed a noise vector (
) from a probability distribution (), and outputs a generated data-point (fake image). The discriminator takes its input either from the generator (the fake image) () or from the training set (the real image) () and is trained to distinguish between the two. The discriminator and the generator play a two-player minimax game with value function (1), where the discriminator tries to maximize V, while the generator tries to minimize it.
Ii-A1 Objective functions
While GANs are able to generate high-quality images, they are notoriously difficult to train. They suffer from problems like training stability, non-convergence and mode collapse. Multiple improvements have been suggested to fix these problems [7, 8, 9]; including using deep convolutional layers for the networks  and modified objective functions i.e. to least-squares  or to Wasserstein distance between the distributions [12, 13, 14].
Conditional generation with GANs entails using labeled data to generate images based on a certain class. The two types that will be discussed in the subsections below are Conditional GANs and Auxiliary Classifier GANs.
Ii-B1 Conditional GANs
In a conditional GAN (CGAN)  (architecture depicted in Fig.1(b)) the discriminator and the generator are conditioned on , which could be a class label or some data from another modality. The input and
are combined in a joint hidden representation and fed as an additional input layer in both networks.
Ii-B2 Auxiliary Classifier GANs
Ii-C GAN Applications
, image super-resolution and object detection .
Up until the writing of this paper, logo generation has previously only been investigated by Sage et al. , who accomplish three main things:
Define the Large Logo Dataset (LLD) 
Synthetically label the logos by clustering them using the ResNet Classifier network
Build a GAN conditioned on these synthetic labels
However, as these labels are defined from computer-generated clusters, they do not necessarily provide intuitive classes for a human designer. More descriptive labels, that could provide designers with more detailed selection criteria and better convey what is in the logos are needed.
This paper offers a solution to the previous problem by:
Using twelve colors to define the logo classes
Defining LoGAN: An AC-WGAN-GP that can conditionally generate logos
Iii Proposed Architecture
The proposed architecture for LoGAN is an Auxiliary Classifier Wasserstein Generative Adversarial Neural Network with gradient penalty (AC-WGAN-GP), depicted on Fig.2. LoGAN is based on the ACGAN architecture , with the main difference being that it consists of three neural networks, namely the discriminator, the generator and an additional classification network 111Code for this paper is available via https://github.com/ajki/LoGAN. The latter is responsible for assisting the discriminator in classifying the logos, as the original classification loss from AC-GAN was dominated by the Wasserstein distance used by the WGAN-GP.
In an original AC-GAN  the discriminator is forced to optimize the classification loss together with the real-fake loss. This is shown in equations (2) & (3), which describe the loss from defining the source of the image (training set or generator) and the loss from defining the class of the image respectively. The adversarial aspect of AC-GAN comes as the discriminator tries to maximize , whilst the generator tries to maximize .
Since WGAN-GP is more stable while training, the proposed architecture makes use of the WGAN-GP loss, instead of the AC-GAN loss. The loss function for the discriminator and generator of LoGAN are the same as a WGAN-GP, stated in equations (4) & (5).
The loss of the additional classifier network, is defined as:
To avoid instability during training and mode collapse certain measures were taken. The generator and classifier were trained once for every 5 iterations of discriminator training, as suggested by Gulrajan et al. ,
was sampled from a Gaussian Distribution
, and batch normalization was used.
The dataset used to train LoGAN is the LLD-icons dataset , which consists of 486’377 icons.
To extract the most prominent color from the image a k-Means algorithm withwas used to define the RGB values of the centroids. The algorithm makes use of the MiniBatchKMeans implementation from sci-kit learn package, and the package webcolors was used to turn the RGB values into color words. The preliminary colors consisted of X11 color names 222A list of the X11 color names, and the class grouping of RGB values can be found on Wikipedia at : https://en.wikipedia.org/wiki/Web_colors#X11_color_names, which were grouped into 12 main classes: black, blue, brown, cyan, gray, green, orange, pink, purple, red, white, and yellow. The class distribution can be observed in Fig.3.
V Experimental Results
In this section, the training process, quality evaluation and the results obtained from the model 333 The model was trained on a Windows 10 machine, with a Tesla K80 GPU, for 400 epochs, which lasted around three full days.
The model was trained on a Windows 10 machine, with a Tesla K80 GPU, for 400 epochs, which lasted around three full days.will be presented.
Fig.4 shows the loss graphs per batch for the discriminator (a)a, generator (b)b and classifier (c)c. It can be observed that both the discriminator and the generator have not converged as the loss graphs have a downward trend. This does, however, not imply improper training as neither WGAN nor WGAN-GP are guaranteed to reach convergence .
The classifier on the other hand has converged with a loss value close to 1. Further investigation into the classification loss shows that the classification loss for fake images has converged to zero (loss depicted on Fig.5). This means that the generated images get classified correctly.
V-B Quality evaluation
The network is expected to generate logo-resembling images that have clear color definition. In each epoch, 64 logos will be generated per class. The top three most prominent colors in the logos will be extracted, and the generated pairs and triplets will be analyzed. For each class (), the precision (7), recall (8) and F1-score (9) will be measured for the most prominent color in the logo.
The results for the class conditioned generation after 400 epochs are shown in Fig.6. As expected, a slight blurriness can be noticed on the generated logos. This is because the training images are only pixels. Despite the blurriness, the generated logos resemble feasible ones. The generated images are mainly dominated by round and square shapes. Even irregular shapes have been generated, for example the heart and the x in the array of white logos. A look-alike of the Google Chrome logo is also present in the midst of the cyan class.
The precision, recall and F1-score of the class conditioned generation after 400 epochs is shown on Table I. The precision scores are relatively high with the exception of white and gray. This is because the most predominantly white logos are small logos with a lot of white or transparent space around. Consequently many small logos generated in other colors are going to be classified as white. Similarly, because of its neutrality the gray class also appears a lot in other classes.
The recall values are lower than the precision overall, with red having the highest recall at 0.92 and pink having the lowest at 0.3. Most of the pink labeled logos generated belong to class white.
The F1-score marks black on top with 0.90, and white on the bottom with 0.38. Based on Equation (9), and the fact that white has the lowest precision, its F1-score is also expected to be low.
Fig.7 shows the distribution of the top three generated colors in each class. This distribution is calculated by extracting the top three colors from each logo, and distinguishing the most prominent ones in each category. As expected from the results in Table I, white and gray are present in the top three for many of the classes. Some interesting combinations include the orange class, where the color brown appears, and the yellow class, where the color blue appears. There are three classes which get generated using only shades from their own class (blue, brown, purple). However, if we consider black and white as shades of gray, that number rises to five.
Vi Conclusion & Future work
In this paper, it was shown how designers can be assisted in their creative process for logo design by modern artificial intelligence techniques, namely Generative Adversarial Networks. The proposed model can successfully create logos if given a certain keyword, which in our case consisted of the most prominent color in the logo. This class of keywords can be considered descriptive as it provides a property of the logo that is easy for humans to distinguish.
The proposed architecture consists of an Auxilliary Classification Wasserstein GAN with gradient penalty (AC-WGAN-GP) and generates logos conditioned on twelve colors. This helps designers in their creative process and while brainstorming, making logo design cheaper and less time-consuming. As the generated logos have very low resolution, they can serve as a very rough first draft of a final logo, or as a means of inspiration for the designer. Regarding the results, the classifier converged, and the generated logos meet the requirements of the class they belong to. This is backed up by precision, recall, and comparison with other logos.
Despite the promising results, a limitation of the approach is the blurriness of the generated logos. At the same time, color is not a stand-alone keyword for defining a logo. Higher resolution training images and other labels such as the shape of the logo or the focus of the company would provide valuable input, thus improving the results.
Possible extensions to this work include conditioning on more labels, such as the shape of the logo. However, as logos do not always have clear geometrical shapes, possible issues could include logos with text only or logos with irregular shapes. These issues could be overcome by splitting the dataset into two main groups, logos with an obvious geometrical shape, e.g. quadrilateral, circular or triangular; and logos with a non-regular shape, e.g. text, letters, hearts, etc. Extracting shapes from the first group can be easily done using packages like OpenCV. The non-regular group of shapes may be a bit more challenging to label. A possible path is to use a tool such as tessaract-ocr perform optical character recognition to extract the text in the logo, and use it as the label. As for the irregularly shaped logos, those could be put in an extra class with a label such as ’others’.
Finally, another possible set of labels for the logos could be gathered for the LLD-logos dataset (which also provides higher quality images) by gathering the most used words to describe the company the logo belongs to. Combining these labels with word embedding models could potentially introduce a semantic meaning to the logos, further boosting the interpretability of the current approach.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
-  J. F. Nash et al., “Equilibrium points in n-person games,” Proceedings of the national academy of sciences, vol. 36, no. 1, pp. 48–49, 1950.
-  Y. Jin, J. Zhang, M. Li, Y. Tian, H. Zhu, and Z. Fang, “Towards the automatic anime characters creation with generative adversarial networks,” unpublished.
V. Volz, J. Schrum, J. Liu, S. M. Lucas, A. M. Smith, and S. Risi, “Evolving
mario levels in the latent space of a deep convolutional generative
adversarial network,” in
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2018). New York, NY, USA: ACM, July 2018. [Online]. Available: http://doi.acm.org/10.1145/3205455.3205517
-  T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” unpublished.
-  A. Sage, E. Agustsson, R. Timofte, and L. Van Gool, “Logo synthesis and manipulation with clustered generative adversarial networks,” unpublished.
-  N. Kodali, J. Abernethy, J. Hays, and Z. Kira, “On convergence and stability of gans,” unpublished.
-  L. Mescheder, A. Geiger, and S. Nowozin, “Which training methods for gans do actually converge?” arxiv preprint,” unpublished.
-  T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” in Advances in Neural Information Processing Systems, 2016, pp. 2234–2242.
-  A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” unpublished.
X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley, “Least squares
generative adversarial networks,” in
2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017, pp. 2813–2821.
-  M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” unpublished.
-  D. Berthelot, T. Schumm, and L. Metz, “Began: Boundary equilibrium generative adversarial networks,” unpublished.
-  I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Advances in Neural Information Processing Systems, 2017, pp. 5769–5779.
-  M. Mirza and S. Osindero, “Conditional generative adversarial nets,” unpublished.
-  A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” unpublished.
-  H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. Metaxas, “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks,” in IEEE Int. Conf. Comput. Vision (ICCV), 2017, pp. 5907–5915.
-  H. Wu, S. Zheng, J. Zhang, and K. Huang, “Gp-gan: Towards realistic high-resolution image blending,” unpublished.
-  C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network,” unpublished.
-  J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, and S. Yan, “Perceptual generative adversarial networks for small object detection,” in IEEE CVPR, 2017.
-  A. Sage, E. Agustsson, R. Timofte, and L. Van Gool, “Lld - large logo dataset - version 0.1,” https://data.vision.ee.ethz.ch/cvl/lld, 2017.
-  T. White, “Sampling generative networks: Notes on a few effective techniques,” unpublished.
-  S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” unpublished.