1 Introduction
Symmetry is a prominent property that has gained attention in face recognition
[16, 1, 5] and other applications [15, 7]. Considering symmetry in the context of image generation with GANs, the core research questions that we ask in this work are: (1) how to control the symmetry of generated images and (2) how to rotate an object, when the training set did not contain the relevant supervision.Question number one is answered by proposing two alternative architectures for generative networks. In the first architecture, the first few elements of the input vector serve as the antisymmetric component, the others serve as the symmetric component. In the second architecture, the generated image is symmetric, if the input vector has a palindrome structure, i.e., remains unchanged when flipping the order of elements. Both architectures are shown to work much better than an approach in which the loss is used in order to control the symmetry property. Fig.
2 illustrates how is converted to in the symmetric GANs.The second question is answered by a process in which the generator network is adapted in order to generate a specific face. This powerful technique for preserving a given identity in the generated images is a general one and can be applied to many other GANbased methods.
As far as we know, we are the first to manipulate the structure of the generator in order to enforce properties on the generated image. Due to space constraints, we have placed in the appendices, a different structure manipulation that creates tilable textured patches. Therefore, the core idea of our work also applies to completely different tasks.
(a)  (b) 
1.1 Related Work
The task of generating realistic looking images has challenged computer vision for a long time. Recently, a major leap has been made with the development of the Generative Adversarial Network (GAN)
[10]. These architectures employ two networks and , which provide training signals to each other. The network tries to distinguish between the “fake” images generated by and the real images provided as a training set. Network tries to fool and creates images that look as realistic as possible.The specific architecture that we employ is based in part on the DCGAN [17]
method. This architecture uses deconvolution and batch normalization in order to create attractive outputs. Specifically, the input vector in DCGAN is a
vector, whose elements are sampled i.i.d from a uniform distribution. In our case, we encode symmetry into this vector, and through the use of specific architectures, we enforce the generated output
to display the required level of symmetry.2 Symmetric GANs
We present two architecturebased methods, which differ in the manipulation that the input undergoes in order to create a mirrored version, see Fig. 1. We also present a lossbased method to serve as a baseline. The difficulty in training this lossbased method successfully, emphasizes the effectiveness of the architecturebased methods.
2.1 The Symmetric Architectures
Our GANs are symmetric endtoend, including both the generator and the discriminator .
Both and contain convolutional layers, as well as fully connected ones. In order to maintain symmetry, both these layer types are augmented. The fully connected parts are handled differently in and in . We also present two architectures for , which differ exactly in the way in which the fully connected layers are constructed. The convolutional layers are treated exactly the same in all variants of and in and in all of these cases the same symmetric kernels are introduced. The flow of the symmetric generator in presented in Fig. 3.
The two alternative architectures differ in the very first layer. The first architecture creates symmetric images for inputs in which the first part is zero. The second architecture produces symmetric outputs for inputs which are themselves symmetric, i.e., , where the flip operator switches the first element with the last one, the second element with the one before the last and so on.
2.1.1 The Generator of the z’ Architecture
The first generated architecture splits the input vector into two parts. The first, is the antisymmetric part, while the second is the symmetric part. For input vectors that have , the output is completely symmetric.
The architecture that ensures the symmetry and antisymmetric property enforces this structure on the first feature map, and maintains it thereafter by employing symmetric convolutional kernels.
The generation of the first feature map is depicted in Fig. 4. The random input vector , which contains 100 i.i.d uniformly distributed elements, is split into two parts. The first part is mapped through a fully connected layer (affine transformation) to a vector of size 5120. This vector is then reshaped into 512 kernels of size , which are transformed to antisymmetric kernels of size by taking the last column to be the negative of the first column and the column before, to be the negative of the second column, see Fig. 5(c).
The remaining 95 elements of , denoted by are mapped to a vector of size , which is then reshaped to 512 kernels of size . By performing the symmetric reflection depicted in Fig. 5(b), i.e., copying the first column to the fifth column and the second to the fourth column, kernels are obtained.
The rest of the network follows the DCGAN architecture, except that the kernels of parameters each, are replaced with symmetric kernels that contain only 15 different parameters.
Fig. 5(c) shows an antisymmetric reflection of the symmetry break part on the first feature map of G.
2.1.2 The Generator of the flip Architecture
In the alternative symmetric architecture, the generator produces symmetric images, when for a given input , it happens that . Here, too, the symmetric kernels throughout the layers make sure that a symmetric feature map from one layer, leads to a symmetric feature map in the next.
The architecture of the first layer is depicted in Fig. 6. The same matrix is applied as weights of a fully connected layer, to both and , to obtain vectors of length 12,800. These are reshaped to 512 feature maps of size . The feature map that results from the flipped vector is being flipped left and right (the first columns is replaced with the fifth and second with the fourth). The matching pairs, one from each branch of the network, are then summed.
Given that , the two branches produce identical feature maps before the leftright flipping and after flipping, thus symmetry is obtained.
2.1.3 The Symmetric Discriminator Architecture
A desired property in our framework is that the discriminator
would return the same probability of “real” for an image and its mirrored version. This is not the case in conventional architectures, and, for example, during training the conventional discriminator would overfit on the training sample, while fails on its mirror image, see Fig.
7.We, therefore, enforce symmetry by using a specific architecture. First, we replace the discriminator of the DCGAN with one that has symmetric leftright kernels, similar to what we have used in the generators. For the last layer, we then obtain a feature map of size . This feature map undergoes a symmetric folding, as depicted in Fig. 5(d). Namely, the first (second) column is replaced by the sum of this column and the fifth (fourth) column. In addition, the third column is doubled. The result can be seen as a vector the size of . Through a fully connected layer, a single output is then obtained. The succession of symmetric kernels and the symmetric folding, ensure that mirror images obtain the same score by . The flow of the symmetric discriminator is shown in Fig. 8.
2.2 Alternative Loss Based Method
In addition to the architecture based approaches, we evaluated baseline methods for which the symmetry is enforced by adding a loss term. Multiple experiments were conducted, each with varying emphasis (weight) of the added loss term. The overall conclusion is that such training is highly unstable and that the model tends to collapse and either ignore the term (when the weight is low), or result in symmetric images for every input (when the weight is high).
As above, the vector encodes symmetry either by having the first five elements as zero, or by having the vector invariant to element flipping.
The loss term we add is based on pairs of inputs . For the based symmetry encoding, we create pairs , where is identical to , except that the first elements are replaced by . For the flipping based symmetry encoding, we take . The loss term used aggregates over all such pairs , for a weight and where mirror flips the order of the columns of the image.
3 Processing an Existing Image
In order to manipulate an existing image , one needs to first recover the “underlying” vector by employing a reconstruction loss. Employing this vector in order to recover and rotated versions of it, suffers from noticeable reconstruction errors. Most notably, the reconstructed face does not maintain the identity of the image , despite their similarity. The same phenomenon is also apparent when observing the approximation results of the most recent generation schemes, such as BEGAN [3] (see Fig. 9).
We therefore, propose to finetune using the same loss, while focusing on the recovered . By doing so, we obtain an image specific network that is able to generate the input image , but is no longer as general as .
We assume a symmetric generator that employs the method. Having and , we are then able to alter the amount of symmetry by modifying . First, the mirror image is generated by employing
(as above). The spectrum in between the image and the mirror image, which provides a virtual yaw effect, is then spanned by interpolating between
and . The results are shown in Fig 14(d). As shown in the experiment, the results are superior to those obtained using the unmodified generator .The process of recovering of is illustrated in Algo. 1. First, we iteratively optimize to minimize the following term: . We found it necessary to employ weight decay on and also to use a hingeloss to encourage the values to remain in the range .
In the second phase, we allow to be optimized as well, creating a version that is tailored to the specific sample . In order to prevent from becoming degenerate and too specific to the problem of reconstructing , we alternated between iterations that are performed on the training data that was used for training and between optimizing and to minimize the reconstruction risk.
4 Experiments
We present empirical evaluation results for both types of structured GANs studied: and flip.
4.1 Applying Symmetric GANs to Face Images
For evaluating the symmetric GAN methods, we have compared the following methods:

A simple DCGAN with no symmetric properties.

DCGAN with soft symmetric loss term: .

DCGAN with strong symmetric loss term: .

Our Symmetric GAN using the Architecture.

Our Symmetric GAN using the flip Architecture.
In each method, we generated a series of nine images while attempting to enforce symmetry on it, so that the first image will be a mirror of the last, the second of the one before the last and so on. Since the number of images is odd, the middle image is expected to be symmetric to itself. The way of obtaining the symmetry is determined by the method and follows the description in Fig.
2.Sample results are presented in Fig. 10. As can be seen, DCGAN creates high quality images but had no symmetric effect as expected. DCGAN with soft symmetric loss was not strong enough to enforce symmetry over the generated image. However, on the other hand, it created light deformation to the image caused by unstable training. DCGAN with a strong symmetric loss was very unstable during training. The generated images were mostly symmetric to themselves and with bad quality. The results of both of our symmetric training techniques were much more convincing and presented the desired effect.
We then averaged each generated image with the corresponding image on the other side of the series, after the 2nd image was mirrored. If the mirroring effect is exact, no artifacts are expected. The results can be seen in Fig. 11. This visualization clearly demonstrates that both our methods ( and flip) create mirror images when the input dictates this.
Finally, we measure the MSE between each image and the mirror version of it. The results are shown in Fig. 7 in the appendices. As can be seen, the proposed methods drop to nearly 0 in the middle image, indicating that those images are symmetric. We can see that the MSE of the other methods is relatively constant and does not drop to zero. The lossbased method, with the strong symmetric constraints creates images that are symmetric throughout the range of values. An even stronger symmetry loss would lead to an MSE close to zero along the entire curve, with an image that is barely recognizable as a face.
Manipulating a Face Image
In order to manipulate a given image , we have recovered the vector that best matches the image and then manipulated it, as explained in Sec. 3. Fig. 13 depicts the results obtained by recovering this vector and then generating images from manipulated versions. There are noticeable artifacts. These artifacts are largely reduced when performing the per image tuning of in order to obtain , as can be seen in Fig. 14.
4.2 Symmetrical Views of ManMade Scenes
To show that our method is general, the network was also trained on the LSUN bedrooms dataset [22]. Unlike a face, a bedroom is not symmetric. However, since mirror images of rooms belong to the same class, the method fits this kind of data well.
In our experiments, we focused on the architecture. The first experiment shows how a generated image is affected when setting its component closer or further away from zero. The results, depicted in Fig. 12, show that the closer to zero, the more symmetric the generated image is, and that the image associated with and that associated with are mirror images. The second experiment is similar, with the single change that we fix everywhere except for one coordinate at a time that is changed. This way, we can study the effect of each individual dimension on the output. The results are shown in Fig. 15. It is clear that each dimension controls a different mode of variability. However, the dimensions are not independent and the same objects emerge by using different coordinates.
5 Conclusions
DCGANs are being used today for a wide range of applications, such as domain transfer networks [19], photo editing [4], denoising, data creation and more. We demonstrate how by manipulating the structure of the generator, we can directly control the symmetry of the output. A second, completely different, application to tiling, which is presented in the appendices, shows that a similar structure modifying design provides a solution for a completely different application.
Acknowledgements
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant ERC CoG 725974).
We are grateful to Barak Itkin for proposing the tiling application.
References
 [1] (1997) Face recognition: the problem of compensating for changes in illumination direction. TPAMI 19 (7), pp. 721–732. Cited by: §1.
 [2] (2017) Learning texture manifolds with the periodic spatial gan. arXiv preprint arXiv:1705.06566. Cited by: §A.1.
 [3] (2017) BEGAN: boundary equilibrium generative adversarial networks. arXiv preprint arxiv:1703.10717. Cited by: Figure 9, §3.
 [4] (2016) Neural photo editing with introspective adversarial networks. arXiv preprint arXiv:1609.07093. Cited by: §5.
 [5] (1993) Face recognition: features versus templates. TPAMI 15 (10), pp. 1042–1052. Cited by: §1.
 [6] (2016) InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In NIPS, Cited by: §1.1.

[7]
(2016)
Group equivariant convolutional networks.
In
International Conference on Machine Learning
, pp. 2990–2999. Cited by: §1.  [8] (2015) A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576. Cited by: §A.1.

[9]
(2015)
Texture synthesis using convolutional neural networks
. In NIPS, Cited by: §A.1.  [10] (2014) Generative adversarial nets. In NIPS, pp. 2672–2680. Cited by: §1.1.
 [11] (2016) Texture synthesis with spatial generative adversarial networks. arXiv preprint arXiv:1611.08207. Cited by: §A.1.

[12]
(2016)
Perceptual losses for realtime style transfer and superresolution
. In European Conference on Computer Vision, pp. 694–711. Cited by: §A.1.  [13] (2017) Fader networks: manipulating images by sliding attributes. arXiv preprint 1706.00409. Cited by: §1.1.
 [14] (2016) Precomputed realtime texture synthesis with markovian generative adversarial networks. In European Conference on Computer Vision, pp. 702–716. Cited by: §A.1.
 [15] (2016) Linear time symmetric axis search based on palindrome detection. In ICIP, pp. 1799–1803. Cited by: §1.

[16]
(1994)
Viewbased and modular eigenspaces for face recognition
. In CVPR, Vol. 94, pp. 84–91. Cited by: §1.  [17] (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint 1511.06434. Cited by: §A.2, §1.1.
 [18] (2014) Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §A.1.
 [19] (2016) Unsupervised crossdomain image generation. arXiv preprint arXiv:1611.02200. Cited by: §5.
 [20] (2017) Improved texture networks: maximizing quality and diversity in feedforward stylization and texture synthesis. In CVPR, Cited by: §A.1.
 [21] (2016) Texture networks: feedforward synthesis of textures and stylized images. In Int. Conf. on Machine Learning (ICML), Cited by: §A.1.

[22]
(2015)
Lsun: construction of a largescale image dataset using deep learning with humans in the loop
. arXiv preprint arXiv:1506.03365. Cited by: §4.2.
Appendix A Generating Tiles
As a second application for manipulating the structure of GANs, we present methods for creating tiles that can be arranged repeatedly in 2D in a variety of predetermined patterns. Just like symmetry, tiling enforces a specific structure on the output. For example, in the case of simple tiling, where tiles are being placed in the same orientation on a grid, the top (left) part of the tile should merge smoothly with the bottom (right) part.
a.1 Previous Work On Texture Synthesis
Gatys [9] demonstrated how to capture texture properties from a given image and generate new images with the same texture properties. The descriptor is based on a pretrained network, usually VGG [18]. A GRAM matrix is extracted from feature maps of certain layers. The objective compares the descriptors of the target image to those of the source image. [8, 20] perform style transfer by combining a content loss from a feature map of a deep layer of VGG. Later on, works such as [12, 21, 14, 11, 2] and others, showed how to train generative networks that are able to simultaneously generate images with texture properties that were already embedded in the train process. Works like [14, 11, 2] do so as a GAN implementation.
In contrast to previous work, we focus on tiles and not on the textured image. This allows us to develop GANS that create tiles for complex tiling patterns.
a.2 An Architecture for Generating Tiles
The idea of enforcing structure by constructing a suitable architecture, as opposed to modifying just the loss, extends beyond symmetry to the problem of tiling. The input to the tiling problem is an image of some texture. The goal is to synthesize a patch that:

Has texture properties that are indistinguishable from those of patches from the source image .

Has a periodic structure such that when the patch is concatenated to itself, there is no texture discontinuity in the boundary.
The most basic tiling pattern repeats each tile, as is, in multiple columns and rows. However, as Fig. 16 illustrates, there are many alternative patterns in which the patterns might be rotated or placed in more complex patterns.
As in symmetry, we employ a modified version of the generator of the DCGAN method [17] in order to transform a random vector into a patch image, in this case of size . Unlike the symmetry encode case, in which the vector encodes whether the output image is symmetric or not, for tiling, we expect all outputs to maintain the two desired properties and is completely random.
Since it is the texture properties of the patch that we are concerned with, we encode the patch using the GRAM matrix extracted from the generated image as well as from all layers of
, right after the convolution, and before adding the bias, performing batch normalization and applying ReLU. Specifically,
where denotes the feature map of layer . A virtual layer of ones is added in order to capture first order statistics and the size of the GRAM matrix computed for layer is, therefore, , where is the number of filters in this layer. All GRA matrices are then normalized by the value .
All GRAM fields from all the layers of are concatenated to one descriptor, which is fed to the fully connected part of . At each batch, crops out of of size are used as the “real” samples and generated samples of the same size are used as the “fake” sample. The architecture of for capturing textures is depicted in Fig. 18.
We propose two different tiling GAN methods. The first employs cyclic deconvolutions and the second tiles and crops.
Cyclic deconvolution
In order to support horizontal tiling, for example, it is necessary to have the leftmost part of the patch similar to the rightmost part. This is enforced by replacing the deconvolution blocks of with cyclic deconvolution blocks, in which the convolutions support extend beyond the edges of the feature map and warp back to the other end of the map. This is done for all layers of . Note, that for complex tiling patterns, the cyclic deconvolution take more complex forms (see below).
Tile and randomly crop
In this method, it is the discriminator that enforces the tiling property. This is done by taking the generated image, tiling it in the plane and cropping a patch from the result. This patch is then fed to
. If there are tiling artifacts in the crop, the discriminator will then pick up on these. During backpropagation, G is being augmented in a way that reduces the artifacts and learns the tiling pattern implicitly. See Fig.
18.a.3 Tiling experiments
We first present, in Fig. 19, the results obtained for the simple grid tiling. As can be seen, tiling using tiles generated by the baseline DCGAN leads to noticeable artifacts at the boundaries of the tiles, while either one of the two methods we propose avoids these artifacts.
We further experimented with less conventional tiling approaches. The results are shown in Fig. 20. The proposed methods perform well, except that the cyclic convolution method is not appropriate for the spherical topology, since it requires the conversion of a row to a column and vice versa.
A closer look at the various artifacts can be observed in Fig. 21.
Appendix B MSE Plot for Symmetric GANs
We measure the MSE between each image and the mirror version of it. The results are shown in Fig. 22. As can be seen, the proposed methods drop to nearly 0 in the middle image, indicating that those images are symmetric to themselves. We can see that the MSE of the other methods is relatively constant and does not drop to zero. The lossbased method, with the strong symmetric constraints creates images that are symmetric throughout the range of values. An even stronger symmetry loss would lead to an MSE close to zero along the entire curve, with an image that is barely recognizable as a face.
Comments
There are no comments yet.