SSD-GAN: Measuring the Realness in the Spatial and Spectral Domains

12/10/2020
by   Yuanqi Chen, et al.
Tencent
Peking University
8

This paper observes that there is an issue of high frequencies missing in the discriminator of standard GAN, and we reveal it stems from downsampling layers employed in the network architecture. This issue makes the generator lack the incentive from the discriminator to learn high-frequency content of data, resulting in a significant spectrum discrepancy between generated images and real images. Since the Fourier transform is a bijective mapping, we argue that reducing this spectrum discrepancy would boost the performance of GANs. To this end, we introduce SSD-GAN, an enhancement of GANs to alleviate the spectral information loss in the discriminator. Specifically, we propose to embed a frequency-aware classifier into the discriminator to measure the realness of the input in both the spatial and spectral domains. With the enhanced discriminator, the generator of SSD-GAN is encouraged to learn high-frequency content of real data and generate exact details. The proposed method is general and can be easily integrated into most existing GANs framework without excessive cost. The effectiveness of SSD-GAN is validated on various network architectures, objective functions, and datasets. Code will be available at https://github.com/cyq373/SSD-GAN.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 5

page 6

page 7

page 9

03/20/2021

Are High-Frequency Components Beneficial for Training of Generative Adversarial Networks

Advancements in Generative Adversarial Networks (GANs) have the ability ...
11/03/2021

On the Frequency Bias of Generative Models

The key objective of Generative Adversarial Networks (GANs) is to genera...
12/09/2019

cGANs with Multi-Hinge Loss

We propose a new algorithm to incorporate class conditional information ...
02/11/2021

SWAGAN: A Style-based Wavelet-driven Generative Model

In recent years, considerable progress has been made in the visual quali...
12/05/2020

Spectral Distribution aware Image Generation

Recent advances in deep generative models for photo-realistic images hav...
03/31/2021

A Closer Look at Fourier Spectrum Discrepancies for CNN-generated Images Detection

CNN-based generative modelling has evolved to produce synthetic images i...
10/27/2021

Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme

Recently, a series of algorithms have been explored for GAN compression,...

Code Repositories

SSD-GAN

SSD-GAN: Measuring the Realness in the Spatial and Spectral Domains. AAAI2021.


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Generative Adversarial Networks (GANs) Goodfellow et al. (2014) involve training a generator and discriminator network in an adversarial manner, such that the generator learns to reproduce the desired data distribution. Despite the remarkable achievements in image generation tasks Isola et al. (2017); Karras et al. (2019), as shown in recent works Zhang et al. (2019); Durall et al. (2020); Dzanic and Witherden (2019); Frank et al. (2020)

, we can efficiently distinguish GAN-generated images from real images in the frequency domain, which indicates that existing GANs fail to learn the spectral distributions.

Recent studies Dzanic and Witherden (2019); Frank et al. (2020) show that the frequency spectrum discrepancy mainly exists at high frequencies. Because high-frequency components of images influence the exactness of details, the discrepancy cannot be ignored for generative tasks where details matter. As shown in Fig. 1, when real data contains significant high frequencies, standard GAN might fail to reproduce the desired data distribution. Moreover, since the Fourier transform is a bijective mapping, the frequency spectrum discrepancy between real data and the generated samples also indicates that the data distribution in image space is not well captured. We believe that reducing the spectrum discrepancy would boost the performance of GANs.

Figure 1: Standard GAN (SGAN) fails to learn high frequencies of real data. In this toy example, the real data is a single image with a checkerboard pattern, which contains significant high frequencies. Unlike SGAN, our proposed SSD-GAN alleviates the spectral information loss in the discriminator and can well reproduce the real data. The detailed experimental setup is described in the supplemental materials.

In this paper, we first attempt to explore why there is a spectrum discrepancy between real data and the generated samples. By investigating the downsampling techniques widely used in the discriminator networks, we reveal that both of these downsampling strategies, downsampling with anti-aliasing and downsampling without anti-aliasing, would lead to high frequencies missing in the discriminator. Since the training of GANs is a two-player minimax game, the generator lacks incentives from the discriminator to learn the high-frequency information of the data.

Figure 2: The StyleGAN’s discriminator cannot distinguish the difference of high frequencies.

The numbers are estimates of

by averaging 1,000 samples. Unless we change the spectrum over a large bandwidth, the outputs of the discriminator are roughly the same, which makes StyleGAN fail to reproduce spectral distribution.

To address the issue of high frequencies missing in the discriminator, we propose SSD-GAN, whose discriminator can measure the realness of the input in both the spatial and spectral domains. To instantiate the idea, we introduce an additional spectral classifier to detect frequency spectrum discrepancy between real and generated images and integrate it into the discriminator of GANs. With the enhanced discriminator, the generator of SSD-GAN is encouraged to reduce frequency spectrum discrepancy and generate realistic images in both the spatial and spectral domains. Since a lightweight spectral classifier can be effective, the proposed method is general and can be easily integrated into most existing GANs framework without excessive cost. In the experiment, the effectiveness of the proposed method is validated on various network architectures, objective functions, and datasets.

The contributions of the paper can be summarized as follows:

  • We observe there is an issue of high frequencies missing in the discriminator of GANs and reveal it stems from downsampling layers employed in the network architecture, which results in a significant spectrum discrepancy between generated images and real images.

  • We introduce SSD-GAN, an enhancement of GANs to alleviate spectral information loss in the discriminator. With a discriminator that can measure both the spatial and spectral realness of an input sample, SSD-GAN can better capture the data distribution than standard GANs.

  • We show experimentally that the quality of the generations can be improved by reducing the frequency spectrum discrepancy, which emphasizes the necessity of learning in the frequency domain.

Related Work

Generative Adversarial Networks

Recent rapid advances in Generative Adversarial Networks (GANs) Goodfellow et al. (2014)

have greatly promoted the computer vision and image processing community, e.g., image inpainting 

Yang et al. (2018); Ren et al. (2019)

, image colorization 

Isola et al. (2017)

, image-to-image translation 

Yu et al. (2019), etc. To enhance the quality of generated samples, PG-GAN Karras et al. (2018) introduces a progressive growing manner for the training process to increase the resolution of synthesized images. StyleGAN Karras et al. (2019) propose a style-based generator for finer control over the image synthesis. Other lines of work focus mainly on improving the discriminator of GANs. As details are important for generative tasks, PatchGAN discriminator Isola et al. (2017) utilizes local discriminator feedback to capture better local structures. SNGAN Miyato et al. (2018) limits the spectral norm of the weight matrices in the discriminator for Lipschitz constraint. For countering discriminator forgetting and stabilize the training process, SS-GAN Chen et al. (2019) proposes to rotate the image and ask the discriminator to predict the rotation angle. To effectively balancing the performance of the generator and discriminator, variational discriminator bottleneck Peng et al. (2019) constrains information flow in the discriminator. In this paper, after observing the issue of high frequencies missing in the discriminator, we aim to enhance the ability in the frequency domain of the discriminator. Thus the generator is encouraged to learn the spectral distribution of real data.

Frequency Analysis for CNNs

Even though some GAN-generated images seem to be flawless for human perception, recent studies Zhang et al. (2019); Dzanic and Witherden (2019); Frank et al. (2020); Durall et al. (2020) find frequency analysis is effective for image forensics. They also show that existing GAN based models always fail to reproduce the spectral distribution of real data. AutoGAN Zhang et al. (2019) first identifies that spectral artifacts stem from upsampling modules included in the GANs pipeline. To compensate spectral distortions, a spectral regularization term Durall et al. (2020) is proposed to add to the generator loss. Frank et al. (2020) examines StyleGAN instances using different upsampling techniques and finds bilinear sampling followed by anti-aliasing filters would help to alleviate the problem. In this paper, we investigate another source of spectral distortions, the issue of high frequencies missing in the discriminator.

Apart from frequency analysis for image forensics, researchers prove that neural networks exhibit a spectral bias 

Rahaman et al. (2019); Xu et al. (2019); they learn filters with a strong bias towards lower frequencies. Based on this observation, to effectively control the resource usage, band-limited convolutional layer Dziedzic et al. (2019) is introduced to constrain the frequency spectra of filters and data, while retaining high performance for classification tasks. However, high-frequency components cannot be ignored for generative tasks where details matter. To guarantee all the information can be kept in the model, MWCNN Liu et al. (2018) utilizes discrete wavelet transform (DWT) as a downsampling module in the network architecture for image restoration. Different from it, we introduce a spectral classifier to compensate for high-frequency information loss of GANs’s discriminator.

High Frequencies Missing in the Discriminator

In the standard GANs Goodfellow et al. (2014), the adversarial loss for the discriminator is defined as:

(1)

and

represents the probability that

comes from rather than the generator’s distribution . In other words, measures the realness of the sample . If is realistic, then it is realistic in all aspects, such as in the spatial and frequency domains. However, as pointed out in recent works Zhang et al. (2019); Durall et al. (2020); Dzanic and Witherden (2019); Wang et al. (2020b); Frank et al. (2020), existing GAN based models usually fail to synthesize samples that are realistic in the frequency domain. It suggests that we cannot only measure the realness in the spatial domain.

Why do these GAN based models fail to reproduce the spectral distributions? We suspect that the generator lacks incentives from the discriminator to learn the high-frequency information of the data, since the training of GANs is a two-player minimax game. To validate the assumption, we first randomly sample 1,000 images from the real dataset. Then we modulate the amplitude of high frequencies of different bands. For the discriminator of StyleGAN Karras et al. (2019), we compute the mean output of it for images after inverse Fourier transform of the modified spectra. As shown in Fig. 2, unless we change the spectrum over a large bandwidth, the discriminator cannot tell the difference of these spectra and the outputs are roughly the same. As a result, if the generated images contain some unusual high-frequency components, the discriminator may not distinguish them to be fake, which makes StyleGAN fail to reproduce spectral distribution.

Figure 3: Downsampling causes high frequencies missing. For these two downsampling strategies, the left column is the output of the raw image and its spectrum, while the right column is the output of the modulated image and its spectrum. After Blur + AvgPool, there is no significant difference between the spectra before and after the amplitude modulation. For AvgPool, the results exhibit aliasing (e.g., see shirt in the images), which indicates high-frequency details become invalid.

Why does the discriminator fail to distinguish the high-frequency contents of the images? We believe that there is an issue of high frequencies missing in the architecture of the discriminator. Specifically, this issue stems from the downsampling modules of the discriminator. When downsampling an input image, based on the classical sampling criterion Nyquist (1928), a reasonable approach is to anti-alias by low-pass filtering the image. Some networks adopt this form of blurred-downsampling LeCun et al. (1990); Karras et al. (2019)

. However, the low-pass filter removes the high frequencies of the input images. Since low-pass filtering before downsampling usually results in performance degradation, it is rarely used today. Another line of downsampling methods, such as max-pooling, strided-convolution, and average-pooling, abandons the use of the low-pass filter. However, these commonly used downsampling methods ignore the sampling theorem 

Zhang (2019), and high-frequency components are aliased and become invalid. To sum up, both of these downsampling strategies, downsampling with anti-aliasing and downsampling without anti-aliasing, lead to high frequencies missing in the discriminator.

As shown in Fig. 3, we provide evidence for the above statement. For an input image, we first enhance the amplitude of high frequencies using a sharpening filter. Then we downsample the raw image and the modulated image and compare the results. For Gaussian blur followed by average-pooling, which belongs to downsampling with anti-aliasing, the high-frequency components are attenuated, and there is no significant difference between the spectra before and after the amplitude modulation. For average-pooling, since it does not provide the anti-aliasing capability, the results exhibit aliasing (e.g., see shirt in the images), which indicates high-frequency details become invalid. The more downsampling modules the deep network has, the wider the bandwidth of the lost high frequencies, which indicates that the high frequencies missing issue cannot be ignored, especially for generative tasks where details matter.

Methodology

In this section, we first introduce a spectral classifier to detect frequency spectrum discrepancy between real and generated images. Then we integrate into the discriminator of GANs to enhance its ability in the spectral domain, thereby reducing the spectrum discrepancy.

Detecting Frequency Spectrum Discrepancy

To address the issue of high frequencies missing in the discriminator, a straightforward approach is to discriminate in the frequency domain rather than the spatial domain. For a discrete two-dimensional signal representing an image of size , we first compute the discrete Fourier transform of it,

(2)

for the spectral coordinates and . Then we convert it from Cartesian coordinates and to polar coordinates and for better representing the frequencies of different bands,

(3)

Recent works Durall et al. (2020); Dzanic and Witherden (2019) have shown that a simple 1D representation of the Fourier power spectrum is effective to highlight the difference between the spectral characteristics of real and deep network generated images. Following these works, we get the reduced spectral representation by azimuthally averaging over ,

(4)

which represents the mean intensity of the signal with respect to the radial distance . The reduced spectral representation smooths the fluctuations in the spectrum at high frequencies.

For an input image

, we use the grayscale component of it to get its spectral vector

and denote the process as . The spectral classification loss is:

(5)

where measures the spectral realness of , and is the generator ’s distribution.

Reducing Frequency Spectrum Discrepancy

Figure 4: The enhanced discriminator measures both the spectral realness and the spatial realness.

Since a sample is realistic if and only if it is realistic in both the spatial and frequency domains, we propose to measure the realness of with the combination of spatial realness and spectral realness. We integrate the spectral classifier into the discriminator of GANs to encourage the generator to learn the high-frequency content of the data, As shown in Fig. 4, our enhanced discriminator consists of two modules, a vanilla discriminator which measures the spatial realness, and a spectral classifier . Thus, is a discriminator measuring the realness of the input in both the spatial and spectral domains, and the overall realness of a sample is represented as:

(6)

where

is a hyperparameter that controls the relative importance of the spatial realness and the spectral realness. The adversarial loss of the framework can be written as:

(7)

where represents the generator ’s distribution.

To train our model, we alternately update spectral classifier , discriminator , and generator with the following gradients:

(8)

Analyzing the Effect of the Spectral Classifier

Since much information of the image is discarded in the spectral vector

, we found it cannot provide an effective gradient for the adversarial training, which degrades the performance of the model. To this end, we propose that the backpropagation process of Eq. 

7 does not pass through the spectral classifier , and serves as a spectral modulating factor to the adversarial loss.

Figure 5: FFHQ generations. Generations of SSD-StyleGAN trained on FFHQ at 10241024.

We compare the gradient of standard GAN (SGAN) and the proposed method for further insight. For a generated image , the gradients of the discriminator and generator in non-saturating SGAN are respectively:

(9)
(10)

where and are the parameters of and , and is the Jacobian. As for our method, the gradients are:

(11)
(12)

From these gradients, it can be observed that our method performs a hard example mining, where ”hard” is defined in the frequency domain. For the discriminator, if , the generated sample has good spectral characteristics and is a hard example to be classified as fake. For the generator, is a hard example when . This means that has poor spectral realness and needs more attention from the generator. In our model, when is a hard example in the frequency domain, the gradients of the discriminator and generator are up-weighted, which induces the model to learn the spectral distribution of the real data.

Experiments

Since is easy to compute and can be a lightweight classifier Durall et al. (2020), the proposed method is general and can be easily integrated into most existing GANs framework without excessive cost. In the experiment, we show that for various GANs frameworks of different objective functions, network architectures and datasets, our method can reduce the frequency spectrum discrepancy and improve the performance in the spatial domain.

SSD-StyleGAN

Implementation

Based on StyleGAN Karras et al. (2019), We evaluate the effectiveness of our method on the FFHQ Karras et al. (2019) dataset. It consists 70,000 high-quality images at 10241024 resolution. We use the same implementation as StyleGAN. In the discriminator, the activations are blurred before each downsampling layer for anti-aliasing. The training process is under a progressive growing manner Karras et al. (2018) which starts from 88 to 10241024. We apply the non-saturating loss Goodfellow et al. (2014) as our adversarial loss with regularization Mescheder et al. (2018). We train all our models with Adam optimizer Kingma and Ba (2015), setting . The total training time is 25M images. The hyperparameter is set to 0.5. The experiments are conducted on 4 Tesla V100 GPUs.

Figure 6: The absolute difference of average spectra .

Evaluation in the frequency domain

We first validate whether our proposed SSD-StyleGAN can reduce spectral distortions. We estimate the average spectrum by averaging over 5,000 images. Then we plot the absolute difference between the two average spectra. As depicted in Fig. 6

, compared to StyleGAN, the frequency spectrum discrepancy between images generated by SSD-StyleGAN and real data is significantly reduced. We also notice that both of the models have spectral distortions at the corners of the spectra that represent extremely high frequencies of images. In pratice, due to image compression algorithms applied to real data, these high-frequency bands contain little information. Therefore, to discourage overfitting, GANs tend not to learn these extremely high frequencies, and these components of the generated images behave like white noise which has constant power density 

Dzanic and Witherden (2019).

Evaluation in the spatial domain

Table 1 reports the performance in the spatial domain. We adopt Fréchet Inception Distance (FID) Heusel et al. (2017) to evaluate the perceptual quality of generated images, and perceptual path length (PPL) Karras et al. (2019) to measure the degree of disentanglement of representations. Because the feature extractors used in these metrics are neural networks that map from a high-dimensional input space to a low-dimensional space, they also suffer from some degree of high-frequency loss and mainly measure the characteristics in the spatial domain. It is evident that our method performs better than StyleGAN on both these metrics. We attribute the performance improvement to alleviating the high frequencies missing problem in the discriminator. By reducing spectral distortions, it helps to reproduce the spatial distribution of real data, since the Fourier transform is a bijective mapping.

For qualitative evaluation, we utilize the recent embedding algorithm Abdal et al. (2019) to map a given image into the

space of a pre-trained StyleGAN and then reconstruct back for comparison. We note that the models may memory images during training and produce good reconstructions. To avoid overfitting, we propose to compare the interpolations of StyleGAN and SSD-StyleGAN. As shown in Fig. 

7, compared to StyleGAN, the results of our proposed method show a smoother morphing and have better details, which is consistent with the quantitative evaluation. Fig. 5 shows a collection of generations obtained from SSD-StyleGAN.

Estimating the spectral quality of samples

Figure 7: Interpolations in the space. For each comparison, the top row represents the reconstructions of the interpolations of StyleGAN, while the bottom row represents the results of SSD-StyleGAN.

In this section, we evaluate the performance of the spectral classifier for estimating the spectral quality of samples. As shown in Fig. 8, for real data and generations of SSD-StyleGAN, we present two images with high and low spectral quality scores. In general, the samples with high spectral quality scores display a clear portrait. However, the images with low spectral quality scores are often overexposed and lose details, or have some unusual high-frequency components (e.g., see background and headwear in the right column). The above observation shows the effectiveness of the spectral classifier , which is beneficial to the learning process of SSD-StyleGAN.

Figure 8: Spectral quality of real data and generations of SSD-StyleGAN.
Method FID Path length
full end
StyleGAN 4.40 234.0 195.9
SSD-StyleGAN 4.06 229.1 189.7
Table 1: FID scores and perceptual path lengths (PPLs) on FFHQ (lower is better). PPLs are measured in the space.

Ssd-Sngan

Implementation

Based on SNGAN Miyato et al. (2018), we evaluate the proposed method on a range of datasets including CIFAR100 Krizhevsky and Hinton (2009), STL10 Coates et al. (2011), and LSUN-bedroom Yu et al. (2015). Since these datasets have various kinds of resolution, we mark them with the resolution: CIFAR100-32, STL10-48, and LSUN-128. We use the same training configurations as  Lee and Town (2020). We train all our models with Adam optimizer Kingma and Ba (2015), setting . The learning rate is set to 0.0002, and the minibatch size is 64. The hyperparameter is set to 0.5. All models are trained on a single Tesla V100 GPU.

Baseline Models

We compare our method against three baselines, including:

  • SNGAN Miyato et al. (2018) limits the spectral norm of the weight matrices in the discriminator for Lipschitz constraint. It adopts a ResNet He et al. (2016) backbone and uses average-pooling as the downsampling layer in the discriminator. Different from StyleGAN, it utilizes the hinge versionMiyato et al. (2018) of the adversarial loss.

  • SNGAN+REG Durall et al. (2020) adds a spectral regularization loss to the generator loss to penalize for synthesizing samples with abnormal spectra. The term can be written as , where measures the binary cross entropy.

  • SNGAN+DWT Liu et al. (2018)

    adopts discrete wavelet transform (DWT) as downsampling layer to avoid information loss. we use 2D Haar wavelet transform to decompose an input into an low-pass representation and three directions of high-requency coefficients. Specifically, this DWT downsampling layer transforms the input raw images or a group of feature maps with heith H, width W and channel C into a tensor of shape

    .

Figure 9: Bedroom generations. Generations of SSD-SNGAN+REG trained on LSUN-bedroom at 128128.
Method CIFAR100-32 STL10-48 LSUN-128
SNGAN 22.61 39.56 25.87
SNGAN+REG 21.39 38.16 16.95
SNGAN+DWT 25.95 40.46 148.05
SSD-SNGAN 19.28 36.41 15.17
SSD-SNGAN+REG 19.25 35.41 10.61
Table 2: FID scores on CIFAR100-32, STL10-48, and LSUN-128 (lower is better).

Results

Table 2 reports the FID scores on CIFAR100-32, STL10-48, and LSUN-128. SNGAN+REG shows performance improvement over the baseline SNGAN, indicating that utilizing spectral information is effective. Compared with SNGAN, the scores of SNGAN+DWT are higher, especially for LSUN-128. We conjecture that because the DWT downsampling layer remains all the high-frequency information of the input, which contains both details and noises, it is difficult for the model to learn meaningful semantic representation. Moreover, as pointed out by Wang et al. (2020a), learning high-frequency information may degrade the robustness and generalization of a model. Since LSUN-128 has higher resolution and contains more high-frequency information, performance degradation of SNGAN+DWT on this dataset is more dramatic. Our method gets lower FID scores than SNGAN+REG and SNGAN+DWT, indicating it is a smarter way to utilize high-frequency information of images. It is remarkable that SSD-SNGAN+REG achieves the best FID scores among all the datasets. These two techniques have complementary advantages, since they encourage the model to utilize high-frequency information from the aspects of generator and discriminator respectively. Fig. 9 shows a collection of generations obtained from SSD-SNGAN+REG on LSUN-128.

Robustness of the Hyperparameter

In Eq. 6, we introduce a hyperparameter to control the relative importance of spatial realness and spectral realness. Here, we evaluate different hyperparameter settings on CIFAR100-32 to investigate the robustness of . The baseline model is SNGAN, and other training settings remain the same as the previous section. Fig. 10 compares FID scores over the course of training when setting different values for . Note that the case of is the baseline model. We observe that the proposed approach yields consistent performance improvements and enjoys considerable tolerance for the selection of the hyperparameter .

Figure 10: FID scores over the course of training with different hyperparameter .

Conclusion and Outlook

In this paper, we delve into why existing GANs fail to reproduce the spectral distribution of real data and reveal the issue of high frequencies missing in the discriminator. To alleviate the issue, we introduce SSD-GAN, whose discriminator is enhanced to measure the realness of samples in both the spatial and spectral domains. We provide empirical evidence that the proposed SSD-GAN can reduce frequency spectrum discrepancy, thus achieving performance improvement in the image domain.

Frequency analysis provides a novel perspective for analyzing and understanding GANs. It also opens some avenues for future research. First, although recent GAN based models achieve good performance under existing metrics, the generated samples of these models can easily be distinguished from real data in the frequency domain. A metric that quantitatively measure the performance of generative models in the frequency domain would promote the image synthesis community. Moreover, besides the discriminator in GANs, many machine learning tasks involve learning a mapping from a high-dimensional input space to a low-dimensional space. Constructing a general network architecture to learn semantic representations while taking high frequencies into consideration is interesting and challenging for future work.

References

Experimental Setup of the Toy Example

In the toy example, we aim to describe a simple yet prototypical counterexample to show that standard GAN (SGAN) fails to learn high frequencies of real data. The real data distribution is given by a Dirac-distribution concentrated at a single image, which has pixels and a checkerboard pattern with significant high-frequency information. We train SGAN and SSD-GAN with Adam optimizer, setting . The learning rate is set to 0.0002. The models are trained for 10K iterations. For SSD-GAN, the hyperparameter is 0.5.

The network architectures of SGAN and SSD-GAN are shown in Table. 3 and Table. 4. The spectral classifier

in SSD-GAN has only one fully connected layer. There are some notations: N: the number of output channels, K: kernel size, S: stride size, P: padding size, FC: fully connected layer, BN: batch normalization, SN: spectral normalization, Up: upsampling using bilinear interpolation.

Layer Input Output Shape
FC-(8,1024), Reshape (8) (64,4,4)
ResBlock: CONV-(N64,K3,S1,P1),

BN, ReLU, Up

(64,4,4) (64,8,8)
ResBlock: CONV-(N64,K3,S1,P1),
BN, ReLU, Up
(64,8,8) (64,16,16)
BN, ReLU,
CONV-(N1,K3,S1,P1), Tanh
(64,16,16) (1,16,16)
Table 3: Architecture of the generator .
Layer Input Output Shape
ResBlock: CONV-(N128,K3,S1,P1),
ReLU, AvgPool-(K2,S2)
(1,16,16) (128,8,8)
ResBlock: CONV-(N128,K3,S1,P1),
ReLU, AvgPool-(K2,S2)
(128,8,8) (128,4,4)
ResBlock: CONV-(N128,K3,S1,P1),
ReLU
(128,4,4) (128,4,4)
ReLU, GlobalSumPool (128,4,4) (128)
FC-(128,1), SN (128) (1)
Table 4: Architecture of the discriminator .

Additional Qualitative Results

We provide more qualitative results for interpolations in Fig.11 and generations of multiple datasets in Fig.12. As shown in Fig.11, compared to StyleGAN, the results of our proposed method show a smoother morphing and have better details. Fig.12

shows more generated samples of our proposed method trained on multiple datasets including LSUN-CAT, CIFAR100 and STL-10.

Figure 11: Interpolations in the space. For each comparison, the top row represents the reconstructions of the interpolations of StyleGAN, while the bottom row represents the results of SSD-StyleGAN.
Figure 12: Generations of multiple datasets. Top: Generations of SSD-StyleGAN trained on LSUN-CAT dataset. Bottom-left: Generations of SSD-SNGAN trained on CIFAR100 dataset. Bottom-right: Generations of SSD-SNGAN trained on STL-10 dataset.