Log In Sign Up

GDCA: GAN-based single image super resolution with Dual discriminators and Channel Attention

by   Thanh Nguyen, et al.

Single Image Super-Resolution (SISR) is a very active research field. This paper addresses SISR by using a GAN-based approach with dual discriminators and incorporating it with an attention mechanism. The experimental results show that GDCA can generate sharper and high pleasing images compare to other conventional methods.


FREDSR: Fourier Residual Efficient Diffusive GAN for Single Image Super Resolution

FREDSR is a GAN variant that aims to outperform traditional GAN models i...

Single Image Super-Resolution via a Dual Interactive Implicit Neural Network

In this paper, we introduce a novel implicit neural network for the task...

BAM: A Lightweight and Efficient Balanced Attention Mechanism for Single Image Super Resolution

Single image super-resolution (SISR) is one of the most challenging prob...

Image Super Resolution via Bilinear Pooling: Application to Confocal Endomicroscopy

Recent developments in image acquisition literature have miniaturized th...

Single Image Super-Resolution

This study presents a chronological overview of the single image super-r...

Efficient Super Resolution For Large-Scale Image Using Attentional GAN

Single Image Super Resolution (SISR) is a well-researched problem with b...

1 Introduction

Figure 1: The generator architecture consists of channel attention blocks and local extraction blocks with one long skip

Single Image Super Resolution (SISR) is the problem of reconstructing an accurate high-resolution (HR) image from its low-resolution (LR) counterpart. The reconstructed image is referred as the super-resolution (SR) one. Recent approaches using deep learning show the impressive result with much higher quality compare to conventional approaches. This result engages more researchers to work on and make deep learning SISR to be an active research field along with other canonical topics

[nguyen2021pre, 9413446, nguyen2021sample] .

CNN based approach has achieved significant improvements over conventional methods. Dong et al. [dong2014learning] is a pioneer with three-layer CNN (SRCNN) for the SISR. After that, VDSR [kim2016accurate] was introduced with an increase of network depth to 20 layers, which outperforms SRCNN… These approaches aim to maximize Peak Signal Noise Ratio (PSNR) with the more efficient deeper network. The problem with these approaches is that the SR images are blurry result in visually unpleasing.

Recently, GAN-based methods have emerged as an effective solution to overcome the blurry limitation. Taking advantage of GANs enables to reconstruct SR images with high-frequency details and high perceptual quality. GAN based approach usually consists of a generator and a discriminator. Discriminator try to identify HR or SR image whereas generator try to fool discriminator to classify its generated SR image as HR image. SRGAN

[ledig2017photo] employs an adversarial loss term to increase visually pleasing quality. SRFeat [park2018srfeat] used two discriminators and adopts the adversarial loss terms in both image and feature domains, resulting better perceptual quality.

Although previous approaches achieve high quality results, they did not take attention into account.. With SISR, attention can help the network focus on importance regions which have high texture in other to get higher performance. In this work, we thoroughly incorporate attention with previous approaches and prove the effectiveness of attention mechanism in SISR. To be more specific, we propose “GAN-based single image super resolution with Dual discriminator and Channel Attention.” (GDCA). GDCA takes advantage of GAN. GDCA introduce new generator with integrated attention module to boost performance. we also employ two discriminators which are image discriminator and feature discriminator. With these two discriminators, network discriminates in both image space and feature space resulting in a better perceptual quality. Our contributions are listed below:

  • Propose a new generator architecture with a combination of skip connection, channel attention, batch norm removal, and use mean absolute error loss rather than conventional mean square loss.

  • Dual discriminators are utilized: image discriminator and feature discriminator. The network is trained under GAN framework.

  • We thoroughly evaluate our network on Root Mean Square Error (RMSE), “Perceptual Index” based on Ma + NIQE (PIMR2018). Experimental results show that our GDCA method is superior to has high performance on several benchmarks proving the effectiveness of attention in SISR. GDCA also generated much more pleasure image. The finding will engage people apply attention to their network in other to get high result on SISR.

2 Methodology

In this section, we present our proposed generator architecture as well as the explanation for our design. Then we explain the training loss functions used to train this network efficiently.

Figure 2: The discriminator architecture. The number above a convolution represents the number of filters

2.1 Network architecture

As the GAN-based architecture, our architecture includes two parts: generator and discriminators. Therein, the discriminators includes image discriminator and feature discrimination. The generator network is shown in Figure 1. The network receives the low resolution (LR) image and produces corresponding super resolution (SR) image. The overall network consists of multiple channel attention (CA) blocks and local extraction (LE) blocks with a weighted long skip connection. Firstly, a 5x5 convolution layer to extract low-level features are applied to input to extract course feature representation. Secondly, multiple CA blocks and LE blocks are employed to learn higher-level features with more non-linearity and larger receptive fields. Lastly, two sub-pixel convolution layers (SPL) proposed in [shi2016real]

are used to up-sample the feature map to produce the target output. The discriminators include feature discriminator and image discriminator. Both discriminators have the same network architecture but receive different inputs. The network architecture is combination of feed forward convolution neural network and fully connected layer as shown in Figure

1. Image discriminator inputs are SR image and HR image whereas feature discriminator inputs are the feature map of SR image and HR image. These feature maps are extracted from corresponding Conv5 layer of VGG19 pre-train network of SR image and HR image. Both of the discriminators try to classify the image into real and fake class. Utilizing the dual discriminator is the crucial factor of the whole framework. Image discriminator distinguishes the image space domain whereas feature discriminator distinguishes image in the feature domain. Both of discriminators make network stronger conventional discriminator result in a better output.

2.2 Loss function

There are three loss terms contributing to the total loss: perceptual similarity loss, image GAN loss, and feature GAN loss. Our losses follow the general GAN loss framework and aim to improve the perceptual quality.

Where g(x) is the generator network, d is a discriminator network, x is the input, and y is the sample from real data distribution.

3 Experimental and Result

3.1 Quantitative result

We evaluated our GAN-based final generator on test set: Set 5. The final generator here was obtained by training the pre-train generator with GAN-based losses. Root Mean Square Error (RMSE) and Perceptual Index (PI) are chosen. These methods measure the perceptual quality of the output image. The higher PI, the better perceptual quality. Contrarily, the lower RMSE, the better reconstruction quality. The detail result is show in Table 1.

Super-SR[dong2014learning] SRFeat [park2018srfeat] Ours
Perceptual Index 1.98 2.25 2.47
RMSE 15.30 14.95 13.95
Table 1: Result of our method compared to Super-SR and SRFeat on Set 5 evaluation set. Perceuptual Index and RMSE are measurement metrics.

From the table, our generator produces better RMSE than other state of the art models, while obtains a com-parable perceptual index (2.47).

3.2 Qualitative result

Our method show impressive qualitative result as show in Figure 3. Our GAN-based method achieves sharper output while mean square-based methods produce blurry result.

(a) Mean Square
(b) GDCA
(c) Ground truth
Figure 3: Qualitative result performing on Set 5 test set. GDCA shows sharper result comparing to Mean Square-based method

4 Conclusion

GDCE addresses SISR by using dual discriminators and incorporate with attention mechanism. The experimental results show that GDCA can generate sharper and high pleasing images compare to other conventional methods.