Defect-GAN: High-Fidelity Defect Synthesis for Automated Defect Inspection

03/28/2021 ∙ by Gongjie Zhang, et al. ∙ Nanyang Technological University 0

Automated defect inspection is critical for effective and efficient maintenance, repair, and operations in advanced manufacturing. On the other hand, automated defect inspection is often constrained by the lack of defect samples, especially when we adopt deep neural networks for this task. This paper presents Defect-GAN, an automated defect synthesis network that generates realistic and diverse defect samples for training accurate and robust defect inspection networks. Defect-GAN learns through defacement and restoration processes, where the defacement generates defects on normal surface images while the restoration removes defects to generate normal images. It employs a novel compositional layer-based architecture for generating realistic defects within various image backgrounds with different textures and appearances. It can also mimic the stochastic variations of defects and offer flexible control over the locations and categories of the generated defects within the image background. Extensive experiments show that Defect-GAN is capable of synthesizing various defects with superior diversity and fidelity. In addition, the synthesized defect samples demonstrate their effectiveness in training better defect inspection networks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 7

page 8

page 12

page 13

page 14

page 15

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: Mimicking the defacement and restoration processes over the easily collected normal samples, Defect-GAN generates large-scale defect samples with superior fidelity and diversity. The generated defect samples demonstrate great effectiveness in training accurate and robust defect inspection network models.

Automated visual defect inspection aims to automatically detect and recognize various image defects, which is highly demanded in different industrial sectors, such as manufacturing and construction. In manufacturing, it is one key component in maintenance, repair, and operations (MRO) that aims to minimize the machinery breakdown and maximize production. It is also important for quality control for spotting anomalies at different stages of the production pipeline. In construction, it is critical to public safety by identifying potential dangers in various infrastructures such as buildings, bridges, etc. Although automated visual defect inspection has been studied for years, it remains a challenging task with a number of open research problems.

One key challenge in automated visual defect inspection lies with the training data, which usually manifests in two different manners. First, collecting a large number of labeled defect samples are often expensive and time-consuming. The situations become much worse due to the poor reusability and transferability of defect samples, i.e., we often have to re-collect and re-label defect samples while dealing with various new defect inspect tasks. Second, collecting defect samples is not just about efforts and costs. In many situations, the defect samples are simply rare, and the amount available is far from what is required, especially when training deep neural network models. The availability of large-scale defect samples has become one bottleneck for effective and efficient design and development of various automated defect inspection systems.

An intuitive way to mitigate the defect insufficiency issue is to synthesize defect samples. Though Generative Adversarial Networks (GANs) have achieved superior image synthesis in recent years, synthesizing defect samples using GANs is still facing several challenges.

First, existing GANs usually require large-scale training data, but large-scale defect samples are not available in many situations. Second, GANs tend to generate simpler structures and patterns by nature [3] and so are not good at synthesizing defects that often have complex and irregular patterns with large stochastic variations. Third, defect samples with different backgrounds are very difficult to collect, and GANs thus tend to generate defect samples with similar backgrounds as the collected reference samples. As a result, the GANs synthesized defect samples often have similar feature representation and distribution as those reference samples and offer little help while facing various new defect samples on various different backgrounds.

Inspired by [37] that collects defect samples by manually damaging the surface of normal work-pieces, we design a Defect-GAN that aims for automated generation of high-fidelity defect samples for training accurate and robust defect inspection networks. Defect-GAN simulates the defacement and restoration processes, which greatly mitigates the defect-insufficiency constraint by leveraging large-scale normal samples that are often readily available. We design novel control mechanisms that enable Defect-GAN to generate different types of defects at different locations of background images flexibly and realistically. We also introduce randomness to the defacement process to capture the stochastic variation of defects, which improves the diversity of the generated defect samples significantly. Additionally, we design a compositional layer-based network architecture that allows for generating defects over different normal samples but with minimal change of normal samples’ background appearance. As a result, the model trained with such generated defect samples is more capable of handling new defect samples with variously different backgrounds. Extensive experiments show that Defect-GAN can generate large-scale defect samples with superior fidelity and diversity as well as effectiveness while applied to train deep defect inspection networks.

The contributions of this work can be summarized in three aspects. First, we design a compositional layer-based network architecture to generate defects from normal samples while preserving the appearance of normal samples, which improves the defect diversity by simulating how defects look like on various normal samples. Second, we propose a Defect-GAN that synthesizes defects by simulating defacement and restoration processes. It offers superior flexibility and control over the category and spatial locations of the generated defects in the image background, achieves great defect diversity by introducing stochastic variations into the generation process, and is capable of generating high-fidelity defects via defacement and restoration of normal samples. Third, extensive experiments show that the Defect-GAN generated defect samples help to train more accurate defect inspection networks effectively.

Figure 2: Generation pipeline of the proposed Defect-GAN: It adopts an encoder-decoder structure to synthesize defects by mimicking defacement and restoration processes. The Spatial & Categorical Control Map

generated from category vectors controls where and what kind of defects to generate within the provided normal sample. The

Adaptive Noise Injection introduces stochastic variations into the generated defects to improve the diversity of the generated defects. In addition, Defect-GAN adopts a Layer-Wise Composition strategy that produces defect and repaint foregrounds according to the corresponding spatial distribution maps. This helps preserve the style and appearance of the normal samples and achieve superior realism in defect synthesis.

2 Related Works

Image Synthesis. GANs [13] are a powerful generative model that simultaneously trains a generator to produce realistic faked images and a discriminator to distinguish between real and faked images. Early attempts [13, 46, 2, 23, 4] focus on synthesizing images unconditionally. Recently, more and more works emerge to perform image synthesis conditioned on input images, which has wide applications including style translation [33, 21, 26, 76, 35, 28], facial expression editing [7, 45, 6, 60, 59]

, super-resolution 

[30, 58, 48]

, image inpainting 

[66, 67, 44, 63], etc. Another trend is multi-modal image synthesis [24, 25, 19, 8, 77]. However, existing methods fail to generalize well on defect synthesis. Our Defect-GAN is designed to generate defect samples by simulating the defacement and restoration processes and incorporating randomness to mimic the stochastic variations within defects. Besides, inspired by [64, 50, 42, 69], it deems defects as a special foreground and adopts a layer-based architecture to compose defects on normal samples, thus reserve the normal samples’ style and appearance and achieving superior synthesis realism and diversity.

Learning From Limited Data.Deep learning based techniques [47, 71, 68] usually require a large amount of annotated training samples, which are not always available. Recent researches have proposed many attempts to mitigate the data-insufficiency issue. They can be broadly categorized as few-shot learning and data augmentation.

Few-shot learning [51, 52, 12, 5, 31, 22, 62, 57, 70, 10, 55, 72] refers to learning from extremely limited training samples (e.g., 1 or 3) for an unseen class. However, their performances are quite limited and thus far from practical application. Besides, few-shot learning techniques usually require large amounts of samples from the same domain, which does not lift the data-insufficiency constraint. Data augmentation aims to enrich the training datasets in terms of quantity and diversity such that better deep learning models can be trained. Several recent research attempts [1, 56, 61, 40] adopt GANs as data augmentation methods to synthesis realistic training samples. The proposed Defect-GAN also works as a data augmentation method to train better defect inspection networks by synthesizing various defect samples with superior diversity and fidelity.

Defect Inspection. Surface defect inspection refers to the process of identifying and localize surface defects based on machine vision, which is an important task with extensive real-life applications in industrial manufacturing, safety inspection, building construction, etc. Before deep learning era, traditional methods [39, 54, 29, 75, 53]

design hand-crafted feature extractors and heuristic pipelines, which needs specialized expertise and not robust. In deep learning era, many works 

[32, 9, 41, 38]

adopt Convolutional Neural Networks (CNNs) based models for defect inspection and achieve remarkable performances.

However, in practical scenarios, limited number of defect samples has always been a bottleneck issue. To mitigate such defect-insufficiency issue, [37] manually destroys work-pieces to collect defect samples; [36, 18] further adopt Computer-Aided Drawing (CAD) to synthesis defect samples. However, such methods can only handle simple cases. The recently proposed SDGAN [40] adopts GANs to perform defect sample synthesis for data augmentation. We also propose to synthesis defect samples with GANs for training better defect inspection networks. By simulating the defacement and restoration processes with a layer-wise composition strategy, our proposed Defect-GAN can generate defect samples with superior realism, diversity, and flexibility. It can further provide better transferability by imposing learnt defect patterns on unseen surfaces.

3 Methodology

In this section, we discuss the proposed method in details. As illustrated in Fig. 1, our proposed method consists of two parts: (1) Defect-GAN design for automated synthesis of defect samples, and (2) defect inspection by using the synthesized defect samples.

3.1 Defect-GAN for Defect Synthesis

We hypothesis that there are sufficient amount of normal samples, and only a limited number of defect samples since defects are usually rare and difficult to capture. Based on this hypothesis of data availability, we propose to perform defect synthesis following the paradigm of unpaired image-to-image translation 

[76, 7], which usually requires less training data and can produce better synthesis fidelity. Our proposed Defect-GAN is based on the intuition that defects do not exist out of thin air, i.e., there is always a defacement process to generate defects over those normal samples, and there also exists a restoration process to restore the defect samples back to normal samples. By mimicking the defacement and restoration processes as mentioned above, we are able to leverage the large number of normal samples to generate required defect samples.

The Defect-GAN architecture consists of a generator and a discriminator . During the training stage, Defect-GAN performs image translation using in two cycles: and , where denotes a normal sample, denotes a defect sample, and denote restored normal and defect sample, respectively. Since the two cycles are identical and simultaneously conducted, we only describe the cycle in the following sections for simplicity.

The generator is illustrated in Fig. 2. It employs an encoder-decoder architecture. The major architecture of mainly follows the commonly used image-to-image translation networks [21, 76, 7]

, which first encodes the input image by a stride of 4, and then decodes it to its original size. To improve synthesis realism and diversity for defect generation, we specifically design spatial and categorical control, stochastic variation and layer-based composition for

. The network architecture of is the same as StarGAN [7], which includes a to distinguish faked samples from real ones using PatchGAN [20] and a to predict the categories of generated defects.

Spatial and Categorical Control for Defect Generation. Different types of defects can exist on different locations of normal samples. To provide better attribute (spatial and categorical) control over the generated defects, we feed an attribute controlling map into to add specific kind of defects to specific location, where represents the presence of defects at the corresponding location, and denotes the number of defect categories. is imposed into the network via SPADE normalization [43] and is fed into every block in the decoder part of .

Note that since we only assume image-level annotations available, during training stage, the attribute controlling map should be constant for all locations of the image, i.e., is acquired by spatial-wisely repeating the target defect label . This restriction can be lifted during inference stage, which enables Defect-GAN to add defects at different location(s) in a context-compatible manner.

Stochastic Variation of Defects. Unlike general objects, defects are known to possess complex and irregular patterns with extremely high stochastic variations that are extremely challenging to model using GANs. To mitigate this issue, we employ an adaptive noise insertion module in each block of the encoder-decoder architecture, which explicitly injects Gaussian noise into the feature maps after each convolutional block. For each noise injection, it learns a exclusive scalar to adjust the intensity of the injected noise. By explicitly mirroring the stochastic variations within defects, Defect-GAN can generate more realistic defect samples with much higher diversity.

Layer-Wise Composition. As illustrated in Fig. 2, Defect-GAN is also different from existing image-to-image translation GANs [21, 76, 7, 40] in that we consider the final generation as composition of two layers. Specifically, in the defacement process, final defects samples are generated by adding a defect foreground layer on top of the provided normal samples. Similarly, in the restoration process, final restored normal samples are generated by adding a repaint foreground layer on top of the defect samples.

The defacement process can be formulated as:

(1)
(2)

where denotes spatial-wise multiplication, denotes generated defect foreground, and denotes the corresponding spatial distribution map of . Similarly, the restoration process can be formulated as:

(3)
(4)

where denotes restored normal sample without defects.

The intuition behind this layer-wise composition strategy is that defects can be deemed a special kind of foreground composed on the background (normal samples). Similarly, the restoration process that removes defects from the background can also be considered a ‘repainting’ process to cover the defect areas. Instead of generating synthesized images directly, Defect-GAN separately generates defect foregrounds along with the corresponding spatial distribution maps, and then performs an layer-wise composition to produce the synthesized defect samples.

The novel compositional layer-based synthesis can significantly improve defect synthesis in terms of both realism and diversity. This is mainly because by taking normal samples as background, our model implicitly focuses on generation of defects, without considering the generation of backgrounds. This feature provides our model with more capability to generate more realistic defect samples. Furthermore, defects can potentially exists on various backgrounds. Due to the rarity of defect samples, we can only collect specific defects on a very limited number of backgrounds. As a result, typical image synthesis methods lack defect transferablity, i.e., they can only synthesize defect samples under a constrained number of contexts. Our proposed layer-wise composition strategy can mitigate this issue. This is because it is able to sufficiently preserve the identities (appearances, styles, etc.) of backgrounds, which forces the model to simulate how defects would interact with the exact provided backgrounds. This significantly improves the defect transferability, which means our model is capable of generating new defect samples within variously different backgrounds.

Training Objective. To generate visually realistic images, we adopt adversarial loss to make the generated defect indistinguishable from real defect sample .

(5)

Our Defect-GAN aims to generate defects conditioned on target defect label . To make the generated defects align with the target category, we impose a category classification loss, which consists of two terms: to optimize

by classifying real defect sample

to the corresponding category , and to optimize to generate defect sample of target category .

(6)
(7)

Additionally, we impose a reconstruction loss that helps preserve the content of input images as much as possible. We adopt L1 loss for the reconstruction loss.

(8)

The layer-wise composition strategy will generate spatial distribution maps in both defacement and restoration process to guide the final compositions. We further improve composition by introducing two additional spatial constraints (beyond spatial distribution maps), namely, a cycle-consistency loss and a region constrain loss.

To precisely restore the generated defect samples to normal samples, the repaint spatial distribution map shall be ideally the same as the defect spatial distribution map. Thus, we design a spatial distribution cycle-consistency loss between the defect spatial distribution map and the repaint spatial distribution map.

(9)

Meanwhile, to avoid the defect foreground and the repaint foreground to take over the whole image area, we introduce a region constrain loss to penalize excessively large defect and foreground distribution maps:

(10)

The overall training objectives for G and D are:

(11)
(12)

where , , , , are hyper-parameters that are empirically set as , , , , , respectively.

3.2 Boosting Defect Inspection Performance

Figure 3: We introduce a source classifier (connected to the network backbone through a Gradient Reversal Layer) for explicitly distinguishing synthesized and real samples. With this the defect inspection network will not learn for such task undesirably.

The large amounts of defect samples generated by the aforementioned Defect-GAN can be further used to train the state-of-the-art visual recognition models for defect inspection. We adopt the most commonly used image recognition models ResNet [15] and DenseNet [17] to perform defect inspection. The generated defect samples are mixed with the original dataset to train the recognition models.

However, we notice that although Defect-GAN can synthesize realistic defect samples, there still exists a domain gap between the generated samples and the original samples. Naively training a recognition model over the augmented data will lead the model to learn to distinguish these two domains undesirably. We attach an additional source classifier to distinguish synthesized samples from real ones explicitly, and connect this domain classifier to the network backbone through a Gradient Reversal Layer (GRL) [11] as illustrated in Fig.3

. Therefore, there will be no distinguishable difference between the features extracted by

for the synthesized samples and the real samples, which ensures all training data are effectively learnt.

4 Experiments

This section presents experimentation of our methods. We first evaluate Defect-GAN’s defect synthesis performance, and then demonstrate its capacity in boosting defect inspection performance as a data augmentation method.

Dataset. We evaluate Defect-GAN on CODEBRIM111Dataset available at https://doi.org/10.5281/zenodo.2620293 [38] – a defect inspection dataset in context of concrete bridges, which features six mutually non-exclusive classes: crack, spallation, efflorescence, exposed bars, corrosion and normal samples. It provides image patches for multi-label classification as well as the full-resolution images from which image patches are cropped. Compared with existing open datasets for defect inspection [49, 65, 34], CODEBRIM is the most challenging and complex one to the best of our knowledge, which can better reflect the practical scenarios.

Methods FID Scores
StackGAN++ [73] 111.1
Conditional StackGAN++ [73] 132.1
StyleGAN v2 [25] 148.2
StyleGAN v2 [25] + DiffAug [74] 142.4
CycleGAN [76] 94.5
StarGAN [7] 295.1
StarGAN [7] + SPADE [43] 103.0
Defect-GAN (Ours) 65.6
Ideal Defect Synthesizer 25.0
Table 1: Quantitative comparison of Defect-GAN with existing image synthesis methods in Fréchet Inception Distance (FID).

4.1 Defect Synthesis

Implementation Details. We use all images from the classification dataset to train Defect-GAN. Besides, we collect extra 50,000 normal image patches by simply cropping from the original full-resolution images. All images are resized to for training. To stabilize the training and generate better images, we replace Eq. 5 with Wasserstein GAN objective with gradient penalty [2, 14] and perform one generator update every five discriminator updates. We use Adam optimizer [27] with = and = to train Defect-GAN with the learning rate starting from and reducing to . We set batch size at 4 and the total training iteration at 500,000. The training takes about one day on a single NVIDIA 2080Ti GPU.

Evaluation Metric. We adopt the commonly used Fréchet Inception Distance (FID) [16] to evaluate the realism of synthesized defect samples. Lower FID scores indicate better synthesis realism.

Quantitative Experimental Results. Table 1 show quantitative experimental results regarding defect synthesis fidelity, in which the first block includes direct synthesis methods (image synthesis from a randomly sampled latent code), and the second block includes image-to-image translation methods. We also present the FID score of a perfect defect synthesizer in the third block by randomly separating the real defect samples into two sets and computing the FID score between them. As Table 1 shows, the direct synthesis methods generally have unsatisfactory performances due to the lack of defect training samples as well as their limited capacities to capture the complex and irregular patterns of defects. As a comparison, by mimicking the defacement and restoration processes following Defect-GAN, existing image-to-image translation methods can generate defects with significantly better quality. This is because, with more information as input, such methods are generally more data-efficient. Besides, they can utilize the large amount of normal samples in training. On the other hand, Defect-GAN achieves significantly better synthesis FID, which demonstrate its superiority in defect synthesis. Interestingly, models with categorical control tend to perform worse in terms of FID scores than models without. We believe introducing additional categorical control can limit model’s synthesis realism. However, even with such constraint, Defect-GAN still achieves the best performance.

Design Choices FID Scores
SCC ANI LWC SC
295.1
103.0
99.7
76.8
69.5
65.6
Table 2: Ablation studies of the proposed Defect-GAN: Our designed Spatial and Categorical Control (SCC), Adaptive Noise injection (ANI), Layer-Wise Composition (LWC), and additional Spatial Constraints (SC) are complementary and jointly beneficial to the quality of the synthesized defects.

We further demonstrate the effectiveness of our proposed designs in Defect-GAN by presenting quantitative ablative experiments in Table 2. Without our designed components, Defect-GAN degrades to StarGAN [7] – a widely used multi-domain image-to-image translation model. However, it fails to converge on this task and cannot synthesize any defect-like patterns. By incorporating Spatial and Categorical Control (SCC), it can converge and generate defect samples with comparable quality with existing methods. Based on this, the Layer-Wise Composition (LWC) can significantly improve the synthesis realism. We believe the reason is twofold: (1) it lifts the defect-insufficiency constraint by allowing the networks to fully focus on defect generation; (2) it can generate contextually more natural defects. Furthermore, Adaptive Noise Injection (ANI) and additional Spatial Constraints (SC) for training can also boost defect synthesis performance. These proposed components are proved to be complementary to each other, enabling Defect-GAN to achieve state-of-the-art defect synthesis quality.

Figure 4: Qualitative comparison of Defect-GAN with state-of-the-art image synthesis methods: Rows 1-2 show direct defect synthesis from random noises by two latest image synthesis methods. Rows 4-5 compare defect generation over Normal Samples in Row 3 (used in network training) by StarGAN with SPADE and our Defect-GAN, while Rows 7-8 compare defect generation over Unseen Normal Samples in Row 6 (not used in network training) by StarGAN with SPADE and our Defect-GAN.
Figure 5: Illustration of categorical control in defect generation by Defect-GAN: For each normal sample in Row 1, Rows 2-3 and 4-5 show the generated defect samples conditioned on a single and multiple target categories, respectively.
Figure 6: Illustration of spatial control in defect generation by Defect-GAN: Each row shows defect samples generated with different normal samples but the same spatial control, while each column shows defect samples generated with the same normal sample but different spatial controls.

Qualitative Experimental Results. Fig. 4 shows qualitative results of Defect-GAN and comparisons with other synthesis methods. Rows 1-2 show the synthesis by state-of-the-art direct synthesis methods: StackGAN++ [73] and StyleGAN v2 [25] with DiffAug [74]. We can see that many generated samples do not contain clear defects, and some samples are not visually natural. This verifies the aforementioned limitation of GANs for defect synthesis. For image-to-image translation methods, we choose StarGAN [7] with SPADE [43] as the competing method since it offers categorical control as Defect-GAN. And other methods like CycleGAN [76] and SDGAN [40] produce visually similar results. As shown in Row 4-5, StarGAN w/ SPADE and Defect-GAN can produce visually realistic and diverse defect samples conditioned on normal samples. Defect samples by StarGAN w/ SPADE look comparable with Defect-GAN, except that it tends to alter the background identity, while Defect-GAN can preserve the appearance and style of normal samples thanks to the layer-wise composition strategy. On the other hand, StarGAN w/ SPADE completely fails to transfer the learnt defect patterns to novel backgrounds that are not seen during training, while Defect-GAN shows superb defect transferability and synthesis realism as shown in Rows 7-8. This property is essential for introducing new information into the training data.

In addition, we show Defect-GAN’s categorical control in defect generation in Fig. 5, where different types of defects can be generated conditioned on the same normal image. Fig. 6 also shows Defect-GAN’s spatial control in defect generation, where red boxes denote the intended places to generate defects. Defect-GAN can generate defects on specific locations while maintaining contexts natural.

4.2 Defect Inspection

Implementation Details. We use the training set and additional 50,000 normal images to train Defect-GAN. Then, Defect-GAN expands the training samples by synthesizing 50,000 defect samples. The generated defect samples are mixed with the original training data to train the defect inspection networks, with the extra restored normal samples also included to avoid data imbalance. All images are resized to . We adopt SGD with a learning rate of to train the network until convergence. Batch size is set to . We use the validation set to select the best performing model and report the performance on the test set.

Quantitative Experimental Results. As CODEBRIM features a multi-label classification task, we can only adopt methods with categorical control to expand the training samples. Results for defect inspection is shown in Table 3. The first row of each block shows defect inspection performance only with original training data, and the rest three rows of each block present defect inspection performance with original training data and the augmented samples generated by different synthesis methods. For fair comparison, 50,000 synthesised defect samples are augmented for all synthesis methods. As the results shows, the synthesized defect samples from Conditional StackGAN++ [73] greatly drop the defect inspection performance. This is because that StackGAN++ is not even able to generate realistic defect samples due to its limited capacity in defect modeling. StackGAN++ generated defect samples are harmful to network training. On the other hand, StarGAN[7]+SPADE[43] generated samples can slightly boost the inspection performance. And our proposed Defect-GAN can further significantly improve the accuracy of trained defect inspection networks. Although both methods can generate defect samples with good visual realism, our proposed Defect-GAN is capable of simulating the learnt defects on backgrounds that are not seen during training. This feature makes Defect-GAN generated samples much more diverse, thus can introducing new information into the training data, significantly improving the performance of trained models. The results also demonstrate the superiority of Defect-GAN for defect synthesis in terms of fidelity, diversity and transferability.

Networks Augmentation Methods Accuracy(%)
ResNet34 [15] None 70.25
Conditional StackGAN++[73] 62.59
StarGAN[7]+SPADE[43] 71.90
Defect-GAN (Ours) 75.48
DenseNet121 [17] None 70.77
Conditional StackGAN++[73] 58.68
StarGAN[7]+SPADE[43] 72.61
Defect-GAN (Ours) 75.79
Table 3: Quantitative experimental results for defect inspection.

5 Conclusion

This paper presents a novel Defect-GAN for defect sample generation by mimicking the defacement and restoration processes. It can capture the stochastic variations within defects and can offer flexible control over the locations and categories of the generated defects. Furthermore, with a novel compositional layer-based architecture, it is able to generate defects while preserving the style and appearance of the provided backgrounds. The proposed Defect-GAN is capable of generating defect samples with superior fidelity and diversity, which can further significantly boost the performances of defect inspection networks.

Acknowledgments: This work was conducted within the Delta-NTU Corporate Lab for Cyber-Physical Systems with funding support from Delta Electronics Inc. and the National Research Foundation (NRF) Singapore under the Corp Lab @ University Scheme (Project No.: DELTA-NTU CORP-SMA-RP15).

References

  • [1] A. Antoniou, A. Storkey, and H. Edwards (2017) Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340. Cited by: §2.
  • [2] M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein generative adversarial networks. In ICML, pp. 214–223. Cited by: §2, §4.1.
  • [3] D. Bau, J. Zhu, J. Wulff, W. Peebles, H. Strobelt, B. Zhou, and A. Torralba (2019) Seeing what a gan cannot generate. In ICCV, pp. 4502–4511. Cited by: §1.
  • [4] A. Brock, J. Donahue, and K. Simonyan (2019) Large scale GAN training for high fidelity natural image synthesis. In ICLR, Cited by: §2.
  • [5] W. Chen, Y. Liu, Z. Kira, Y. Wang, and J. Huang (2019) A closer look at few-shot classification. In ICLR, Cited by: §2.
  • [6] Y. Chen, X. Shen, Z. Lin, X. Lu, I. Pao, J. Jia, et al. (2019) Semantic component decomposition for face attribute manipulation. In CVPR, pp. 9859–9867. Cited by: §2.
  • [7] Y. Choi, M. Choi, M. Kim, J. Ha, S. Kim, and J. Choo (2018) StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In CVPR, pp. 8789–8797. Cited by: §2, §3.1, §3.1, §3.1, §4.1, §4.1, §4.2, Table 1, Table 3.
  • [8] Y. Choi, Y. Uh, J. Yoo, and J. Ha (2020) StarGAN v2: diverse image synthesis for multiple domains. In CVPR, pp. 8188–8197. Cited by: §2.
  • [9] S. Faghih-Roohi, S. Hajizadeh, A. Núñez, R. Babuska, and B. De Schutter (2016) Deep convolutional neural networks for detection of rail surface defects. In International joint conference on neural networks (IJCNN), pp. 2584–2589. Cited by: §2.
  • [10] Q. Fan, W. Zhuo, C. Tang, and Y. Tai (2020) Few-shot object detection with attention-rpn and multi-relation detector. In CVPR, Cited by: §2.
  • [11] Y. Ganin and V. Lempitsky (2015)

    Unsupervised domain adaptation by backpropagation

    .
    In ICML, pp. 1180–1189. Cited by: §3.2.
  • [12] S. Gidaris and N. Komodakis (2018) Dynamic few-shot visual learning without forgetting. In CVPR 2018, pp. 4367–4375. Cited by: §2.
  • [13] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In NeurIPS, pp. 2672–2680. Cited by: §2.
  • [14] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein GANs. In NeurIPS, pp. 5767–5777. Cited by: §4.1.
  • [15] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In CVPR, pp. 770–778. Cited by: §3.2, Table 3.
  • [16] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter (2017) GANs trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, pp. 6626–6637. Cited by: §4.1.
  • [17] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In CVPR, pp. 4700–4708. Cited by: §3.2, Table 3.
  • [18] Q. Huang, Y. Wu, J. Baruch, P. Jiang, and Y. Peng (2009) A template model for defect simulation for evaluating nondestructive testing in x-radiography. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 39 (2), pp. 466–475. Cited by: §2.
  • [19] X. Huang, M. Liu, S. Belongie, and J. Kautz (2018) Multimodal unsupervised image-to-image translation. In ECCV, Cited by: §2.
  • [20] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. In CVPR, pp. 1125–1134. Cited by: §3.1.
  • [21] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. In CVPR, pp. 1125–1134. Cited by: §2, §3.1, §3.1.
  • [22] B. Kang, Z. Liu, X. Wang, F. Yu, J. Feng, and T. Darrell (2019) Few-shot object detection via feature reweighting. In ICCV, pp. 8420–84298420–8429. Cited by: §2.
  • [23] T. Karras, T. Aila, S. Laine, and J. Lehtinen (2018) Progressive growing of gans for improved quality, stability, and variation. In ICLR, Cited by: §2.
  • [24] T. Karras, S. Laine, and T. Aila (2019) A style-based generator architecture for generative adversarial networks. In CVPR, pp. 4401–4410. Cited by: §2.
  • [25] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila (2020) Analyzing and improving the image quality of StyleGAN. In CVPR, pp. 8110–8119. Cited by: §2, §4.1, Table 1.
  • [26] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim (2017) Learning to discover cross-domain relations with generative adversarial networks. arXiv preprint arXiv:1703.05192. Cited by: §2.
  • [27] D. P. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. In ICLR, Cited by: §4.1.
  • [28] A. Koksal and S. Lu (2020) RF-gan: a light and reconfigurable network for unpaired image-to-image translation. In ACCV, Cited by: §2.
  • [29] C. J. Kuo, C. M. Hsu, Z. Liu, and H. Wu (2014) Automatic inspection system of led chip using two-stages back-propagation neural network. Journal of Intelligent Manufacturing 25 (6), pp. 1235–1243. Cited by: §2.
  • [30] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, pp. 4681–4690. Cited by: §2.
  • [31] K. Lee, S. Maji, A. Ravichandran, and S. Soatto (2019) Meta-learning with differentiable convex optimization. In CVPR, Cited by: §2.
  • [32] Y. Li, W. Zhao, and J. Pan (2016) Deformable patterned fabric defect detection with fisher criterion-based deep learning. IEEE Transactions on Automation Science and Engineering 14 (2), pp. 1256–1264. Cited by: §2.
  • [33] M. Liu, T. Breuel, and J. Kautz (2017) Unsupervised image-to-image translation networks. In NeurIPS, pp. 700–708. Cited by: §2.
  • [34] M. Maguire, S. Dorafshan, and R. J. Thomas (2018)

    SDNET2018: a concrete crack image dataset for machine learning applications

    .
    Cited by: §4.
  • [35] Y. A. Mejjati, C. Richardt, J. Tompkin, D. Cosker, and K. I. Kim (2018) Unsupervised attention-guided image-to-image translation. In NeurIPS, pp. 3693–3703. Cited by: §2.
  • [36] D. Mery, D. Hahn, and N. Hitschfeld (2005) Simulation of defects in aluminium castings using cad models of flaws and real x-ray images. Insight-Non-Destructive Testing and Condition Monitoring 47 (10), pp. 618–624. Cited by: §2.
  • [37] D. Mery and D. Filbert (2002) Automated flaw detection in aluminum castings based on the tracking of potential defects in a radioscopic image sequence. IEEE Transactions on Robotics and Automation 18 (6), pp. 890–901. Cited by: §1, §2.
  • [38] M. Mundt, S. Majumder, S. Murali, P. Panetsos, and V. Ramesh (2019) Meta-learning convolutional neural architectures for multi-target concrete defect classification with the COncrete DEfect BRidge IMage dataset. In CVPR, pp. 11196–11205. Cited by: §2, §4.
  • [39] H. Y. Ngan, G. K. Pang, and N. H. Yung (2011) Automated fabric defect detection—a review. Image and vision computing 29 (7), pp. 442–458. Cited by: §2.
  • [40] S. Niu, B. Li, X. Wang, and H. Lin (2020) Defect image sample generation with GAN for improving defect recognition. IEEE Transactions on Automation Science and Engineering. Cited by: §2, §2, §3.1, §4.1.
  • [41] S. Niu, H. Lin, T. Niu, B. Li, and X. Wang (2019) DefectGAN: weakly-supervised defect detection using generative adversarial network. In CASE, pp. 127–132. Cited by: §2.
  • [42] D. P. Papadopoulos, Y. Tamaazousti, F. Ofli, I. Weber, and A. Torralba (2019) How to make a pizza: learning a compositional layer-based gan model. In CVPR, pp. 8002–8011. Cited by: §2.
  • [43] T. Park, M. Liu, T. Wang, and J. Zhu (2019) Semantic image synthesis with spatially-adaptive normalization. In CVPR, pp. 2337–2346. Cited by: §3.1, §4.1, §4.2, Table 1, Table 3.
  • [44] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros (2016) Context encoders: feature learning by inpainting. In CVPR, pp. 2536–2544. Cited by: §2.
  • [45] A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer (2018) GANimation: anatomically-aware facial animation from a single image. In ECCV, pp. 818–833. Cited by: §2.
  • [46] A. Radford, L. Metz, and S. Chintala (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: §2.
  • [47] S. Ren, K. He, R. Girshick, and J. Sun (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In NeurIPS, pp. 91–99. Cited by: §2.
  • [48] M. S. Sajjadi, B. Scholkopf, and M. Hirsch (2017) Enhancenet: single image super-resolution through automated texture synthesis. In ICCV, pp. 4491–4500. Cited by: §2.
  • [49] Y. Shi, L. Cui, Z. Qi, F. Meng, and Z. Chen (2016) Automatic road crack detection using random structured forests. IEEE Transactions on Intelligent Transportation Systems 17 (12), pp. 3434–3445. Cited by: §4.
  • [50] K. K. Singh, U. Ojha, and Y. J. Lee (2019) FineGAN: unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In CVPR, pp. 6490–6499. Cited by: §2.
  • [51] J. Snell, K. Swersky, and R. Zemel (2017) Prototypical networks for few-shot learning. In NeurIPS 2017, pp. 4077–4087. Cited by: §2.
  • [52] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales (2018) Learning to compare: relation network for few-shot learning. In CVPR 2018, pp. 1199–1208. Cited by: §2.
  • [53] A. Tolba and H. M. Raafat (2015) Multiscale image quality measures for defect detection in thin films. The International Journal of Advanced Manufacturing Technology 79 (1-4), pp. 113–122. Cited by: §2.
  • [54] D. Tsai and C. Hsieh (1999) Automated surface inspection for directional textures. Image and Vision computing 18 (1), pp. 49–62. Cited by: §2.
  • [55] X. Wang, T. E. Huang, T. Darrell, J. E. Gonzalez, and F. Yu (2020) Frustratingly simple few-shot object detection. Cited by: §2.
  • [56] Y. Wang, R. Girshick, M. Hebert, and B. Hariharan (2018) Low-shot learning from imaginary data. In CVPR, pp. 7278–7286. Cited by: §2.
  • [57] Y. Wang, D. Ramanan, and M. Hebert (2019) Meta-learning to detect rare objects. ICCV, pp. 9924–9933. Cited by: §2.
  • [58] Z. Wang, M. Ye, F. Yang, X. Bai, and S. Satoh (2018) Cascaded sr-gan for scale-adaptive low resolution person re-identification.. In IJCAI, pp. 3891–3897. Cited by: §2.
  • [59] R. Wu and S. Lu (2020) LEED: label-free expression editing via disentanglement. In ECCV, pp. 781–798. Cited by: §2.
  • [60] R. Wu, G. Zhang, S. Lu, and T. Chen (2020) Cascade EF-GAN: progressive facial expression editing with local focuses. In CVPR, pp. 5021–5030. Cited by: §2.
  • [61] Y. Xian, S. Sharma, B. Schiele, and Z. Akata (2019) f-VAEGAN-D2: a feature generating framework for any-shot learning. In CVPR, pp. 10275–10284. Cited by: §2.
  • [62] X. Yan, Z. Chen, A. Xu, X. Wang, X. Liang, and L. Lin (2019) Meta R-CNN: towards general solver for instance-level low-shot learning. In ICCV, Cited by: §2.
  • [63] Z. Yan, X. Li, M. Li, W. Zuo, and S. Shan (2018)

    Shift-net: image inpainting via deep feature rearrangement

    .
    In ECCV, pp. 1–17. Cited by: §2.
  • [64] J. Yang, A. Kannan, D. Batra, and D. Parikh (2017) LR-GAN: layered recursive generative adversarial networks for image generation. In ICLR, Cited by: §2.
  • [65] L. Yang, B. Li, W. Li, Z. Liu, G. Yang, and J. Xiao (2017) Deep concrete inspection using unmanned aerial vehicle towards CSSC database. In IEEE/RSJ International Conference on Intelligent Robots and Systems, Cited by: §4.
  • [66] R. A. Yeh, C. Chen, T. Yian Lim, A. G. Schwing, M. Hasegawa-Johnson, and M. N. Do (2017) Semantic image inpainting with deep generative models. In CVPR, pp. 5485–5493. Cited by: §2.
  • [67] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang (2018) Generative image inpainting with contextual attention. In CVPR, pp. 5505–5514. Cited by: §2.
  • [68] F. Zhan, S. Lu, and C. Xue (2018) Verisimilar image synthesis for accurate detection and recognition of texts in scenes. Cited by: §2.
  • [69] F. Zhan, H. Zhu, and S. Lu (2019) Spatial fusion GAN for image synthesis. In CVPR, pp. 3653–3662. Cited by: §2.
  • [70] G. Zhang, K. Cui, R. Wu, S. Lu, and Y. Tian (2021) PNPDet: efficient few-shot detection without forgetting via plug-and-play sub-networks. In WACV, Cited by: §2.
  • [71] G. Zhang, S. Lu, and W. Zhang (2019) CAD-Net: a context-aware detection network for objects in remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing 57 (12), pp. 10015–10024. Cited by: §2.
  • [72] G. Zhang, Z. Luo, K. Cui, and S. Lu (2021) Meta-DETR: few-shot object detection via unified image-level meta-learning. ArXiv abs/2103.11731. Cited by: §2.
  • [73] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas (2018) StackGAN++: realistic image synthesis with stacked generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence 41 (8), pp. 1947–1962. Cited by: §4.1, §4.2, Table 1, Table 3.
  • [74] S. Zhao, Z. Liu, J. Lin, J. Zhu, and S. Han (2020) Differentiable augmentation for data-efficient gan training. arXiv preprint arXiv:2006.10738. Cited by: §4.1, Table 1.
  • [75] W. Zhou, M. Fei, H. Zhou, and K. Li (2014) A sparse representation based fast detection method for surface defect detection of bottle caps. Neurocomputing 123, pp. 406–414. Cited by: §2.
  • [76] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, pp. 2223–2232. Cited by: §2, §3.1, §3.1, §3.1, §4.1, Table 1.
  • [77] Z. Zhu, Z. Xu, A. You, and X. Bai (2020) Semantically multi-modal image synthesis. In CVPR, pp. 5467–5476. Cited by: §2.