Imagery-based small defection segmentation (DS) always play important roles in modern assembly line. It is attractive due to its sensory convenience especially under the prosperity of high-resolution camera equipment. Small defections can be either of known types or random objects, considering which derives two fashions of DS. To deal with the former one, vision-based classifier and generator[akcay2018ganomaly, andrews2016transfer, napoletano2018anomaly, schlegl2017unsupervised] emerged but failed in dealing with those unseen random objects. To recognize the latter, defected and defection-free templates are compared[ding2019tdd, gaidhane2018efficient, moganti1995automatic, wang2016framework], which also leaves us with a great many problems to be solved.
Considering one specific circumstance in a printed circuit board (PCB) workshop, where machines are supposed to figuring out random defections in each PCB comparing to its template. Such procedure is challenging since: i) condition changes in each observation leads to the inconvenience in direct comparing, e.g. lighting changes and subtle pose changes between PCBs; ii) pixel-wise error leads to false small defection segmentation, e.g. non-overlapping edges due to assembly error of electronic components in each PCB might be recognized as small defections; iii) PCBs come with different style leads to either the overloaded data and training or the difficulty in the generalization of most of the existing defection segmentation networks; iv) insufficient real-world training data for DS leads to a hard practical implementation. With these difficulties in mind, an ideal approach for PCB defection segmentation should require less expensive real-world training data, and should work with multiple styles of PCBs in a variety of conditions. This task on PCB is representative of many tasks of defection segmentation on random objects, e.g. fingerprint detection in screen manufacturing. The question we ask in this work is: with limited real-world training data and varying PCB styles, how can a network stay functioning.
We answer this question in a sim2real fashion. Instead of training a network to recognize defections in each specific PCB context, we want to endow the robot with the universal ability to distinguish only defections regardless of contextual variance via simulated training. This ability enables the method to generalize to practical applications and eases the burden associated with providing the network with comprehensive data and excessive training, as they are inexhaustible.
To tackle this challenge, we propose a context insensitive small defection segmentation network which is trained on the simulated environment once for all, named SSDS for “Sim2Real Small Defection Segmentation Network”. SSDS is designed to recognize foreign objects considering two given images who are allowed to also have shifts in space, shown in Fig. 1
. The defection-free template is then pose transformed with respect to the pose estimator so that it is pose-aligned to the defected image. To learn to recognize differences between the two images without being misled by the context, a phase correlation based feature extractor is further designed with the backbone of UNet. Utilizing the fact that when two images without share content are fed into deep phase correlation (DPC)[chen2020deep] to calculate the relative pose, the angle output will give a constant false value at leading to a strong self-supervision constraint forcing the UNet to extract the defection only. Finally, a masking layer further reduces noise introduced by “static shifting” such as assembly error of electronic components in PCB inspection.
The key contribution of the paper is the introduction of guiding a network to distinguish differences between a pair of images to achieve defection segmentation rather than forcing it to remember the context information in the training data. Specific contributions can be summarized as:
SSDS is proposed to distinguish small defections between an image pair with minimal training data. We guide the network to specifically learn differences between images by making use of the pose sensitivity of phase correlation so that even when trained with cheap simulated data, it can still be generalized to multiple practical applications.
A new dataset HiDefection [HiDefection] containing both generated and actual defections on a PCB board is released for defection segmentation researches and this dataset supports the experiments of the proposed method, which demonstrates superior performance of the generalization ability.
Ii Related Works
Existing methods for defection detection can be summarized into two fashions: template free and template based. Template free methods usually require one input for detection while template based methods need two: defected image and defection-free template. Each of them has its own share of strength.
Ii-a Template Free Defection Segmentation
Methods based on pretrained feature extractors usually utilize widely adopted feature extractors as well as classifiers. Napoletano et al. [napoletano2018anomaly] utilized ResNet-18 [he2016deep]
trained on ImageNet[krizhevsky2012imagenet] to obtain a clustered feature extractor to recognize defections. Andrews et al. [andrews2016transfer] achieved the segmentation with the backbone of VGG [simonyan2014very] and utilized a One-Class-SVM [chen2001one] to recognize the defection. These methods are limited by the classifier and are gradually replaced by the following generative methods.
GAN based and autoencoder based methods are generative with the assumption that defections should not be generated since they are unseen in the training set. Schlegl et al. [schlegl2017unsupervised] proposed a GAN based manifold extractor that was trained on defection free images. It learned a representation of the manifold of the clear images and expecting it to recognize and localize the defections in a defected image. Ackay et al. [akcay2018ganomaly] proposed “Ganomaly” and utilized an autoencoder as the generating part of the GAN following by a decoder to generate a positive image. By comparing the generated image to the input image pixel-wise, defections are expected to be revealed. Bergmann et al. [bergmann2018improving] generated positive images with an autoencoder and constructed a specific SSIM loss so that the generative error is related to the pixel-wise similarity. Zhai et al. [zhai2016deep] introduced energy to the regularized autoencoder so the higher energy in the generated output represented the defections. Rudolph et al. [rudolph2021same] detected anomalies via the usage of likelihoods provided by a normalizing flow on multi-scale image features. Comparing to these generative methods, the proposed SSDS adopts the generative fashion but differs in the content of the generated output: SSDS generates the defection directly from the defected and defection-free inputs so that even unseen defections can be recognized.
Ii-B Template Based Defection Segmentation
Different from template free approaches, template based methods usually coupe with tiny, vital but untrained defections which can only be recognized by comparing the defected image and the defection-free template. The origin of this is image subtraction [moganti1995automatic] by XOR logical operator, it is easy to apply but with extremely high requirements for pixel-wise alignment of the images. Considering this, Ding et al. [ding2019tdd] with feature matching as an improvement has been proposed since it is more robust with pixel errors. Besides feature matching of two images, Gaidhane et al. [gaidhane2018efficient] utilized similarity measurement with symmetric matrix to localize the defection. Although these methods succeeded in detecting tiny untrained defections, they are heavily influenced by the context which means the generalization capability is unsatisfying.
SSDS in this paper aims at detecting those untrained, multi-scaled defections with template based inputs but in a generative way. The proposed method is able to generate defection maps between defected and defection-free images regardless of the context so that it can even be trained in simple simulated data to generalize to multiple practical applications where defection-free template is available.
Iii Sim2Real Small Defection Segmentation
One typical example of the encountered problem is shown in Fig. 1 where the defected image and the defection free template are the inputs. Since there are multiple mature methods (mechanically as well as algorithmically [chen2020deep, cheng2019qatm, mughal2021assisting]) that are able to align the images in the industry, we will pass over the pose alignment and assume that the two input images are approximately pose aligned. Following the pipeline in Fig. 2
we design a difference learner for small defection segmentation which aims at only recognizing the differences even if they are small. It is contextual insensitive so that it can be generalized to practical applications with simple training in simulation. Then, a masking module is proposed to eliminate the outlier. Since differentiable pose estimation inplays an important role in the difference learner, we will introduce it as a prerequisite in Section III-A.
Iii-a Differentiable Phase Correlation
Given two images and with pose transformations, a variation of the deep phase correlator [chen2020deep] is utilized to estimate the the overall relative pose between and . The rotation and scale is calculated by
where is the rotation and scale part of between ,
is the discrete Fourier Transform,is the log-polar transform, and is the phase correlation solver. Fourier Transformation
transforms images into the Fourier frequency domain of which the magnitude has the property of translational insensitivity, therefore the rotation and scale are decoupled with displacements and are represented in the magnitude. Log-polar transformationtransforms Cartesian coordinates into log-polar coordinates so that such rotation and scale in the magnitude of Fourier domain are remapped into displacement in the new coordinates, making it solvable with the afterward phase correlation solver. Phase correlation solver outputs a heatmap indicating the displacements of the two log-polar images, which eventually stands for the rotation and scale of the two input images and . To make the solver differentiable, we use expectation as the estimation of .
Then is rotated and scaled referring to with the result of :
With the same manner, translations between and is calculated
One unique property of deep phase correlation is to identify whether the two input images are related in content. When and only when the two inputs of the phase correlation in the rotation estimation stage are completely irrelevant, the phase correlation map will be a one-peak distribution centering at exactly [oppenheim1981importance]. With this property in mind, we will move on to the difference learning module, the difference learning.
Iii-B Difference Learning
In this module, we aim at designing a learnable layer that can recognize differences between two input images and without being disrupted by the context, and this in the practical applications could be defections in PCB, fingerprints in screen manufacturing, etc.
where is the concatenation, and are the desire outputs of with respect to and . The content of and is further supervised by the proposed “Irrelevance Loss” and by the outlier masking layer introduced in Section III-C.
The “Irrelevance Loss” forces the content of and to be completely different by utilizing the similarity identifying ability of deep phase correlation introduced in Section III-A:
where is the phase correlation result on rotation and scale with respect to and . By expecting and to have completetly irrelevant content,
is then supervised by the Kullback–Leibler Divergence lossbetween the Gaussian Blurred one-peak distribution centering at :
By now the content of and is expected to be completely different from each other. Resulting from this, there are two possibilities for and : i) they are totally in a mess; ii) one of them contains useful information of differences between and . This leads to the next module “Outlier Masking” which also constrains the content of and to be useful.
Iii-C Outlier Masking
Inspired by the problem in Section III-B, in this module, we define the content of and . We start with focusing on the final defection segmentation result who is derived from the and with a learnable masking layer :
The result should be exactly the same as the ground truth of defection In the coordinate system of , shown as “Defection Estimated” and “Defection GT” in Fig. 1 respectively. Therefore, a hard supervision based on Mean Square Error on and the ground truth is applied:
where is the loss of . The final loss should be:
Since a masking layer cannot generate defections out of nowhere, at least one of and should contain the defection features, However, with introduced (in Section III-B), and must not contain same objects so that the defection features should only be presented in one of them, then we have:
which stands for: there exists exactly one variable in the set that is similar to the output .
To this end, it is clear that one of and is a noisy defection map, but we are not concerned about who exactly it is since they are both intermediate results to be processed by . This explains that the difference learner learns differences between the two input images and that the context is neglected. With the masking layer proposed, the content of and is defined and the segmentation is completed.
Iii-D Training in Simulation
The training skills also play important roles in the methodology since we are designing a once-for-all network needless of further training. One simulated dataset is specifically designed for the SSDS’s training with randomization, shown in Fig. 3 (Simulation Training Set). One defection-free template is firstly generated with a random number of random shapes at random locations. For the defected image, comparing to the defection-free one, the same number of same shapes are generated around the same location with slight random translational disturbance for imitating practical usage, e.g. assembly error of electronic components. Then the generated defected image is rotated and scaled randomly to simulate the overall pose misalignment in practical applications. The simulated data is completely randomized so that the network will not overfit and will focus on the difference learning. Note that the shapes are partially borrowed from [ Unet-pytorch
Iv Experiments: Dataset And Setup
Our method is trained in simulation (Fig. 3: Simulation Training Set) and evaluated on the sim&real dataset HiDefection [HiDefection] for PCB defection segmentation. As one of the contributions of this paper, HiDefection contains two packages HiDefection:SIM and HiDefection:REAL. HiDefection:SIM is constructed with randomly generated shapes as defections on two different types of PCBs as background and HiDefection:REAL with realistic photos of actual objects on PCBs.
HiDefection: The HiDefection is a real-world dataset collected both in a PCB workshop with simulated defections in different types of PCB and actual defected boards together with their corresponding defection-free templates.
HiDefection:SIM: It is recorded in the overhead perspective of a large amount of PCBs and is leveraged in the paper. The random defections in this part of dataset contain random shapes and randomly dropped electronic components. Defections are small with the largest being of the whole image. The total size of this part of dataset is 15000 pairs for training and 3000 pairs for validation. One sample pair is shown in Fig. 3 as “HiDefection:SIM”.
HiDefection:REAL: This sub-dataset is recorded in the same camera condition with HiDefection:SIM above but with real defections and PCBs shot together so that it can represent what defection segmentation actually encounters in real-world applications. When collecting, we also introduce light changes in some pairs of images to make it more realistic so that the result on this can be representative of what one method will achieve in practice. Moreover, the assembly error of electronic components are introduced to every pair of data which makes the small defection segmentation more challenging. Note that the size of the largest defection in this part of data will not exceed of the whole image. This part of dataset is designed for validating the generalizing ability of defection segmentation methods and contains 100 pairs of images. One sample pair is shown in Fig. 3 as “HiDefection:REAL”.
For all dataset above, defected images are pose transformed from the defection-free template with the constraint on translations of both and , rotation changes and scale changes of the two images in the range of pixels, and respectively with images shapes of .
We evaluate the performance of the defection segmentation with Intersection-over-Union(IoU) AP, Recall Rate R and the Harmony Mean(MaxF1) of AP and R. In Section V, AP and MaxF1 are adopted for quantitative demonstration.
Iv-C Comparative Methods
Baselines in the experiment include template free methods GANomaly[akcay2018ganomaly], SSIM [bergmann2018improving], DSEBM[zhai2016deep] and template based method TDD-net [ding2019tdd]. All the comparative methods share the same training condition and are trained in the simulated dataset and on HiDefection. They are evaluated on HiDefection as comparisons to prove the generalization capability of SSDS.
V Experiments: Results
V-a PCB with Simulated Defection
In this experiment, we evaluate SSDS on HiDetection:SIM where “Scene I” has simulated shapes on PCBs and “Scene II” has randomly placed electronic components on PCBs. We train SSDS and the rest of the baselines in simulation and test it on HiDetection:SIM. However, to prove the generalization capability, we additionally train all baselines in HiDetection:SIM except for SSDS. The qualitative result shown in Fig. 4 and the quantitative result in TABLE. I demonstrate the superior performance SSDS achieved in generalizing simulation training to practical inference and it outperforms other baselines in generalizing. Even when trained in simulation with a completely different context, SSDS could still be on par with baselines that are trained on HiDetection:SIM when comparing the performance on inferring defections of PCBs. When inferred in a different context from training, template free and generative baselines (GANomaly, SSIM, DSEBM) are easily confused by the new context and inevitably miss important defections, and template based baseline TDD-net is misled by the assembly error of the components so that its output is blurry.
In the HiDetection:SIM where defections are simulated and are similar to what they are in the simulated training, baselines’ generalizing ability is still worthy of segmentation. However, when defections are from the real world, their performances start to drop, and this leads to Section V-B.
V-B PCB with Real Defection
In this experiment, we evaluate SSDS on HiDefection:REAL where “Scene III” and “Scene IV” are both real-world particles and PCBs with lighting changes. Different from what has been conducted in Section V-A, all methods including SSDS have not been trained on HiDefection:REAL so that the generalization is further studied. The qualitative result is shown in Fig. 5 and the quantitative result is shown in TABLE. II. The result indicates that with defections from the real world involved which have not been seen in any of the training data, both the comparing methods trained in simulation and trained in HiDefection:SIM fail to present successful predictions of the defection. In contrast, SSDS trained in simulation is still a competent approach to recognize defection, even when light changes are involved in one pair of images. This proves the generalizing ability of SSDS and that the method can be directly adopted in practical applications without any further training.
V-C Ablation Study
Several experiments are designed for ablation studies to reassure that each module involved plays its own important role. Firstly, we validate the role of the difference learning layer by deprecating this layer and the corresponding loss in the training stage(denotes as “w/o ”). In such study, “w/o ” is trained both in simulation and on HiDefection and evaluated in HiDefection. The qualitative result (Fig. 6) and the quantitative result (TABLE. III) shows that while “w/o ” trained on HiDefection still gives a satisfying result, it fails in defection segmentation when it is trained in the simulation. This proves that, without the difference learning layer, the method will still work when the training and inferring share the same context, but will fail when they are different. It also proves that the difference learning layer is a vital component for the context insensitive behavior.
We further study the importance of the masking layer by deprecating such layer and calculate the defection MSE loss directly between and (denotes as “w/o Mask”). The qualitative result (Fig. 6) and the quantitative result (TABLE. III) shows that “w/o Mask” trained on both HiDefection and simulation is able to provide acceptable results in but with more noise generated from the assembly error on components. This proves that the masking layer efficiently eliminates the unfavorable noise due to pixel error.
V-D Case Study
We visualize the intermediate results in the whole method to prove our interpretation of the network, shown in Fig. 7. The first case (upper case in Fig. 7) is conducted on a set of PCBs images shot in different lighting conditions, and the second case (lower case in Fig. 7) is where the shapes of silicon grease varies from each other but should not be counted as defection. We trained the network twice in simulation resulting in two independent models. The first model have the “Intermediate 2” generated with defections while the second model have it in “Intermediate 1”. It proves that one of the two outputs of the “Difference Learner” contains specific defections with noise introduced by pixel-wise error of the two images and the “Outlier Mask” is able to filter the noise out.
While it is satisfying to prove that SSDS is able to deal with lighting changes, its potential of handling images with false defections is rewarding. Note that in the second case, when defections are presented in “Intermediate 1”, the differences in silicon grease is presented in “Intermediate 2” and is filtered out after the “Outlier Mask”.
|w/o||w/o Mask||w/o||w/o Mask||SSDS|
We present an approach for small defection segmentation with simple training in simulated environment, namely SSDS. To achieve this, we design one difference learner based on the deep phase correlation so that it can recognize defections regardless of the context, and one masking layer for outlier elimination. In various experiments, the proposed SSDS presents satisfying performances in generalization, i.e. background changes and lighting changes, showing its potential for practical defection segmentation application.