Spatially-weighted Anomaly Detection with Regression Model

03/23/2019
by   Daiki Kimura, et al.
ibm
The University of Tokyo
0

Visual anomaly detection is common in several applications including medical screening and production quality check. Although a definition of the anomaly is an unknown trend in data, in many cases some hints or samples of the anomaly class can be given in advance. Conventional methods cannot use the available anomaly data, and also do not have a robustness of noise. In this paper, we propose a novel spatially-weighted reconstruction-loss-based anomaly detection with a likelihood value from a regression model trained by all known data. The spatial weights are calculated by a region of interest generated from employing visualization of the regression model. We introduce some ways to combine with various strategies to propose a state-of-the-art method. Comparing with other methods on three different datasets, we empirically verify the proposed method performs better than the others.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

10/05/2018

Spatially-weighted Anomaly Detection

Many types of anomaly detection methods have been proposed recently, and...
07/20/2021

A Comparison of Supervised and Unsupervised Deep Learning Methods for Anomaly Detection in Images

Anomaly detection in images plays a significant role for many applicatio...
05/14/2020

A Weighted Mutual k-Nearest Neighbour for Classification Mining

kNN is a very effective Instance based learning method, and it is easy t...
08/03/2018

Robust Spectral Filtering and Anomaly Detection

We consider a setting, where the output of a linear dynamical system (LD...
04/09/2020

On Anomaly Interpretation via Shapley Values

Anomaly localization is an essential problem as anomaly detection is. Be...
10/06/2020

Flow-based anomaly detection

We propose OneFlow - a flow-based one-class classifier for anomaly (outl...
09/02/2019

Adaptive Anomaly Detection in Chaotic Time Series with a Spatially Aware Echo State Network

This work builds an automated anomaly detection method for chaotic time ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Abstract

Visual anomaly detection is common in several applications including medical screening and production quality check. Although a definition of the anomaly is an unknown trend in data, in many cases some hints or samples of the anomaly class can be given in advance. Conventional methods cannot use the available anomaly data, and also do not have a robustness of noise. In this paper, we propose a novel spatially-weighted reconstruction-loss-based anomaly detection with a likelihood value from a regression model trained by all known data. The spatial weights are calculated by a region of interest generated from employing visualization of the regression model. We introduce some ways to combine with various strategies to propose a state-of-the-art method. Comparing with other methods on three different datasets, we empirically verify the proposed method performs better than the others.

I Introduction

Anomaly detection [1] has widely been used in many fields. For example, it is now common to see the use for medical diagnosis [2, 3]. Depending on the real-world anomaly detection problems, some of the anomaly patterns might already be known. For example, some typical different patterns from healthy older adults of a screening test for dementia called the Yamaguchi Fox-Pigeon Imitation Test (YFPIT) [4, 5] have been reported. The detection should be able to leverage this information to improve the accuracy. Since the manual check of anomaly conditions is mostly done by visual inspection and recent vision researches have led to significant breakthrough [6, 7], using image information is natural for developing an automatic system [8]. In this paper, we focus on visual anomaly detection for problems with some known anomalies.

A straightforward method where normal and anomaly patterns are given is to train a regression function for these classes by a convolutional neural network (CNN). However, this method suffers from the data imbalance problem, and the regression value for unknown patterns is intrinsically unexpected. Hence, a structure for detecting anomaly pattern is required.

Visual anomaly detections [9, 10] mostly use a reconstruction-loss computed by a generative model which is minimized a loss between normal inputs and the reconstructed images. However, noise in the image will be averaged or eliminated by the generative models; thus, these methods will misclassify noise as the anomaly. The “Raw loss” image in Fig. 1 shows this issue; where the “Unexpected” loss is a problem. Furthermore, these methods could not use the known anomaly patterns.

Fig. 1: Overview of the proposed method (SPADER)

A region of interest (ROI) can reduce this effect. However, defining the ROI manually will be tricky, error-prone, and sub-optimal. Recently, Grad-CAM [11] was proposed as a method which computes the ROI from gradients of CNN; it does not require any region information.

We propose a hybrid method of spatially-weighted reconstruction-loss-based anomaly detection and a likelihood value from a regression model trained known data. The weights are computed by Grad-CAM [11] to decrease noise effects, and a combination with the regression is to improve accuracy by known information. Moreover, we introduce various strategies to combine. We named the SPatially-weighted Anomaly DEtection as “SPADE” [12], and a method with SPADE and Regression as “SPADER”. Fig. 1 shows a flow of the SPADER. We verify the method with some methods on three datasets. The major contributions are, (1) we proposed a method for the condition where some anomaly patterns are known, (2) we proposed a spatially-weighted method for noise reduction, (3) we proposed a hybrid strategy to get state-of-the-art results.

Ii Proposed method

As problem statement, there are three classes: normal class , known anomaly class , and unknown anomaly class 

. Given a set of training patterns from

and , method does detection for test patterns of all classes.

Ii-a Training

The proposed method have two training networks: a variational auto-encoder (VAE) [13] for reconstructing normal patterns, and a CNN for normalness regression.

The VAE network is set with the following equation,

(1)

During this minimization, the network optimizes,

(2)

where is the generative parameters, is the variational parameters, and

is a random latent variable. In this paper, we use a normal distribution for the variable space; thus, the generative loss is a mean squared error. In the remaining paper, we call

for both of and .

The CNN network is set with the following equation,

(3)

where is the label value for . When is normal, ; when is anomaly, .

1:Given image , trained , and
2: Reconstructed image
3: Reconstruction-loss image
4:
5: Grad-CAM for input image
6:
7: Grad-CAM for VAE output
8:
9: Spatially-weighted loss
10:
11: Combine loss value and the likelihood
12:Anomaly detection by
Algorithm 1 SPADER

Ii-B Detection

The proposed method has the following three steps to calculate a score for detection: getting the reconstruction-loss image, calculating an ROI image, and combining the spatially-weighted loss and a likelihood value from the regression model. Algorithm 1 explains the details.

The reason for adding an ROI for the VAE output (line 6-8 in Algorithm 1) is, if we use only Grad-CAM of the input and the input is the anomaly, the area that is related to the normal class will not be focused on; however, the loss might appear in this region. Adding enables to focus on areas for the normal class as well.

The reason for using an absolute function (line 5) is, the input is not only the normal class but also the anomaly class. When we use a ReLU function that is same as the original Grad-CAM [11], ROI will only focus on the normal class. However, as the reconstructed image by the VAE appears similar to the normal patterns, the function for VAE output (line 7) must be a ReLU function.

The reason for adding normalization (line 9) is, weighting without will be affected by the region size and strength of . When the ROI is wide or it has high value, will easily become a high value. Therefore, we included the normalization in this equation.

Iii Experiments

In this paper, we prepared the following three datasets that include noise: handwritten digit images [14] with noise, a public hand gesture dataset [15], and images of human gestures which are described in [5]. The first and second datasets are for quantitative evaluation, and the third is for effectiveness assessment.

Iii-a Methods

We compared the following methods in each dataset:

VAE [10]: VAE reconstruction-loss anomaly detection [10]
(5)
Naïve VAE + GradCAM: Naïve weighted-loss by ROI
(6)
SPADE w/o norm.: SPADE without normalization
(7)
SPADE [12]: Spatially-weighted anomaly detection
(8)
CNN-Reg: Anomaly detection by regression model
(9)
VAE + CNN-Reg: VAE-based detection w/ regression
(10)
SPADER: SPADE with regression
(11)

where meanings of the variable are in Algorithm 1, and is the number of trials for VAE reconstruction ().

Iii-B MNIST with noise

Fig. 2: Noisy MNIST (0: normal digit, 1-9: anomaly digit)
Fig. 3: Public hand gesture dataset [15] (top: class descriptions from [16], bottom: image examples)

Since the MNIST [14] does not contain noise, the reconstruction-loss did not have a problem; An. J, el al. have already reported the performance [10]. In this experiment, we added normal distribution noise . The is generated from for each image. This is because such a condition is common in a real case; for example, each image has different noise according to the difference of the person or background in Fig. 3 and Fig. 4. We also changed the image size to , and the digit was put with random size and position. Fig. 3

shows the examples of generated images. Here, ‘0’ is the normal class, odd-numbers are candidates for the known anomaly class, and the others are the unknown anomaly. The encoder and decoder of VAE have four convolutional (conv) layers, and the latent space has 128 units. The CNN for getting regression value and Grad-CAM has three conv layers. Note that we did not change the network among the methods; we only changed the

function.

Noisy MNIST (original dataset [14]) Pigeon (the problem is from [5])
Known anomaly digit Average Known anomaly pose Average
1 3 5 7 9 c d e
VAE [10] .63.01 .63.01 .63.01 .63.01 .63.01 .632±.01 .95.01 .95.01 .95.01 .948±.01
Naïve VAE + GradCAM .67.04 .59.03 .65.01 .76.02 .53.02 .640±.02 .80.02 .71.09 .78.10 .762±.07
SPADE w/o norm. .85.01 .83.02 .88.02 .87.02 .85.02 .857±.02 .96.02 .96.03 .82.09 .911±.04
SPADE [12] .85.04 .87.01 .92.02 .86.04 .90.03 .880±.03 .98.01 .97.01 .96.03 .970±.02
CNN-Reg (Pigeon:[6]) .73.04 .88.02 .96.02 .88.02 .96.01 .881±.02 .86.03 .97.02 .90.03 .908±.03
VAE + CNN-Reg .73.02 .81.03 .87.03 .85.02 .89.03 .831±.03 .97.00 .99.00 .98.01 .980±.00
Ours: SPADER .94.02 .92.01 .97.00 .95.01 .96.01 .947±.01 .99.00 1.00.00 .98.01 .988±.01
TABLE I: AUROC for noisy MNIST and pigeon dataset. Value is an average for 5 trials, and bold is the best for each.

The left side of Table I shows the averages of area under curve (AUC) for ROC among 5 different trials with each condition. For example, the second column shows the result for a condition that 0 is normal, 1 is known anomaly, and the others (2-9) are unknown anomaly classes. The proposed method (SPADER) has the best performance.

Iii-C Hand gesture

We used a public hand gesture dataset because we plan to apply gesture detection for YFPIT [5] in the next section. We used depth images in this dataset, the size of image is , and the images are taken from  people with various backgrounds. Fig. 3 explains class definitions and examples. We set ‘1’-gesture as the normal, and the others are the anomaly class. The encoder and decoder also have four conv layers, and the latent space has 256 units. The CNN also consists three conv layers.

Hand gesture [15] Known anomaly gesture Average
2 3 4 5 6 7 8 9 10
VAE [10] .82.17 .82.17 .82.17 .82.17 .82.17 .82.17 .82.17 .82.17 .82.17 .822.17
Naïve VAE + GradCAM .82.17 .80.20 .77.30 .78.17 .76.23 .80.16 .80.23 .81.16 .77.17 .790.20
SPADE w/o norm. .83.17 .81.16 .87.09 .83.16 .80.22 .83.15 .84.18 .82.17 .82.18 .828.17
SPADE [12] .84.18 .82.16 .86.13 .85.15 .85.16 .85.15 .85.16 .84.18 .85.16 .845.16
CNN-Reg .77.25 .94.03 .92.03 .96.01 .95.02 .87.03 .94.01 .85.20 .97.01 .908.06
VAE + CNN-Reg .87.20 .95.04 .95.04 .97.04 .96.03 .92.05 .95.04 .88.20 .97.02 .934.07
Ours: SPADER .87.20 .95.03 .95.04 .96.04 .96.03 .92.05 .95.04 .88.20 .97.02 .937.07
TABLE II: AUROC for the hand gesture dataset [15]. Value is an average for 5 trials, and bold result is the best for each.

Table II shows the AUC of the ROC curve for the detection. Sometimes the values of the proposed method are less than or similar to the ‘VAE + Reg’ result; however, the SPADER shows the best results in total.

Iii-D Pigeon gesture

Fig. 4: Pigeon dataset (left: class descriptions, the image is from [5], right: samples of captured images for each class and some difficult cases)

We took images for the “pigeon”-pose test of YFPIT [5]. We used a Kinect depth image, and we took from  people. During the shooting, we changed position and angle of the hand, close/open fingers, righty/lefty, sitting position, and stooping angle. In total, we took around 189,000 images for 7 poses. Fig. 4 shows the definition of poses, and the samples of took images and some difficult images. We set the ‘b’-pose as the normal class, ‘c, d, e’ as candidates for the known anomaly class (because Yamaguchi et al. reported these are typical patterns for the subjects), and the others as the unknown anomaly class. The encoder and decoder also have four conv layers, and the latent space also has 256 units. The CNN has the ResNet [6] structure; however, the final layer is different.

The right side of Table I shows the proposed method has the best results. We hope this result helps to implement automatic YFPIT [5]. As future work, there are a comparison for different normal class, and a combination method with the latest visual explanation method [17].

Iv Conclusion

We proposed a novel hybrid method with spatially-weighted anomaly detection and regression model. We conducted experiments on three different datasets, then the proposed method produced the best performance compared to previous methods. We hope this hybrid architecture will contribute in various applications including reinforcement learning works 

[18].

References

  • [1] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM computing surveys, 2009.
  • [2] M. Prastawa et al.

    , “A brain tumor segmentation framework based on outlier detection,”

    MIA, 2018.
  • [3] T. Schlegl et al., “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” Information Processing in Medical Imaging, 2017.
  • [4] H. Yamaguchi et al., “Yamaguchi fox-pigeon imitation test for dementia in clinical practice,” Psychogeriatrics, 2011.
  • [5] H. Yamaguchi, Y. Maki, and T. Yamagami, “Yamaguchi fox-pigeon imitation test: a rapid test for dementia,” Dementia and geriatric cognitive disorders, 2010.
  • [6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognitio,” CVPR, 2016.
  • [7]

    D. Kimura, K. Pichai, A. Kawewong, and O. Hasegawa, “Ultra-fast and online incremental transfer learning,” in

    Symposium on Sensing via Image Information, 2011.
  • [8] A. Taboada-Crispi et al., “Anomaly detection in medical image analysis,” Advanced Techniques in Diagnostic Imaging and Biomedical Applications, 2009.
  • [9]

    M. Sakurada and T. Yairi, “Anomaly detection using autoencoders with nonlinear dimensionality reduction,” in

    ACM MLSDA, 2014.
  • [10]

    J. An and S. Cho, “Variational autoencoder based anomaly detection using reconstruction probability,”

    SNU Data Mining Center, Tech. Rep., 2015.
  • [11] R. R. Selvaraju et al., “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in ICCV, 2017.
  • [12] M. Narita, D. Kimura, and R. Tachibana, “Spatially-weighted anomaly detection,” arXiv:1810.02607, 2018.
  • [13] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” ICLR, 2014.
  • [14] Y. LeCun et al., “MNIST handwritten digit database,” 2010.
  • [15] G. Marin et al., “Hand gesture recognition with leap motion and kinect devices,” in ICIP, 2014.
  • [16] University of Padova, “Hand gesture datasets, http://lttm.dei.unipd.it/downloads/gesture/,” 2014.
  • [17] A. Chattopadhyay et al., “Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks,” WACV, 2018.
  • [18] D. Kimura, “Daqn: Deep auto-encoder and q-network,” arXiv preprint arXiv:1806.00630, 2018.