Anomaly detection is an increasingly important area within visual image understanding. Following recent trends in the field, there has been a significant increase in the availability of large datasets. However, in most cases such data resources are highly imbalanced towards examples of normality (non-anomalous), whilst lacking in examples of abnormality (anomalous) and offering only partial coverage of all possibilities can could encompass this latter class. This variation, and somewhat unknown nature, of the anomalous class mean such datasets lack the capacity and diversity to train traditional supervised detection approaches. In many application scenarios, such as the X-ray screening example illustrated in Figure 1, the availability of anomalous cases may be limited and may evolve over time due to external factors. Within such scenarios, unsupervised anomaly detection has become instrumental in modeling such data distributions, whereby the model is trained only on normal (non anomalous) samples to capture the distribution of normality, and then evaluated on both unseen normal and abnormal (anomalous) examples to find their deviation from the distribution.
A significant body of prior work exists within anomaly detection for visual scene understanding [1, 2, 3, 4, 5] with a wide range of application domains [6, 7, 8, 9, 10]. A common hypothesis in such anomaly detection approaches is that abnormal samples differ from normality in not only high-dimensional image space but also with lower-dimensional latent space encoding. Hence, mapping high-dimensional images to lower-dimensional latent space becomes essential. The critical issue here is that capturing the distribution of the normal samples is rather challenging. Recent developments in Generative Adversarial Networks (GAN) , shown to be highly capable of obtaining input data distribution, have led to a renewed interest in the anomaly detection problem. Several contemporary studies demonstrate that the use of GAN has great promise to address this anomaly detection problem since they are inherently adept at mapping high-dimensional to lower dimensional latent encoding and vice-versa with minimal information loss [9, 12, 13].
Schlegl et al. trains a pre-trained GAN backwardly to map from image space to lower-dimensional latent space, hypothesizing that differences in latent space would yield anomalies. Zenati et al. jointly train a two network to capture normal distribution by mapping from image space to latent space, and vice-versa. Akçay et al. trains an encoder-decoder-encoder network with the adversarial scheme to capture the normal distribution within the image and latent space. Sabokrou et al. also trains an adversarial network to capture the normal distribution, hypothesizing that the model would fail to generate abnormal samples, where the difference between the original and generated images would yield the abnormality. This prior work in the field [9, 12, 13, 14], empirically illustrates both the importance and promise of anomaly detection anomalies within dual image and latent space.
Here we propose a new method for anomaly detection via the adversarial training over a skip-connected encoder-decoder (convolutional neural) network architecture. Whilst adversarial training has shown the promise of GAN in this domain , skip-connections within such UNet style (encoder-decoder)  generator networks are known to enable the multi-scale capture of image space detail with sufficient capacity to generate high-quality normal image drawn from the distribution the model has learned. Similar to [9, 12, 13], the proposed approach also seeks to learn the normal distribution in both the image and latent spaces via a GAN generator-discriminator paradigm. The discriminator network not only forces the generator to learn an improved model of the distribution but also works as a feature extractor such that it learns the reconstruction of the normal distribution within a lower-dimensional latent space. Evaluation of the model on various established benchmarks [16, 17] statistically illustrate superior anomaly detection task performance over prior work [9, 12, 13]. Subsequently, the main contributions of this paper are as follow:
unsupervised anomaly detection — a unique unsupervised adversarial training regime, over a skip-connected encoder-decoder convolutional network architecture, yields superior reconstruction within the image and latent vector spaces.
efficacy — an efficient anomaly detection algorithm achieving quantitatively and qualitatively superior performance against prior state-of-the-art approaches.
reproducibility — a simple yet effective algorithmic approach that can be readily reproduced.
Ii Related Work
Anomaly detection is a major area of interest within the field of machine learning with various real-world applications spanning from biomedical to video surveillance. Recently, a considerable literature has grown up in the field, leading to a proliferation of taxonomy papers [1, 2, 3, 4, 5]. Due to the current trends, the review in the paper primarily focuses on reconstruction-based anomaly detection approaches.
One of the most influential accounts of anomaly detection using adversarial training comes from Schlegl et al.. The authors hypothesize that the latent vector of the GAN represents the distribution of the data. However, mapping to the vector space of the GAN is not straightforward. To achieve this, the authors first train a generator and discriminator using only normal images. In the next stage, they utilize the pre-trained generator and discriminator by freezing the weights and remap to the latent vector by optimizing the GAN based on the vector. During inference, the model pinpoints an anomaly by outputting a high anomaly score, reporting significant improvement over the previous work. The main limitation of this work is its computational complexity since the model employs a two-stage approach, and remapping the latent vector is extremely expensive. In a follow-up study, Zenati et al. investigate the use of BiGAN  in an anomaly detection task, examining joint training to map from image space to latent space simultaneously, and vice-versa. Training the model via  yields superior results on the MNIST  dataset. In a similar study in which image and latent vector spaces are optimized for anomaly detection, Akçay et al. propose an adversarial network such that the generator comprises encoder-decoder-encoder sub-networks. The objective of the model is not only the minimize the distance between the real and fake normal images, but also minimize the distance within their latent vector representations jointly. The proposed approach achieves state-of-the-art performance both statistically and computationally.
Taken together, these studies support the notion that the use of reconstruction-based approaches shows promise within the field [10, 9, 12, 13, 14]. Motivated by the previous methods in which latent vectors are optimized [9, 12, 13]
, we propose an anomaly detection approach that utilizes adversarial autoencoders with skip connections. The proposed approach learns representations within both image and latent vector space jointly and achieves numerically superior performance.
Iii Proposed Approach
Before proceeding to explain our proposed approach, it is important to introduce the fundamental concepts.
Iii-A1 Generative Adversarial Networks (GAN)
GAN are unsupervised deep neural architectures that learn to capture any input data distribution by predicting features from an initially hidden representation. Initially proposed in
, the theory behind GAN is based on a competition of two networks within a zero-sum game framework, as initially used in game theory. The task of the first network, called Generator () is to capture the distribution of the input dataset for a given class label, by predicting features (or images) from a hidden representation, which is commonly a random noise vector. Hence the generator network has a decoder network architecture such that it up-samples the input arbitrary latent representation to generate high dimensional features. The task of the second network, called Discriminator (), on the other hand, is to predict the correct class (i.e., real vs. fake) based on the given features (or images). The discriminator network usually adopts encoder network architecture such that for a given high dimensional feature, it predicts its class label. With optimization based on a zero-sum game framework, each network strengthens its prediction capability until they reach an equilibrium.
Due to their inherent potential for capturing data distributions, there is a growing body of literature that recognizes the importance of GAN . Training two networks jointly to reach an equilibrium, however, is not a straightforward procedure, causing training instability issues. Recently, there has been a surge of interest in addressing the instability issues via several empirical methodologies [21, 22]. An innovative and seminal work of Radford and Chintala 
pioneered a new approach to stabilize GAN training by using fully convolutional layers and batch normalization throughout the network. Another well-known attempt to stabilize GAN training is the use of Wasserstein loss in the training objective, which significantly improves the training issues [25, 26].
Iii-A2 Adversarial Auto-Encoders (AAE)
Conceptually similar to GAN, AAE consist of a generator and a discriminator networks. The generator has a bow-tie architectural network style comprising both an encoder and a decoder. The task of the generator is to reconstruct an input data by down-sampling it into a latent representation first, and then by upsampling the latent vector into the reconstructed data (image). The task of the discriminator network is to predict whether the input is a latent vector from the auto-encoder or the prior distribution initialized arbitrarily. Training AAE provides superior reconstruction as well as the capability of controlling the latent space [27, 28, 20].
Iii-A3 Inference within GAN
A strong correlation has been demonstrated between the manipulation of the input noise vector and the output of the generator network [23, 29]. Similar latent space variables have demonstrably produced visually similar high-dimensional images . One approach to finding the optimal latent vectors to create similar images is to inversely map images back to their hidden space via their gradients . Alternatively, with an additional encoder network that down-samples high dimensional images into lower dimensional latent space, vanilla GAN are reported to be capable of learning inverse mapping . Another way to learn inference via inverse mapping is to jointly train two networks such that the former maps images to latent space, while the latter maps this latent space representation back into higher dimensional image space . Based on these previous findings, the primary aim of this paper is to explore inference within GAN by exploiting the latent vector representation in order to find unique a representation for a normal (non anomalous) data distribution such that it can be statistically differentiated from unseen, unknown and varying abnormal (anomalous) data samples.
Iii-B Proposed Approach
Iii-B1 Problem Definition
This work proposes an unsupervised approach for anomaly detection.
We adversarially train our proposed convolutional network architecture in an unsupervised manner such that the conceptual model is trained on normal samples only, and yet tested on both normal and abnormal ones. Mathematically, we define and formulate our problem as the following:
An input dataset is split into train and test sets such that contains normal samples, where denotes normal class. The test set comprises normal and abnormal samples, where for normal and abnormal classes, respectively. In practical setting, .
Based on the dataset defined above, we are to train our model on and evaluate its performance on . The training objective () of the model is to capture the distribution of within not only image space but also hidden latent vector space. Capturing the distribution within both dimensions by minimizing enable the network to learn higher and lower level features that are unique to normal images. We hypothesize that defining an anomaly score based on the training objective would yield minimum anomaly scores for training samples —normal samples, but higher scores for abnormal images. Hence a higher anomaly score for a given sample would indicate whether is any abnormal with respect to the distribution of normal data learned by from during training.
Figure 2 shows a high-level overview of the proposed approach, which comprises a generator () and a discriminator () networks, respectively. The network adopts a bow-tie network using an encoder () and a decoder () networks. The encoder network captures the distribution of the input data by mapping high-dimensional image () into lower-dimensional latent representation () such that , where and . As illustrated in Figure 3, the network reads input
through five blocks containing Convolutional and BatchNorm layers as well as LeakyReLU activation function and outputs the latent representation, which is also known as the bottleneck features that carries a unique representation of the input.
Being symmetrical to , the decoder network up-samples the latent vector back to the input image dimension and reconstructs the output, denoted as . Motivated by , the decoder adopts skip-connection approach such that each down-sampling layer in the encoder network is concatenated to its corresponding up-sampling decoder layer (Figure 3). This use of skip connections provides substantial advantages via direct information transfer between the layers, preserving both local and global (multi-scale) information, and hence yielding better reconstruction.
The second network within the pipeline, shown in Figure 3 (b), called discriminator (
), predicts the class label of the given input. In this context, its task is to classify real images () from the fake ones (), generated by the network . The network architecture of the discriminator follows the same structure as the discriminator of the DCGAN approach presented in . Besides being a classifier, the network is also used as a feature extractor such that latent representations of the input image and the reconstructed image are computed. Extracting the features from the discriminator to perform inference within the latent space is the novel part of the proposed approach compared to the previous approaches [9, 12, 13].
Based on this multi-network architecture, explained above and shown in Figure 3, the next section describes the proposed training objective and inference scheme.
Iii-C Training Objective
As explained in Section III-B1, the idea proposed in this work is to train the model only on normal samples, and test on both normal and abnormal ones. The motivation is that we expect the model to be able to correctly reconstruct the normal samples either in image or latent vector space. The hypothesis is that the network is conversely expected to fail to reconstruct the abnormal samples as it is never trained on such abnormal examples. Hence, for abnormal samples, one would expect a higher loss for the reconstruction of the output image representation or the latent representation . To validate this, we propose to combine three loss values (Adversarial, Contextual, Latent), each of which has its own contribution to make within the overall training objective.
Iii-C1 Adversarial Loss
In order to maximize the reconstruction capability for the normal images during training, we utilize the adversarial loss proposed in . This loss, shown in Equation 1, ensures that the network reconstructs a normal image to as realistically as possible, while the discriminator network classifies the real and the (fake) generated samples. The task here is to minimize this objective for , and maximize for to achieve , where is denoted as
Iii-C2 Contextual Loss
The adversarial loss defined in Section III-C1 impose the model to generate realistic samples, but does not guarantee to learn contextual information regarding the input. To explicitly learn this contextual information to sufficiently capture the input data distribution for the normal samples, we apply normalization to the input and the reconstructed output . This normalization ensures that the model is capable of generating contextually similar images to normal samples. The contextual loss of the training objective is shown below:
Iii-C3 Latent Loss
With the adversarial and contextual losses defined above, the model is able to generate realistic and contextually similar images. In addition to these objectives, we aim to reconstruct latent representations for the input and the generated normal samples as similar as possible. This is to ensure that the network is capable of producing contextually sound latent representations for common examples. As depicted in Figure 3(b), we use the final convolutional layer of the discriminator , and extract the features of and to reconstruct their latent representations such that and . The latent representation loss therefore becomes:
Finally, total training objective becomes a weighted sum of the losses above.
where , and are the weighting parameters adjusting the dominance of the individual losses to the overall objective function.
where is the reconstruction score measuring the contextual similarity between the input and the generated images based on Equation 2. denotes the latent representation score measuring the difference between the input and generated images based on Equation 3. is the weighting parameter controlling the relative importance of the score functions.
Based on Equation 5, we then compute the anomaly scores for each individual test sample in the test set , and denote as anomaly score vector such that . Finally, following the same procedure proposed in , we also apply feature scaling to to scale the anomaly scores within the probabilistic range of . Hence, the updated anomaly score for an individual test sample becomes:
Iv Experimental Setup
This section introduces the datasets, training and implementational details as well as the evaluation criteria used within the experimentation.
To demonstrate the proof of concept of the proposed approach, we validate the model on four different datasets, each of which is explained in the following subsections.
We perform our evaluation using the benchmark CIFAR-10 dataset  and also the UBA and FFOB datasets . Using CIFAR-10 we formulate a leave one class out anomaly detection problem. For the application context of X-ray baggage screening , the UBA and FFOB datasets from  are used to formulate an anomaly detection problem based on the concept of weapon threat items being an anomaly within the security screening process.
Experiments for the CIFAR-10 dataset has the one versus the rest approach. Following this procedure yields ten different anomaly cases for CIFAR-10, each of which has normal training samples, and : normal-abnormal test samples.
Iv-A2 University Baggage Dataset —UBA
This in-house dataset comprises 230,275 dual energy X-ray security image patches extracted via a overlapping sliding window approach. The dataset contains 3 abnormal sub-classes —knife (63,496), gun (45,855) and gun component (13,452). Normal class comprises 107,472 benign X-ray patches, splitted via 80:20 train-test ratio.
Iv-A3 Full Firearm vs Operational Benign —FFOB
As presented in , we also evaluate the performance of the model on the UK government evaluation dataset , comprising both expertly concealed firearm (threat) items and operational benign (non-threat) imagery from commercial X-ray security screening operations (baggage/parcels). Denoted as FFOB, this dataset comprises 4,680 firearm full-weapons as full abnormal and 67,672 operational benign as full normal images, respectively.
Iv-B Training Details
The training objective from Equation 4 is optimized via Adam optimizer with an initial learning rate with a lambda decay, and momentums , . The weighting parameters of is chosen as , and , empirically shown to yield the optimal performance (See Figure 9
). The model is initially set to be trained for 15 epochs; however, in most cases it learns sufficient information within less training cycles. Therefore, we save the parameters of the network when the performance of the model starts to decrease since this reduce is a strong indication of over-fitting. The model is implemented using PyTorch (v0.5.1, Python 3.7.1, CUDA 9.3 and CUDNN 7.1). Experiments are performed using an NVIDIA Titan X GPU.
Table II presents the experimental results for UBA and FFOB datasets. It is apparent from this table that the proposed method significantly outperforms the prior work in each anomaly cases of the datasets. Of significance, the best AUC of the prior work is for the most challenging abnormality case – knife, while the method proposed here achieves AUC of .
is that the proposed model is capable of generating both normal and abnormal reconstructed outputs at test time, meaning that it captures the distribution of both domains. This is probably due to the use of skip connections enabling reconstruction even for the abnormal test samples.
The qualitative results of Figure 7, supporting by the quantitative results of Table II reveal that abnormality detection is successfully made in latent object space of the model that emerges from our adversarial training over the proposed skip-connected architecture.
Figures 5 and 6 show the histogram plot (a) of the normal and abnormal scores for the test data, and the t-SNE plot (b) of the normal and abnormal features extracted from the last convolutional layer () of the discriminator (see Figure 3). Closer inspection of the figures reveals that the model yields promising separation within both the output anomaly (reconstruction) score and the preceding convolutional feature spaces.
Overall, these results indicate that the proposed approach yields superior anomaly detection performance to the previous state-of-the-art approaches.
This paper introduces a novel unsupervised anomaly detection architecture within an adversarial training scheme. The proposed approach examines the role of skip connections within the generator and feature extraction from the discriminator for the manipulation of hidden features. Based on an evaluation across multiple datasets from different domains and complexity, the findings indicate that skip connections provide more stable training, and the inference learning from the discriminator achieves numerically superior results than the previous state-of-the-art methods. The empirical findings in this study provide an insight into the generalization capability of the proposed method to any anomaly detection task. Further research could also be conducted to determine the effectiveness of the proposed approach on both higher resolution images and various other anomaly detection tasks containing temporal information.
M. Markou and S. Singh, “Novelty detection: a review—part 1: statistical approaches,”Signal Processing, vol. 83, no. 12, pp. 2481–2497, dec 2003. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0165168403002020
-  ——, “Novelty detection: a review—part 2:: neural network based approaches,” Signal Processing, vol. 83, no. 12, pp. 2499–2521, dec 2003. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0165168403002032
V. Hodge and J. Austin, “A Survey of Outlier Detection Methodologies,”Artificial Intelligence Review, vol. 22, no. 2, pp. 85–126, oct 2004. [Online]. Available: http://link.springer.com/10.1023/B:AIRE.0000045502.10941.a9
-  V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection,” ACM Computing Surveys, vol. 41, no. 3, pp. 1–58, jul 2009.
-  M. A. Pimentel, D. A. Clifton, L. Clifton, and L. Tarassenko, “A review of novelty detection,” Signal Processing, vol. 99, pp. 215–249, 2014.
-  A. Abdallah, M. A. Maarof, and A. Zainal, “Fraud detection system: A survey,” Journal of Network and Computer Applications, vol. 68, pp. 90–113, jun 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1084804516300571
-  M. Ahmed, A. Naser Mahmood, and J. Hu, “A survey of network anomaly detection techniques,” Journal of Network and Computer Applications, vol. 60, pp. 19–31, jan 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1084804515002891
-  M. Ahmed, A. N. Mahmood, and M. R. Islam, “A survey of anomaly detection techniques in financial domain,” Future Generation Computer Systems, vol. 55, pp. 278–288, feb 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X15000023
-  T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs, “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10265 LNCS, pp. 146–147, 2017.
B. R. Kiran, D. M. Thomas, and R. Parakkal, “An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos,”Journal of Imaging, vol. 4, no. 2, p. 36, 2018.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
-  H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar, “Efficient gan-based anomaly detection,” arXiv preprint arXiv:1802.06222, 2018.
-  S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon, “Ganomaly: Semi-supervised anomaly detection via adversarial training,” arXiv preprint arXiv:1805.06725, 2018.
-  M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli, “Adversarially learned one-class classifier for novelty detection,” in , 2018, pp. 3379–3388.
-  O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
-  A. Krizhevsky, V. Nair, and G. Hinton, “The cifar-10 dataset,” online: http://www. cs. toronto. edu/kriz/cifar. html, 2014.
-  “OSCT Borders X-ray Image Library, UK Home Office Centre for Applied Science and Technology (CAST),” Publication Number: 146/16, 2016.
-  J. Donahue, P. Krähenbühl, and T. Darrell, “Adversarial Feature Learning,” in International Conference on Learning Representations (ICLR), Toulon, France, apr 2017. [Online]. Available: http://arxiv.org/abs/1605.09782
-  Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010. [Online]. Available: http://yann.lecun.com/exdb/mnist/
-  A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, “Generative adversarial networks: An overview,” IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 53–65, 2018.
-  T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” in Advances in Neural Information Processing Systems, 2016, pp. 2234–2242.
-  M. Arjovsky and L. Bottou, “Towards Principled Methods for Training Generative Adversarial Networks,” in 2017 ICLR, April 2017. [Online]. Available: http://arxiv.org/abs/1701.04862
-  A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” in ICLR, 2016.
-  S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 07–09 Jul 2015, pp. 448–456. [Online]. Available: http://proceedings.mlr.press/v37/ioffe15.html
-  M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 06–11 Aug 2017, pp. 214–223. [Online]. Available: http://proceedings.mlr.press/v70/arjovsky17a.html
-  I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Advances in Neural Information Processing Systems, 2017, pp. 5767–5777.
-  M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
-  A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Adversarial autoencoders,” in ICLR, 2016.
-  X. Chen, X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets,” in Advances in Neural Information Processing Systems, 2016, pp. 2172–2180.
-  A. Creswell and A. A. Bharath, “Inverting the generator of a generative adversarial network (ii),” arXiv preprint arXiv:1802.05701, 2018.
-  Z. C. Lipton and S. Tripathi, “Precise recovery of latent vectors from generative adversarial networks,” in ICLR Workshop, 2017.
-  V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, and A. Courville, “Adversarially learned inference,” in ICLR, 2017.
-  S. Akcay, M. E. Kundegorski, C. G. Willcocks, and T. P. Breckon, “Using deep convolutional neural network architectures for object classification and detection within x-ray baggage security imagery,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 9, pp. 2203–2215, Sept 2018.
-  D. Kinga and J. B. Adam, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), vol. 5, 2015.
-  A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” 2017.
-  C. X. Ling, J. Huang, H. Zhang et al., “Auc: a statistically consistent and more discriminating measure than accuracy,” in IJCAI, vol. 3, 2003, pp. 519–524.