Outliers refer to observations that have significantly different characteristics from the majority of other data. These observations are so unique as to arouse suspicions that they were generated by illegal acts or undetected errors. To reveal the critical and interesting information in them, many outlier detection technologies have been studied and applied in various applications. Such as the fraud detection in credit card transaction [1, 2, 3], fake rating and review detection in e-commerce service platform [4, 5], intrusion detection in network service request [6, 7], and abnormal moving object detection in traffic monitoring .
In general, according to the availability of data labels, existing methods can be divided into three categories: unsupervised, supervised, and semi-supervised outlier detection. Unsupervised algorithms are among the most widely studied because they do not require additional labels or prior information. Including statistical-based [9, 10], cluster-based , regression-based , proximity-based [13, 14, 15], reconstruction-based [16, 17, 18], and other approaches. They assume explicitly or implicitly that outliers are not as concentrated as normal data . Thus, discrete anomalies can be detected effectively. However, in many cases, multiple anomalies (e.g., DoS attack) may be generated by the same mechanism. They become increasingly concentrated such that unsupervised outlier detection incorrectly detects these group anomalies as normal data. Moreover, the selection of models and parameters is a considerable challenge for unsupervised methods without the help of prior knowledge. As for supervised algorithms, higher detection rates and optimal parameters can usually be obtained because the labels are complete and correct during training . However, obtaining sufficient anomalies and correct labels is a time-consuming task. In addition, detection models trained on fully labeled data have considerable uncertainty when dealing with emerging anomalies.
To address these issues, semi-supervised outlier detection with few identified anomalies and abundant unlabeled data was proposed . Despite insufficient capacity to label all normal examples or outliers, few abnormal behaviors that have triggered an alarm can be collected easily in many applications 
. Examples include DoS attacks that have caused a system crash and insurance applications that have been proven to be fraudulent. In addition to their own labels, these identified anomalies can also provide a priori information for other samples that have the same generation mechanism. If this information is utilized fully, semi-supervised model can not only identify discrete anomalies, but also detect partially identified group anomalies. Moreover, few anomalies can also provide valuable guidance for the selection of models and parameters, which has significant advantage over unsupervised outlier detection. Thus, this paper will focus on this special anomaly detection setting, in the hope of using limited tags to achieve high detection accuracy.
, first extracts reliable normal examples through a heuristic method, which is completely consistent with the first step of PU-learning. Then a modified outlier detection model is trained on the new tagged dataset to identify the other anomalies. But since the outliers are usually discrete or belong to different clusters, the extracted samples that are significantly different from identified anomalies are not necessarily normal. As a result, the potential information in identified anomalies is not used effectively and erroneous information may also be introduced into the new tagged dataset. Therefore, to augment the use of known information and reduce the introduction of error messages, several soft versions of the above strategy were established. For example, LBS-SVDD assigns abnormal likelihood values to each sample based on the proportion of anomalies in its neighbors, while ADOA  attaches a weight to each instance according to its own isolation and its similarity to identified anomalies. However, the calculation of neighbors and similarity usually has a high computational cost and is likely to be affected by irrelevant variables.
In this paper, we propose a one-step method for semi-supervised outlier detection with few identified anomalies, which can directly utilize the potential information in identified anomalies without calculating the abnormal degree of each instance. Specifically, the Dual Generative Adversarial Networks (Dual-GAN) contains two Multiple-Objective Generative Adversarial Networks  (i.e., UMO-GAN and AMO-GAN) and an overall discriminator. The Unlabeled MO-GAN (UMO-GAN) is used to learn the generation mechanism of unlabeled data and gradually generates informative potential outliers to provide a reasonable reference distribution for unlabeled data. By contrast, the Abnormal MO-GAN (AMO-GAN) is used to learn the deep representation of identified anomalies and generates numerous potential anomalies with the same generation mechanism as known anomalies to enhance the minority class. Thus, in order to distinguish these identified and synthesized anomalies from the unlabeled data, the overall discriminator will not only describe a division boundary that encloses the concentrated data, but will also separate partially identified group anomalies from the concentrated data. In addition, considering that instances with similar output values are not necessarily close to one another in the sample space, we replace the MO-GAN with Multiple Generative Adversarial Networks (M-GAN). More specifically, the modified model RCC-Dual-GAN first divides the identified anomalies and unlabeled data into different subsets through a Robust Continuous Clustering (RCC) . Then, multiple GANs are utilized to learn their generation mechanisms directly. Compared with the original model Dual-GAN, RCC-Dual-GAN can create the reference distribution and augment the minority class more robustly in various situations. The main contributions of this work are summarized as follows:
We propose a semi-supervised outlier detection method Dual-GAN, which consists of two MO-GAN and an overall discriminator. The method utilizes the potential information in identified anomalies directly to detect discrete anomalies and partially identified group anomalies simultaneously.
Considering that instances with similar output values may not all be similar in a complex data structure, we change the original model Dual-GAN to RCC-Dual-GAN by replacing MO-GAN with the combination of RCC and M-GAN. Compared with Dual-GAN, the modified model can create the reference distribution and augment the minority class more robustly.
Considering the difficulty in finding the Nash equilibrium and optimal model during iteration, two evaluation indicators are created and introduced into the two models to make the detection process more intelligent.
We conduct extensive experiments on both benchmark datasets and two practical tasks to investigate the performance of our proposed approaches. The results show that even with only a few identified anomalies, our proposed approaches can significantly improve the accuracy of outlier detection.
The rest of this paper is organized as follows. Section 2 provides a brief review of related works. Section 3.1 introduces the detection principle and model details of MO-GAN, and the proposed models are described in Section 3.2. We report extensive experiment results in Section 4 and the whole paper is concluded in Section 5.
2 Related Work
Numerous overviews on outlier detection algorithms for different kinds of data and applications are available in the literature [21, 27]. Here, we briefly discuss the common outlier detection methods (i.e., unsupervised and supervised approaches) and then focus on the semi-supervised outlier detection with limited labels, which is most relevant to our research. Finally, the GAN-based outlier detection algorithms are reviewed in Section 2.3.
2.1 Common Outlier Detection Methods
Unsupervised outlier detection methods have been studied widely because they require no additional label. Specific algorithms include proximity- [13, 14, 15], statistical- [9, 10], cluster- , regression- , and reconstruction-based models [16, 17, 18]. Proximity-based models assume the outliers are points far away from other data and can be performed by measuring the distance or density of the point. By contrast, the remaining models assume that outliers are observations that have large deviations from the normal profiles and can be performed by creating a model for the majority of samples. However, all these algorithms based on the assumption that outliers are not as concentrated as the normal data, such that the group anomalies with higher density levels cannot be detected correctly. And most of them must be provided with model assumptions or parameters in advance, which is a huge challenge for unsupervised methods without the help of prior knowledge.
Supervised outlier detection can be considered as a special classification problem and many classification algorithms have been applied. However, in most practical applications, outliers are far less common than normal data, so that the direct use of off-the-shelf classifiers may produce biased results. Hence, cost-sensitive learning and adaptive re-sampling [1, 29, 30], are later incorporated into the classification process. The cost-sensitive learning increases the misclassification costs of outliers by weighting the classification errors, whereas the adaptive re-sampling increases the relative proportion of the minority class by under- or over-sampling. Supervised algorithms usually achieve good parameters and high detection rates because the labels are complete during training. But the question is how to obtain sufficient anomalies and correct labels, which is a time-consuming and expensive task. Moreover, the detection model trained on the fully labeled data has significant uncertainty in dealing with emerging anomalies.
2.2 Semi-Supervised Outlier Detection Methods
According to the available labels, semi-supervised outlier detection can be divided into three categories: one-class learning with only normal examples, semi-supervised outlier detection with small amount of labeled data, and semi-supervised outlier detection with few identified anomalies. The one-class learning is only slightly different from unsupervised outlier detection, and most of the unsupervised approaches (e.g., OC-SVM  and SVDD ) can be used in this case . The outlier detection model established on the one-class dataset tends to be more robust because of the absence of additional interference from anomalies. However, considerable time must be spent in verifying the collected samples to ensure that the training data contain only normal data. The semi-supervised outlier detection with small amount of labeled data usually optimizes an outlier detection model (e.g., -means  and fuzzy rough -means ) with the assurance that the labels of the labeled data are almost unchanged. Compared with unsupervised models, their performance is improved through a small amount of labeled data. However, the potential information in the labeled examples is not used effectively. And the normal examples in the labeled data may still require additional confirmation because of undetected anomalies.
Compared with them, the case of semi-supervised outlier detection with few identified anomalies is much simpler because few abnormal behaviors can be easily collected in many applications. The initial model [22, 23] first extracts reliable normal examples through a heuristic method, and then trains a semi-supervised outlier detection model described above on the new labeled dataset. But since the outliers are usually discrete or belong to different clusters, the extracted samples that are far from identified anomalies are not necessarily normal. As a result, the potential information in identified anomalies is not utilized fully and erroneous information may also be introduced into the new dataset. To address this, several soft versions of above are then established. For example, before training the detection model, LBS-SVDD 
first evaluates the abnormal probability of each sample based on the proportion of anomalies in its neighbors, whereas ADOA
assigns likelihood values to each sample according to its own isolation and its similarity to identified anomalies. They enhance the use of known information and simultaneously reduce the introduction of error messages. However, the calculation of probability may require high computational costs on large datasets and tends to be affected by the “curse of dimensionality” on high-dimensional datasets. Therefore, we propose a GAN-based model for semi-supervised outlier detection with few identified anomalies, which can utilize the potential information in identified anomalies directly.
2.3 GAN-Based Outlier Detection Methods
GAN  is an adversarial representation learning model that has achieved state-of-the-art performance in various applications. For unsupervised outlier detection and semi-supervised outlier detection with only normal examples, GAN-based reconstruction model and generation model have been studied. GAN-based reconstruction models usually learn the generation mechanism of normal data by training a regular GAN 
or a combination of GAN and autoencoder[36, 37, 18], and then measure the abnormal degree of example based on the reconstruction loss or discriminator loss. Moreover, in order to prevent slight anomalies from being reconstructed, Bian et al.  also perform active negative training to limit network generative capability. GAN-based generation models usually use the GAN to generate informative potential outliers [25, 39] or infrequent normal samples , such that subsequent detectors can describe a correct boundary. For supervised outlier detection, GAN [1, 30] is often used to synthesize minority class examples to balance the relative proportion between the two classes. Besides, Zheng et al. 
also take advantage of an adversarial deep denoising autoencoder to better extract latent representation of labeled transactions, which can greatly improve the accuracy of fraud detection. However, so far, there are few GAN-based studies focusing on semi-supervised outlier detection with few identified anomalies. Although Kimuraet al.  utilize both noisy normal images and given abnormal images for visual inspection, its main purpose is to eliminate the impact of abnormal pixels in some normal images during the reconstruction process, which differs considerably from our model.
In this section, we first introduce the detection principle of the Artificially Generating Potential Outliers (AGPO)-based unsupervised outlier detection method MO-GAN , which are necessary to comprehend our proposed methods. Then two semi-supervised outlier detection models (i.e., Dual-GAN and RCC-Dual-GAN) are proposed to effectively improve the detection rate of outliers.
3.1 Background on MO-GAN
Unsupervised outlier detection can be regarded as a density level detection process due to its default assumption. Unlike existing model- or proximity-based outlier detection, AGPO-based algorithms approach density level detection as a classification problem. First, numerous data points are randomly sampled as the potential outliers (shown with gray dots in Fig. 1) to construct a reference distribution . Then a classifier is trained on the new dataset to separate potential outliers from the original data (shown with blue dots and stars in Fig. 1
). In order to minimize the loss function,
the classifier should assign a higher value to the original data having a higher relative density , and a lower value to the opposite case. Thus, when faced with the uniform reference distribution , the classifier can describe a division boundary that encloses the concentrated normal samples (as shown in Fig. 1(a)).
Illustration of the detection performance of AGPO and MO-GAN. Normal points, outliers, and potential outliers are shown with blue dots, blue stars, and gray dots, respectively. High-dimensional data are presented as cross-sectional data, and data points closer to the green area are more likely to be outliers.
However, when the dimension increases, a limited number of potential outliers () cannot provide sufficient information for the classifier to describe a correct boundary (as shown in Fig. 1(b)). Therefore, MO-GAN was proposed to generate informative potential outliers directly to construct a reasonable reference distribution, which can ensure the relative density level of the normal case is greater than that of the outlier.
MO-GAN (shown in Fig. 2) consists of sub-generators and a discriminator . Its central idea is to let the specific sub-generator actively learn the generation mechanism of the data in the specific subset , and gradually generate potential outliers that occur inside or close to the data . Thus, the integration of different numbers of potential outliers can provide a reasonable reference distribution for the whole dataset. More specifically, due to samples with similar outputs are more likely to be similar, MO-GAN first divides the original dataset equally into subsets based on their similar outputs. Then, a dynamic game is executed between the sub-generators and discriminator . Each sub-generator attempts to learn the generation mechanism of by making the generated samples output similar values to , whereas discriminator attempts to identify the generated outliers from the original data , such as the classifier in AGPO. Eventually, the MO-GAN reaches a Nash equilibrium through several iterations. Integrated different numbers of informative potential outliers can construct a reasonable reference distribution , and discriminator can describe a correct boundary to enclose concentrated original data (as shown in Fig. 1(c) and Fig. 2).
3.2 Outlier Detection with Few Identified Anomalies
The largest problem with unsupervised outlier detection (including MO-GAN) is that it cannot detect group anomalies in the absence of additional information. All labels and sufficient anomalies are difficult to obtain, but few common anomalous behaviors (e.g., DoS and DDoS attacks) that have triggered alarms can be collected easily in many applications. These identified anomalies not only contain their own labels, but also potentially provide a priori information for other samples that with the same generation mechanism as identified anomalies. If these information is utilized fully, partially identified group anomalies will be detected accurately along with the discrete anomalies. Therefore, this section proposes two semi-supervised outlier detection approaches, namely, Dual-GAN and RCC-Dual-GAN, which can improve the detection accuracy by directly utilizing the potential information in identified anomalies.
Assume a dataset with identified anomalies and unlabeled samples , where represents a data point, represents its label, and . Our goal is to identify a scoring function that can assign a higher value (close to 1) to normal data and a lower value (close to 0) to the outlier. Because of a few identified anomalies, this scoring function should satisfy two conditions: (i) Based on the default assumption that outliers are not concentrated, the scoring function should output higher values to samples with higher density levels and output lower values to discrete data. (ii) Assuming that samples with the same generation mechanisms as identified anomalies are more likely to be outliers, the scoring function should output a value close to 0 for them and the identified anomalies. Thus, we first propose the Dual-GAN (shown in Fig. 3), which consists of two MO-GAN (i.e., UMO-GAN and AMO-GAN) and an overall discriminator .
The UMO-GAN attempts to generate samples that occur inside or around the target data to construct a reasonable reference distribution for unlabeled data. It takes unlabeled samples as input and includes sub-generators and a discriminator . The specific sub-generator learns the generation mechanism of the data by making the generated samples output similar values to , whereas the discriminator guides the learning of sub-generator by identifying the generated samples from unlabeled data . The optimization framework of UMO-GAN is formulated as follows:
where is a representative statistic of (e.g., the minimum value). With the iteration between and , the sub-generator gradually generates informative potential outliers. And ultimately, when the dynamic game reaches the Nash equilibrium, the integration of different numbers of potential outliers (shown with gray dots in Fig. 3) provides a reasonable reference distribution for the unlabeled dataset .
The AMO-GAN is used to generate samples similar to the identified anomalies to prevent the overall discriminator from overfitting or forgetting when dealing with the minority class . It takes identified anomalies as input and includes sub-generators and a discriminator . Specific sub-generator learns the generation mechanism of the data , and discriminator identifies the generated samples from identified anomalies . The optimization framework of AMO-GAN is formulated as follows:
where is a representative statistic of . Unlike UMO-GAN, it will continue training after it reaches the Nash equilibrium because the purpose of the AMO-GAN is to generate data points as similar as possible to the identified anomalies. Finally, the integration of numerous of potential outliers (shown with gray stars in Fig. 3) can augment the minority class to ensure that partially identified group anomalies are detected as anomalies.
The overall discriminator , which takes all original data and generated potential outliers as input, attempts to describe an accurate division boundary by identifying all potential outliers (i.e., and ) and identified anomalies from the unlabeled data . The optimization function of is formulated as follows:
where and represent the number of potential outliers generated by and , respectively. More potential outliers must be generated for the less concentrated subset to create a reasonable reference distribution . At the beginning of the iteration, randomly generated potential outliers may not provide sufficient information for . However, when the two MO-GAN models reach the Nash equilibrium, the integration of different numbers of potential outliers can provide a reasonable reference distribution for the unlabeled data , whereas the integration of numerous of potential outliers can augment the minority class. Thus, in order to minimize the optimization function , the overall discriminator will not only assign a higher value (close to 1) to concentrated unlabeled data, but also assign a lower value (close to 0) to discrete anomalies and partially identified group anomalies (shown in Fig. 3(a)), which is the scoring function we are looking for. Compared with unsupervised detection using only UMO-GAN (shown in Fig. 3(b)), Dual-GAN can also detect group anomalies with the help of few identified anomalies. Compared with supervised detection using only AMO-GAN (shown in Fig. 3(c)), Dual-GAN can also detect previously unknown discrete anomalies.
In addition, two issues that have a substantial effect on the results, namely, the evaluation of Nash equilibrium and the selection of optimal model, must be discussed to ensure a more intelligent and reliable detection.
Nash Equilibrium in GAN means that the distribution of the real data has been learned by the generator, and the discriminator cannot recognize the difference between the two distributions. The original GAN uses the classification error to evaluate the similarity between the generated data and the real data, that is, the Nash equilibrium is reached when the error is close to . However, the absolute Nash equilibrium cannot be guaranteed when the objective function is non-convex. The previously proposed MO-GAN utilizes the trend of the generator loss to evaluate their similarity, that is, the Nash equilibrium is reached when the downward trend of generator loss tends to be slow. However, accurate assessment of the trend requires human intervention due to the fluctuation of the loss. Therefore, we propose an evaluation indicator, Nearest Neighbor Ratio (), to directly measure the similarity between the two distributions. First, samples are selected randomly from one subset, and the ratio of data belonging to another subset among the nearest neighbors of each sample is calculated. If is greater than a certain threshold , the sample can be thought of as having a similar generation mechanism to the data in another subset. Then calculate the ratio of the samples that have a similar generation mechanism to in the randomly selected samples. If the is greater than , the two subsets are considered to be generated from similar distributions and the dynamic game reaches the Nash equilibrium.
Optimal model refers to the model that can most effectively identify outliers from the whole dataset during iteration. Given no additional information, the evaluation of detection performance and the selection of the optimal model are difficult for unsupervised outlier detection. Fortunately, the data used to train the semi-supervised outlier detection model usually contain few identified anomalies, which can provide valuable guidance for the selection of the final model. In this paper, we use the Average Position () of known anomalies in the ascending order of all real data output results to measure the performance of the overall discriminator . A lower means that the model assigns lower values to identified anomalies than to others, and the model corresponding to the lowest is used as the final model for subsequent detection.
In general, Dual-GAN can achieve good detection performance. However, as the cluster structure of the data becomes more complex, instances with similar output values may not all be similar to one another in the sample space, that is, the data points divided according to their similar outputs are not necessarily close to each other, and the generated data whose outputs are similar to that of target data are not necessarily similar to the target data. Therefore, we then propose a modified model RCC-Dual-GAN based on Dual-GAN to create the reference distribution and augment the minority class more robustly. The network structure and detection process of RCC-Dual-GAN are illustrated in Fig. 4, where the unlabeled data and identified anomalies are first divided into different subsets by RCC.
RCC  is a non-parametric clustering that can achieve high clustering accuracy across multiple domains without knowing the number of clusters. Given the unlabeled data as an example, RCC first constructs a connectivity structure based on mutual -nearest neighbor connectivity. And then, a set of representatives of the unlabeled data is optimized to reveal the cluster structure latent in . The representative should be as similar as possible to the corresponding unlabeled data , and the representatives of interconnected data should be as similar as possible. The optimization objective is formulated as follows:
where is used to balance the strength of different objective terms, is used to balance the contribution of each point to the pairwise terms, and is a penalty on the regularization terms. Finally, based on the optimized , RCC constructs a graph in which a pair and is connected if , such that different unlabeled subsets are output. Compared with the subsets divided by similar outputs, the subsets partitioned by RCC can accurately reflect the cluster structure latent in the data even in the case of complex data structures.
After the unlabeled data and identified anomalies are divided into and subsets, respectively, RCC-Dual-GAN replaces MO-GAN with M-GAN to create the reference distribution and augment the minority class in more detail. The UM-GAN includes sub-generators and sub-discriminators . Each specific sub-GAN can directly learn the generation mechanism of the data through the dynamic game between and ,
where represents the number of samples in the th unlabeled subset. The AM-GAN includes sub-generators and sub-discriminators . Each specific sub-GAN directly learns the deep representation of data through the dynamic game between and ,
where represents the number of samples in . The overall discriminator still attempts to identify all potential outliers and identified anomalies from the unlabeled data,
where and represent the number of potential outliers generated by and , respectively. The UM-GAN will generate the same number of potential outliers for different unlabeled subsets, which is different from the UMO-GAN. Because each unlabeled data subset partitioned by RCC contains a different number of samples, and the concentrated data are usually divided into large subsets.
At the beginning of the iteration, the two M-GANs randomly generate potential outliers in the sample space, whereas the overall discriminator describes a rough boundary to separate them from unlabeled data. However, when all sub-GANs reach the Nash equilibrium, the integration of the same number of potential outliers (shown with gray dots in Fig. 4) generated by can provide a reasonable reference distribution for the unlabeled data, and the integration of numerous potential outliers (shown with gray stars in Fig. 4) generated by can augment the minority class. Consequently, the overall discriminator will not only describe a division boundary that encloses the concentrated data but also separate the partially identified group anomalies from the concentrated data (shown with the red lines in Fig. 4(a)). Compared with the potential outliers generated by outputting similar values, the potential outliers generated by directly learning can more effectively assist the overall discriminator in describing a correct boundary even in the case of complex data structures.
4 Experiments and Applications
Extensive experiments are conducted on synthetic data and real-world data to investigate the importance of the effective use of identified anomalies. In addition, we apply the proposed models to two practical tasks (i.e., credit card fraud detection and network intrusion detection) to study the performance of different algorithms in complex situations.
4.1.1 Baselines and Parameter Settings
We compare the proposed models (i.e., Dual-GAN and RCC-Dual-GAN) with several representative outlier detection algorithms. (i) Three of the most common unsupervised approaches (NN, LOF, -means) are first selected because their effectiveness and robustness have been proven in multiple performance evaluations. (ii) The basic model MO-GAN, which utilizes the explicit information and guidance information in identified anomalies, is performed to investigate the significance of the data augmentation in Dual-GAN. (iii) The supervised Sup-GAN , which uses GAN to increase the relative proportion of the minority class, is used to explore the importance of the unsupervised module in Dual-GAN. (iv) The extended supervised Sup-RCC-GAN, where the single GAN in Sup-GAN is replaced by our proposed combination of RCC and M-GAN, is compared to further demonstrate the performance advantages of multiple GAN. (v) The semi-supervised ADOA , which attaches a weight to each instance, is used to evaluate the performance of our proposed semi-supervised models.
For non-GAN-based models, we attempt to find the optimal parameters in a range of values. For example, the parameters in NN and LOF are searched from 2 to , the in -means is selected from 1 to , and the in ADOA is adjusted from 0.1 to 0.9. For all GAN-based models, we adopt a unified network structure: (i) five sub-generators against one discriminator for MO-GAN, UMO-GAN, and AMO-GAN; (ii) a three-layer network () for generator and a four-layer network (
) for discriminator; (iii) Orthogonal initializer for generator and Variance-Scaling for discriminator; (v), and are set to 0.5, 0.4 and 1000, respectively; and (vi) the final model in Sup-GAN is selected by the accuracy, and the is for others.
4.1.2 Experiments on Synthetic Data
We generate a couple of datasets (i.e., training dataset and test dataset) based on the usual assumptions of outliers to study the performance characteristics of different algorithms in more detail. The training dataset (as shown on the left in Fig. 5(a)) consists of two sets of normal data, two sets of group anomalies, and two discrete anomalies. And, in order to match the setting of anomaly detection with few identified anomalies, five examples are randomly sampled from all anomalies as the identified anomalies (shown with red stars). The test dataset (as shown on the right in Fig. 5(a)) contains two sets of normal data, two sets of group anomalies, and five discrete anomalies. The normal data and group anomalies have exactly the same generation mechanisms with the training data, whereas the five discrete outliers are unidentified or emerging anomalies
The experimental results of our proposed methods and seven competitors are shown in Fig. 6. Dual-GAN and RCC-Dual-GAN obtain the best detection results (AUC=1), whereas NN and LOF achieve very poor results because the two proximity-based methods with parameter in a specific range cannot identify group anomalies. As for the other five competitors, in order to clearly illustrate their performance characteristics, we provide a visual representation of the detection results as shown in Fig. 5. The cluster-based -means (shown in Fig. 5(b)) achieves the optimal result when . However, the cluster centers of the two sets of normal data are not accurately identified due to the interference of unidentified anomalies. The basic model MO-GAN (shown in Fig. 5(c)) describes a division boundary that encloses the concentrated data, such that the discrete anomalies can be accurately identified. However, partially identified group anomalies cannot be separated from the concentrated normal data because only explicit information in identified anomalies is used. Supervised Sup-GAN and Sup-RCC-GAN (shown in Fig. 5(f) and 5(g)) that use GAN to enhance the minority class can identify group anomalies represented by identified anomalies. However, the detection of discrete and emerging anomalies will face substantial challenges because the patterns of normal data are not established. The semi-supervised ADOA (shown in Fig. 5(h)) that obtains the suboptimal AUC value can identify all anomalies in the training data, but the ADOA only divides the weighted normal data from the weighted anomalies, such that the detection results of emerging anomalies in the test data cannot be guaranteed. By contrast, our proposed models (shown in Fig. 5(d) and 5(e)) can describe a division boundary that encloses the normal data, showing evident advantages in identifying the partially identified group anomalies and all discrete anomalies.
4.1.3 Experiments on Real-world Data
Ten real-world datasets that often appear in other outlier detection literatures are selected for the following experiments to obtain an overall assessment of different algorithms. These datasets are first processed as outlier evaluation datasets according to the procedure described in . We then divide each dataset into a training dataset and a test dataset in the ratio of 2 to 1. Furthermore, 10% of anomalies in the training data are randomly selected as identified anomalies to match the setting of few identified anomalies. Detailed information on these datasets is listed in Table I, where NoC. Indicates the number of identified anomalies clusters that are divided by RCC.
|Dataset||Dim.||Training Date||Test Date|
Experimental results on real-world datasets are shown in Table II. The highest AUC for each dataset is highlighted in bold. The average ranks of nine algorithms on ten datasets are provided in the last row of Table II.
Compared with unsupervised methods (i.e., NN, LOF, and -means), algorithms that use identified anomalies achieve substantially higher accuracy on most datasets, showing that reasonable use of these limited tags can effectively improve the performance of outlier detection even with only few identified anomalies. Moreover, to further evaluate the effect of the number of identified anomalies on different algorithms, semi-supervised and supervised approaches are performed on these datasets with different identification ratios. The results are shown in Fig. 8, where the ratio of identified anomalies in each dataset is adjusted from 0% to 100%. The accuracy of MO-GAN (shown with blue lines in Fig. 7) generally increases linearly with the identification ratio, and satisfactory results can only be obtained if there are many identified anomalies. By contrast, Dual-GAN, RCC-Dual-GAN, and Sup-RCC-GAN (shown with yellow, red, and orange line, respectively, in Fig. 7) can utilize few identified anomalies (i.e., 10% identification ratio) to achieve excellent results that approach the results when all tags are known (i.e., 100% identification ratio) on multiple datasets.
Compared with supervised methods, the overall performance (i.e., average ranks) of Dual-GAN and RCC-Dual-GAN is superior to that of Sup-GAN and Sup-RCC-GAN, respectively. Although the suboptimal Sup-RCC-GAN achieves the best performance on three datasets (i.e., Thyroid, Waveform, and Har), the identified anomalies in these datasets belong to one cluster (i.e., NoC.=1). This means that all anomalies in each dataset are most likely generated by the same generation mechanism, and identified anomalies may represent all of them. If unidentified and emerging anomalies exist in the later detection, the accuracy of the supervised detector may not always be guaranteed. By contrast, the proposed semi-supervised methods, which also use the unsupervised modules (i.e., UMO-GAN and UM-GAN) to establish the patterns of normal data, can simultaneously detect the partially identified group anomalies and all discrete anomalies.
The semi-supervised ADOA, which uses isolation and similarity to calculate the confidence of each instance, can identify partially identified group anomalies and discrete anomalies in the training data. However, due to the significant challenge that ADOA faces in detecting emerging anomalies, the overall performance of Dual-GAN and RCC-Dual-GAN is better than ADOA. Regarding the comparison between the two proposed methods, RCC-Dual-GAN outperforms Dual-GAN on nine of the ten datasets. It shows that the network structure combining RCC and M-GAN has greater stability in various datasets, which can also be reflected from the comparison between Sup-GAN and Sup-RCC-GAN.
4.2.1 Credit Card Fraud Detection
With the fast development of e-commerce, increasingly more kinds of credit card frauds arise, which poses a serious threat to all organizations issuing credit cards or managing online transactions. Thus, many machine learning and computational intelligence techniques have been proposed to reduce economic losses and simultaneously enhance customer confidence. However, they are mainly focused on the supervised or unsupervised setting, ignoring the verifiability of fraud and verification latency. That is, a small set of frauds can be timely checked by the investigator, whereas the remaining transactions will be unlabeled until customers discover fraud. Therefore, we apply our proposed models to the issue of credit card fraud detection.
Since banks are reluctant to disclose such data, we perform the experiment on a publicly available Credit-card dataset . The Credit-card dataset contains 284,807 credit card records that occurred in two days of September 2013, where 492 records are fraudulent transactions. Each record consists of transaction time, amount, class (i.e., normal or fraud) and 28 numerical features, which are the principal components extracted from the original features. On this basis, we further remove the transaction time and rescale the other features in the interval [0, 1]. And then, we divide the dataset into two datasets in the ratio of 2 to 1. The training dataset contains 328 fraudulent transactions out of 189,871 records, while the test dataset contains 164 fraudulent transactions out of 94,936 records. Finally, to match the special semi-supervised setting, we randomly select 10% of fraudulent transactions (i.e., 33 frauds) from the training dataset as identified frauds, and the remaining records are used as unlabeled transactions.
Experimental results on the Credit-card dataset are shown in Fig. 8. Similar to the results on real-world datasets, RCC-Dual-GAN and Dual-GAN obtain good performance, which demonstrates the effectiveness of our proposed methods on credit card fraud detection. The supervised Sup-RCC-GAN yields a suboptimal result because the identified frauds may represent the vast majority of fraudulent transactions. However, the detection accuracy of supervised Sup-GAN is even worse than that of unsupervised NN and LOF. It indicates that the single GAN cannot accurately learn multiple generation mechanisms simultaneously, which can further prove the performance advantages of the combination of RCC and M-GAN.
4.2.2 Network Intrusion Detection
Cybersecurity is another important application area for outlier detection, and a considerable number of machine learning techniques, including cluster-based, classification-based, and hybrid methods, have been developed for intrusion detection. However, although only part of intrusions can be detected in practice, semi-supervised methods are still rarely studied and applied to this issue, as discussed above. Thus, in this section, we apply our proposed methods to NSL-KDD, which is one of the most widely used datasets for performance evaluation of intrusion detection.
NSL-KDD solves several inherent problems of the KDD’99 by removing redundant records and readjusting its size. And then, in order to more suitable for the inherent nature that attacks are relatively uncommon, we further adjust the proportion of attacks by deleting 90% of the attack records. Thus, the training dataset contains 67,343 normal records and 5,872 attacks, which belong to 21 attack types in four main categories (i.e., DoS, Probe, R2L, and U2R); the test set contains 9,711 normal records and 1,304 attacks, which fall into 37 attack types. Finally, we randomly select 10% of network intrusions (i.e., 597 attacks in the 21 attack types) from each attack type in the training data as the identified attacks, and the remaining records are used as unlabeled behaviors.
The experimental results on the NSL-KDD dataset are shown in Fig. 9. The semi-supervised RCC-Dual-GAN and Dual-GAN achieve the optimal and suboptimal outcomes, respectively, whereas the supervised Sup-GAN and Sup-RCC-GAN obtain results similar to the unsupervised NN and -means. This is most likely because only 19 of the 37 attack types in the test data are identified, so that the detection of emerging intrusions is as important as the effective use of identified attacks. As for the semi-supervised RCC-Dual-GAN and Dual-GAN, they can exploit the potential information in identified intrusions and simultaneously detect emerging discrete attacks.
5 Conclusions and Future Works
In this paper, we first propose a one-step method Dual-GAN for semi-supervised outlier detection with few identified anomalies, which can directly utilize the potential information in identified anomalies to detect partially identified group anomalies. In addition, since instances with similar output values may not all be similar in a complex data structure, we propose a modified model RCC-Dual-GAN based on Dual-GAN to create the reference distribution and augment the minority class more robustly. Considering the difficulty in finding the Nash equilibrium and optimal model during iteration, two evaluation indicators (i.e., and ) are provided to make the detection process more intelligent and reliable. Extensive experiments on synthetic data and real-world data show that even with only a few identified anomalies, our proposed approaches can substantially improve the accuracy of outlier detection. Moreover, credit card fraud detection and network intrusion detection are performed to demonstrate the effectiveness of our proposed methods in complex practical situations. In future, we attempt to introduce incremental learning into the training process to continuously learn new knowledge with less computational cost, and more intensive research on the evaluation of Nash equilibrium will be conducted.
This work is supported by the Major Program of the National Natural Science Foundation of China (91846201, 71490725), the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (71521001), the National Natural Science Foundation of China (71722010, 91546114, 91746302, 71872060), The National Key Research and Development Program of China (2017YFB0803303).
-  U. Fiore, A. D. Santis, F. Perla, P. Zanetti, and F. Palmieri, “Using generative adversarial networks for improving classification effectiveness in credit card fraud detection,” Information Sciences, 2017.
A. D. Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi, “Credit
card fraud detection: A realistic modeling and a novel learning strategy,”
IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 8, pp. 3784–3797, 2017.
-  S. Makki, Z. Assaghir, Y. Taher, R. Haque, M.-S. Hacid, and H. Zeineddine, “An experimental study with imbalanced classification approaches for credit card fraud detection,” IEEE Access, vol. 7, pp. 93 010–93 022, 2019.
-  M. Rahman, M. Rahman, B. Carbunar, and D. H. Chau, “Search rank fraud and malware detection in google play,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 8, pp. 1329–1342, 2017.
-  S. Liu, B. Hooi, and C. Faloutsos, “A contrast metric for fraud detection in rich graphs,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 12, pp. 2235–2248, 2019.
-  P. Mishra, V. Varadharajan, U. Tupakula, and E. S. Pilli, “A detailed investigation and analysis of using machine learning techniques for intrusion detection,” IEEE Communications Surveys and Tutorials, vol. 21, no. 1, pp. 686–728, 2019.
-  M. R. G. Raman, N. Somu, K. Kirthivasan, and V. S. Sriram, “A hypergraph and arithmetic residue-based probabilistic neural network for classification in intrusion detection systems,” Neural Networks, vol. 92, pp. 89–97, 2017.
-  J. Mao, T. Wang, C. Jin, and A. Zhou, “Feature grouping-based outlier detection upon streaming trajectories,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 12, pp. 2696–2709, 2017.
-  X. Yang, L. J. Latecki, and D. Pokrajac, “Outlier detection with globally optimal exemplar-based gmm,” in SIAM International Conference on Data Mining, 2009, pp. 145–154.
B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen, “Deep autoencoding gaussian mixture model for unsupervised anomaly detection,” inInternational Conference on Learning Representations, 2018.
-  E. Manzoor, S. M. Milajerdi, and L. Akoglu, “Fast memory-efficient anomaly detection in streaming heterogeneous graphs,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1035–1044.
H. Paulheim and R. Meusel, “A decomposition of the outlier detection problem into a set of supervised learning problems,”Machine Learning, vol. 100, no. 2-3, pp. 509–531, 2015.
-  M. Salehi, C. Leckie, J. C. Bezdek, T. Vaithianathan, and X. Zhang, “Fast memory efficient local outlier detection in data streams,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3246–3260, 2016.
-  M. H. Chehreghani, “K-nearest neighbor search and outlier detection via minimax distances,” in SIAM International Conference on Data Mining, 2016, pp. 405–413.
-  Y. Djenouri, A. Belhadi, J. C.-W. Lin, and A. Cano, “Adapted k-nearest neighbors for detecting anomalies on spatio–temporal traffic flow,” IEEE Access, vol. 7, pp. 10 015–10 027, 2019.
-  C. Zhou and R. C. Paffenroth, “Anomaly detection with robust deep autoencoders,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 665–674.
-  T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs, “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” in International Conference on Information Processing in Medical Imaging, 2017, pp. 146–157.
M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli, “Adversarially learned one-class classifier for novelty detection,” in
-  I. Steinwart, “A classification framework for anomaly detection,” Journal of Machine Learning Research, vol. 6, no. 1, pp. 211–232, 2005.
-  Y. Zhang, L. Li, J. Zhou, X. Li, and Z. Zhou, “Anomaly detection with partially observed anomalies,” in WWW: International World Wide Web Conference, 2018, pp. 639–646.
-  C. C. Aggarwal, Outlier Analysis. Springer International Publishing, 2017.
-  A. Daneshpazhouh and A. Sami, “Entropy-based outlier detection using semi-supervised approach with few positive examples,” Pattern Recognition Letters, vol. 49, pp. 77–84, 2014.
——, “Semi-supervised outlier detection with only positive and unlabeled
data based on fuzzy clustering,”
International Journal on Artificial Intelligence Tools, vol. 24, no. 3, 2015.
-  B. Liu, Y. Xiao, P. S. Yu, Z. Hao, and L. Cao, “An efficient approach for outlier detection with imperfect data labels,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 7, pp. 1602–1616, 2014.
-  Y. Liu, Z. Li, C. Zhou, Y. Jiang, J. Sun, M. Wang, and X. He, “Generative adversarial active learning for unsupervised outlier detection,” IEEE Transactions on Knowledge and Data Engineering, 2019.
-  S. A. Shah and V. Koltun, “Robust continuous clustering,” Proceedings of the National Academy of Sciences, vol. 114, no. 37, p. 9814–9819, 2017.
-  H. Wang, M. Bah, and M. Hammad, “Progress in outlier detection techniques: A survey,” IEEE Access, vol. 7, pp. 107 964–108 000, 2019.
B. X. Wang and N. Japkowicz, “Boosting support vector machines for imbalanced data sets,”Knowledge and Information Systems, vol. 25, no. 1, pp. 1–20, 2010.
R. F. Lima and A. C. M. Pereira, “Feature selection approaches to fraud detection in e-payment systems,” inInternational Conference on Electronic Commerce and Web Technologies, 2017, pp. 111–126.
-  J. L. P. Lima, D. Macêdo, and C. Zanchettin, “Heartbeat anomaly detection using adversarial oversampling,” in IEEE International Joint Conference on Neural Networks, 2019.
S. M. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie, “High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning,”Pattern Recognition, vol. 58, pp. 128–134, 2016.
-  B. Liu, Y. Xiao, L. Cao, Z. Hao, and F. Deng, “Svdd-based outlier detection on uncertain data,” Knowledge and Information Systems, vol. 34, no. 3, pp. 597–618, 2013.
-  J. Gao, H. Cheng, and P. Tan, “Semi-supervised outlier detection,” in ACM symposium on Applied computing, 2006, pp. 635–636.
-  Z. Xue, Y. Shang, and A. Feng, “Semi-supervised outlier detection based on fuzzy rough c-means clustering,” Mathematics and Computers in Simulation, vol. 80, no. 9, pp. 1911–1921, 2010.
-  I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Advances in Neural Information Processing Systems, vol. 3, pp. 2672–2680, 2014.
-  S. Akcay, A. Atapourabarghouei, and T. P. Breckon, “Ganomaly: Semi-supervised anomaly detection via adversarial training,” arXiv:1805.06725, 2018.
-  H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar, “Efficient gan-based anomaly detection,” in The Workshop on International Conference on Learning Representations, 2018.
-  J. Bian, X. Hui, S. Sun, X. Zhao, and M. Tan, “A novel and efficient cvae-gan-based approach with informative manifold for semi-supervised anomaly detection,” IEEE Access, vol. 7, pp. 88 903–88 916, 2019.
-  C. Wang, Y. Zhang, and C. Liu, “Anomaly detection via minimum likelihood generative adversarial networks,” in International Conference on Pattern Recognition, 2018.
-  S. K. Lim, Y. Loo, N.-T. Tran, N.-M. Cheung, G. Roig, and Y. Elovici, “Doping: Generative data augmentation for unsupervised anomaly detection with gan,” in IEEE International Conference on Data Mining, 2018.
-  Y. J. Zheng, X. Zhou, W. Sheng, Y. Xue, and S. Chen, “Generative adversarial network based telecom fraud detection at the receiving bank,” Neural Networks, vol. 102, pp. 78–86, 2018.
-  M. Kimura and T. Yanagihara, “Semi-supervised anomaly detection using gans for visual inspection in noisy training data,” arXiv:1807.01136, 2018.
-  H. Gao, Z. Shou, A. Zareian, H. Zhang, and S. Chang, “Low-shot learning via covariance-preserving adversarial augmentation networks,” in Advances in Neural Information Processing Systems, 2018, pp. 981–991.
-  G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent, and M. E. Houle, “On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study,” Data Mining and Knowledge Discovery, vol. 30, no. 4, pp. 891–927, 2016.
-  A. D. Pozzolo, O. Caelen, R. A. Johnson, and G. Bontempi, “Calibrating probability with undersampling for unbalanced classification,” in IEEE Symposium Series on Computational Intelligence, 2015, pp. 159–166.