1 Introduction
Outliers refer to observations that have significantly different characteristics from the majority of other data. These observations are so unique as to arouse suspicions that they were generated by illegal acts or undetected errors. To reveal the critical and interesting information in them, many outlier detection technologies have been studied and applied in various applications. Such as the fraud detection in credit card transaction [1, 2, 3], fake rating and review detection in ecommerce service platform [4, 5], intrusion detection in network service request [6, 7], and abnormal moving object detection in traffic monitoring [8].
In general, according to the availability of data labels, existing methods can be divided into three categories: unsupervised, supervised, and semisupervised outlier detection. Unsupervised algorithms are among the most widely studied because they do not require additional labels or prior information. Including statisticalbased [9, 10], clusterbased [11], regressionbased [12], proximitybased [13, 14, 15], reconstructionbased [16, 17, 18], and other approaches. They assume explicitly or implicitly that outliers are not as concentrated as normal data [19]. Thus, discrete anomalies can be detected effectively. However, in many cases, multiple anomalies (e.g., DoS attack) may be generated by the same mechanism. They become increasingly concentrated such that unsupervised outlier detection incorrectly detects these group anomalies as normal data. Moreover, the selection of models and parameters is a considerable challenge for unsupervised methods without the help of prior knowledge. As for supervised algorithms, higher detection rates and optimal parameters can usually be obtained because the labels are complete and correct during training [20]. However, obtaining sufficient anomalies and correct labels is a timeconsuming task. In addition, detection models trained on fully labeled data have considerable uncertainty when dealing with emerging anomalies.
To address these issues, semisupervised outlier detection with few identified anomalies and abundant unlabeled data was proposed [21]. Despite insufficient capacity to label all normal examples or outliers, few abnormal behaviors that have triggered an alarm can be collected easily in many applications [20]
. Examples include DoS attacks that have caused a system crash and insurance applications that have been proven to be fraudulent. In addition to their own labels, these identified anomalies can also provide a priori information for other samples that have the same generation mechanism. If this information is utilized fully, semisupervised model can not only identify discrete anomalies, but also detect partially identified group anomalies. Moreover, few anomalies can also provide valuable guidance for the selection of models and parameters, which has significant advantage over unsupervised outlier detection. Thus, this paper will focus on this special anomaly detection setting, in the hope of using limited tags to achieve high detection accuracy.
, first extracts reliable normal examples through a heuristic method, which is completely consistent with the first step of PUlearning. Then a modified outlier detection model is trained on the new tagged dataset to identify the other anomalies. But since the outliers are usually discrete or belong to different clusters, the extracted samples that are significantly different from identified anomalies are not necessarily normal. As a result, the potential information in identified anomalies is not used effectively and erroneous information may also be introduced into the new tagged dataset. Therefore, to augment the use of known information and reduce the introduction of error messages, several soft versions of the above strategy were established. For example, LBSSVDD
[24] assigns abnormal likelihood values to each sample based on the proportion of anomalies in its neighbors, while ADOA [20] attaches a weight to each instance according to its own isolation and its similarity to identified anomalies. However, the calculation of neighbors and similarity usually has a high computational cost and is likely to be affected by irrelevant variables.In this paper, we propose a onestep method for semisupervised outlier detection with few identified anomalies, which can directly utilize the potential information in identified anomalies without calculating the abnormal degree of each instance. Specifically, the Dual Generative Adversarial Networks (DualGAN) contains two MultipleObjective Generative Adversarial Networks [25] (i.e., UMOGAN and AMOGAN) and an overall discriminator. The Unlabeled MOGAN (UMOGAN) is used to learn the generation mechanism of unlabeled data and gradually generates informative potential outliers to provide a reasonable reference distribution for unlabeled data. By contrast, the Abnormal MOGAN (AMOGAN) is used to learn the deep representation of identified anomalies and generates numerous potential anomalies with the same generation mechanism as known anomalies to enhance the minority class. Thus, in order to distinguish these identified and synthesized anomalies from the unlabeled data, the overall discriminator will not only describe a division boundary that encloses the concentrated data, but will also separate partially identified group anomalies from the concentrated data. In addition, considering that instances with similar output values are not necessarily close to one another in the sample space, we replace the MOGAN with Multiple Generative Adversarial Networks (MGAN). More specifically, the modified model RCCDualGAN first divides the identified anomalies and unlabeled data into different subsets through a Robust Continuous Clustering (RCC) [26]. Then, multiple GANs are utilized to learn their generation mechanisms directly. Compared with the original model DualGAN, RCCDualGAN can create the reference distribution and augment the minority class more robustly in various situations. The main contributions of this work are summarized as follows:

We propose a semisupervised outlier detection method DualGAN, which consists of two MOGAN and an overall discriminator. The method utilizes the potential information in identified anomalies directly to detect discrete anomalies and partially identified group anomalies simultaneously.

Considering that instances with similar output values may not all be similar in a complex data structure, we change the original model DualGAN to RCCDualGAN by replacing MOGAN with the combination of RCC and MGAN. Compared with DualGAN, the modified model can create the reference distribution and augment the minority class more robustly.

Considering the difficulty in finding the Nash equilibrium and optimal model during iteration, two evaluation indicators are created and introduced into the two models to make the detection process more intelligent.

We conduct extensive experiments on both benchmark datasets and two practical tasks to investigate the performance of our proposed approaches. The results show that even with only a few identified anomalies, our proposed approaches can significantly improve the accuracy of outlier detection.
The rest of this paper is organized as follows. Section 2 provides a brief review of related works. Section 3.1 introduces the detection principle and model details of MOGAN, and the proposed models are described in Section 3.2. We report extensive experiment results in Section 4 and the whole paper is concluded in Section 5.
2 Related Work
Numerous overviews on outlier detection algorithms for different kinds of data and applications are available in the literature [21, 27]. Here, we briefly discuss the common outlier detection methods (i.e., unsupervised and supervised approaches) and then focus on the semisupervised outlier detection with limited labels, which is most relevant to our research. Finally, the GANbased outlier detection algorithms are reviewed in Section 2.3.
2.1 Common Outlier Detection Methods
Unsupervised outlier detection methods have been studied widely because they require no additional label. Specific algorithms include proximity [13, 14, 15], statistical [9, 10], cluster [11], regression [12], and reconstructionbased models [16, 17, 18]. Proximitybased models assume the outliers are points far away from other data and can be performed by measuring the distance or density of the point. By contrast, the remaining models assume that outliers are observations that have large deviations from the normal profiles and can be performed by creating a model for the majority of samples. However, all these algorithms based on the assumption that outliers are not as concentrated as the normal data, such that the group anomalies with higher density levels cannot be detected correctly. And most of them must be provided with model assumptions or parameters in advance, which is a huge challenge for unsupervised methods without the help of prior knowledge.
Supervised outlier detection can be considered as a special classification problem and many classification algorithms have been applied. However, in most practical applications, outliers are far less common than normal data, so that the direct use of offtheshelf classifiers may produce biased results. Hence, costsensitive learning
[28] and adaptive resampling [1, 29, 30], are later incorporated into the classification process. The costsensitive learning increases the misclassification costs of outliers by weighting the classification errors, whereas the adaptive resampling increases the relative proportion of the minority class by under or oversampling. Supervised algorithms usually achieve good parameters and high detection rates because the labels are complete during training. But the question is how to obtain sufficient anomalies and correct labels, which is a timeconsuming and expensive task. Moreover, the detection model trained on the fully labeled data has significant uncertainty in dealing with emerging anomalies.2.2 SemiSupervised Outlier Detection Methods
According to the available labels, semisupervised outlier detection can be divided into three categories: oneclass learning with only normal examples, semisupervised outlier detection with small amount of labeled data, and semisupervised outlier detection with few identified anomalies. The oneclass learning is only slightly different from unsupervised outlier detection, and most of the unsupervised approaches (e.g., OCSVM [31] and SVDD [32]) can be used in this case [21]. The outlier detection model established on the oneclass dataset tends to be more robust because of the absence of additional interference from anomalies. However, considerable time must be spent in verifying the collected samples to ensure that the training data contain only normal data. The semisupervised outlier detection with small amount of labeled data usually optimizes an outlier detection model (e.g., means [33] and fuzzy rough means [34]) with the assurance that the labels of the labeled data are almost unchanged. Compared with unsupervised models, their performance is improved through a small amount of labeled data. However, the potential information in the labeled examples is not used effectively. And the normal examples in the labeled data may still require additional confirmation because of undetected anomalies.
Compared with them, the case of semisupervised outlier detection with few identified anomalies is much simpler because few abnormal behaviors can be easily collected in many applications. The initial model [22, 23] first extracts reliable normal examples through a heuristic method, and then trains a semisupervised outlier detection model described above on the new labeled dataset. But since the outliers are usually discrete or belong to different clusters, the extracted samples that are far from identified anomalies are not necessarily normal. As a result, the potential information in identified anomalies is not utilized fully and erroneous information may also be introduced into the new dataset. To address this, several soft versions of above are then established. For example, before training the detection model, LBSSVDD [24]
first evaluates the abnormal probability of each sample based on the proportion of anomalies in its neighbors, whereas ADOA
[20]assigns likelihood values to each sample according to its own isolation and its similarity to identified anomalies. They enhance the use of known information and simultaneously reduce the introduction of error messages. However, the calculation of probability may require high computational costs on large datasets and tends to be affected by the “curse of dimensionality” on highdimensional datasets. Therefore, we propose a GANbased model for semisupervised outlier detection with few identified anomalies, which can utilize the potential information in identified anomalies directly.
2.3 GANBased Outlier Detection Methods
GAN [35] is an adversarial representation learning model that has achieved stateoftheart performance in various applications. For unsupervised outlier detection and semisupervised outlier detection with only normal examples, GANbased reconstruction model and generation model have been studied. GANbased reconstruction models usually learn the generation mechanism of normal data by training a regular GAN [17]
or a combination of GAN and autoencoder
[36, 37, 18], and then measure the abnormal degree of example based on the reconstruction loss or discriminator loss. Moreover, in order to prevent slight anomalies from being reconstructed, Bian et al. [38] also perform active negative training to limit network generative capability. GANbased generation models usually use the GAN to generate informative potential outliers [25, 39] or infrequent normal samples [40], such that subsequent detectors can describe a correct boundary. For supervised outlier detection, GAN [1, 30] is often used to synthesize minority class examples to balance the relative proportion between the two classes. Besides, Zheng et al. [41]also take advantage of an adversarial deep denoising autoencoder to better extract latent representation of labeled transactions, which can greatly improve the accuracy of fraud detection. However, so far, there are few GANbased studies focusing on semisupervised outlier detection with few identified anomalies. Although Kimura
et al. [42] utilize both noisy normal images and given abnormal images for visual inspection, its main purpose is to eliminate the impact of abnormal pixels in some normal images during the reconstruction process, which differs considerably from our model.3 Methodology
In this section, we first introduce the detection principle of the Artificially Generating Potential Outliers (AGPO)based unsupervised outlier detection method MOGAN [25], which are necessary to comprehend our proposed methods. Then two semisupervised outlier detection models (i.e., DualGAN and RCCDualGAN) are proposed to effectively improve the detection rate of outliers.
3.1 Background on MOGAN
Unsupervised outlier detection can be regarded as a density level detection process due to its default assumption. Unlike existing model or proximitybased outlier detection, AGPObased algorithms approach density level detection as a classification problem. First, numerous data points are randomly sampled as the potential outliers (shown with gray dots in Fig. 1) to construct a reference distribution . Then a classifier is trained on the new dataset to separate potential outliers from the original data (shown with blue dots and stars in Fig. 1
). In order to minimize the loss function
,(1) 
the classifier should assign a higher value to the original data having a higher relative density , and a lower value to the opposite case. Thus, when faced with the uniform reference distribution , the classifier can describe a division boundary that encloses the concentrated normal samples (as shown in Fig. 1(a)).
Illustration of the detection performance of AGPO and MOGAN. Normal points, outliers, and potential outliers are shown with blue dots, blue stars, and gray dots, respectively. Highdimensional data are presented as crosssectional data, and data points closer to the green area are more likely to be outliers.
However, when the dimension increases, a limited number of potential outliers () cannot provide sufficient information for the classifier to describe a correct boundary (as shown in Fig. 1(b)). Therefore, MOGAN was proposed to generate informative potential outliers directly to construct a reasonable reference distribution, which can ensure the relative density level of the normal case is greater than that of the outlier.
MOGAN (shown in Fig. 2) consists of subgenerators and a discriminator . Its central idea is to let the specific subgenerator actively learn the generation mechanism of the data in the specific subset , and gradually generate potential outliers that occur inside or close to the data . Thus, the integration of different numbers of potential outliers can provide a reasonable reference distribution for the whole dataset. More specifically, due to samples with similar outputs are more likely to be similar, MOGAN first divides the original dataset equally into subsets based on their similar outputs. Then, a dynamic game is executed between the subgenerators and discriminator . Each subgenerator attempts to learn the generation mechanism of by making the generated samples output similar values to , whereas discriminator attempts to identify the generated outliers from the original data , such as the classifier in AGPO. Eventually, the MOGAN reaches a Nash equilibrium through several iterations. Integrated different numbers of informative potential outliers can construct a reasonable reference distribution , and discriminator can describe a correct boundary to enclose concentrated original data (as shown in Fig. 1(c) and Fig. 2).
3.2 Outlier Detection with Few Identified Anomalies
The largest problem with unsupervised outlier detection (including MOGAN) is that it cannot detect group anomalies in the absence of additional information. All labels and sufficient anomalies are difficult to obtain, but few common anomalous behaviors (e.g., DoS and DDoS attacks) that have triggered alarms can be collected easily in many applications. These identified anomalies not only contain their own labels, but also potentially provide a priori information for other samples that with the same generation mechanism as identified anomalies. If these information is utilized fully, partially identified group anomalies will be detected accurately along with the discrete anomalies. Therefore, this section proposes two semisupervised outlier detection approaches, namely, DualGAN and RCCDualGAN, which can improve the detection accuracy by directly utilizing the potential information in identified anomalies.
3.2.1 DualGAN
Assume a dataset with identified anomalies and unlabeled samples , where represents a data point, represents its label, and . Our goal is to identify a scoring function that can assign a higher value (close to 1) to normal data and a lower value (close to 0) to the outlier. Because of a few identified anomalies, this scoring function should satisfy two conditions: (i) Based on the default assumption that outliers are not concentrated, the scoring function should output higher values to samples with higher density levels and output lower values to discrete data. (ii) Assuming that samples with the same generation mechanisms as identified anomalies are more likely to be outliers, the scoring function should output a value close to 0 for them and the identified anomalies. Thus, we first propose the DualGAN (shown in Fig. 3), which consists of two MOGAN (i.e., UMOGAN and AMOGAN) and an overall discriminator .
The UMOGAN attempts to generate samples that occur inside or around the target data to construct a reasonable reference distribution for unlabeled data. It takes unlabeled samples as input and includes subgenerators and a discriminator . The specific subgenerator learns the generation mechanism of the data by making the generated samples output similar values to , whereas the discriminator guides the learning of subgenerator by identifying the generated samples from unlabeled data . The optimization framework of UMOGAN is formulated as follows:
(2)  
(3)  
where is a representative statistic of (e.g., the minimum value). With the iteration between and , the subgenerator gradually generates informative potential outliers. And ultimately, when the dynamic game reaches the Nash equilibrium, the integration of different numbers of potential outliers (shown with gray dots in Fig. 3) provides a reasonable reference distribution for the unlabeled dataset .
The AMOGAN is used to generate samples similar to the identified anomalies to prevent the overall discriminator from overfitting or forgetting when dealing with the minority class [43]. It takes identified anomalies as input and includes subgenerators and a discriminator . Specific subgenerator learns the generation mechanism of the data , and discriminator identifies the generated samples from identified anomalies . The optimization framework of AMOGAN is formulated as follows:
(4)  
(5)  
where is a representative statistic of . Unlike UMOGAN, it will continue training after it reaches the Nash equilibrium because the purpose of the AMOGAN is to generate data points as similar as possible to the identified anomalies. Finally, the integration of numerous of potential outliers (shown with gray stars in Fig. 3) can augment the minority class to ensure that partially identified group anomalies are detected as anomalies.
The overall discriminator , which takes all original data and generated potential outliers as input, attempts to describe an accurate division boundary by identifying all potential outliers (i.e., and ) and identified anomalies from the unlabeled data . The optimization function of is formulated as follows:
(6)  
where and represent the number of potential outliers generated by and , respectively. More potential outliers must be generated for the less concentrated subset to create a reasonable reference distribution [25]. At the beginning of the iteration, randomly generated potential outliers may not provide sufficient information for . However, when the two MOGAN models reach the Nash equilibrium, the integration of different numbers of potential outliers can provide a reasonable reference distribution for the unlabeled data , whereas the integration of numerous of potential outliers can augment the minority class. Thus, in order to minimize the optimization function , the overall discriminator will not only assign a higher value (close to 1) to concentrated unlabeled data, but also assign a lower value (close to 0) to discrete anomalies and partially identified group anomalies (shown in Fig. 3(a)), which is the scoring function we are looking for. Compared with unsupervised detection using only UMOGAN (shown in Fig. 3(b)), DualGAN can also detect group anomalies with the help of few identified anomalies. Compared with supervised detection using only AMOGAN (shown in Fig. 3(c)), DualGAN can also detect previously unknown discrete anomalies.
In addition, two issues that have a substantial effect on the results, namely, the evaluation of Nash equilibrium and the selection of optimal model, must be discussed to ensure a more intelligent and reliable detection.
Nash Equilibrium in GAN means that the distribution of the real data has been learned by the generator, and the discriminator cannot recognize the difference between the two distributions. The original GAN uses the classification error to evaluate the similarity between the generated data and the real data, that is, the Nash equilibrium is reached when the error is close to . However, the absolute Nash equilibrium cannot be guaranteed when the objective function is nonconvex. The previously proposed MOGAN utilizes the trend of the generator loss to evaluate their similarity, that is, the Nash equilibrium is reached when the downward trend of generator loss tends to be slow. However, accurate assessment of the trend requires human intervention due to the fluctuation of the loss. Therefore, we propose an evaluation indicator, Nearest Neighbor Ratio (), to directly measure the similarity between the two distributions. First, samples are selected randomly from one subset, and the ratio of data belonging to another subset among the nearest neighbors of each sample is calculated. If is greater than a certain threshold , the sample can be thought of as having a similar generation mechanism to the data in another subset. Then calculate the ratio of the samples that have a similar generation mechanism to in the randomly selected samples. If the is greater than , the two subsets are considered to be generated from similar distributions and the dynamic game reaches the Nash equilibrium.
Optimal model refers to the model that can most effectively identify outliers from the whole dataset during iteration. Given no additional information, the evaluation of detection performance and the selection of the optimal model are difficult for unsupervised outlier detection. Fortunately, the data used to train the semisupervised outlier detection model usually contain few identified anomalies, which can provide valuable guidance for the selection of the final model. In this paper, we use the Average Position () of known anomalies in the ascending order of all real data output results to measure the performance of the overall discriminator . A lower means that the model assigns lower values to identified anomalies than to others, and the model corresponding to the lowest is used as the final model for subsequent detection.
3.2.2 RCCDualGAN
In general, DualGAN can achieve good detection performance. However, as the cluster structure of the data becomes more complex, instances with similar output values may not all be similar to one another in the sample space, that is, the data points divided according to their similar outputs are not necessarily close to each other, and the generated data whose outputs are similar to that of target data are not necessarily similar to the target data. Therefore, we then propose a modified model RCCDualGAN based on DualGAN to create the reference distribution and augment the minority class more robustly. The network structure and detection process of RCCDualGAN are illustrated in Fig. 4, where the unlabeled data and identified anomalies are first divided into different subsets by RCC.
RCC [26] is a nonparametric clustering that can achieve high clustering accuracy across multiple domains without knowing the number of clusters. Given the unlabeled data as an example, RCC first constructs a connectivity structure based on mutual nearest neighbor connectivity. And then, a set of representatives of the unlabeled data is optimized to reveal the cluster structure latent in . The representative should be as similar as possible to the corresponding unlabeled data , and the representatives of interconnected data should be as similar as possible. The optimization objective is formulated as follows:
(7)  
where is used to balance the strength of different objective terms, is used to balance the contribution of each point to the pairwise terms, and is a penalty on the regularization terms. Finally, based on the optimized , RCC constructs a graph in which a pair and is connected if , such that different unlabeled subsets are output. Compared with the subsets divided by similar outputs, the subsets partitioned by RCC can accurately reflect the cluster structure latent in the data even in the case of complex data structures.
After the unlabeled data and identified anomalies are divided into and subsets, respectively, RCCDualGAN replaces MOGAN with MGAN to create the reference distribution and augment the minority class in more detail. The UMGAN includes subgenerators and subdiscriminators . Each specific subGAN can directly learn the generation mechanism of the data through the dynamic game between and ,
(8)  
where represents the number of samples in the th unlabeled subset. The AMGAN includes subgenerators and subdiscriminators . Each specific subGAN directly learns the deep representation of data through the dynamic game between and ,
(9)  
where represents the number of samples in . The overall discriminator still attempts to identify all potential outliers and identified anomalies from the unlabeled data,
(10)  
where and represent the number of potential outliers generated by and , respectively. The UMGAN will generate the same number of potential outliers for different unlabeled subsets, which is different from the UMOGAN. Because each unlabeled data subset partitioned by RCC contains a different number of samples, and the concentrated data are usually divided into large subsets.
At the beginning of the iteration, the two MGANs randomly generate potential outliers in the sample space, whereas the overall discriminator describes a rough boundary to separate them from unlabeled data. However, when all subGANs reach the Nash equilibrium, the integration of the same number of potential outliers (shown with gray dots in Fig. 4) generated by can provide a reasonable reference distribution for the unlabeled data, and the integration of numerous potential outliers (shown with gray stars in Fig. 4) generated by can augment the minority class. Consequently, the overall discriminator will not only describe a division boundary that encloses the concentrated data but also separate the partially identified group anomalies from the concentrated data (shown with the red lines in Fig. 4(a)). Compared with the potential outliers generated by outputting similar values, the potential outliers generated by directly learning can more effectively assist the overall discriminator in describing a correct boundary even in the case of complex data structures.
4 Experiments and Applications
Extensive experiments are conducted on synthetic data and realworld data to investigate the importance of the effective use of identified anomalies. In addition, we apply the proposed models to two practical tasks (i.e., credit card fraud detection and network intrusion detection) to study the performance of different algorithms in complex situations.
4.1 Experiments
4.1.1 Baselines and Parameter Settings
We compare the proposed models (i.e., DualGAN and RCCDualGAN) with several representative outlier detection algorithms. (i) Three of the most common unsupervised approaches (NN, LOF, means) are first selected because their effectiveness and robustness have been proven in multiple performance evaluations. (ii) The basic model MOGAN, which utilizes the explicit information and guidance information in identified anomalies, is performed to investigate the significance of the data augmentation in DualGAN. (iii) The supervised SupGAN [1], which uses GAN to increase the relative proportion of the minority class, is used to explore the importance of the unsupervised module in DualGAN. (iv) The extended supervised SupRCCGAN, where the single GAN in SupGAN is replaced by our proposed combination of RCC and MGAN, is compared to further demonstrate the performance advantages of multiple GAN. (v) The semisupervised ADOA [20], which attaches a weight to each instance, is used to evaluate the performance of our proposed semisupervised models.
For nonGANbased models, we attempt to find the optimal parameters in a range of values. For example, the parameters in NN and LOF are searched from 2 to , the in means is selected from 1 to , and the in ADOA is adjusted from 0.1 to 0.9. For all GANbased models, we adopt a unified network structure: (i) five subgenerators against one discriminator for MOGAN, UMOGAN, and AMOGAN; (ii) a threelayer network () for generator and a fourlayer network (
) for discriminator; (iii) Orthogonal initializer for generator and VarianceScaling for discriminator; (v)
, and are set to 0.5, 0.4 and 1000, respectively; and (vi) the final model in SupGAN is selected by the accuracy, and the is for others.4.1.2 Experiments on Synthetic Data
We generate a couple of datasets (i.e., training dataset and test dataset) based on the usual assumptions of outliers to study the performance characteristics of different algorithms in more detail. The training dataset (as shown on the left in Fig. 5(a)) consists of two sets of normal data, two sets of group anomalies, and two discrete anomalies. And, in order to match the setting of anomaly detection with few identified anomalies, five examples are randomly sampled from all anomalies as the identified anomalies (shown with red stars). The test dataset (as shown on the right in Fig. 5(a)) contains two sets of normal data, two sets of group anomalies, and five discrete anomalies. The normal data and group anomalies have exactly the same generation mechanisms with the training data, whereas the five discrete outliers are unidentified or emerging anomalies
The experimental results of our proposed methods and seven competitors are shown in Fig. 6. DualGAN and RCCDualGAN obtain the best detection results (AUC=1), whereas NN and LOF achieve very poor results because the two proximitybased methods with parameter in a specific range cannot identify group anomalies. As for the other five competitors, in order to clearly illustrate their performance characteristics, we provide a visual representation of the detection results as shown in Fig. 5. The clusterbased means (shown in Fig. 5(b)) achieves the optimal result when . However, the cluster centers of the two sets of normal data are not accurately identified due to the interference of unidentified anomalies. The basic model MOGAN (shown in Fig. 5(c)) describes a division boundary that encloses the concentrated data, such that the discrete anomalies can be accurately identified. However, partially identified group anomalies cannot be separated from the concentrated normal data because only explicit information in identified anomalies is used. Supervised SupGAN and SupRCCGAN (shown in Fig. 5(f) and 5(g)) that use GAN to enhance the minority class can identify group anomalies represented by identified anomalies. However, the detection of discrete and emerging anomalies will face substantial challenges because the patterns of normal data are not established. The semisupervised ADOA (shown in Fig. 5(h)) that obtains the suboptimal AUC value can identify all anomalies in the training data, but the ADOA only divides the weighted normal data from the weighted anomalies, such that the detection results of emerging anomalies in the test data cannot be guaranteed. By contrast, our proposed models (shown in Fig. 5(d) and 5(e)) can describe a division boundary that encloses the normal data, showing evident advantages in identifying the partially identified group anomalies and all discrete anomalies.
4.1.3 Experiments on Realworld Data
Ten realworld datasets that often appear in other outlier detection literatures are selected for the following experiments to obtain an overall assessment of different algorithms. These datasets are first processed as outlier evaluation datasets according to the procedure described in [44]. We then divide each dataset into a training dataset and a test dataset in the ratio of 2 to 1. Furthermore, 10% of anomalies in the training data are randomly selected as identified anomalies to match the setting of few identified anomalies. Detailed information on these datasets is listed in Table I, where NoC. Indicates the number of identified anomalies clusters that are divided by RCC.
Dataset  Dim.  Training Date  Test Date  

Nor.  Ano.  Ide.  NoC.(Ide.)  
Thyroid  6  2451  64  7  1  1255 
Pima  8  328  179  18  3  261 
Stamps  9  206  21  3  1  113 
Pageblocks  10  3263  340  34  16  1790 
Cardio  21  1103  118  12  3  610 
Waveform  21  2229  67  7  1  2322 
Spambase  57  1681  1120  112  12  1406 
Optdigits  64  3377  100  10  1  2179 
Mnist  100  4602  467  47  14  2534 
Har  561  1868  20  2  1  972 
Experimental results on realworld datasets are shown in Table II. The highest AUC for each dataset is highlighted in bold. The average ranks of nine algorithms on ten datasets are provided in the last row of Table II.
Dataset  NN  LOF  means  MOGAN  DualGAN  RCCDualGAN  SupRCCGAN  SupGAN  ADOA 

Thyroid  0.9365  0.9527  0.9381  0.9606  0.9775  0.9915  0.9970  0.9872  0.9927 
Pima  0.7385  0.7154  0.6927  0.7366  0.7326  0.7460  0.6769  0.5823  0.7021 
Stamps  0.9223  0.9010  0.9077  0.9236  0.9906  0.9922  0.9097  0.9022  0.9509 
Pageblocks  0.8866  0.9232  0.9195  0.8456  0.9024  0.9317  0.9230  0.8066  0.9156 
Cardio  0.9606  0.9639  0.9577  0.9516  0.9871  0.9892  0.9891  0.9105  0.9675 
Waveform  0.8102  0.8071  0.7275  0.8658  0.9140  0.9184  0.9186  0.8821  0.8474 
Spambase  0.5724  0.5391  0.5972  0.8947  0.9152  0.8785  0.9131  0.8753  0.8108 
Optdigits  0.8303  0.9100  0.8843  0.9020  0.9926  0.9941  0.9960  0.9959  1.0000 
Mnist  0.8647  0.8562  0.8467  0.9114  0.9517  0.9748  0.9738  0.9579  0.9731 
Har  0.9756  0.9827  0.9718  0.9892  0.9933  0.9943  0.9943  0.9923  0.9915 
Average Rank  6.9  6.0  7.6  5.7  3.7  1.9  2.9  6.0  4.2 
Compared with unsupervised methods (i.e., NN, LOF, and means), algorithms that use identified anomalies achieve substantially higher accuracy on most datasets, showing that reasonable use of these limited tags can effectively improve the performance of outlier detection even with only few identified anomalies. Moreover, to further evaluate the effect of the number of identified anomalies on different algorithms, semisupervised and supervised approaches are performed on these datasets with different identification ratios. The results are shown in Fig. 8, where the ratio of identified anomalies in each dataset is adjusted from 0% to 100%. The accuracy of MOGAN (shown with blue lines in Fig. 7) generally increases linearly with the identification ratio, and satisfactory results can only be obtained if there are many identified anomalies. By contrast, DualGAN, RCCDualGAN, and SupRCCGAN (shown with yellow, red, and orange line, respectively, in Fig. 7) can utilize few identified anomalies (i.e., 10% identification ratio) to achieve excellent results that approach the results when all tags are known (i.e., 100% identification ratio) on multiple datasets.
Compared with supervised methods, the overall performance (i.e., average ranks) of DualGAN and RCCDualGAN is superior to that of SupGAN and SupRCCGAN, respectively. Although the suboptimal SupRCCGAN achieves the best performance on three datasets (i.e., Thyroid, Waveform, and Har), the identified anomalies in these datasets belong to one cluster (i.e., NoC.=1). This means that all anomalies in each dataset are most likely generated by the same generation mechanism, and identified anomalies may represent all of them. If unidentified and emerging anomalies exist in the later detection, the accuracy of the supervised detector may not always be guaranteed. By contrast, the proposed semisupervised methods, which also use the unsupervised modules (i.e., UMOGAN and UMGAN) to establish the patterns of normal data, can simultaneously detect the partially identified group anomalies and all discrete anomalies.
The semisupervised ADOA, which uses isolation and similarity to calculate the confidence of each instance, can identify partially identified group anomalies and discrete anomalies in the training data. However, due to the significant challenge that ADOA faces in detecting emerging anomalies, the overall performance of DualGAN and RCCDualGAN is better than ADOA. Regarding the comparison between the two proposed methods, RCCDualGAN outperforms DualGAN on nine of the ten datasets. It shows that the network structure combining RCC and MGAN has greater stability in various datasets, which can also be reflected from the comparison between SupGAN and SupRCCGAN.
4.2 Applications
4.2.1 Credit Card Fraud Detection
With the fast development of ecommerce, increasingly more kinds of credit card frauds arise, which poses a serious threat to all organizations issuing credit cards or managing online transactions. Thus, many machine learning and computational intelligence techniques have been proposed to reduce economic losses and simultaneously enhance customer confidence. However, they are mainly focused on the supervised or unsupervised setting, ignoring the verifiability of fraud and verification latency. That is, a small set of frauds can be timely checked by the investigator, whereas the remaining transactions will be unlabeled until customers discover fraud. Therefore, we apply our proposed models to the issue of credit card fraud detection.
Since banks are reluctant to disclose such data, we perform the experiment on a publicly available Creditcard dataset [45]. The Creditcard dataset contains 284,807 credit card records that occurred in two days of September 2013, where 492 records are fraudulent transactions. Each record consists of transaction time, amount, class (i.e., normal or fraud) and 28 numerical features, which are the principal components extracted from the original features. On this basis, we further remove the transaction time and rescale the other features in the interval [0, 1]. And then, we divide the dataset into two datasets in the ratio of 2 to 1. The training dataset contains 328 fraudulent transactions out of 189,871 records, while the test dataset contains 164 fraudulent transactions out of 94,936 records. Finally, to match the special semisupervised setting, we randomly select 10% of fraudulent transactions (i.e., 33 frauds) from the training dataset as identified frauds, and the remaining records are used as unlabeled transactions.
Experimental results on the Creditcard dataset are shown in Fig. 8. Similar to the results on realworld datasets, RCCDualGAN and DualGAN obtain good performance, which demonstrates the effectiveness of our proposed methods on credit card fraud detection. The supervised SupRCCGAN yields a suboptimal result because the identified frauds may represent the vast majority of fraudulent transactions. However, the detection accuracy of supervised SupGAN is even worse than that of unsupervised NN and LOF. It indicates that the single GAN cannot accurately learn multiple generation mechanisms simultaneously, which can further prove the performance advantages of the combination of RCC and MGAN.
4.2.2 Network Intrusion Detection
Cybersecurity is another important application area for outlier detection, and a considerable number of machine learning techniques, including clusterbased, classificationbased, and hybrid methods, have been developed for intrusion detection. However, although only part of intrusions can be detected in practice, semisupervised methods are still rarely studied and applied to this issue, as discussed above. Thus, in this section, we apply our proposed methods to NSLKDD, which is one of the most widely used datasets for performance evaluation of intrusion detection.
NSLKDD solves several inherent problems of the KDD’99 by removing redundant records and readjusting its size. And then, in order to more suitable for the inherent nature that attacks are relatively uncommon, we further adjust the proportion of attacks by deleting 90% of the attack records. Thus, the training dataset contains 67,343 normal records and 5,872 attacks, which belong to 21 attack types in four main categories (i.e., DoS, Probe, R2L, and U2R); the test set contains 9,711 normal records and 1,304 attacks, which fall into 37 attack types. Finally, we randomly select 10% of network intrusions (i.e., 597 attacks in the 21 attack types) from each attack type in the training data as the identified attacks, and the remaining records are used as unlabeled behaviors.
The experimental results on the NSLKDD dataset are shown in Fig. 9. The semisupervised RCCDualGAN and DualGAN achieve the optimal and suboptimal outcomes, respectively, whereas the supervised SupGAN and SupRCCGAN obtain results similar to the unsupervised NN and means. This is most likely because only 19 of the 37 attack types in the test data are identified, so that the detection of emerging intrusions is as important as the effective use of identified attacks. As for the semisupervised RCCDualGAN and DualGAN, they can exploit the potential information in identified intrusions and simultaneously detect emerging discrete attacks.
5 Conclusions and Future Works
In this paper, we first propose a onestep method DualGAN for semisupervised outlier detection with few identified anomalies, which can directly utilize the potential information in identified anomalies to detect partially identified group anomalies. In addition, since instances with similar output values may not all be similar in a complex data structure, we propose a modified model RCCDualGAN based on DualGAN to create the reference distribution and augment the minority class more robustly. Considering the difficulty in finding the Nash equilibrium and optimal model during iteration, two evaluation indicators (i.e., and ) are provided to make the detection process more intelligent and reliable. Extensive experiments on synthetic data and realworld data show that even with only a few identified anomalies, our proposed approaches can substantially improve the accuracy of outlier detection. Moreover, credit card fraud detection and network intrusion detection are performed to demonstrate the effectiveness of our proposed methods in complex practical situations. In future, we attempt to introduce incremental learning into the training process to continuously learn new knowledge with less computational cost, and more intensive research on the evaluation of Nash equilibrium will be conducted.
Acknowledgments
This work is supported by the Major Program of the National Natural Science Foundation of China (91846201, 71490725), the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (71521001), the National Natural Science Foundation of China (71722010, 91546114, 91746302, 71872060), The National Key Research and Development Program of China (2017YFB0803303).
References
 [1] U. Fiore, A. D. Santis, F. Perla, P. Zanetti, and F. Palmieri, “Using generative adversarial networks for improving classification effectiveness in credit card fraud detection,” Information Sciences, 2017.

[2]
A. D. Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi, “Credit
card fraud detection: A realistic modeling and a novel learning strategy,”
IEEE Transactions on Neural Networks and Learning Systems
, vol. 29, no. 8, pp. 3784–3797, 2017.  [3] S. Makki, Z. Assaghir, Y. Taher, R. Haque, M.S. Hacid, and H. Zeineddine, “An experimental study with imbalanced classification approaches for credit card fraud detection,” IEEE Access, vol. 7, pp. 93 010–93 022, 2019.
 [4] M. Rahman, M. Rahman, B. Carbunar, and D. H. Chau, “Search rank fraud and malware detection in google play,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 8, pp. 1329–1342, 2017.
 [5] S. Liu, B. Hooi, and C. Faloutsos, “A contrast metric for fraud detection in rich graphs,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 12, pp. 2235–2248, 2019.
 [6] P. Mishra, V. Varadharajan, U. Tupakula, and E. S. Pilli, “A detailed investigation and analysis of using machine learning techniques for intrusion detection,” IEEE Communications Surveys and Tutorials, vol. 21, no. 1, pp. 686–728, 2019.
 [7] M. R. G. Raman, N. Somu, K. Kirthivasan, and V. S. Sriram, “A hypergraph and arithmetic residuebased probabilistic neural network for classification in intrusion detection systems,” Neural Networks, vol. 92, pp. 89–97, 2017.
 [8] J. Mao, T. Wang, C. Jin, and A. Zhou, “Feature groupingbased outlier detection upon streaming trajectories,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 12, pp. 2696–2709, 2017.
 [9] X. Yang, L. J. Latecki, and D. Pokrajac, “Outlier detection with globally optimal exemplarbased gmm,” in SIAM International Conference on Data Mining, 2009, pp. 145–154.

[10]
B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen, “Deep autoencoding gaussian mixture model for unsupervised anomaly detection,” in
International Conference on Learning Representations, 2018.  [11] E. Manzoor, S. M. Milajerdi, and L. Akoglu, “Fast memoryefficient anomaly detection in streaming heterogeneous graphs,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1035–1044.

[12]
H. Paulheim and R. Meusel, “A decomposition of the outlier detection problem into a set of supervised learning problems,”
Machine Learning, vol. 100, no. 23, pp. 509–531, 2015.  [13] M. Salehi, C. Leckie, J. C. Bezdek, T. Vaithianathan, and X. Zhang, “Fast memory efficient local outlier detection in data streams,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3246–3260, 2016.
 [14] M. H. Chehreghani, “Knearest neighbor search and outlier detection via minimax distances,” in SIAM International Conference on Data Mining, 2016, pp. 405–413.
 [15] Y. Djenouri, A. Belhadi, J. C.W. Lin, and A. Cano, “Adapted knearest neighbors for detecting anomalies on spatio–temporal traffic flow,” IEEE Access, vol. 7, pp. 10 015–10 027, 2019.
 [16] C. Zhou and R. C. Paffenroth, “Anomaly detection with robust deep autoencoders,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 665–674.
 [17] T. Schlegl, P. Seeböck, S. M. Waldstein, U. SchmidtErfurth, and G. Langs, “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” in International Conference on Information Processing in Medical Imaging, 2017, pp. 146–157.

[18]
M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli, “Adversarially learned oneclass classifier for novelty detection,” in
IEEE Conference on Computer Vision and Pattern Recognition
, 2018, p. 3379–3388.  [19] I. Steinwart, “A classification framework for anomaly detection,” Journal of Machine Learning Research, vol. 6, no. 1, pp. 211–232, 2005.
 [20] Y. Zhang, L. Li, J. Zhou, X. Li, and Z. Zhou, “Anomaly detection with partially observed anomalies,” in WWW: International World Wide Web Conference, 2018, pp. 639–646.
 [21] C. C. Aggarwal, Outlier Analysis. Springer International Publishing, 2017.
 [22] A. Daneshpazhouh and A. Sami, “Entropybased outlier detection using semisupervised approach with few positive examples,” Pattern Recognition Letters, vol. 49, pp. 77–84, 2014.

[23]
——, “Semisupervised outlier detection with only positive and unlabeled
data based on fuzzy clustering,”
International Journal on Artificial Intelligence Tools
, vol. 24, no. 3, 2015.  [24] B. Liu, Y. Xiao, P. S. Yu, Z. Hao, and L. Cao, “An efficient approach for outlier detection with imperfect data labels,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 7, pp. 1602–1616, 2014.
 [25] Y. Liu, Z. Li, C. Zhou, Y. Jiang, J. Sun, M. Wang, and X. He, “Generative adversarial active learning for unsupervised outlier detection,” IEEE Transactions on Knowledge and Data Engineering, 2019.
 [26] S. A. Shah and V. Koltun, “Robust continuous clustering,” Proceedings of the National Academy of Sciences, vol. 114, no. 37, p. 9814–9819, 2017.
 [27] H. Wang, M. Bah, and M. Hammad, “Progress in outlier detection techniques: A survey,” IEEE Access, vol. 7, pp. 107 964–108 000, 2019.

[28]
B. X. Wang and N. Japkowicz, “Boosting support vector machines for imbalanced data sets,”
Knowledge and Information Systems, vol. 25, no. 1, pp. 1–20, 2010. 
[29]
R. F. Lima and A. C. M. Pereira, “Feature selection approaches to fraud detection in epayment systems,” in
International Conference on Electronic Commerce and Web Technologies, 2017, pp. 111–126.  [30] J. L. P. Lima, D. Macêdo, and C. Zanchettin, “Heartbeat anomaly detection using adversarial oversampling,” in IEEE International Joint Conference on Neural Networks, 2019.

[31]
S. M. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie, “Highdimensional and largescale anomaly detection using a linear oneclass svm with deep learning,”
Pattern Recognition, vol. 58, pp. 128–134, 2016.  [32] B. Liu, Y. Xiao, L. Cao, Z. Hao, and F. Deng, “Svddbased outlier detection on uncertain data,” Knowledge and Information Systems, vol. 34, no. 3, pp. 597–618, 2013.
 [33] J. Gao, H. Cheng, and P. Tan, “Semisupervised outlier detection,” in ACM symposium on Applied computing, 2006, pp. 635–636.
 [34] Z. Xue, Y. Shang, and A. Feng, “Semisupervised outlier detection based on fuzzy rough cmeans clustering,” Mathematics and Computers in Simulation, vol. 80, no. 9, pp. 1911–1921, 2010.
 [35] I. J. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Advances in Neural Information Processing Systems, vol. 3, pp. 2672–2680, 2014.
 [36] S. Akcay, A. Atapourabarghouei, and T. P. Breckon, “Ganomaly: Semisupervised anomaly detection via adversarial training,” arXiv:1805.06725, 2018.
 [37] H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar, “Efficient ganbased anomaly detection,” in The Workshop on International Conference on Learning Representations, 2018.
 [38] J. Bian, X. Hui, S. Sun, X. Zhao, and M. Tan, “A novel and efficient cvaeganbased approach with informative manifold for semisupervised anomaly detection,” IEEE Access, vol. 7, pp. 88 903–88 916, 2019.
 [39] C. Wang, Y. Zhang, and C. Liu, “Anomaly detection via minimum likelihood generative adversarial networks,” in International Conference on Pattern Recognition, 2018.
 [40] S. K. Lim, Y. Loo, N.T. Tran, N.M. Cheung, G. Roig, and Y. Elovici, “Doping: Generative data augmentation for unsupervised anomaly detection with gan,” in IEEE International Conference on Data Mining, 2018.
 [41] Y. J. Zheng, X. Zhou, W. Sheng, Y. Xue, and S. Chen, “Generative adversarial network based telecom fraud detection at the receiving bank,” Neural Networks, vol. 102, pp. 78–86, 2018.
 [42] M. Kimura and T. Yanagihara, “Semisupervised anomaly detection using gans for visual inspection in noisy training data,” arXiv:1807.01136, 2018.
 [43] H. Gao, Z. Shou, A. Zareian, H. Zhang, and S. Chang, “Lowshot learning via covariancepreserving adversarial augmentation networks,” in Advances in Neural Information Processing Systems, 2018, pp. 981–991.
 [44] G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent, and M. E. Houle, “On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study,” Data Mining and Knowledge Discovery, vol. 30, no. 4, pp. 891–927, 2016.
 [45] A. D. Pozzolo, O. Caelen, R. A. Johnson, and G. Bontempi, “Calibrating probability with undersampling for unbalanced classification,” in IEEE Symposium Series on Computational Intelligence, 2015, pp. 159–166.
Comments
There are no comments yet.