Attributed networks [li2019adaptive] are ubiquitous in the real world such as social networks[liao2018attributed], communication networks[2019DPernesIJCNN], and product co-purchase networks[shi2018heterogeneous], in which each node is associated with a rich set of attributes or characteristics, in addition to the raw network topology.
Anomaly detection on attribute networks aims at finding nodes whose patterns or behaviors significantly deviate from the reference nodes, which has a broad impact on various domains such as network intrusion detection [ding2012intrusion], system fault diagnosis [cheng2016ranking], and social spammer detection [fakhraei2015collective]. Recently, there is a growing interest in researches about anomaly detection on attributed networks. Some of them study the problem of community-level anomalies detection by comparing the current node with other reference nodes within the same community [perozzi2016scalable] or measuring the quality of connected subgraphs [perozzi2018discovering]. Some of them conduct anomaly analysis through subspace selection of node feature [sanchez2013subspaces, perozzi2014focused]. While some recent residual analysis based methods attempt to find anomalies by assuming that anomalies cannot be approximated from other reference nodes [li2017radar, peng2018anomalous].
Although above mentioned algorithms had their fair share of success, these methods either suffer from severe computational overhead caused by shallow learning mechanisms and subspace selection, or neglect the complex interactions between nodes and attributes by only learning the representations for nodes[ding2019deep], while interactions between two different modality sources are of great importance for anomaly detection task to capture both structure and attribute induced anomalies. To alleviate the above-mentioned problems, in this paper, we propose a deep joint representation learning framework for anomaly detection through a dual autoencoder (AnomalyDAE), which captures the complex interactions between the network structure and node attribute for high-quality embeddings. Different from [ding2019deep], which employs one single graph convolutional network (GCN) [kipf2017semi] based encoder for node embedding, AnomalyDAE consists of a structure autoencoder and an attribute autoencoder to learn the latent representation of nodes and attributes jointly by reconstructing the original network topologies and node attributes respectively. Then, anomalies in the network are detected by measuring the reconstruction errors of nodes from both the structure and attribute perspectives.
In sum, the main contributions of this paper are as follows:
We propose a deep joint representation learning framework for anomaly detection on attributed networks via a dual autoencoder where the complex cross-modality interactions between the network structure and node attribute are captured, and the anomalies are measured from both the structure and attribute perspectives.
We conduct extensive experiments on multiple real-world datasets, and the results show that AnomalyDAE consistently outperforms state-of-the-art deep model significantly, with up to 22.32% improvement in terms of the ROC AUC score. The source codes111https://github.com/haoyfan/AnomalyDAE are publicly available.
2 Notations and Problem Statement
In this section, we formally define the frequently-used notations and the studied problem. The notations used in this paper are summarized in Table 1.
Attributed Network is defined as an undirected graph with M= nodes and edges, each of nodes is associated with a N= dimension attribute.
Given an attributed network , our goal is to detect the nodes that are rare and differ significantly from the majority of the reference nodes in terms both the structure and attribute information of the nodes. More formally, we aim to learn a score function , to classify sample
, to classify samplebased on the threshold :
where denotes the label of sample , with 0 being the normal class and 1 the anomalous class.
|A set of nodes in network.|
|A set of node attributes in network.|
|A set of edges in network.|
|The number of nodes.|
|The dimension of attribute.|
|The dimension of embedding.|
|Adjacency matrix of a network.|
|Attribute matrix of all nodes.|
|Latent embedding of nodes.|
|Latent embedding of attributes.|
In this section, we introduce the proposed AnomalyDAE in detail. As shown in Fig. 1, AnomalyDAE is an end-to-end joint representation learning framework that consists of a structure autoencoder for network structure reconstruction, and an attribute autoencoder for node attributes reconstruction. Take the learned node embedding from the structure encoder and the learned attribute embedding from the attribute encoder as inputs, the interactions between the network structure and the node attribute are jointly captured by both structure decoder and attribute decoder during the training. Finally, anomalies in the network can be measured by the reconstruction errors of network structure and node attribute.
3.1 Structure Autoencoder
In order to obtain sufficient representative high-level node features, structure encoder firstly transforms the original observed node attribute into the low-dimentional latent representation , which is shown as follows:
whereand are the weight and bias learned by encoder, and are the dimensionalities of and , respectively.
Given the transformed node embedding , a graph attention layer [velickovic2018graph] is then employed to aggregate the representation from neighbor nodes, by performing a shared attentional mechanism on the nodes:
where is the importance weight of node to node ,
denotes the neural network parameterized by weightsand that shared by all nodes, denotes the concatenate operation. Then, the final importance weight is normalized through the softmax function:
where denotes the neighbors of node , which is provided by adjacency matrix , and its final embedding can be obtained by weighted sum based on the learned importance weights as follows:
Finally, structure decoder takes the final node embeddings as inputs to decode them for reconstruction of the original network structure:
3.2 Attribute Autoencoder
In the attribute encoder, two non-linear feature transform layers are employed to map the observed attribute data to the latent attribute embedding , which can be formulated as follows:
where , , and are the weights and biases learned by two layers, , , and are the dimensionalities of , and , respectively.
Finally, attribute decoder takes both the node embeddings learned by structure encoder, and the attribute embeddings as inputs for decoding of the original node attribute:
in which, the interactions between network structure and node attribute are jointly captured. Different from the structure decoder, in the attribute decoder, no activation function is utilized here for the arbitrary-valued attribute.
In AnomalyDAE, the computational complexity of structure autoencoder and attribute autoencoder are and respectively, where is the number of nodes, is the number of edge, is the dimension of attribute, is the dimension of embedding.
3.3 Loss function
The training objective of AnomalyDAE is to minimize the reconstruction errors of both network structure and node attribute:
where is the parameter which control the trade off between structure reconstruction and attribute reconstruction. is the Hadamard product, and are defined as follows:
where and , which are used to impose more penalty to the reconstruction error of the non-zero elements due to some missing edges or attributes in the real world.
3.4 Anomaly Detection
Inspired by the motivation that the pattern of abnormal nodes deviate from the majority of other nodes in either structure or attribute, the anomaly score of node is defined as the reconstruction error from both network structure and node attribute perspective:
Based on the measured anomaly scores, the threshold in Eq. 1 can be determined according to distribution of scores, e.g. the nodes of top-k scores are classified as anomalous nodes.
4.1 Experimental Setup
Three commonly used real-world datasets[ding2019deep] are used in this paper to evaluate the proposed method, including BlogCatalog, Flickr, and ACM. The statistics of datasets are shown in Table 2. In the experiment, we train AnomalyDAE with 100, 100, 80 iterations for BlogCatalog, Flickr, and ACM respectively. Adam[kingma2015adam] algorithm is utilized for optimization with learning rate as 0.001. The embedding dimension is set as 128 for all datasets. The parameters are empirically set as (0.7, 5, 40), (0.9, 8, 90), (0.7, 3, 10) for BlogCatalog, Flickr, and ACM respectively.
4.2 Result Analysis
4.2.1 Performance Evaluation
We compare AnomalyDAE with the state-of-the-art methods including LOF [breunig2000lof], SCAN [xu2007scan], AMEN [perozzi2016scalable], Radar[li2017radar], Anomalous[peng2018anomalous], and Dominant[ding2019deep]. The AUC scores (the Area Under a receiver operating characteristic Curve) for anomaly detection are reported in Table 3.
The experimental results show that the proposed AnomalyDAE significantly outperforms all baselines in various datasets. The performance of AnomalyDAE is much higher than traditional anomaly detection methods, i.e., AnomalyDAE outperforms LOF by 48.66%, SCAN by 70.54%, and AMEN by 44.44% respectively on AUC for BlogCatalog dataset, because of that LOF and SCAN consider only network structure or node attribute, and AMEN is designed for anomalous neighborhoods rather than node itself. Besides, for Flickr dataset, AnomalyDAE increases the AUC score by 24.36% compared with Radar and 25.63% compared with Anomalous, this is because that residual analysis or CUR decomposition[mahoney2009cur] based methods are not only sensitive to network sparsity, but with limited learning ability on large graph. Compared with more recent Dominant, AnomalyDAE achieves gains by 19.68%, 22.32%, 15.11% on BlogCatalog, Flickr, and ACM datasets respectively. Although GCN based encoder in Dominant is capable of learning discriminative node embeddings by aggregating neighbor features, the single graph encoder cannot jointly capture the complex interactions between network structure and node attribute, while AnomalyDAE employs two separate encoders for jointly learning of node embedding and attribute embedding respectively, which considers the complex modality interactions of network structure and node attribute.
4.2.2 Parameter Sensitivity
In this section, we investigate the parameter sensitivity of different numbers of the embedding dimension and the trade-off parameter for anomaly detection. The experiment results are shown in Fig. 2. We can see that a relative high dimension such as 128 or 256 facilitates high performance because higher dimensional embeddings are capable of encoding more information. However, the dimension with too low value or too high value would degrade the performance because of the weak modeling ability or suffering from overfitting. In terms of , considering only attribute reconstruction (=0) or structure reconstruction (=1) would results in poor performance, which demonstrates the importance of the interactions between the network structure and node attribute on attributed network for anomaly detection.
In this paper, we study the problem of anomaly detection on attributed network by considering the complex modality interaction between network structure and node attribute. To cope with this problem, we propose a deep joint representation learning framework for anomaly detection via a dual autoencoder. By introducing two separate autoencoders for jointly learning of node embedding and attribute embedding, AnomalyDAE performs better than current state-of-the-art methods on multiple real-world datasets.