1. Introduction
Anomaly detection targets at identifying the rare instances that behave significantly different from the majority instances. Conventional detection algorithms are mainly based on the assumption that instances are independent and identically distributed (i.i.d.). But in many practical scenarios, instances are often explicitly or implicitly connected with each other, resulting in the socalled attributed networks. The unique data characteristics of attributed networks bring several challenges to anomaly detection. The definition of anomaly becomes more complicated and obscure. Apart from anomalous nodes whose nodal attributes are rather different from the majority reference nodes from a global perspective, nodes with nodal attributes deviate remarkably from their communities are also considered to be anomalies.
To handle the topological structures and the heterogeneity challenge, in network analysis, a widelyused and effective approach is to embed all information in attributed network into unified lowdimensional node representations. However, it would achieve suboptimal performance when directly applied to the anomaly detection. To achieve the join embedding, most existing network embedding algorithms and recent joint embedding based anomaly detection methods mainly rely on the homophily hypothesis, which implies that connected nodes tend to have similar nodal attributes (McPherson et al., 2001; Liu et al., 2019). Based on this assumption, topological relations could be introduced via involving regularization terms to minimize the distance between embedding representations of linked nodes (Yu et al., 2018; Li et al., 2019). Another effective way to encode networks is to employ graph convolutional networks (GCN) (Kipf and Welling, 2017). It could be considered as a special form of Laplacian smoothing that learns nodes’ representations from their neighbors (Li et al., 2018; Huang et al., 2019). The homophily hypothesis or smoothing operations are not in line with anomaly detection. They might oversmooth the node representations, and make anomalous nodes less distinguishable from the majority within the community. For example, malicious users would have completely different nodal attributes than their friends. Customers who have written fake reviews might purchase the same products as the normal ones. Thus, existing joint learning models in network analysis could not be directly applied to anomaly detection.
To bridge the gap, we investigate the problem of anomaly detection in attributed networks. Two specific research questions are studied. 1) How to formally define anomalies in attributed network? 2) How to perform joint learning on topological structures and nodal attributes, while being in line with the anomaly detection, such as getting rid of the restriction of homophily and oversmooth problems? Following these research perspectives, our contributions could be summarized as follows.

We propose a graph convolution and deconvolution based framework, for anomaly detection in attributed networks.

We develop a tailored model to project the attributed network into a special space for global anomalies and community anomalies. It leverages Laplacian sharpening to amplify the distances between embedding representations of anomalies and the ones of other nodes within the communities.

We conduct evaluations on realworld datasets, demonstrating the effectiveness of SpecAE.
2. Problem Statement
Let = {, } be an input attributed network, with nodes interconnected by a network. denotes the corresponding adjacency matrix. Each node is associated with a set of dimensional nodal attributes. We collect all these nodal attribute as .
Definition 1 (Global Anomaly) It refers to a node whose nodal attributes are rare and significantly different from the nodal attributes of majority nodes, from a global perspective.
Definition 2 (Community Anomaly) It is defined as a node with nodal attributes that significantly deviate from the node’s neighbors’ nodal attributes, based on the topological network structure.
Given the aforementioned definitions, we formally define the anomaly detection in attributed networks problem as follows. Given a set of nodes connected by an attribute network , we target at identifying the global anomalies and community anomalies in .
3. Proposed Spectral AutoEncoder
We propose a Spectral autoencoder based anomaly detection framework  SpecAE, for attributed networks. The pipeline of SpecAE is illustrated in Fig.1. Given an attributed network , we leverage a tailored Spectral autoencoder to jointly embed nodal attributes and relations into a carefullydesigned space . To detect the global anomalies, we apply an autoencoder to all nodal attributes to learn embedding representations as well as the reconstruction errors . In such a way, we could globally compare all nodes’ attributes. To detect the community anomalies, we design novel graph convolutional encoder and deconvolutional decoder networks to learn nodes’ community representations , based on each node’s neighbors. The corresponding reconstruction errors are denoted as . Later on, based on the tailored joint representations , we estimate the suspicious level of each node by measuring its embedding representation
’s energy in the Gaussian Mixture Model.
3.1. Tailored Embedding Spaces for Global Anomalies and Community Anomalies
First, based on the definition of global anomaly, we employ an autoencoder to embed nodal attributes to learn the first type of tailored representations, i.e., and . The encoding and decoding processes are achieved via:
(1) 
where denotes the reconstructed nodal attributes. Since anomalies are harder to reconstruct than normal nodes, we also include the reconstruction errors , where the operation denotes a series of the distance measuring matrices such as the Euclidean distance and the cosine distance. In such a way, global anomalies would tend to have their representations in being different from the majority, and being large.
Second, to identify community anomalies, we need to jointly consider and . Based on their topological dependencies, we develop a novel graph convolutional encoder and graph deconvolutional decoder to learn the second type of tailored representations, i.e., and . Details are introduced as follows.
3.2. Graph Convolution and Deconvolution
Our goal is to learn nodes’ community representations , which describe the expected nodal attributes according to their neighbors.
A straightforward solution is to apply GCN to embed . From a formulation perspective (Li et al., 2018), GCN could be treated as a special form of Laplacian Smoothing. The Laplacian smoothing on each channel of the input features can be written as:
(2) 
where is a parameter which controls the weighting between the features of the current node and the features of its neighbors, with a normalization trick as:
(3) 
where and .
Based on the definition of convolution in graph signals on the spectral domain, to generate a new matrix from by applying the graph convolution as spectral filters, we have:
(4) 
which is a special form of Laplacian Smoothing when after replacing the normalized graph Laplacian with the symmetric normalized graph Laplacian .
However, GCN does not contain nodal attributes reconstruction procedure, which is useful in the anomaly detection task with three reasons. i) Training transformation functions base on oneclass observations (only normal instances available) without the reconstruction error may easily lead the objective function converges to a local optimal when all of the node representations collapse into a small area (one point in extreme cases) in the latent space. ii) Repeatedly applying Laplacian smoothing might cause the nodal attributes overmixed with their neighbors and make them indistinguishable, since it will be more difficult to identify each individual instance after the smoothness operation which has weakened their uniqueness of the original attributes. iii) Reconstruction error usually contains useful information as indicators for anomaly detection. For example, in (Chen et al., 2017), the reconstruction error could be directly used as an anomaly score to rank the anomalous degree of nodes.
Thus, we decide to design the graph decoding (graph deconvolution) from the smoothed features as a complementary inverse process of graph convolution. Inspired by the digital images field, sharpening is an inverse process of blurring/smoothing (Ma et al., 2014). Comparing with the smoothing process, which is done in the spatial domain by pixel averaging in neighbors, sharpening is to find the difference by the neighbors, done by spatial differentiation. A Laplacian operator is to restore fine details of an image which has been smoothed or blurred. After Laplacian sharpening, we can reconstruct the nodal attributes from the fusion features caused by the graph convolution process.
If we replace the original GCN function in Eq. (4) as a general Laplacian smoothing, we will have:
(5) 
Generalizing the above definition of graph convolution with adjacency matrix and the nodal attributes of instances and input channels, the propagation rule of the convolution layer can be written as:
(6) 
where is the trainable weight matrix in the convolution layer,
is the activation function, e.g.,
. The parameter is to control the weighting between the features of the current nodal attributes and the features of its neighbors.Contrary to Laplacian smoothing, we compute the new features of nodal attributes through sharpening the features with their neighbors in order to reconstruct the features from the smoothed results. To magnify the difference between the current node and its neighbors, we will have:
(7) 
Given the above definition of deconvolution to an attributed network , with adjacency matrix and the nodal attributes of instances and input channels which after the convolution process, the propagation rule of the deconvolution layer is:
(8) 
where is the trainable weight matrix in the deconvolution layer. After the sharpening process, we can reconstruct the original attributes from the smoothed features.
Given nodal attributes with adjacency matrix , the compression network computes as:
(9)  
(10)  
(11) 
(12) 
where the
denote the mean vector in Eq. (
7), similarly, . The latent variable of each node is shown in Eq. (12), which concatenates from four components. is learned by the graph convolutional encoder which including the neighbor attribution with the topological relations. is the reconstruction of the nodal attributes base on the fusion representation and the adjacency matrix .3.3. Anomaly Detection via Density Estimation
Given the learned embedding representation of samples and their soft mixturecomponent membership prediction
(estimated from a softmax layer based on the lowdimensional representation
), where is a Kdimensional vector andis the number of mixture components in Gaussian Mixture Model (GMM), we can estimate the mixture probability
, the mean value , the covariance vector for component in GMM respectively (Zong et al., 2018).The sample energy can be inferred by:
(13) 
where we estimate the parameters in GMM as follows:
(14) 
(15) 
3.4. Objective Function
Given an input attributed network = , the objective function is constructed as follow:
(16)  
The objective function includes four components:

are the loss function that characterizes the reconstruction error.

denotes the sample energy of the GMM estimation in Eq. (13). Through minimizing the sample energy, we maximize the likelihood of nonanomalous samples, and predict samples with topK high energy as anomalies.

To avoid trivial solutions when the diagonal entries of covariance matrices degenerate to 0, we penalize small values by the third component as a regularizer.

We optimize the variational lower bound as the last two terms.
is the KullbackLeibler divergence between
and . is a Gaussian prior:.
4. Experiments
In this section, we evaluate the performance of our model with experiments on realworld datasets (Cora (Sen et al., 2008), Pubmed (Sen et al., 2008), and PolBlog (Perozzi et al., 2014)) with five baseline methods, and ablation analysis with a case study, to verify the effectiveness of the proposed approach.
In order to simulate ground truth outlierness, for Cora and Pubmed, we adopt the same strategy as in
(Skillicorn, 2007; Song et al., 2007) to generate a combined set of anomalies from nodal attributes perspective and topological structure perspective (by anomaly ratio of %5). We inject an equal ratio of anomalies for both perspectives. First, we randomly select bags of words from the word dictionary which have low correlations as new nodal attributes, and replace these generated nodal attributes to the original , mark them as anomalies; Then we randomly pick another nodes and replace the attribution of each node to another node where the node deviates from the node with different context attributes (e.g., in citation networks, node and denote different categories of paper), in the meanwhile we keep their original topological relations (we keep the adjacency matrix to denote their citation relations).4.1. Effectiveness evaluation
Cora (Sen et al., 2008)  Pubmed (Sen et al., 2008)  

K  5  10  15  20  5  10  15  20 
LOF (Breunig et al., 2000)  85.60  80.91  76.62  72.01  85.71  81.27  76.89  72.71 
OC SVM (Schölkopf et al., 2000)  85.45  80.76  76.92  72.45  85.54  81.03  76.47  72.27 
Deep SVDD (Ruff et al., 2018)  87.89  85.49  81.50  77.25  87.93  84.63  80.96  77.07 
Radar (Li et al., 2017)  87.70  85.27  82.35  78.58  86.02  81.90  78.03  74.06 
GCN (Kipf and Welling, 2017)  85.23  83.38  76.70  71.05  86.95  83.57  80.11  76.41 
SpecAE  94.31  90.81  86.67  82.05  94.98  90.50  85.96  81.52 
For both metrics of Accuracy@K (Tab. 1) and ROCAUC (Fig. 2), SpecAE consistently achieves good performance. In addition, we make the following observations. (1) In general, the proposed SpecAE outperforms all of the baseline methods. Comparing with the methods which only utilize unimodal information, our proposed approach achieves better performance which validates the importance of applying heterogeneous multimodal sources. (2) The performance of deep models are superior to the conventional anomaly detection methods in shallow structures. It verifies the effectiveness of extracting features to represent the nodal attributes on attributes networks in deep structures which can extract features in a more effective way. (3) SpecAE shows the importance of enlarging the model capacity by introducing the mutual interactions between nodal attributes with linkage connections and enabling topological relations to contain additional information. As can be observed, Spectral autoencoder can achieve better performance on anomaly detection application than pure GCN based model, which justifies the advantage of our proposed model that overcomes the drawbacks of the GCN, and adopts it into an anomaly detection scenario.
4.2. Ablation Analysis And A Case Study
Metric  Accuracy  Precision  Recall  F1  AUC 

SpecAES  54.51  12.54  59.63  20.72  58.45 
SpecAEN  90.51  52.40  52.59  52.50  73.65 
SpecAEnr  82.24  11.07  11.11  11.09  50.53 
SpecAE  90.36  51.66  51.85  51.76  73.96 
SpecAE  91.93  54.98  55.19  55.08  77.15 
Ablation and hyperparameter analysis.
We conduct ablation studies on Cora to demonstrate the importance of individual components in SpecAE. We perform ablation analysis comparing SpecAE with four alternatives. (i) SpecAES: SpecAE without the representations for community anomaly detection, i.e., and ; (ii) SpecAEN: SpecAE without the representations for global anomaly detection, i.e., and ; (iii) SpecAEnr: SpecAE without the reconstruction components and ; and (iv) training with an extreme value hyperparameter (shortcircuit the self features in the graph convolution and deconvolution). Results on are shown in Tab. 2, indicating that the comprehensive method is superior to the variants, since it outperforms most of the ablation test settings where one component has been removed or replaced at a time from the full system and performed retraining and retesting.
(Anomaly).com  Abnormal Keywords 

thismodernworld  siberians, fenno, scandinavia, 
ultraviolet, primates, colder  
directorblue.blogspot  boiler, inflatable, choppers, streamline, 
rooftop, jettison, sheathed  
balloonjuice  recapturing, wooly, buffy, 
talons, snarled, tails, dismemberment  
fringeblog  mosquitos, nerf, redwoods, mammoths, 
fungi, snail, hawked  
nomayo.mu.nu  pancakes, cheese, stewed, pork, brunch 
We also conduct a case study using the PolBlog (Perozzi et al., 2014) dataset. The attributes of each node represent words appeared in each blog based on a given word dictionary. As shown in the Tab. 3, five sampled anomalies (nodes) are listed for more specific explanations. The abnormal keywords refer to the key information which appears in the anomaly bloggers, yet has low frequency and lacks political flavor in a global perspective among all bloggers. To be more specific, some of the posts actually focus on other fields like climates (thismordernworld), food (nomayo.mu.nu), constructions and tools (directorblue.blogspot), huntings and creatures (ballonjuice, fringeblog). Comparing with the most popular political topics, these selected keywords in the abnormal bloggers are less relevance to the political events.
5. Conclusion and Future Work
In this paper, we introduce an effective framework SpecAE for identifying anomalies in attributed networks. To detect global and community anomalies, we map the attributed network into two types of lowdimensional representations. The first type consists of node representations learned from an autoencoder and corresponding reconstruction errors. To learn the second type of representations, we design the novel graph deconvolution neural networks as the complementary operation to the graph convolution, aiming to reconstruct nodal attributes according to the topological relations. Experiential results demonstrate that SpecAE has superior performance over stateoftheart algorithms on realworld datasets. Ablation analysis shows the effectiveness of each component.
References
 LOF: identifying densitybased local outliers. In ACM sigmod record, Cited by: Table 1.
 Outlier detection with autoencoder ensembles. In SDM, Cited by: §3.2.
 Graph recurrent networks with attributed random walks. Cited by: §1.
 Semisupervised classification with graph convolutional networks. ICLR. Cited by: §1, Table 1.
 Radar: residual analysis for anomaly detection in attributed networks. IJCAI. Cited by: Table 1.

Deeper insights into graph convolutional networks for semisupervised learning
. In AAAI, Cited by: §1, §3.2.  Deep structured crossmodal anomaly detection. IJCNN. Cited by: §1.
 Is a single vector enough? exploring node polysemy for network embedding. arXiv preprint arXiv:1905.10668. Cited by: §1.
 Optimized laplacian image sharpening algorithm based on graphic processing unit. PHYSICA A. Cited by: §3.2.
 Birds of a feather: homophily in social networks. Annual review of sociology. Cited by: §1.
 Focused clustering and outlier detection in large attributed graphs. In SIGKDD, Cited by: §4.2, §4.
 Deep oneclass classification. In ICML, Cited by: Table 1.

Support vector method for novelty detection
. In NeurIPS, Cited by: Table 1.  Collective classification in network data. AI magazine. Cited by: Table 1, §4.
 Detecting anomalies in graphs. In IEEE ISI, Cited by: §4.
 Conditional anomaly detection. IEEE TKDE. Cited by: §4.
 Netwalk: a flexible deep embedding approach for anomaly detection in dynamic networks. In SIGKDD, Cited by: §1.
 Deep autoencoding gaussian mixture model for unsupervised anomaly detection. ICLR. Cited by: §3.3.