Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection

05/01/2020 ∙ by Zhiwei Liu, et al. ∙ Beihang University University of Illinois at Chicago 0

The graph-based model can help to detect suspicious fraud online. Owing to the development of Graph Neural Networks (GNNs), prior research work has proposed many GNN-based fraud detection frameworks based on either homogeneous graphs or heterogeneous graphs. These work follow the existing GNN framework by aggregating the neighboring information to learn the node embedding, which lays on the assumption that the neighbors share similar context, features, and relations. However, the inconsistency problem is hardly investigated, i.e., the context inconsistency, feature inconsistency, and relation inconsistency. In this paper, we introduce these inconsistencies and design a new GNN framework, GraphConsis, to tackle the inconsistency problem: (1) for the context inconsistency, we propose to combine the context embeddings with node features, (2) for the feature inconsistency, we design a consistency score to filter the inconsistent neighbors and generate corresponding sampling probability, and (3) for the relation inconsistency, we learn a relation attention weights associated with the sampled nodes. Empirical analysis on four datasets indicates the inconsistency problem is crucial in a fraud detection task. The extensive experiments prove the effectiveness of GraphConsis. We also released a GNN-based fraud detection toolbox with implementations of SOTA models. The code is available at blue<>.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

There are various kinds of fraudulent activities on the Internet (Jiang et al., 2016), e.g., fraudsters disguise as regular users to post fake reviews (Kaghazgaran et al., 2019) and commit download fraud (Dou et al., 2019). By modeling the entities as nodes and the corresponding interactions between entities as edges (Peng et al., 2019), we can design a graph-based algorithm to detect the suspicious patterns and therefore can spot the fraudsters. Along with the development of Graph Neural Networks (GNNs) (Kipf and Welling, 2017; Veličković et al., 2017; Hamilton et al., 2017), which are powerful in learning the deep representation of nodes, recently, the previous endeavors also propose many GNN-based fraud detection frameworks (Wang et al., 2019b; Li et al., 2019; Zhong et al., 2020; Liu et al., 2018; Wang et al., 2019a; Zhang et al., 2019).

Among those frameworks, (Wang et al., 2019b; Li et al., 2019) detect opinion fraud in the online review system, (Zhong et al., 2020; Liu et al., 2018; Wang et al., 2019a) aim at financial fraud and (Zhang et al., 2019) targets cyber-criminal in online forums. They model their problems upon either homogeneous (Wang et al., 2019b; Li et al., 2019) or heterogeneous (Liu et al., 2018; Zhong et al., 2020; Li et al., 2019; Wang et al., 2019a; Zhang et al., 2019) graphs. Regarding the base model, FdGars (Wang et al., 2019b) and GAS(Li et al., 2019) adopt GCN (Kipf and Welling, 2017), while SemiGNN and Player2Vec (Zhang et al., 2019) adopt GAT (Veličković et al., 2017). Some works (Liu et al., 2018; Li et al., 2019; Zhong et al., 2020) devise new aggregators to aggregate the neighborhood information. Those GNN-based fraud detectors learn the node representation iteratively and predict the node suspiciousness in an end-to-end and semi-supervised fashion.

Figure 1. Left: A toy example of a graph with two relations constructed on a fraud dataset, are neighbors of . Context inconsistency: fraudster can connect to many benign neighbors ( in Relation II) to disguise itself. Feature inconsistency: for and with the same relation to , their features may have great differences. Relation inconsistency: for , Relation I connects more similar neighbors than Relation II. Right

: To alleviate the inconsistency problem, we introduce three techniques. First, we propose to combine the context embeddings with feature vectors. Then, we calculate the consistency scores of neighbors to filter nodes and generate sampling probabilities. Finally, we aggregate the sampled neighbors with the attention mechanism over relation embeddings.

However, all existing methods ignore the inconsistency problem when designing a GNN model regarding the fraud detection task. The inconsistency problem is associated with the aggregation process of the GNN model. The mechanism of aggregation is based on the assumption that the neighbors share similar features and labels (Hou et al., 2020). When the assumption holds, we can aggregate the neighborhood information to learn the node embedding. However, as Figure 1 (Left) shows, the inconsistency in fraud detection problem comes from three perspectives:

(1) Context inconsistency. Smart fraudsters can connect themselves to regular entities as camouflage (Kaghazgaran et al., 2018; Sun et al., 2018). Meanwhile, the amount of fraudsters is much less than that of regular entities. Directly aggregating neighbors by the GNNs can only help the fraudulent entities aggregate the information from regular entities and thus prevent themselves from being spotted by fraud detectors. For example, in Figure 1 (Left), the fraudster connects to benign entities under relation II.

(2) Feature inconsistency. Taking the opinion fraud (e.g., spam reviews) detection problem as an example (Li et al., 2019), assuming there are two reviews posted from the same user but to products in distinct categories, those two reviews have an edge since they share the same user. However, their review content (features) are far from each other as they are associated with different products. Direct aggregation makes the GNN hard to distinguish the unique semantic characteristics of reviews and finally affects its ability to detect spam reviews. For example, in Figure 1 (Left), we can find that the feature of node is inconsistent with nodes , , and .

(3) Relation inconsistency. Since the entities are connected with multiple types of relations, simply treating all the relations equally results in a relation inconsistency problem. For example, two reviews may either be connected by the same user or the same product, which are respectively common-user relation and common-product relation. Assuming that one review is suspicious, then the other one should have a greater suspiciousness if they are connected by common-user relation since fraudulent users tend to post many fraudulent reviews. For example, in Figure 1 (Left), we find that under relation II, the fraudster is connected to two other fraudsters. However, under relation I, the fraudster is connected to only one fraudster but three benign entities.

To tackle all above inconsistencies, we design a novel GNN framework, GraphConsis, to solve the fraud detection problem, as shown in Figure 1 (Right). is built upon a heterogeneous graph. differs existing GNNs from the aggregation process. Instead of directly aggregating neighboring embeddings, we design three techniques to tackle three inconsistency problems simultaneously. Firstly, to handle the context inconsistency of neighbors, assigns each node a trainable context embedding, which is illustrated as the gray block aside nodes in Figure 1 (Right). Secondly, to aggregate consistent neighbor embeddings, we design a new metric to measure the embedding consistency between nodes. By incorporating the embedding consistency score into the aggregation process, we ignore the neighbors with a low consistency score (e.g. the node is dropped in Figure 1 (Right)) and generate the sampling probability. Last but not least, we learn relation attention weights associated with neighbors in order to alleviate the relation inconsistency problem.

The contributions of this paper are:

  • To the best of our knowledge, we are the first work addressing the inconsistency problem in GNN models.

  • We empirically analyze three inconsistency problems regarding applying GNN models to fraud detection tasks.

  • We propose to tackle three inconsistency problems, which combines context embedding, neighborhood information measure, and relational attention.

2. Preliminary

We detect the fraud entities in the graph by using the node representations. Hence, we need to introduce the node representation learning first. A heterogeneous graph , where denotes the nodes, is the feature matrix of nodes, and denotes the edges w.r.t. the relation . We have different types of relations. To represent the nodes as vectors, we need to learn a function that maps nodes to a dimensional space, where . The function

should preserve both the structural information of the graph and the original feature information of the nodes. With the learned node embeddings, we can train a classifier

to detect whether a given node is a fraudster, where denotes fraudster, and denotes benign entity. In this paper, we adopt the GNN framework to learn the node representation through neighbor aggregation. GNN framework can train the mapping function and the classifier simultaneously. We only need to input the graph and the labels of nodes to a GNN model. The general framework of a GNN model is:


where is the hidden embedding of at -th layer, denotes the neighbors of node , and the represents the aggregation function that maps the neighborhood information into a vector. Here, we use to denote the combination of neighbor information and the center node information, it can be direct addition or concatenation then passed to a neural network. For the function, we first assign a sampling probability to the neighboring nodes. Then we sample nodes and average111Other pooling techniques can also be applied. them as a vector. The calculation of probability is introduced later in Eq. (4). Note that the framework of GNN is a -layer structure, where . At -th layer, it aggregates the information from -th layer.

3. Proposed Model

3.1. Context Embedding

The aggregator combines the information of neighboring nodes according to Eq. (2). When , the hidden embedding is equivalent to the node feature. To tackle the context inconsistency problem, we introduce a trainable context embedding for node ., instead of only using its feature vector . The first layer of the aggregator then becomes:


where denotes the concatenation operation. The context embedding is trained to represent the local structure of the node, which can help to distinguish the fraud. If we use addition operation for , then .

3.2. Neighbor Sampling

Since there exists a feature inconsistency problem, we should sample related neighbors rather than assign equal probabilities to them. Thus, we compute the consistency score between embeddings:


where denotes the consistency score for two nodes at -th layer, and is the -norm222Other metrics, such as -norm, are also applicable. of vector. We first apply a threshold to filter neighbors far away from consistent. Then, we assign each node to the filtered neighbors of node with a sampling probability by normalizing its consistency score:


Note that the probability is calculated at each layer for the .

3.3. Relation Attention

We have different relations in the graph. The relation information should also be included in the aggregation process to tackle the relation inconsistency problem. Hence, for each relation , we train a relation vector , where , to represent the relation information that should be incorporated. Since the relation information should be aggregated along with the neighbors to center node , we adopt the self-attention mechanism (Veličković et al., 2017) to assign weights for sampled neighbor nodes:


where denotes the relation of -th sample with node ,

is the activation function, and

represents the attention weights that is shared for all attention layer. The final is:


where is the embedding of -th node sampled based on Eq. (4).

4. Experiments

4.1. Experimental Setup

4.1.1. Dataset and Graph Construction

We utilize the YelpChi spam review dataset (Rayana and Akoglu, 2015), along with three other benchmark datasets (Kipf and Welling, 2017; Hamilton et al., 2017) to study the graph inconsistency problem in the fraud detection task. The YelpChi spam review dataset includes hotel and restaurant reviews filtered (spam) and recommended (legitimate) by Yelp. In this paper, we conduct a spam review classification task on the YelpChi dataset which is a binary classification problem. We remove products with more than 800 reviews to restrict the size of the computation graph. The pre-processed dataset has 29431 users, 182 products, and 45954 reviews (%14.5 spams).

Based on previous studies (Rayana and Akoglu, 2015) which show the spam reviews have connections in user, product, rating, and time, we take reviews as nodes in the graph and design three relations denoted by R-U-R, R-S-R, and R-T-R. R-U-R connects reviews posted by the same user; R-S-R connects reviews under the same product with the same rating; R-T-R connects two reviews under the same product posted in the same month. We take the 100-dimension Word2Vec embedding of each review as its feature like previous work (Li et al., 2019).

4.1.2. Baselines

To show the ability of in alleviating inconsistency problems, we compare its performance with a non-GNN classifier, vanilla GNNs, and GNN-based fraud detectors.

  • Logistic Regression. A non-GNN classifier that makes predictions only based on the reviews features.

  • FdGars (GCN) (Wang et al., 2019b). A spam review detection algorithm using GCN (Kipf and Welling, 2017).

  • GraphSAGE (Hamilton et al., 2017). A popular GNN framework which samples neighboring nodes before aggregation.

  • Player2Vec (Zhang et al., 2019). A state-of-the-art fraud detection model which uses GCN to encode information in each relation, and uses GAT to aggregate neighbors from different relations.

4.1.3. Experimental Settings

We use Adam optimizer to train our model based on the cross-entropy loss. For the hyper-parameters, we choose -layer structure, and the number of samples is set as and for the first layer and second layer, respectively. The embedding dimension of the hidden layer is and for the first layer and second layer, respectively. We use F1-score to measure the overall classification performance and AUC to measure the performance of identifying spam reviews.

4.2. The Inconsistency Problem

We first take the Yelpchi dataset to demonstrate the inconsistency problem in applying GNN to fraud detection tasks. Table 1 shows the statistics of graphs built on YelpChi comparing to node classification benchmark datasets used by (Kipf and Welling, 2017; Hamilton et al., 2017). Yelp-ALL is composed of three single-relation graphs.

Comparing to three widely-used benchmark node classification datasets, we find that a multi-relation graph constructed on YelpChi has a much higher density (the average node degree is greater than 100). It demonstrates that the real-world fraud graphs usually incorporate complex relations and neighbors, and thus render inconsistency problems. Before we compare the graph characteristics and analyze three inconsistency problems, similar to (Hou et al., 2020), we design two characteristic scores. One is the context characteristic score:


where is an indicator function to indicate whether node and node have the same label. We sum all the indication w.r.t. all the edges and normalized by the total number of edges . The context characteristic measures the label similarity between neighboring nodes under a specific relation . The other one is the feature characteristic score:


where we employ the RBF kernel function333Other kernel functions can also be applied. as the similarity measurement between two connected nodes. The overall feature characteristic score is normalized by the product of the total number of edges and the feature dimension . Normalizing the similarity by feature dimension is to fairly compare the feature characteristics of different graphs, which may have different feature dimensions.

Context Inconsistency. We compute the context characteristic based on Eq. (7), which measures the context consistency. For the graph R-T-R, R-S-R and Yelp-ALL, there are less than of neighboring nodes have similar labels. It shows that fraudsters may hide themselves among regular entities under some relations.

Feature Inconsistency. We calculate the feature characteristic using Eq. (8). The graph constructed by R-U-R relation (reviews posted by the same user) has higher feature characteristic than the other two relations. Thus, we need to sample the neighboring nodes not only based on their relations but also the feature similarities.

Relation Inconsistency. For graphs constructed by three different relations, the neighboring nodes also have different feature/label inconsistency score. Thus, we need to treat different relations with different attention weights during the aggregation.

Graph #Nodes #Edges
Others Cora 2,708 5,278 0.72 0.81
PPI 14,755 225,270 0.48 0.98
Reddit 232,965 11,606,919 0.70 0.63
Ours R-U-R 45,954 98,630 0.83 0.90
R-T-R 45,954 1,147,232 0.79 0.05
R-S-R 45,954 6,805,486 0.77 0.05
Yelp-ALL 45,954 7,693,958 0.77 0.07
Table 1. The statistics of different graphs.

4.3. Performance Evaluation

Table 2 shows the experiment result of the spam review detection task. We could see that outperforms other models under and of training data on both metrics, which suggests that we can alleviate the inconsistency problem. Compared with other GNN-based models, LR performs stably and better on AUC. It indicates that the node feature is useful, but the aggregator in GNN undermines the classifier in identifying fraudsters. This observation also proves that the inconsistency problem is critical and should be considered when applying GNNs to fraud detection tasks. Compared to Player2Vec which also learns relation attention, performs better. It suggests that solely using relation attention cannot alleviate the feature inconsistency. The neighbors should be filtered and then sampled based on our designed methods. FdGars directly aggregates neighbors’ information and GraphSAGE samples neighbors with equal probability. Both of them perform worse than , which shows that our neighbor sampling techniques are useful.

Method 40% 60% 80%
LR 0.4647 0.6140 0.4640 0.6239 0.4644 0.6746
GraphSAGE 0.4956 0.5081 0.5127 0.5165 0.5158 0.5169
FdGars 0.4603 0.5505 0.4600 0.5468 0.4603 0.5470
Player2Vec 0.4608 0.5426 0.4608 0.5697 0.4608 0.5403
GraphConsis 0.5656 0.5911 0.5888 0.6613 0.5776 0.7428
Table 2. Experiment results under different training %.

5. Conclusion and Future Works

In this paper, we investigate three inconsistency problems in applying GNNs in fraud detection problem. To address those problems, we design three modules respectively and propose . Experiment results show the effectiveness of . Future work includes devising an adaptive sampling threshold for each relation to maximize the receptive field of GNNs. Investigating the inconsistency problems under other fraud datasets is another avenue of future research.

This work is supported by the National Key R&D Program of China under grant 2018YFC0830804, and in part by NSF under grants III-1526499, III-1763325, III-1909323, and CNS-1930941. For any correspondence, please refer to Hao Peng.


  • Y. Dou, W. Li, Z. Liu, Z. Dong, J. Luo, and P. S. Yu (2019) Uncovering download fraud activities in mobile app markets. In ASONAM, Cited by: §1.
  • W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In NeurIPS, Cited by: §1, 3rd item, §4.1.1, §4.2.
  • Y. Hou, J. Zhang, J. Cheng, K. Ma, R. T. B. Ma, H. Chen, and M. Yang (2020) Measuring and improving the use of graph information in graph neural networks. In ICLR, Cited by: §1, §4.2.
  • M. Jiang, P. Cui, and C. Faloutsos (2016) Suspicious behavior detection: current trends and future directions. IEEE Intelligent Systems. Cited by: §1.
  • P. Kaghazgaran, M. Alfifi, and J. Caverlee (2019) Wide-ranging review manipulation attacks: model, empirical study, and countermeasures. In CIKM, Cited by: §1.
  • P. Kaghazgaran, J. Caverlee, and A. Squicciarini (2018) Combating crowdsourced review manipulators: a neighborhood-based approach. In WSDM, Cited by: §1.
  • T.N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In ICLR, Cited by: §1, §1, 2nd item, §4.1.1, §4.2.
  • A. Li, Z. Qin, R. Liu, Y. Yang, and D. Li (2019) Spam review detection with graph convolutional networks. In CIKM, Cited by: §1, §1, §1, §4.1.1.
  • Z. Liu, C. Chen, X. Yang, J. Zhou, X. Li, and L. Song (2018) Heterogeneous graph neural networks for malicious account detection. In CIKM, Cited by: §1, §1.
  • H. Peng, J. Li, Q. Gong, Y. Song, Y. Ning, K. Lai, and P. S. Yu (2019) Fine-grained event categorization with heterogeneous graph convolutional networks. In IJCAI, Cited by: §1.
  • S. Rayana and L. Akoglu (2015) Collective opinion spam detection: bridging review networks and metadata. In KDD, Cited by: §4.1.1, §4.1.1.
  • L. Sun, Y. Dou, C. Yang, J. Wang, P. S. Yu, and B. Li (2018) Adversarial attack and defense on graph data: a survey. arXiv preprint arXiv:1812.10528. Cited by: §1.
  • P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2017) Graph attention networks. In ICLR, Cited by: §1, §1, §3.3.
  • D. Wang, J. Lin, P. Cui, Q. Jia, Z. Wang, Y. Fang, Q. Yu, J. Zhou, S. Yang, and Y. Qi (2019a) A semi-supervised graph attentive network for fraud detection. In ICDM, Cited by: §1, §1.
  • J. Wang, R. Wen, C. Wu, Y. Huang, and J. Xion (2019b) FdGars: fraudster detection via graph convolutional networks in online app review system. In WWW Workshops, Cited by: §1, §1, 2nd item.
  • Y. Zhang, Y. Fan, Y. Ye, L. Zhao, and C. Shi (2019) Key player identification in underground forums over attributed heterogeneous information network embedding framework. In CIKM, Cited by: §1, §1, 4th item.
  • Q. Zhong, Y. Liu, X. Ao, B. Hu, J. Feng, J. Tang, and Q. He (2020) Financial defaulter detection on online credit payment via multi-view attributed heterogeneous information network. In WWW, Cited by: §1, §1.