Log In Sign Up

Graph Neural Networks in Real-Time Fraud Detection with Lambda Architecture

by   Mingxuan Lu, et al.
Shanghai Jiao Tong University

Transaction checkout fraud detection is an essential risk control components for E-commerce marketplaces. In order to leverage graph networks to decrease fraud rate efficiently and guarantee the information flow passed through neighbors only from the past of the checkouts, we first present a novel Directed Dynamic Snapshot (DDS) linkage design for graph construction and a Lambda Neural Networks (LNN) architecture for effective inference with Graph Neural Networks embeddings. Experiments show that our LNN on DDS graph, outperforms baseline models significantly and is computational efficient for real-time fraud detection.


page 1

page 2

page 3

page 4


BRIGHT – Graph Neural Networks in Real-Time Fraud Detection

Detecting fraudulent transactions is an essential component to control r...

Behavioral graph fraud detection in E-commerce

In e-commerce industry, graph neural network methods are the new trends ...

Suspicious Massive Registration Detection via Dynamic Heterogeneous Graph Neural Networks

Massive account registration has raised concerns on risk management in e...

Graph Neural Lasso for Dynamic Network Regression

In this paper, we will study the dynamic network regression problem, whi...

Situation Recognition with Graph Neural Networks

We address the problem of recognizing situations in images. Given an ima...

1 Introduction

Fraudulent transaction is the one of the most serious threats to online security nowadays. This issue is deteriorated by the growing sophistication of business transactions using online payment and payment cards [LAUNDERS2013150, Wang_2021]. Fraudsters apply a range of tactics, including paying with stolen credit cards, chargeback fraud involving complicit cardholders, selling fake e-gift cards, or creating schemes to create large-scale layered fraud against multiple merchants111 In this work, we aim to detect risky transaction orders in a real-world e-commerce platform.

Unauthenticated transactions are major buyer risk types to E-commerce marketplaces. It is observed that the entities linking to those orders, such as shipping addresses and device machine ID, are key clues to transaction fraud defection. Hundreds of patterns are summarised as features for models or rules for decision engines. However, the feature engineering of the linkage patterns beyond one-hop on the graph, is currently quite inefficient for human experts.

Heterogeneous and Dynamic. Two vital characteristics of fraudulent transaction orders have been raised [rao2020suspicious] in a similar situation of suspicious massive registration detection. First, transaction orders, as well as the relative entities naturally form a graph with heterogeneous nodes (e.g., IP address, email, accounts) since they tend to share some common risk features, such as the same email or the same IP address. Secondly, the temporal dynamic plays an important role for fraudulent detection because accounts used by fraudsters and by legitimate users usually generate activity events on separate time periods.

Future information at transaction checkpoints. Most graph datasets do not consider feature information flowing from the timestamp later than the vertex as a critical issue. For fraud detection, claim, chargeback and suspension history are usually top features for fraud detection models, of which event timestamp matters. Feature patterns from the upcoming events linked by entities may make models ”foresee” the risk but this kind of capability is absent in real world checkpoints as future vertices do not appear on the graph yet.

Graph Neighbor Query. Fraud detection in transaction checkpoint requires low latency response, unlike on-boarding and post-transaction scenarios. For neighbor query more than 2-hops in graph databases on generally designed linkage relationships, it may last hundreds of milliseconds, which is difficult to meet internal system latency requirements.

In this work, our key contributions are:

  1. We propose a novel Directed Dynamic Snapshot (DDS) graph design and a Lambda222Hybrid architecture for learning, batch inference and streaming inference. Neural Networks (LNN) architecture, which leverages snapshot aggregation, and avoids the model to foresee the information in training.

  2. The LNN on DDS outperforms baseline model in LightGBM significantly, which means graph neighbor and snapshot features are well captured.

  3. LNN together with DDS is suitable for real world low-latency inference as only the last one-hop key-value query is required for graph embedding propagation.

2 Background

In this section, research areas relevant to our work are discussed.

GNN. Graph neural network (GNN) [Hamilton2017InductiveRL, Kipf2017SemiSupervisedCW, Vaswani2017AttentionIA] has gained incremental popularity in learning from graphs. It has powerful capacity in grasping the graph structure as well as the complex relations among nodes by the means of message passing and agglomeration.

TGN on Fraud Detection. Dynamic graphs could also be represented as a sequence of time events. Temporal Graph Networks (TGN) [rossi2020temporal] applied memory modules and graph-based operators. The framework of TGN is computationally efficient based on event update. Asynchronous Propagation Attention Network (APAN) [Wang_2021] adopted temporal encoding similar to TGN and decoupled graph computation and inference. However, for TGN, only a small number of neighbors are accessible by graph module due to memory constrains.

GNN on Dynamic Graph. Learning in temporal dynamic graphs is often set in a scenario of homogeneous graphs. One typical work is DySAT which applies self-attention networks to learn low-dimensional embeddings of nodes in a dynamic homogeneous graph. One notable difference with our setting is that we need distinguish between two types of entities, while DySAT assumes that all entities can be added or removed in the graph.

Snapshot GNN on fraud detection. DHGReg [rao2020suspicious] solves suspicious massive registration detection tasks via dynamic heterogeneous graph neural network. DHGReg is composed of two subgraphs, a structural subgraph to reflect the linkages between different types of entities and a temporal subgraph to capture the dynamic perspective of all entities and give different timestamps to different entities as a way to determine whether an entity appears in time t or not. With such graph structure, DHGReg manages to grasp the time dimension of heterogeneous graph to detect suspicious massive registered accounts as early as possible.

In real-world applications, however, issues still remain in the case of DHGReg: (1) the bi-graph structure tends to deplete GPU memory when the graph scale increments; (2) feature information flow from the future to vertices is not constrained; and (3) online neighbor lookup is not effective in deployment.

In this work, we propose LNN on DDS graph to detect suspicious fraud transactions. We adopt merits from the graph structure of DHGReg at the same time coupling both graph computation and online inference in one pipeline. In order to be more compatible with large graphs, we decouple the large graph with partition process before learning. Additionally, we add timestamps to all asset nodes to construct a directed graph only from the effective historical vertices to the target checkout vertices so that the observed feature distribution fits the production scenario better. All assets in the graph is included during graph computation while only one-hop neighbouring entities’ embedding are pre-computed after periodical inference and later passed for online inference to decrease inference latency.

3 Research Question and Methodology

3.1 Research Question

In order to identify transaction risk with graph level information, we would like to answer the questions below.

  1. How could we setup linkages between purchase orders and entities effectively with information from the future of the purchase order creation time excluded?

  2. How could we design the graph neural network architecture which is efficient for online inference?

3.2 Directed Dynamic Snapshot Graph

In our experiments, transaction fraud detection is treated as a binary classification problem in inductive setting on a heterogeneous graph.

In a static transaction graph , a vertex has a type , where . An edge links from an vertex to an vertex.

Notation Description
The undirected static graph
The vertices on static graph
The edges on static graph
An order or entity on the static graph
Order-entity linkage on the static graph
Order vertex on static graph
Entity vertex on static graph
The timestamp set
The directed dynamic snapshot (DDS) graph
Effective entity to order graph
The vertices on DDS graph
The edges on DDS graph
Order on snapshot
Shadow Order on snapshot
Entity on snapshot
Table 1: Notations

The nodes with unauthenticated chargeback claims from the customer system are marked as , which are regarded as fraud transactions. The others are marked as , which represent legitimate checkouts. These labels are used for our binary classification problem.

Directed dynamic snapshot graph (DDS) is transformed from the static transaction graph after graph partition as illustrated in Figure 1

A time snapshot , where , could be represented for a period of time duration. e.g. 1 hour and 1 day. In our experiments, the time snapshot represent a day. A snapshot vertex represents the static vertex which it the snapshot one is transformed from on snapshot . The edge types for the snapshot vertex linkages are represented in Table 2.

Edge Type Description
Both are in the same
Historical entity linkages
Linkages from effective entities
Table 2: Directed Dynamic Snapshot Graph Edge Types

In order to achieve a directed dynamic snapshot graph for GNN to learn from, the graph construction consists of the steps below, illustrated in Fig. 1.

  1. Static Graph Setup Graph setup based on months of transaction data.

  2. Graph Partition Community detection on transaction graph for learning and inference in parallel.

  3. Directed Dynamic Graph Setup

    Information flow designed to constrain features extracted from future.

Figure 1: Graph Transform

3.2.1 Static Graph

In order to collect neighbor features for transaction fraud risk evaluation, multiple entities directly used in checkout sessions are adopted as neighbors. These entities including shipping address, E-mail, IP address, device ID, contact phone, payment token and user account, are represented as nodes on .

Each vertex represents a checkout transaction along with an unique order ID, linked with multiple vertices such as shipping addresses, E-mails, contact phones that buyers need to confirm in checkout pages. Most of vertices are linked to multiple vertices as well.

3.2.2 Graph Partition

The static graph is generated from months of transactions so as to get fewer stand-alone

vertices and make the information passed through neighbors effective. However, on the other hand, the linkages coming from a long time period creates a huge graph with billions of nodes with dozens of extremely large connected components. It makes deep learning and model inference in parallel difficult from the unprocessed graph.

Power Iteration Clustering (PIC) [DBLP] is utilized to partition the graph, which ensures the graph connectivity and reduces the graph sparsity effectively on extreme large graphs. It is a handy approach as in COMPANY, thousands of ETL jobs run on Apache Spark, where PIC is a built-in algorithm.

In order to keep the community size close to the business understanding for a gang of fraudsters, which is around 1000, the cluster results from PIC are further processed with METIS [karypis1998fast]. It makes the graph learning on the mini communities afterwards in ClusterGCN [Chiang_2019] flavor.

3.2.3 Directed Dynamic Snapshot Graph

After graph partition, the static graph is decoupled into thousands of small communities. The small static communities are not chosen to apply GNN as the inference score on on timestamp may obtain embeddings coming from the future snapshot . The embeddings from future is a critical issue to fraud detection as it makes the detector foresee some kind of facts from timestamp summarised in the embeddings which should not be observable at timestamp .

In order to constrain the embeddings seen for at timestamp obtains only the information from the past, DDS graph is utilized. The workflow is described as below:

  1. Construct from as effective order, along with a label for learning and evaluation.

  2. Clone shadow from , which is not linked with any label and get engaged in interaction with entities in the same snapshot.

  3. Create from , which represents the entity in snapshot.

  4. Link and to make information flow between them. For vertex , if and are both linked with the same entity, the shadow will act as the past order to the and will be latest one, which is the order to be evaluated. The shadow is introduced because the information do not flow between and .

  5. Create edges from to , where . These edges represents information flow from the past entity to the current entity and the self-loop on the current entity. may be connected with a bunch of as long as they are linked to any .

  6. Create edges from to , where . has only one edge to the identical effective entity for each entity type such as phone and email. These edges are the final 1-hop edges from the which is to be evaluated or learnt along with labels. These edges are the only edges required for production online inference, which simplifies the neighbor lookup in graph databases into embedding lookup in key-value databases.

With DDS graph, displayed in Fig. 2, information coming from the past for is guaranteed, which answers research question 1.

Figure 2: Directed Dynamic Snapshot Graph

3.3 Network Architecture

In order to leverage the last edge from DDS, a two-stage, or Lambda neural network (LNN) architecture is proposed so as to reuse embeddings from without querying -hop neighbors. Similar to two-tower deep learning model architecture [covington2016deep], in which user-profile and item embeddings are usually pre-computed, the embeddings of entities, such as E-mail and contact phones, are inferred before-hand and fetched from key-value stores for linked purchase orders before checkout approvals. The LNN architecture, togheter with DDS graph, answers the research question 2, which is the solution proposed for online inference with graph structure feature aggregated.

Figure 3: Lambda Network Structure

As illustrated in Figure 3

, LNN consists of a block of GNN layers block and a Multilayer Perceptron (MLP), similar to DeepGCNs

[li2019deepgcns]. The two stages are splitted from the whole model at the vertex last effective vertex . The first stage is the head block of GNN layers except the last layer, of which the final layer embedding output represents the latest effective . For the second stage, namely, the one-layer GNN followed by the MLP, takes the embeddings of and raw features of .

3.3.1 Training phase

In training phase, both stages are used for end-to-end learning. The information flowing from historical shadow to latest effective , then from to with labels, without the information later than .

3.3.2 Entity Embedding Inference

In production environment, vertex embeddings are periodically refreshed from DDS graphs with a fixed number of snapshots by the first stage of LNN. The values could be stored in distributed key-value stores for multiple downstream purposes. In our paper, the embeddings focus on transaction fraud risk detection.

3.3.3 Transaction Risk Inference

Before checkout approval, the second stage of LNN evaluate the score from the entity embeddings and raw purchase order features, which is equivalent to the risk score inferred from the unprocessed historical connected transaction features from scratch, given the linkages from DDS graph.

4 Evaluation

4.1 Dataset

The static graph is constructed from months of transactions and only entities used in checkout sessions are adopted.

4.2 Experimental Setup

The experiments are processed in training phase. The training data comes from the checkouts in the first 80% time snapshots. The middle 10% are used as validation set for early stopping. The final 10% of the snapshot orders are used for testing.

Graph partition is processed on the whole static graph without stages for training, validation or testing. The number of expected partition number is set based on expected partition size. The expected partition size for PIC [DBLP] is 1 million. The expected partition size got from METIS [karypis1998fast] is set to 1024. It would be interesting to further explore how could the partition size impact our model performance. For the experiments in this paper, the numbers are set based on business advice.

Directed graph edges in and checkout features on are used for GNN blocks. For vertices, the initial features are set to zero, which makes the LNN model performance comparison to models without information fair.

Gradient Boosting Decision Tree models [dorogush2018catboost, ke2017lightgbm] are still first choices models in fraud detection domains due to its outstanding performance on tabular datasets, as which our raw checkout features are represented. For MLP and LNN training, we use the encoded features from an existing LightGBM (LGB) [ke2017lightgbm] trained for risky transaction detection.

4.3 Results

We report Average Precision (AP) and Area Under the Receiver Operating Characteristic Curve (ROC AUC) for the predicted scores in Table

3. As is shown, LNN with feature aggregated through graph linkages, outperforms MLP significantly both in ROC AUC and AP. Compared with the LGB, which is still the state-of-art model for tabular feature set, LNN achieves 49.22%, which is 9% higher than the AP obtained from LGB.

Model ROC AUC Average Precision
MLP 0.9217±0.0014 0.3912±0.0029
LGB 0.9317±0.0005 0.4081±0.0096
LNN (GAT) 0.9381±0.0012 0.4755±0.0100
LNN (GCN) 0.9431±0.0008 0.4922±0.0024
Table 3: Experiment Results

5 Conclusions

We present an approach LNN on DDS graph, to leverage both graph structure and time snapshot features for fraud transaction detection. The LNN outperforms LGB significantly, which is promising to help risk data scientists get rid of tedious graph feature engineering. Information from the future of the transaction is well constrained helps to reduce the gap between experiment and production feature distributions. LNN, together with DDS graph design, requires only 1-hop embedding value query, makes low-latency inference feasible.

LNN is a generic GNN blocks, as long as the input graph follows the DDS graph design principles. With more representative GNN layers, it is promising the achieve better performance.