InfDetect: a Large Scale Graph-based Fraud Detection System for E-Commerce Insurance

03/05/2020 ∙ by Cen Chen, et al. ∙ Ant Financial 17

The insurance industry has been creating innovative products around the emerging online shopping activities. Such e-commerce insurance is designed to protect buyers from potential risks such as impulse purchases and counterfeits. Fraudulent claims towards online insurance typically involve multiple parties such as buyers, sellers, and express companies, and they could lead to heavy financial losses. In order to uncover the relations behind organized fraudsters and detect fraudulent claims, we developed a large-scale insurance fraud detection system, i.e., InfDetect, which provides interfaces for commonly used graphs, standard data processing procedures, and a uniform graph learning platform. InfDetect is able to process big graphs containing up to 100 millions of nodes and billions of edges. In this paper, we investigate different graphs to facilitate fraudster mining, such as a device-sharing graph, a transaction graph, a friendship graph, and a buyer-seller graph. These graphs are fed to a uniform graph learning platform containing supervised and unsupervised graph learning algorithms. Cases on widely applied e-commerce insurance are described to demonstrate the usage and capability of our system. InfDetect has successfully detected thousands of fraudulent claims and saved over tens of thousands of dollars daily.



There are no comments yet.


page 2

page 3

page 4

page 5

page 6

page 7

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

When shopping online, buyers face all kinds of risks. They might receive counterfeits when buying luxury bags; the glass bottle package of spirits might be broken during shipment; the food might be sold after the expiration date. Even when the product is undamaged and genuine, one might still want to return it out of various reasons such as shopping regret or suitability issue after using the product. E-commerce insurance is designed to protect buyers from such risks throughout the complete online shopping process by offering compensations for such unsatisfied experience.

Insurance is a contract used to hedge against future risks and potential financial losses. Any risk that can be quantified can potentially be insured in the form of an insurance policy, which states the conditions and scenarios under which the insurer (i.e., insurance company) will compensate the insured (i.e., policyholder/user). The creation of e-commerce insurance provides a trustworthy environment for both online buyers and sellers and greatly facilitates the active usage of our online shopping website. The security deposit insurance and the return-freight insurance are the most popular e-commerce insurance products on Taobao111One of the biggest e-commerce platforms in the world: https://en. The security deposit insurance is purchased by sellers to obtain a ‘trustworthy seller’ badge. If products with quality issues are sold by sellers with this badge, buyers could ask for compensations that are paid by the insurer in advance and is reimbursed by the seller later. Thus sellers are free from the funding pressure for freezing a large amount of security deposit and buyers can still get compensation guarantee when they accidentally purchase products with quality issues. The return-freight insurance is purchased by buyers to protect their right to regret. The insurer pays for the cost of returning unused and undamaged items.

The e-commerce insurance has contributed to over one billion dollars in premiums annually. However, insurance fraud has become a prominent concern. It refers to a range of improper activities that attempt to benefit from a fraudulent outcome from the insurance company [derrig2002insurance].

According to the estimates of our insurance professionals, millions of potentially fraudulent claims go undiscovered whose costs exceed tens of millions of dollars in each year.

The potential large amount of fraudulent claims could harm both customer satisfaction for the prolonged investigation time and potentially increased premiums, and company’s profits, as more human resources and considerable time are required for claim investigations. Thus, it is critical for the insurance company to identify potential fraudulent claims confidently in an efficient manner. The need for a fraud detection system that is able to process very large data arises.

I-a Challenges in Insurance Fraud Detection

Traditional methods on insurance fraud detection primarily focus on extracting handcrafted features (such as past claim history) and subsequently heuristics/rules are distilled based on expert knowledge to decide whether a claim needs further human investigation or not. Witnessing the emergence of big data and distributed computing, insurance companies have started leveraging machine learning techniques to lessen the burden of human investigation/intervention in the claim process 

[sithic2013survey]. Statistical models used in insurance fraud detection generally can be categorized into three types: supervised approaches, unsupervised techniques, and a hybrid of both [joudaki2015using, li2008survey, viaene2002comparison]

. Supervised learning approaches, such as logistic regression 

[mercer1990fraud, wilson2009analytical]

, decision trees 


, support vector machine, Bayesian networks 


, and neural networks 

[shapiro2002merging, he1997application]

, have demonstrated good performances, however, they require data to be labeled by domain experts. On the contrary, unsupervised techniques, such as association rules, cluster analysis, and outlier detection 

[brockett2002fraud, yamanishi2004line, viveros1996applying, nian2016auto], do not have such labeling assumption/limitation and have also attracted much attention over the years. However, there are several aspects that are not well studied in the current literature.

  • Utilizing both labeled and unlabeled data:

    In the insurance domain, it is natural that we have both labeled and unlabeled data. Gathering labels is costly, as long observation period and heavy manual work is often required for labeling. To deal with such problem and boost model performance, one possibility is to combine both supervised and unsupervised learning techniques to better squeeze information from both labeled and unlabeled data for training. We address this problem by introducing unsupervised graph learning algorithms and feature processing techniques in the methodology section.

  • Fraud patterns from graphs: Most deliberate fraudulent behaviors manifest in the form of criminal gangs. Individual behavior can be easier to disguise, but the collective behavior traces can hardly be completely covered up. For example, in Figure 1, we can clearly observe several fraud patterns, where red nodes represent the fraudsters. If we could find a way to utilize additional graph information, e.g., social or transaction networks, it could possibly speed up the claim process and help reduce the fraud rate.

  • Uncertain labels: E-commerce insurance normally issue millions of policies daily and labeling claims requires enormous human effort. A few insurance professionals are not enough for the labeling task, and a common practice here is to ask for a set of rules to separate suspicious and normal. Rules can be applied on the account level, order level, and claim level. A fraudulent score is given and a score higher than the predefined threshold is labeled as ‘high risk’, otherwise ‘no observable risk’. As we obtain labels for our data, it introduces another problem - label uncertainty. Normally We adjust the threshold so we are confident at ‘high risk’ accounts, but it is unclear whether the ‘no observable risk’ accounts are at risk or not. In other words, the labels we have consist of a small amount of true positive labels and a large amount of unknown labels. To collect labels, we randomly undersample samples from the ‘no observable risk’ samples. This strategy is also explained in the methodology section II-E.

Fig. 1: Transaction network of a set of sampled claimants and their neighbors in security deposit insurance, where an edge is formed when there is a fund exchange between two users. Note that red nodes represent fraudsters while green nodes denote normal users.

In the rest of the paper, we introduce a large scale fraud detection system for e-commerce insurance that involves all aspects mentioned before. The system is designed to uncover fraudsters in the claim stage by classifying accounts or orders as fraudulent or not. We specifically address the problem of fraudster gang detection with the help of several powerful graph learning algorithms including unsupervised Deepwalk 

[perozzi2014deepwalk] and supervised DistRep and GeniePath [liu2018geniepath]. The merits, knowledge, and practices we learn from applying graph data are discussed and we show how we apply them on our most popular real-world large-scale e-commerce insurance products.

Ii Methodology

Insurance fraud detection can be viewed as a binary classification problem. Labels of the claims in the training set are obtained from domain experts and our formerly deployed rule-based system with the confidence of a certain extent. We aim to automatically detect more fraudulent claims while retaining high precision.

Graph, such as social, transaction, and communication networks occur naturally in the insurance fraud settings. They provide straightforward information for describing and modeling complex relations. Our system involves several types of graphs as data interfaces and provides a variety of machine learning algorithm to mine suspicious fraudsters and orders.

Formally, given a set of a claim ’s input feature

, and the graphs associated with the claims, our goal is to predict the probability of a claim being fraudulent, i.e.,


Ii-a System Overview

Previous e-commerce insurance fraud detection tasks are conducted by separate insurance data analysts. These professionals come up features through experience and domain knowledge and apply a set of rules on these raw features for fraud detection. Our system is the first graph-based fraud detection system that combines their feature knowledge and various existing graphs.

Fig. 2: Overview of our insurance fraud detection system.

As shown in Figure 2, our system supports two types of algorithms in the fraud detection engine, i.e., graph algorithms to leverage graph information and classification algorithms for general fraud classification widely used in the insurance domain. The insurance fraud detection engine is responsible for interacting with the database, model training, and making predictions. Maxcompute is a general-purpose, fully managed, multi-tenancy data processing platform for large-scale data warehousing 222Maxcompute: maxcompute. It supports SQL and MapReduce for label extraction and feature processing. At the same time, all the graphs are stored and manipulated in GeaBase 333GeaBase: It is a specially designed graph database used in our company that maintains the n-hop graph neighbor information in a systematic way. It is able to store large graphs with low lookup latency. Meanwhile, the management portal supports a variety of management tasks across the whole pipeline, such as business rule intervention, online serving, monitoring, and job scheduling.

Ii-B Data Processing

Features are collected and processed to be fed into downstream machine learning algorithms in more suitable representations. The data processing modules provide several common utility functions such as data scaling, categorical feature encoding, discretization, and missing values filling.

Aside from basic features processed from the raw inputs, we can further enrich the representation of fraud patterns by incorporating denoised latent feature embeddings, which leverage the Denoising Autoencoder (DAE)444The details of DAE is omitted, as it is not the focus for the paper. to transform basic features from a corrupted version for robustness and better generalization. Such unsupervised feature transformation techniques help to better distill additional information from unlabeled data.

Besides, population stability index (PSI) [yurdakul2018statistical] is measured to find out whether a feature is significant enough for classification and stable enough along time. It is used to measure how much a variable has shifted in distribution between two samples. Commonly it is used to monitor the distribution changes of a feature between out-of-time validation samples and modeling samples. If the change is significant, this feature is not valid for online production because of stability issues. PSI is also used to decide whether a feature is important in the modeling stage. If the distribution difference is large between positive samples and negative samples, the feature is retained for modeling.

In addition, graph-based features are extensively used as an essential part in our system. From the graph theory perspective, features such as the degree of a node, the index of the subgraph a node resides in, and the length of the longest path containing a node are precomputed. Because our graphs are stored as assets, computing such features in advance could save a great amount of time when shared in every downstream fraud detection tasks. From the representation learning perspective, graph embedding learned by supervised and unsupervised graph learning algorithms can also be incorporated to uncover potential conspiracy patterns.

Finally, all these features will be concatenated and fed into the classification algorithms.

Ii-C Classification Algorithms

Different from fraud detection systems in other domains, insurance claimants are rather sensitive and alert to the results. For models used in the insurance industry, interpretability is sometimes one of the most important concerns. For example, for some insurance, when the company rejects a claim, the verifier may have to explain the possible reasons/fraud indicators associated with the claim. As a result, classification algorithms with good explainability, such as logistic regression [wilson2009analytical, mercer1990fraud], decision trees [bonchi1999classification]

, are often utilized. In our system, we have implemented a series of general classification algorithms. Parameter server based gradient boosted decision trees, also known as PSMART 

[zhou2017psmart], is mostly adopted for its good expressive power, scalability, and explainability. More specifically, PSMART is distributed implemented over parameter server [li2014communication] on top of the tree boosting technique LambdaMART [burges2010ranknet]. It is deeply optimized for the communication efficiency over the sparse data that can reliably scale to hundreds of billions of samples and thousands of features over the clusters.

Ii-D Graph Learning Algorithms

To help uncover the collective fraudster traces, we leverage the graph representation learning models to bring additional graph-based latent information into the picture. In the following subsections, we will dive into the details of three representative graph learning algorithms, i.e., Deepwalk (unsupervised), Graph Neural Networks (supervised node classification), and DistRep (supervised edge classification). All the algorithms are developed in a distributed fashion over parameter server to handle large scale graphs of up to billions of nodes.

Ii-D1 Deepwalk

Deepwalk (DW) belongs to the family of unsupervised graph learning models. Such models are able to leverage the unlabeled graph data, capture neighborhood similarity and encode the topological relationships into a latent vector space in the form of embedding [goyal2017graph]. DW uses local topological information obtained from truncated random walks sampled from the graph to learn latent representations by treating walks as the equivalent of sentences. Following [perozzi2014deepwalk], the learning procedure in Deepwalk is formulated as a maximum likelihood optimization problem:


where is a matrix of size parameters. For each vertex , it defines as a network neighborhood of source vertex generated through the random walk.

For such unsupervised graph learning technique, the learned embeddings usually serve as the input features for the downstream tasks. A common practice for fraud classification with graph embeddings is outlined in Figure 3.

Fig. 3: Fraud detection pipeline with graph embedding.

Ii-D2 Graph Neural Networks (GNNs)

GNNs are a set of deep learning algorithms following the same architecture that aggregates information from nodes’ neighbors. A deeper layer reaches out more distant neighbors, and the

th layer embedding of node is

where the initial embedding is the account feature, is a non-linear function, and is an aggregation function across layers and neighbors that differs in GNN algorithms.

Common GNN approaches we use for the fraud detection problem are struct2vec [dai2016discriminative] and GeniePath [liu2018geniepath]. Struct2vec aggregates neighbors by simply summing them up while GeniePath stacks adaptive path layers for breadth and depth exploration in the graph. For breadth exploration, it aggregates neighbors as

This breadth-search function emphasizes the importance of neighbors with similar account features.

The resulting embeddings are fed to the final softmax or sigmoid layers for downstream fraud account classification tasks. It’s an end-to-end classification method compared to Deepwalk whose embeddings are treated as features to downstream classification algorithms.

Ii-D3 DistRep

DistRep is a novel algorithm we designed for edge classification. It combines node embeddings and node attributes. The embeddings of a edges’ both vertices and are aggregated as

while the attributes of both vertices are concatenated as

where and are the node features. and are concatenated and fed into a -layer neural network. The final sigmoid layer output the edge classification result.

Ii-E Modelling Label Uncertainty

Most e-commerce datasets suffer from label uncertainty - the rule-based risk indicator is much more confident about ‘high risk’ accounts being fraudulent than about ‘no observable risk’ accounts being regular. To address this problem, the ‘regular’ class in the training dataset is sampled randomly. Downsampling helps to reduce the effect of classifying a ‘no observable risk’ account as fraudulent. The objective function is modified as follows

represents the classification algorithm of our choice. The goal is to minimize the losses caused by wrong classifications. Note the sampling process only exists when selecting samples to be trained. Once the training samples are selected, their neighborhoods (containing 1-hop to 3-hop neighbors in most applications) are not sampled.

Iii Discussion

The key component in InfDetect that differs from other machine learning-based fraud detection systems in the insurance domain is the usage of graph information. Graph is helpful in the following perspectives:

  • Fraud Organization Discovery: As we mentioned in Figure 1, fraudulent accounts are visualized as connective red nodes. In other cases, similar patterns are also discovered (See Figure 4).

  • Fraud Detection with Consistency: Fraud detection suffers from the phenomenon that new types of fraud evolve over time and get more and more unpredictable. The use of non-stationary features, such as the number of claims made in the past month, can be easily affected when fraudsters change their tactics. Meanwhile, graph data provides more stationary information as the relations between collaborating fraudsters could not be easily modified, e.g., in device-sharing graphs. Thus the use of graphs helps to establish model consistency.

Fig. 4: Buyer-seller graph of fraudulent users in the order insurance. The red nodes represent fraudsters in sellers and the green nodes denote normal sellers. The larger nodes are sellers and smaller black nodes are buyers. Only essential buyers that connecting sellers are visualized for simplicity.

Iii-a Graph Construction

In this study, we form the transaction graph, device-sharing graph, and friendship graph to reveal patterns for fraud classification (see Figure 5), and build a buyer-seller graph to identify fraudulent orders. The following properties of graphs can help separate fraudulent from regular:

  • distance aggregation: closer nodes share similar labels;

  • structural differentiation: structures of organized fraudsters are different from structures of regular accounts.

(a) Device-sharing: colluders

(b) Device-sharing: regular

(c) Transaction: colluders
(d) Transaction: regular
(e) Friendship: colluders
(f) Friendship: regular
Fig. 5: Visualization for typical colluders and regular users in device-sharing graph, transaction graph, and friendship graph.

Iii-A1 Buyer-Seller Graph

The buyer-seller graph is built based on Taobao’s order history. Orders from the past week as collected and each edge corresponds to one order while its two vertices corresponds to a seller account and a buyer account, respectively.

Iii-A2 Transaction Graph

The transaction graph shows fund exchange relations between accounts. A vertex is an account, and an edge indicates transactions between accounts.

Iii-A3 Device-Sharing Graph

The device-sharing graph reveals the relation of accounts sharing a device. A vertex is either a device (User Machine ID, UMID555The fingerprint built by Alibaba to uniquely identify devices.) or an account. Edges exist between a device vertex and a UMID vertex, which are extracted from the log-in history.

Iii-A4 Friendship Graph

The friendship graph is built upon friend books at Alipay, a product of Ant Financial with social networking features.

Iii-B Graph Processing

We preprocessed these graphs to remove isolated accounts. In the transaction graph and friendship graph, nodes with zero degree (the number of edges incident to the node) are removed. In the device-sharing graph, account nodes who share no common UMIDs with other accounts and their neighboring UMID nodes are removed.

With the graph processing step, the classification performance is slightly degraded by less than 0.1%, whereas a great amount of computation is saved - the computation time for DeepWalk is shortened from 45 hours to 8 hours after processing the device-sharing graph.

Iii-C How to Choose Graphs

The graphs are of great size (see Table I

), and we evaluate the graphs in advance to avoid implementing all graphs at hand for efficiency. The evaluation metrics are designed in regards to the distance aggregation policy which states if closer nodes in a graph have similar labels, this graph is more helpful for this classification task. We measure it by:

where is the set of fraudulent nodes and is the set of normal nodes.

Graph —V— —E— nodes edges
device-sharing 3 M 6 M account / UMID device usage
transaction 2 M 2 M account fund exchange
friendship 8 M 11 M account friendship
buyer-seller 100 M 1 B account product purchase
TABLE I: Examples of the Graphs provided in InfDetect. and denote the vertices and edges, respectively.

Iii-D How to Use and Choose Graph Learning Algorithms

Graph information can be used as features in traditional machine learning algorithms. One example is to compute the in-degree and out-degree of a node. Graph knowledge is partially considered in a simple but powerful way, and in some cases, it can lead to a slight performance improvement. For example, when the fraudsters are working with a so-called ‘mobile phone factory’666A large amount of inexpensive mobile phones are purchased by fraudsters to register fake accounts and conduct fraud., degree of fraudster account nodes in the device-sharing graph is significantly higher than others.

The usage of graph information as features is not as powerful when attempting to discover relations between certain fraudsters where graph learning algorithms are preferred. In the case of order-wise fraud detection, DistRep is more appropriate as it considers an order as an edge between a seller and a buyer. As for account-level fraud detection, graph neural networks work end-to-end and the embeddings extracted from its hidden layers are task-specific and contain label information. Meanwhile, DeepWalk distills graph structural information and gives a set of uniform embeddings of nodes regardless of downstream tasks.

Iv Cases study on E-Commerce Insurance

In this section, we quantitatively and qualitatively evaluate the effectiveness of our graph-based fraud detection system over our mainstream products of e-commerce insurance.

Iv-a Security Deposit Insurance

Security Deposit Insurance is one of the most popular insurance for sellers on Taobao. To obtain a ‘trustworthy seller’ badge, a seller can choose to freeze a security deposit fund or to buy the security deposit insurance with a yearly premium of a small amount. Such insurance helps the insurer to pay for the emergency compensation in advance.

Iv-A1 Data Preparation

Our security deposit insurance dataset is sampled from its claim history One transaction graph is generated for each day. More specifically, for users (sellers/ buyers) involved in the claims on a day, we retrieve their corresponding transaction records from our platform to build a transaction graph. On average, each transaction graph contains nodes and edges.

Iv-A2 Quantitative Evaluation

We conduct ablation experiments to examine the effectiveness of incorporating the graph information, i.e., embedding learned by DeepWalk (DW). Our parameter server based GBDT method–PSMART [zhou2017psmart] is used as the base classification model. Grid search is performed to find the best parameter settings. Both graph embedding size for DW and denoised feature embedding size for DAE are set as 32. As shown in Table II, incorporating DW significantly boost the model performance. Both DAE and DW are helpful for the task.

AUC Rec.@90%Pre. Rec.@95%Pre.
PSMART 0.9650 44.71% 69.30%
PSMART+DAE 0.9655 46.48% 71.04%
PSMART+DW 0.9661 46.75% 74.49%
PSMART+DAE+DW 0.9667 47.12% 77.89%
TABLE II: Performance comparison in terms of AUC and Recall (Rec.) at different Precision (Pre.) thesholds.

Iv-A3 Online Performance

After an A/B test for 1 month on our platform, we find that our proposed method is able to reduce fraud rate by 76% compared to the previous rule-based method 777We cannot report the accurate insurance claim amount due to the privacy issue..

Iv-A4 Qualitative Evaluation

To understand why our model has better performance on insurance fraud detection task, we qualitatively evaluate our method from two perspectives: one is at claim-level and the other is at user-level. More specifically, we visualize the learned graph embeddings of DW using the t-SNE tool888

T-SNE is a commonly used tool for the visualization of high dimensional data.


Claim level embeddings: For this particular insurance product, each claim involves two parties, we obtain the claim representations by concatenating the involved user embeddings. We then visualize the sampled claims on a day by their representations in Figure 6. Clearly, we find fraudulent claims (in red) are not close to the normal claims (in green). This shows the graph representations are useful for identifying fraudulent claims. Furthermore, we observe that the fraudulent claims form different small clusters. This demonstrates that there is a gang behavior in the fraudulent claims, i.e. there are small groups of users colluding on insurance claim fraud together. This further shows the graph representations are meaningful.

Fig. 6: Claim level embedding. Red dots represent the fraudulent claims, while while greed dots refer to the normal claims.

User embeddings: Moreover, we visualize the user embeddings learned by our method in Figure 7. We use red color to mark a fraudulent user who initiated a fraud claim, and green color to mark normal users. Close examination shows that there are small clusters of fraudulent users and our method is able to project the fraudulent users into similar places in the embedding space.

Interestingly, we find that among a cluster of fraudulent users, there are some normal users. To examine this, we choose two typical clusters of fraudulent users and plot their behaviors over the transaction network in Figure 8. In the case 1, the fraudulent users (in red) exchange funds through a normal user (in green). This is a typical pattern where fraudulent users do not directly contact, instead, they find a “normal” user (the exchange hub in the Figure 8a) with a clean record to do so to cover their fraudulent behaviors/monetary traces. A similar pattern is also observed in case 2. Differently, we observe some claims between fraudulent users and there are fraudulent gangs connected through two “normal” users. In all, user embeddings learned using the transaction graph are insightful and helpful for discovering fraudulent users and claims.

Fig. 7: User level embedding. A fraudulent user is a seller or buyer who is involved in a fraudulent insurance claim.
(a) Case 1
(b) Case 2
Fig. 8: Visualization of fraudulent claims related users over the transaction graph.

Iv-B Return-Freight Insurance

Buyers would like to return genuine and undamaged products for various reasons. In some cases, there could be a significant color difference between the on-screen product and the real-life product. In other cases, customers find a less expensive alternative after receiving their goods. The desire to return such items is reasonable but it will raise lots of disputes between buyers and sellers because of the ambiguity over which party should take responsibilities. Most disputes focus on who should pay for return shipping costs. The return-freight insurance is created to resolve disputes and protect buyers’ right to regret.

Iv-B1 Graph Comparison

We analyze the patterns of fraudulent claims in the scenario of return-freight insurance and organized fraud turns out to be the prominent form of fraud. Three graphs - device-sharing graph, transaction graph, and the friendship graph are compared according to the label aggregation measure , the device-sharing graph fits the best. The conclusion is also shown in Table III.

hop 1 hop 2 overall
Device-sharing 0.80 0.51 0.80
Transaction 0.16 0.06 0.16
Friendship 0.04 0.01 0.04
TABLE III: Label aggregation comparison in terms of graph choice.

Iv-B2 Data Preparation

Our return-freight insurance dataset is sampled from its claim history from the past three months. The device-sharing graph is constructed with accounts that have filed a claim within a 30-day period. Device UMIDs used by these accounts in the past 40 days are added as graph nodes. Isolated subgraphs containing only one account node are removed for computation efficiency. For raw features of account nodes, we collect 50 features (e.g., number of claims submitted over a month, duration as a customer, etc.), derived from insurance claim history, shipping history, and shopping history.

Iv-B3 Quantitative Evaluation

After choosing the proper graph, we compare the DeepWalk algorithm, the GeniePath algorithm, and PSMART. We set the same hyperparameters for all PSMART modules: 500 trees, max tree depth of 5, data sampling rate of 0.6, feature sampling rate of 0.7, and a learning rate of 0.009. We randomly sample 25% of ‘no observable risk’ accounts as negative samples.

Our results, summarized in Table IV and plotted in Figure 9, show that the GNNs-based approach outperforms the others. Detection expansion (DE), defined as

, indicates the ability to detect more fraudulent accounts. All of our approaches raise the coverage of fraudulent account detection by more than 40% while GNNs-based approach has higher precision and recall at most time.

PSMART Node Embedding GNNs
F1 0.547 0.535 0.623
DE 1.47 1.44 1.44
TABLE IV: Results based on Rule-based Labels.
Fig. 9: Model comparison with the Precision-Recall curve.

Iv-B4 Online Performance

Our system collects accounts that have filed a claim over the past months and classifies them daily. The classification result is evaluated by an insurance professional, who randomly samples and examines 300 accounts out of the reported fraudulent accounts. Recent reports show we have a precision of over 80% while covering 44% more suspicious accounts.

Iv-C More applications

Iv-C1 Order Insurance

The order insurance is generally designed for the same purpose as the security deposit insurance is designed for. An order insurance policy only covers the lifecycle within one order, and a security deposit policy covers all orders for a specific seller. However, the advanced compensation offered by the insurer is ten times higher. In some categories on Taobao, alcohol, for example, purchasing order insurance is a must for ‘trustworthy seller’ badge since the products cost a large amount of money so the compensation is expected to be higher by the buyers.

By examining the fraudulent claims, we find suspicious relations between some certain buyers and sellers. With the help of the buyer-seller graph and the edge classification algorithm DistRep, recall reaches 89% in offline experiments. In the online setting, the order insurance using the InfDetect system halves its compensations and saves tens of thousands of dollars per day.

Iv-C2 Complementary Health Insurance

Complementary health insurance is offered to buyers as a marketing strategy to foster online shopping activities. PSMART is applied with the help of InfDetect and the top 50 suspicious claims are sent to insurance professionals for further investigation. In this specific insurance, human investigation is easier by asking the claimed hospitals for detailed information. The feedback is not ready yet and more and more other types of insurance are using our system for general fraud detection and organized fraudsters detection.

V Related Work

Traditional methods on insurance fraud detection primarily focus on extracting handcrafted features (such as past claim history) and subsequently heuristics/rules are distilled based on expert knowledge to decide whether a claim needs further human investigation or not. Witnessing the emergence of big data and distributed computing, insurance companies have started leveraging machine learning techniques to lessen the burden of human investigation/intervention in the claim process [sithic2013survey]. Insurance fraud detection approaches can be generally divided into supervised learning, unsupervised learning, and a mixture of both [joudaki2015using, li2008survey, viaene2002comparison]. Popular supervised algorithms, such as logistic regression [mercer1990fraud, wilson2009analytical], decision trees [bonchi1999classification], support vector machine, Bayesian networks [ormerod2003using], and neural networks [shapiro2002merging, he1997application], have demonstrated good performances, however, they require data to be labeled by domain experts. Meanwhile, unsupervised techniques, such as association rules, cluster analysis, and outlier detection have also been applied and attracted much attention over the years [brockett2002fraud, yamanishi2004line, viveros1996applying, nian2016auto]. Hybrids of supervised and unsupervised algorithms have been studied, and unsupervised approaches have been used to segment insurance data into clusters for supervised approaches in [brockett1998using]. Our proposed approaches/system fall under supervised learning and hybrids of both unsupervised and supervised, respectively. Our proposed approaches/system differ, as we are the first to introduce/incorporate graph information into the insurance fraud modeling.

Graph/network provides straightforward information for describing and modeling complex relations among colluders (collaborating fraudsters). It is the most natural representations of relation information and allows for complex analysis without simplification of data. Recently, network representation learning is playing an increasingly important role in network analysis. Many unsupervised models have been introduced over the years, e.g., the widely used LINE [tang2015line], DeepWalk  [perozzi2014deepwalk], and node2vec [grover2016node2vec]

, which demonstrated to be superior compared to the traditional graph analysis approaches such as spectral clustering 

[tang2011leveraging], modularity analysis [tang2009relational]. Meanwhile, Graph Neural Networks (GNNs) represent a set of supervised graph learning algorithms following the same architecture that aggregates information from nodes’ neighbors [gori2005new, scarselli2008graph]. Commonly used state-of-the-art GNN-based approaches include struct2vec [dai2016discriminative], GAT [velivckovic2017graph], GeniePath [liu2018geniepath], which have demonstrated to be effective in various applications [liu2018heterogeneous, hu2019cash].

Vi Conclusion

In this work, we present a graph-based fraud detection system for large scale e-commerce insurance with the cases of the most popular insurance - the security deposit insurance and the return-freight insurance. We also introduce the modules and their functionality in this system. The key component - graphs and their learning algorithms help discover organized fraudsters and the system has helped save millions of dollars per year.