Are You for Real? Detecting Identity Fraud via Dialogue Interactions

by   Weikang Wang, et al.
Mobvoi, Inc.

Identity fraud detection is of great importance in many real-world scenarios such as the financial industry. However, few studies addressed this problem before. In this paper, we focus on identity fraud detection in loan applications and propose to solve this problem with a novel interactive dialogue system which consists of two modules. One is the knowledge graph (KG) constructor organizing the personal information for each loan applicant. The other is structured dialogue management that can dynamically generate a series of questions based on the personal KG to ask the applicants and determine their identity states. We also present a heuristic user simulator based on problem analysis to evaluate our method. Experiments have shown that the trainable dialogue system can effectively detect fraudsters, and achieve higher recognition accuracy compared with rule-based systems. Furthermore, our learned dialogue strategies are interpretable and flexible, which can help promote real-world applications.



There are no comments yet.


page 6


Integrating Pre-trained Model into Rule-based Dialogue Management

Rule-based dialogue management is still the most popular solution for in...

Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings

We study a symmetric collaborative dialogue setting in which two agents,...

Emily: Developing An Emotion-affective Open-Domain Chatbot with Knowledge Graph-based Persona

In this paper, we describe approaches for developing Emily, an emotion-a...

Lifelong Knowledge Learning in Rule-based Dialogue Systems

One of the main weaknesses of current chatbots or dialogue systems is th...

Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity

State-of-the-art dialogue models still often stumble with regards to fac...

Lifelong and Interactive Learning of Factual Knowledge in Dialogues

Dialogue systems are increasingly using knowledge bases (KBs) storing re...

A Low-Cost, Controllable and Interpretable Task-Oriented Chatbot: With Real-World After-Sale Services as Example

Though widely used in industry, traditional task-oriented dialogue syste...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Identity fraud is one person using another person’s personal information or combining a few pieces of real data with bogus information to deceive a third person. Nowadays, identity fraud is becoming an increasingly prevalent issue and has left many financial firms nursing huge losses. Besides, for persons whose identities have been stolen, they may receive unexpected bills and their credit will also be affected. Although identity fraud is a very serious problem in modern society, there are no effective fraud detection methods at present and little attention has been paid to this problem.

Intuitively, a simple way to detect identity fraud in loan applications is directly asking applicants about their personal information. However, as shown in Fig. 1, this method is prone to errors because fraudsters may well know the fake information. Fortunately, we find fraudsters generally are not clear about answers to questions that are related to the fake information.222This finding is based on the premise that loan applicants answer questions without any help (e.g., using automatic QA systems or information retrieval tools). In fact, this premise is reasonable in many real scenarios, such as dialogue with video calls and phone calls in which we can monitor the applicants with a camera or require them to answer questions within few seconds (e.g., 5 seconds). We refer to these questions as derived questions, which can be constructed based on triplets where the head entity is the personal information entity. For example, the first derived question about “Nanjing University” is based on (Nanjing University, FoundedDate, 1902). In Fig. 1, the applicant claims to graduate from “Nanjing University” but can not answer derived questions about this school. This fact indicates that the applicant is likely to be a fraudster.

Figure 1: Dialogue examples of two possible fraud detection methods. The first one is directly asking applicants about their personal information. The second one is asking applicants about questions that are related to their personal information.

Based on the above finding, we aim to design a dialogue system to detect identity fraud by asking derived questions. However, there are three major challenges in achieving this goal.

First, designing derived questions requires a high-quality KG. However, owing to the sparseness problem ji2016knowledge; trouillon2017knowledge of the KG, many entities have no triplets for derived question generation. Second, randomly selecting triplets to generate questions is feasible but it is not the optimal questioning strategies to detect fraudsters. Third, because of privacy issues, evaluating anti-fraud systems with real applicants is not practical. And existing user simulation methods li2016user; georgila2006user; pietquin2006probabilistic do not apply to our task. Hence, how to evaluate our systems efficiently is a problem.

To address the above problems, we first complete an existing KG with geographic information in an electronic map (Section 2). In the new KG, nearly all personal information entities can find triplets for derived question generation. Then, based on the KG, we present structured dialogue management (Section 3

) to explore the optimal dialogue strategy with reinforcement learning. Specifically, our dialogue management consists of (1) the

KG-based dialogue state tracker (KG-DST) that treats embeddings of nodes in the KG as dialogue states and (2) the hierarchical dialogue policy (HDP) where high-level and low-level agents unfold the dialogue together. Finally, based on intuitive analysis, we find the applicants’ behavior is related to some factors (Section 5.1). Thus, we introduce hypotheses to formalize the effect of these factors on the applicants’ behavior and propose a heuristic user simulator to evaluate our systems.

Experiments have shown that the data-driven system significantly outperforms rule-based systems in the fraud detection task. Besides, the ablation study proves that the proposed dialogue management can improve the recognition accuracy and learning efficiency because of its ability to model structured information. We also analyze the behavior of our system and find the learned anti-fraud policy is interpretable and flexible.

To summarise, our main contributions are three-fold: (1) As far as we know, this is the first work to detect identity fraud through dialogue interactions. (2) We point out three major challenges of identity fraud detection and propose corresponding solutions. (3) Experiments have shown that our approach can detect identity fraud effectively.

2 Knowledge Graph Constructor

There are four types of personal information in a Chinese loan application form: “School”, “Company”, “Residence” and “BirthPlace”. To generate derived questions, we link all personal information entities to nodes in an existing Chinese KG333 and crawl triplets that are directly related to them. However, owing to the fact that the KG is largely sparse, nearly a half of entities444Most of them are “Residence” and “BirthPlace”. cannot be linked. Thus we use wealthy geographic information about organizations and locations in electronic maps (e.g., Amap555 to complete the KG.

Figure 2: An example of the KG for Nanjing University. The green edge represents the triplet crawled from the existing KG and the blue edges represent the triplets generated based on a navigation electronic map.
Figure 3: Overview of our approach. To build the directed graph for dialogue management, we reverse directions of all edges in the original personal KG to make the head entity read information from its tail entities. Besides, we add a special node “User” and new edges to represent the applicant’s personal information. In this graph, the direction of each edge is the direction of message passing in KG-DST. The blue and green edges indicate that two agents select nodes to unfold the dialogue according to HDP.

Specifically, for each personal information entity, we first crawl its points of interest (POI666The POI are the specific locations (e.g., subway stations) that someone may find useful in navigation systems.) within one kilometer and the POI types in the Amap. If there are multiple POI for the same type, we only keep the nearest one. Then we generate triplets in the form of ($Personal Information Entity$, $POI type$, $POI$) to indicate the fact that the nearest $POI type$ to the $Personal Information Entity$ is $POI$. Besides, for any two entities, if the distance between them is less than 100 meters, we generate two triplets to represent the bi-directional adjacency relation between them. In the end, as shown in Fig. 2, we combine triplets from the two information sources (the Chinese KG and the electronic map) to construct a new KG. In this KG, nearly all personal information entities can be linked. And for each relation777After the data cleaning, there are 40 relations in all., we design a language template for the question generation.

3 Dialogue System Design

The overview of our system is shown in Fig. 3. The core of the system is dialogue management which is organized as a directed graph . In each turn, our system first infers dialogue states with the KG-based dialogue state tracker by computing embeddings of nodes. In this graph, the embedding of “User” node is the dialogue state of a high-level agent (manager), and the embeddings of nodes adjacent to “User” (named as personal information nodes) are the dialogue states of low-level agents (workers). Then our system unfolds the dialogue according to the hierarchical dialogue policy

. Concretely, the manager first selects a personal information node (e.g., “Nanjing University”) as the worker, and then the worker will select a node (e.g., “Gulou Subway Station”) from its predecessors (named as answer nodes). After that, the sampled nodes of two agents form the final system action (a triplet). Next, based on the triplet and a predefined template, the natural language generation module will give a multiple-choice question to the applicant. After the applicant gives a response, the embeddings of all nodes will be updated to generate new dialogue states for the next turn.

3.1 KG-based Dialogue State Tracker

There are three types of nodes in : the “User” node , the personal information node and the answer node . In the turn, KG-DST first gives an initial embedding to . The initial embedding is the concatenation of static features and dialogue features. Then, will gather information from its predecessors . After multiple message passings, we get its final embedding . Next, will aggregate information from to generate its embedding . Finally, and are the dialogue states of the manager and worker respectively.


Static Features.

Specifically, for , the static features include the degree and type. Besides, for , we use the “spread degree on the internet” to distinguish different answer nodes because we find there is an obvious correlation between this “spread degree” feature and applicants’ behavior in our human experiments (Section 5.1). To get the “spread degree” feature, we first treat the answer node and its adjacent personal information node as the keyword888In fact, the keyword is the head entity and tail entity of a triplet. For example, for the answer node “1902”, the keyword is “Nanjing University 1902”., and then search it in the search engine. The number999If there are multiple keywords for an answer node (e.g., “Gulou Subway Station”), we take the average. of the retrieved results will be the “spread degree” feature of

. In the end, each static feature is encoded as a one-hot vector and they are concatenated to form a vector


Dialogue Features.

The dialogue features record the dynamic information of during the dialogue. Specifically, dialogue features include whether the node has been explored by the manager or workers and whether the node appeared in the system action of the last turn. In addition, for , the dialogue features include the interaction turns of the corresponding worker and the number of correctly/incorrectly answered questions about . For , the dialogue features include whether applicants know is the answer to a derived question. Similarly, dialogue features will be encoded as a one-hot vector .

Message Passing.

In Fig. 3, the applicant does not know “Gulou Subway Station” is the nearest subway station to “Nanjing University”. In such case, the personal information about “School” may be fake. Besides, for another question “What’s the nearest park to Nanjing University?”, the applicant may not know the answer because the distance between “Gulou Park” and “Gulou Subway Station” is less than 100 meters. Thus, we want the known information of “Gulou Subway Station” to be sent to its successors.

Specifically, for , we compute its embedding recursively as follows:


where is the depth- node embedding in the turn, denotes the set of nodes adjacent to , is the parameter in the iteration and the aggregate function is the element-wise max operation. The final node embedding is the concatenation of embeddings at each depth:


where and

is a hyperparameter.

After getting the embedding of , we compute the embedding of by aggregating information from :


where is the parameter.

In the end, is the worker’s dialogue state which contains information of a part of the graph and is the manager’s dialogue state which contains information of the whole graph.

3.2 Hierarchical Dialogue Policy

After obtaining the dialogue states and node embeddings, our system will unfold the dialogue according to a hierarchical policy.

Specifically, the manager first selects as a worker to verify the identity state of according to a high-level policy . Then, the worker will choose some answer nodes from its predecessors to generate questions about according to a low-level policy . If the worker gives the decision about the identity state of , will end and the manager will select a new worker again or give the final decision. If the manager gives the final decision about the applicant’s identity state, will end. Formally, and are defined as follows:


where are parameters, and are dialogue states of the manager and worker in the turn, is the encoding of the manager’s terminal action which has the same dimension as , and is the encoding of the worker’s terminal action which has the same dimension as .

Besides, to prevent the two agents from making decisions in haste, domain rules are applied to their dialogue policies by “Action Mask” williams2017hybrid. Specifically, domain rules are defined as follows. First, only after all or at least three answer nodes related to a worker have been explored can the worker make the decision. Second, only after all workers have made decisions or at least one worker’s decision is “Fraud” can the manager make the final decision.

4 Training

4.1 Reward Function

We expect the system can give correct decisions about applicants within minimum turns. Thus, at the end of each dialogue, the manager receives a positive reward for correct decision, or a negative reward for wrong decision. If the manager selects a worker to unfold the dialogue in the turn and the worker gives questions to the applicant, the manager will receive a negative reward . Besides, we provide an internal reward to optimize the low-level policy. Specifically, if the worker gives a correct decision about the corresponding personal information, it will receive a positive reward . Otherwise, it will receive a negative reward . And in each turn, the worker receives a negative reward to encourage shorter interactions.

4.2 Reinforcement Learning

The two agents can be trained with policy gradient williams1992simple approach as follows:


where and are the discounted returns of two agents, and are their sampled actions, and are value networks which are optimized by minimizing mean-square errors to and respectively.

4.3 Pre-Training

Before reinforcement learning (RL), supervised learning (SL) is applied to mimic dialogues provided by a rule-based system. Rules are defined as follows. First, the manager selects a worker randomly. Then, the worker will select answer nodes randomly to generate questions. Let

/ denotes the number of correctly/incorrectly answered questions in this worker’s decision process. If or all answer nodes related to this worker have been explored, the worker will give its decision. If , the worker’s decision will be “Fraud” and the manager’s decision will be “Fraud” too. Otherwise, the worker’s decision will be “Non-Fraud” and the manager will choose a new worker to continue the dialogue. In the end, if all workers’ decisions are both “Non-Fraud”, the manager’s decision will be “Non-Fraud”.

5 Experiments and Results

5.1 User Simulator and Human Experiments

Simulating users’ behavior is an efficient way to evaluate dialogue systems. In our task, the applicants’ behavior is answering derived questions. Thus, the key of user simulator is to estimate the probability

, where

is a binary random variable which denotes whether or not the applicant knows the triplet fact

behind a question .

Intuitively, depends on three factors. First, if the applicant’s identity state is “Non-Fraud”, will be greater than . Second, the wider a triplet fact spreads on the internet, the more likely applicants know it. For example, almost all of applicants know (Baidu, Founder, Robin Li) because there are a lot of web pages containing this fact on the internet. Third, if applicants know other triplets that are related to , they may well know because it is easy to deduce based on what they know. For example, if applicants know (Nanjing University, Park, Gulou Park) and (Gulou Park, SubwayStation, Gulou Subway Station), they may well know (Nanjing University, SubwayStation, Gulou Subway Station).

To formalize the effect of the three factors on applicants’ behavior, we introduce three hypotheses: (1) For both fraudsters and normal applicants, is proportional to the “spread degree” of . (2) The “spread degree” of can be approximated by the number of retrieved results (denoted as ) in search engine where the keyword is the head entity and the tail entity of . (3) For any three triplets, if they form a closed loop (regardless of directions) and applicants know two of them, the applicants must know all of them.

To generate simulated loan applicants, we first estimate the function relations between and via human experiments. Specifically, we ask 31 volunteers to answer derived questions101010There are 1516 derived questions in all. about their own and others’ personal information. And then, for the question , we place it into a discrete bin according to the logarithm of . In each bin, we use the ratio of correctly answered questions to approximate . In the end, the relations are shown in Fig. 4. We can find that the statistical distributions of real behavior patterns of normal applicants and fraudsters are distinguishable and the results agree with our first two intuitions.

Figure 4: The relations between and for two kinds of applicants. is used to approximate the “spread degree” of . indicates the probability that applicants know .
Figure 5:

Performance of different systems. Tested on 10 epochs using the best model during training.

Then, we get simulated loan applicants111111Note that we can generate any number of simulated applicants based on one applicant’s personal information. following a “sampling and calibration” manner. Specifically, given an applicant’s personal information, we first sample the identity state randomly. If the sampling result is “Fraud”, we will sample information item(s) randomly to be the fake information. Generally, forging information about “School” and “Company” may result in a larger loan. Thus, when sampling the fake information, the sampling probability of “School” and “Company” is twice the sampling probability of “Residence” and “BirthPlace”. Then, for each personal information and its related triplet , we sample based on (1) whether the personal information is fake (2) and (3) the corresponding function relation in Fig. 4. Because the sampling results are independent from each other, there may be the situations where the sampling results do not satisfy the rule defined in our third hypothesis. If that happens, we calibrate it until all sampling results agree with the hypothesis. Finally, if , the applicant will give the correct answer to the question . Otherwise, the applicant’s response is “D. I am not quite clear.”.

5.2 Baselines

We compare our model (denoted as Full-S) with two rule-based baselines. In addition, to study the effect of message passing and hierarchical policy on the model training, we compare Full-S with two neural baselines for the ablation study.

  • [parsep=0pt,itemsep=0pt,topsep=0pt]

  • Flat Rule: The system selects 10 questions randomly to ask applicants. If the number of correctly answered questions is fewer than the number of incorrectly answered questions, the system’s decision will be “Fraud”. Otherwise, the system’s decision will be “Non-Fraud”.

  • Hierarchical Rule: A rule-based system which uses a hand-crafted hierarchical policy to unfold dialogues. As shown in Section 4.3, we use this system to pre-train Full-S.

  • MP-S: A neural dialogue system which uses message passing to infer dialogue states but uses a flat policy to unfold dialogues. That is, the manager selects answer nodes directly to generate derived questions.

  • HP-S: A neural dialogue system which uses the hierarchical policy to unfold dialogues but does not use message passing to infer dialogue states. That is, is 0 in Eq. 2.

5.3 Implementation Details

We collect 906 applicants’ personal information, and randomly select 706 for training, 100 for dev, and 100 for test. In each batch, we sample 32 applicants’ information for simulation. The maximum interaction turns of the system and the worker are 40 and 10 respectively. The iteration depth is 2 in message passing. In the reward function, , , , , . The discount factors are 0.999 and 0.99 for the manager and worker respectively. All neural dialogue systems are both pre-trained with rule-based systems for 20 epochs. We pre-train MP-S with Flat Rule because they both use the flat policy. Besides, we pre-train HP-S and Full-S with Hierarchical Rule because they both use the hierarchical policy. In the RL stage, all neural dialogue systems are trained for 300 epochs. When testing, we repeat 10 epochs and take the average.

5.4 Test Performance

We compare Full-S with baselines in terms of two metrics: recognition accuracy and average turns.

Fig. 5 shows the test performance. We can see that the accuracy of Flat Rule is lower than Hierarchical Rule, and the accuracy of the data-driven counterpart of Flat Rule (MP-S) is just slightly higher than randomly guessing. It means that using the hierarchical policy to unfold dialogues is necessary for our task. Besides, HP-S achieves a higher accuracy than its rule-based counterpart (Hierarchical Rule) within much fewer turns. It proves that the data-driven system is more efficient than the rule-based system. Finally, equipped with message passing and hierarchical policy, Full-S achieves the best accuracy. And it is interesting to note that Full-S requires more turns but achieves much higher accuracy than HP-S. One possbile reason is that HP-S may easily trap in local optimum without message passing to infer dialogue states.

Figure 6: Accuracy curves of different neural models in dev set. The first “20 epochs” indicates the pre-training stage. The last “300 epochs” indicates the RL stage.

5.5 Ablation Study

To study the effect of message passing and hierarchical policy, we show the learning curves of three neural dialogue systems in Fig. 6. Each learning curve is averaged on 10 epochs.

We find that, compared with Full-S and HP-S, MP-S is unable to learn any useful dialogue policy during training. There are two reasons for this. First, the action space of flat policy is too large, which results that MP-S suffers from the sparse reward and long horizon issues. Second, without explicitly modeling the logic relation between the manager and workers, MP-S is prone to errors. Besides, we can see that the convergence speed of Full-S is faster than HP-S in both the pre-training and the RL stages. This is because message passing can model structured information of the KG, and hence Full-S is more efficient in policy learning.

5.6 Manager’s Policy Analysis

To better understand the high-level dialogue policy, we analyze the manager’s behavior in Full-S.

Figure 7: Manager’s action probability curves. Each curve indicates the probability of selecting a piece of personal information to verify. For each curve, we take the average of all dialogues during testing.

First, we show the manager’s action probability curves in Fig. 7. We can see that selecting “School” and “Company” to verify personal information has a priority over “Residence” and “BirthPlace” in the first decision step. And in the following two decision steps, the probabilities of selecting “Residence” and “BirthPlace” will increase. This is because simulated applicants tend to forge personal information about “School” and “Company” for a larger loan. Consequently, to discover fake information faster, the manager learns to prioritize different information items.

Second, intuitively the manager’s policy should follow two logic rules in our task:



If a worker’s decision is “Fraud” (), the dialogue should end immediately and the manager’s decision will be “Fraud” ().


If all workers’ decisions are both “Non-Fraud” (), the manager’s decision will be “Non-Fraud” ().

To test whether the manager follows the two rules, we calculate the probabilities of and under and respectively. Specifically, in the test data, and . It proves that the manager will adopt workers’ suggestions in most situations.

All triplets that are related to “Shanghai Sports University” (replaced with $School$ for short):
($School$, SuperMarket, Educational Supermarket) ($School$, PetMarket, Seasons Garden)
($School$, LocatedIn, Shanghai) ($School$, FoundedDate, 2002)
($School$, DigitalMall, JinLu Security) ($School$, FruitShop, Xiao Liu Fruit)
($School$, ConvenienceStore, HaoDe) (Xiao Liu Fruit, ConvenienceStore, HaoDe)
(HaoDe, FruitShop, Xiao Liu Fruit)
HP-S Full-S
System: Which is the nearest pet market to $School$? System: Which is the nearest pet market to $School$?
Applicant: I am not quite clear. Applicant: I am not quite clear.
System: Which is the nearest digital mall to $School$? System: Which is the nearest digital mall to $School$?
Applicant: I am not quite clear. Applicant: I am not quite clear.
System: Where is $School$ located? System: Which is the nearest fruit shop to $School$?
Applicant: Shanghai. Applicant: Xiao Liu Fruit
System: Which is the nearest supermarket to $School$?
Applicant: Educational Supermarket
System: When was $School$ founded?
Applicant: 2002
Decison: Fraud (Wrong) Decison: Non-Fraud (Correct)
Table 1: Examples of the low-level policies in two systems. Note that the information about “School” is not fake.

Meanwhile, we study cases where the manager does not follow the two rules and find some interesting phenomena. Specifically, if only one worker’s decision is “Fraud” and the applicant can answer a few questions given by this worker, the manager’s decision may be “Non-Fraud”. Besides, if all workers’ decisions are both “Non-Fraud” but the applicant can not answer most of the questions given by one worker, the manager’s decision may still be “Fraud”. In fact, when the two cases happen, the worker may make the wrong decision. However, the manager can still give the correct decision. It means the manager is robust to workers’ mistakes.

5.7 Worker’s Policy Analysis

To better understand the low-level dialogue policy and the effect of message passing on it, we compare workers’ behaviors in HP-S and Full-S.

Table 1 shows an example of verifying personal information about “School” in HP-S and Full-S. We can see that the two systems give the same two questions in the first two turns. This is because the triplets behind the two questions are rarely known to fraudsters. It means that the low-level policies learn to give priority to such triplets for better distinguishing fraudsters from normal applicants. In the third turn, HP-S gives a question that is easy to answer for fraudsters and makes the wrong decision. However, Full-S notices the applicant gives the correct answer to a question that is hard to answer for fraudsters. Thus, Full-S does not make the decision in haste but continue the dialogue. Besides, it is worth noting that Full-S has not chosen ($School$, ConvenienceStore, HaoDe) to generate the derived question. This is because the message passing mechanism models the relation between “HaoDe” and “Xiao Liu Fruit”. Specifically, because the two entities are closely related to each other, if applicants know “Xiao Liu Fruit”, they may well know “HaoDe”. Thus, there is no need to select this triplet anymore.

6 Related work

As far as we know, there is no published work about detecting identity fraud via interactions. We describe the two most related directions as follows:

Deception Detection.

Detecting deception is a longstanding research goal in many artificial intelligence topics. Existing work has mainly focused on extracting useful features from non-verbal behaviors 

meservy2005deception; lu2005blob; bhaskaran2011lie, speech cues levitan2018acoustic; graciarena2006combining or both krishnamurthy2018deep; perez2015verbal to train a classification model. In their work, the definition of deception is telling a lie. Besides, existing work requires labeled data, which is often hard to get. In contrast, we focus on detecting identity fraud through multi-turn interactions and use reinforcement learning to explore the anti-fraud policy without any labeled data.

Dialogue System. Our work is also related to task-oriented dialogue systems young2013pomdp; wen2017network; li2017end; gavsic2011line; wang-etal-2018-teacher; wang-etal-2019-incremental. Existing systems have mainly focused on slot-filling tasks (e.g., booking a hotel). In such tasks, a set of system actions can be pre-defined based on the business logic and slots. In contrast, the system actions in our task are selecting nodes in the KG to generate questions. Thus, the structured information is important in our task. Besides, some works also try to model structured information in dialogue systems. For example, peng2017composite (peng2017composite) used hierarchical reinforcement learning vezhnevets2017feudal; kulkarni2016hierarchical; florensa2017stochastic to design multi-domain dialogue management. chen2018structured (chen2018structured

) used graph neural networks 

battaglia2018relational; li2015gated; scarselli2009graph; niepert2016learning to improve the sample-efficiency of reinforcement learning. he-etal-2017-learning (he-etal-2017-learning) used DynoNet to incorporate structured information in the collaborative dialogue setting. Compared with them, our method is a combination of the graph neural networks and hierarchical reinforcement learning, and experiments prove that they both work in the novel dialogue task.

7 Conclusion

This paper proposes to detect identity fraud automatically via dialogue interactions. To achieve this goal, we present structured dialogue management to explore anti-fraud dialogue strategies based on a KG with reinforcement learning and a heuristic user simulator to evaluate our systems. Experiments have shown that end-to-end systems outperform rule-based systems and the proposed dialogue management can learn interpretable and flexible dialogue strategies to detect identity fraud more efficiently. We believe that this work is a basic first step in this promising research direction and will help promote many real-world applications.

8 Acknowledgments

The research work described in this paper has been supported by the National Key Research and Development Program of China under Grant No. 2017YFB1002103. We would like to thank Shaonan Wang, Yang Zhao, Haitao Lin, Cong Ma, Lu Xiang and Junnan Zhu for their suggestions on this paper and Fenglv Lin for his help in the POI dataset construction.