DCDIR: A Deep Cross-Domain Recommendation System for Cold Start Users in Insurance Domain

07/27/2020 ∙ by Ye Bi, et al. ∙ Ping An Bank NetEase, Inc 0

Internet insurance products are apparently different from traditional e-commerce goods for their complexity, low purchasing frequency, etc.So, cold start problem is even worse. In traditional e-commerce field, several cross-domain recommendation (CDR) methods have been studied to infer preferences of cold start users based on their preferences in other domains. However, these CDR methods could not be applied into insurance domain directly due to product complexity. In this paper, we propose a Deep Cross Domain Insurance Recommendation System (DCDIR) for cold start users. Specifically, we first learn more effective user and item latent features in both domains. In target domain, given the complexity of insurance products, we design meta path based method over insurance product knowledge graph. In source domain, we employ GRU to model user dynamic interests. Then we learn a feature mapping function by multi-layer perceptions. We apply DCDIR on our company datasets, and show DCDIR significantly outperforms the state-of-the-art solutions.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Nowadays, internet finance is booming and rapidly infiltrating into all kinds of traditional financial fields. Internet insurance adapted to the trend of economic boom in internet age, since it can not only overcome the limitations of live sales and geography, but also provide savings for both companies and their consumers. Due to nature of insurance industry, the products that insurance companies can provide on internet always have the following characteristics: 1) the coverage time is no more than 1 year; 2) the prices are lower than long-term insurances; 3) they covers widely, including property and casualty, etc.; 4) the customers are not required to buy other insurance products earlier. However, recommending insurance products online is challenging. First, insurance policies are so complex that ordinary users are relatively lack of knowledge to understand them. Besides, insurance products are typically bought to be used for a long time period (e.g. one year for car insurance), so there exists data sparsity and cold start problem. Researchers try to solve the problem by recommendation systems (RS) (Rokach et al., 2013; Qazi et al., 2017; Liu et al., 2019), however, these methods directly apply traditional RS model to insurance domain, neglecting item complexity and data sparsity.

PingAn Jinguanjia (PAJGJ) is one of the most popular comprehensive applications (APP) in China. In addition to traditional e-commerce products (defined as nonfinancial products in this paper), e.g. household supplies, it also provides financial products like insurance products, investment services. Here we focus on recommending insurance products. Traditional RS, like collaborate filtering (CF) could not perform effectively in insurance domain for its particular characteristics. To get more accurate recommendation, our company tries to use side information from PAJGJ (interaction behaviors from nonfinancial domain), but to little avail.

Cross-domain recommendation (CDR) (Man et al., 2017; Ma et al., 2019; Kang et al., 2019), employing data from multiple domains, is one of the promising ways to solve data sparsity and cold start problem. Generally, CDR can be categorized into two categories. One is interested in improving the overall performance in target domain by aggregating knowledge between two domains (Ma et al., 2019). The other one aims at infering the preferences of cold start users based on their preferences observed in other domains (Man et al., 2017; Kang et al., 2019). These methods assume that there exists overlap in information between users and/or items across different domains, and train a mapping function from source domain into target domain. Unfortunately, we could not apply CDR methods into insurance and nonfinancial domain directly for its properties.

Based on the observations, we propose a novel framework called a Deep Cross-Domain Insurance Recommendation System (DCDIR) for cold start users. Specifically, we first try to learn more effective user and item latent features in both source and target domains. In target domain, given the complexity of insurance products, we design a meta-path based method over the knowledge graph we constructed. In source domain, we employ gated recurrent unit (GRU) to model users’ dynamic interests. After obtaining the latent features of the overlapping users, a feature mapping function between the two domains is learned by multi-layer perceptron (MLP).

In summary, our contributions in this paper are as follows:

  • To the best of our knowledge, this is the first work to utilize cross-domain mechanism to give personalized recommendations for cold start users in insurance domain.

  • For the complexity of insurance products, we design a meta-path based method to learn more effective latent user and item features, revealing reasons behind recommendations.

  • We conduct experiments on our company’s scenarios, the results prove the efficacy of DCDIR over several baselines.

2. Problem Formulation

Let denote overlapping users between nonfinancial domain (source domain) and insurance domain (target domain) . If a user only appears in one domain, he/she is a cold start user in the other domain. The user-item interaction matrices are denoted as and , which are defined according to users’ implicit feedbacks. We additionally use and for the sequences of items that user has interacted with. We also have a insurance knowledge graph (ISKG) , which consists of multiple entity types (i.e. Product, Feature, Need) and many entity-relation-entity triples . For example, (travel accident insurance, insurance.product-insurance.type, accident insurance) states the type of “travel accident insurance” is accident insurance. Given rating matrices and ISKG, our goal is to learn the mapping function from nonfinancial domain to insurance domain, which can help us deal with cold start users.

3. Dcdir

To provide recommendations to cold start users, we propose DCDIR. As shown in Figure 1, DCDIR contains three main parts: learning user latent features in two domain, mapping of user latent features.

Figure 1. The Framework of DCDIR

3.1. Latent Feature in Target Domain

As mentioned above, the complexity of insurance products is typically non-trivial, understanding the items may require a considerable cognitive overload (Rokach et al., 2013). To help users better understand insurance products, we design a meta-path based method. Figure 2 shows the framework, we first pretrain KG by TransD (Ji et al., 2015), and get entity and relation embeddings, which are denoted by ,

. Then, we generate meta-paths connecting user’s interacted items and target item. To select high-quality meta-paths, we properly design a score function. Finally, we use GRU to model each meta-path and employ max-pooling to aggregate these selected paths.

Figure 2. Meta-Path based ISKG Module

3.1.1. Path Generation

The triples in KG describe relational properties of items, which constitute several paths between the user’s interacted items and target item. For a given user , we formally define the path from to target item as a sequence of entities and relations: , where , , is the -th triple in , and denotes the number of triples in the path. We use to denote all generated paths of . From the construction of ISKG, we know that relation and entity have similar semantics, so the embedding of is denoted as . Long meta-paths are likely to introduce noisy semantics (Sun et al., 2011), we properly design two meta-paths based on our scenario, where we fix entity type and path length. They are represented as and , where , and denote “Product” , “Need” and “Feature”. Here are two examples.

where is critical illness insurance, is health insurance, and are accident insurances.

3.1.2. Sampling Top-K High Quality Path Instances

There still so many meta-paths, even though we have fixed path structure. Some of the paths bring much more noises than useful signals, so we use - sampling module to select useful paths. Specifically, for a given path , we define a score function:


where is ’s position in . The first part of (1) is to measure interaction time, since more recent items in a sequence have a larger impact on users’ next actions. The second part is to measure the similarity between the path and the target item. For a user, we select top-K paths with high score, which are denoted by a set , is a given parameter.

3.1.3. Path Embedding and User Feature Representation

A path instance is a node entity sequence, to embed such sequence into a low-dimensional vector, we take GRU

(Hidasi et al., 2016). The formulations are:



is sigmoid function,

is element-wise product, , , , , , , is hidden size. Let , and apply max-pooling, i.e.:

3.2. Latent Feature in Source Domain

In our APP, each item in nonfinancial domain is associated with a description . In order to learn more effective latent features, we employ word2vec (Mikolov et al., 2013). Suppose there are words in ’s content . We utilize word2vec to obtain word vectors, which are represented as . Then we get the final item embedding by:

To model user latent feature , we employ GRU over , and let , the equation is replacing by in eq. (2).

3.3. Mapping Function Between Two Domains

We employ MLP (Man et al., 2017) to learn mapping function between two domains, taking as input and

as output. The loss function is:

3.4. Training

In the training process, loss functions for each part is added together for joint optimization. The overall loss function is:

where and are recommendation loss in target and source domain, respectively. Take the target domain as an example,

where , is sigmoid function,

is a ranking function which can be a dot-product or a deep neural network.

3.5. Cross-Domain Recommendation

In this paper, we assume cold start users have interactions in nonfinancial domain, but no interactions in insurance domain. After learning the latent features in nonfinancial domain , we can get the corresponding mapping latent features . Based on learned , we can make recommendations to cold start users.

4. Experiment

We conduct extensive experiments to answer the following questions: RQ1: How does DCDIR model perform compared with baselines in terms of NDCG and Recall@3? RQ2: Can DCDIR alleviate the data sparsity problem? RQ3: How does path-based ISKG module affect the performance of DCDIR for cold start users?

IS-domain (Target domain) NF-domain (Source domain)
Items 42 Items 3,836
Interactions 300,000 Interactions 600,000
KG relations 7
KG enitities 77
KG triples 282
Overlapped-users 21,016
Training-sequences 12,437
Test-sequences 4,218
Validation-sequences 4,298
Table 1. Statistics of the JGJISNF dataset.

4.1. Experimental Settings

Datasets. There is no publicly available dataset for CDR-ISNF (cross-domain recommendation for insurances and nonfinancial products). To demonstrate the overall effectiveness of the proposed DCDIR model, we build and release a sub-dataset (named JGJISNF) from a comprehensive e-commerce dataset that contains about 20 million users pursue logs from June 1st 2018 to May 31th 2019. The pursue logs are collected on IS-domain and NF-domain from a well-known e-commerce platform PAJGJ. The IS-domain contains short-term insurances (periods is less than 1 year, e.g., including illness insurances, accident insurances, etc.) interactions . The NF-domain contains user logs of non-financial products (daily necessities products, e.g.,clothes, skincare products, fruits, electronics products, etc). In the two domains, we gather chronological user behaviors, user profiles and detailed product descriptions. Due to the complexity of insurance products, we construct a knowledge graph of insurance products based on their own information.

10 20 50 100
Method NDCG Recall@3 NDCG Recall@3 NDCG Recall@3 NDCG Recall@3
BPR 0.27011 0.06418 0.27105 0.06518 0.27133 0.06451 0.27325 0.07124
GRU4REC 0.23923 0.02143 0.25964 0.07768 0.30725 0.09611 0.30623 0.08602
EMCDR-BPR 0.27343 0.07291 0.27342 0.07291 0.27342 0.07325 0.27347 0.07325
EMCDR-GRU 0.26775 0.11794 0.26801 0.11794 0.29056 0.11996 0.31288 0.12298
DCDIR-V1 0.34781(-4.66) 0.17321(-6.28) 0.35196 0.18016 0.35653 0.18078 0.36481 0.18481
DCDIR-V2 0.36278(-13.05) 0.19159(-27.60) 0.37021 0.19388 0.40273 0.24504 0.40925 0.26461
DCDIR 0.39394(-3.95) 0.25185(-5.31) 0.39741 0.25227 0.40773 0.26268 0.41016 0.26597
DCDIR vs. best 8.59 26.23 7.35 24.96 1.24 7.20 0.22 0.51
Table 2. Performance comparison in Recall@3 and NDCG. The best baseline except DCDIR is bolded. Numbers in “()” represent the percentage of three variants’ performance at =10 compared with their best performance in other sparsity level.

Comparative Models and Metrics. We compare DCDIR with four baselines and two variants of DCDIR. The baselines can be categorized into single-domain group (BPR(Rendle et al., 2009) and GRU4REC (Hidasi et al., 2016)) and cross-domain group (EMCDR-BPR (Man et al., 2017),EMCDR-GRU, DCDIR, DCDIR-V1 and DCDIR-V2). The first group is to validate the usefulness of CDR models, and the second group is for demonstrating the advantage of path-based method. DCDIR leverage path-based method to deal with insurance products’ complex knowledge graph, while DCDIR-V1 and DCDIR-V2 use only simple products’ attributes and KGE method (2-hop entity aggregation among ISKG), respectively.

We evaluate all models in terms of Recall@N (N=3) and NDCG. We adopt a common and widely used strategy to avoid heavy computation on evaluating all user-item pairs (He et al., 2017; Wang et al., 2019b, a). For each user , we randomly sample negative items that don’t appear in the training set and rank them with the single ground-truth item.

Parameter Setting. We randomly select 30 of the total overlapped users and remove their information in the target domain as cold start users for evaluating the performance (i.e., test users). To study the performance of DCDIR with respect to the number of overlapped users, we restrict the number of the overlapped users similarly to the real-world distribution. We build four training sets with a certain fraction of overlapped users who do not belong to the test users. These settings are chosen with grid search on the validation set. Item embedding size and GRU hidden state size are set to 50. We use dropout with drop ratio . For the parameters in Section 3.1.2 (path-based method section), we try different settings, the analysis of which can be found in Section 4.3.For the hyper-parameters of the Adam optimizer,we set the learning rate

= 0.001. To speed up the training and converge quickly, batch size is set to 32. We test the model performance on the validation set for every epoch.

4.2. Performance Comparison (RQ1 and RQ2)

To answer RQ1 and RQ2, three variants of DCDIR are compared with four state-of-the-art models with different densities. Table 2 shows the performance comparison. Overall, benefiting from the proposed insurance products’ KG path-based representations and source domain information, DCDIR beats all comparative methods, and achieves the range of 0.22-8.59 and 0.51-26.23 improvements over the best comparative model in Recall@3 and NDCG under all levels of data sparsity, respectively. These experiments reveal a number of interesting discoveries: (1) All cross-domain methods yield better performances than single-domain methods with mixture of target and source domain data , demonstrating the importance of cross-domain module; (2) Owing to the capability of using insurance products’ knowledge, three variants of DCDIR (DCDIR, DCDIR-V1 and DCDIR-V2) defeat other comparative methods; (3) It also demonstrates that DCDIR achieves more improvements in a sparser dataset than in a denser one. It is validated that, compared to comparative approaches, DCDIR can better diminish the negative impacts of the data sparsity issue. We also conduct experiments to compare DCDIR with DCDIR-V1 and DCDIR-V2 (definition refer to 4.1 comparative models). Numbers in “( )” shows the performance of DCDIR-V2 using KGE method declines sharply in terms of Recall@3 (-13.05) and NDCG (-27.60) when using a sparser dataset, while DICIR-V1 cannot outperform DICDIR in all levels of sparsity. This shows that, DCDIR can get more stable and better performance with limited data.

ISKG module Metrics
parameter value NDCG Recall@3
pathnum 10 0.36611 0.18207
20 0.39394 0.25185
30 0.38435 0.18541
pathstrategy ‘topk’ 0.39394 0.25185
‘random’ 0.34624 0.16065
Table 3. Performance comparison in Recall@3 and NDCG under a sparse setting (=10) with changing path number and choosing path strategy.

4.3. The impact of meta-path based ISKG module to cold start users (RQ 3)

The cold start problem is one of the major challenges for RS. It is necessary to study if our designed meta-path based ISKG module can deal with cold start users problem in an effective way. Therefore, we compare DCDIR with different parameters’ value, the number of path selected and strategy of choosing high-quality paths, in an extremely sparse dataset with =10, where the segmentation of training, testing and validation dataset as introduced above. Table 3 indicates that, suffering from the cold start problem, DCDIR’s best parameters in ISKG module are path number as 20 and choosing path strategy is our designed top K method in terms of Recall@3 and NCDG. Specifically, path strategy can effect the performance of DCDIR significantly with a large improvement in Recall@3 and NCDG, respectively. Top K strategy optimizes the choice of high-quality insurance products’ KG paths, which both leverage rich and complicated information and interference information. Therefroe, DCDIR can better handle cold start users.

5. Conclusion

To deal with insurance product complexity and cold start problem, we propose DCDIR for cold start users. Specifically, we first learn more effective user and item latent features in two domains. In target domain, given the complexity of insurance products, we design a meta-path based method over insurance product knowledge graph, which can provide interpretable recommendations to users. In source domain, we employ GRU to model users’ dynamic interests. Then we learn a feature mapping function by multi-layer perceptions . We apply DCDIR on our company’s dataset, and show DCDIR significantly outperforms the state-of-the-art solutions.


  • X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua (2017) Neural collaborative filtering. In WWW, pp. 173–182. Cited by: §4.1.
  • B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk (2016)

    Session-based recommendations with recurrent neural networks

    In ICLR, Cited by: §3.1.3, §4.1.
  • G. Ji, S. He, L. Xu, K. Liu, and J. Zhao (2015) Knowledge graph embedding via dynamic mapping matrix. In ACL, pp. 687–696. Cited by: §3.1.
  • S. K. Kang, J. Hwang, D. Lee, and H. Yu (2019) Semi-supervised learning for cross-domain recommendation to cold-start users. In CIKM, pp. 1563–1572. Cited by: §1.
  • Z. Liu, C. Zang, K. Kuang, H. Zou, H. Zheng, and P. Cui (2019) Causation-driven visualizations for insurance recommendation. In ICME Workshops, pp. 471–476. Cited by: §1.
  • M. Ma, P. Ren, Y. Lin, Z. Chen, J. Ma, and M. D. Rijke (2019) -net: A parallel information-sharing network for shared-account cross-domain sequential recommendations. In SIGIR, pp. 685–694. Cited by: §1.
  • T. Man, H. Shen, X. Jin, and X. Cheng (2017) Cross-domain recommendation: an embedding and mapping approach. In IJCAI, pp. 2464–2470. Cited by: §1, §3.3, §4.1.
  • T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In NIPS, Cited by: §3.2.
  • M. Qazi, G. M. Fung, K. J. Meissner, and E. R. Fontes (2017)

    An insurance recommendation system using bayesian networks

    In RecSys, pp. 274–278. Cited by: §1.
  • S. Rendle, C. Freudenthaler, Z. Gantner, and L. S. Thieme (2009) BPR: bayesian personalized ranking from implicit feedback. In UAI, pp. 452–461. Cited by: §4.1.
  • L. Rokach, G. Shani, B. Shapira, E. Chapnik, and G. Siboni (2013) Recommending insurance riders. In SAC, pp. 253–260. Cited by: §1, §3.1.
  • Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu (2011) PathSim: meta path-based top-k similarity search in heterogeneous information networks. In PVLDB, Cited by: §3.1.1.
  • X. Wang, X. He, Y. Cao, M. Liu, and T. Chua (2019a) KGAT: knowledge graph attention network for recommendation. In SIGKDD, pp. 950–958. Cited by: §4.1.
  • X. Wang, D. Wang, C. Xu, X. He, Y. Cao, and T. Chua (2019b) Explainable reasoning over knowledge graphs for recommendation. In AAAI, pp. 5329–5336. Cited by: §4.1.