Relation-aware Heterogeneous Graph for User Profiling

by   Qilong Yan, et al.

User profiling has long been an important problem that investigates user interests in many real applications. Some recent works regard users and their interacted objects as entities of a graph and turn the problem into a node classification task. However, they neglect the difference of distinct interaction types, e.g. user clicks an item v.s.user purchases an item, and thus cannot incorporate such information well. To solve these issues, we propose to leverage the relation-aware heterogeneous graph method for user profiling, which also allows capturing significant meta relations. We adopt the query, key, and value mechanism in a transformer fashion for heterogeneous message passing so that entities can effectively interact with each other. Via such interactions on different relation types, our model can generate representations with rich information for the user profile prediction. We conduct experiments on two real-world e-commerce datasets and observe a significant performance boost of our approach.



There are no comments yet.


page 1

page 2

page 3

page 4


Masked Transformer for Neighhourhood-aware Click-Through Rate Prediction

Click-Through Rate (CTR) prediction, is an essential component of online...

node2bits: Compact Time- and Attribute-aware Node Representations for User Stitching

Identity stitching, the task of identifying and matching various online ...

Information Interaction Profile of Choice Adoption

Interactions between pieces of information (entities) play a substantial...

UGRec: Modeling Directed and Undirected Relations for Recommendation

Recommender systems, which merely leverage user-item interactions for us...

BasConv: Aggregating Heterogeneous Interactions for Basket Recommendation with Graph Convolutional Neural Network

Within-basket recommendation reduces the exploration time of users, wher...

Learning on heterogeneous graphs using high-order relations

A heterogeneous graph consists of different vertices and edges types. Le...

Discovering Latent Representations of Relations for Interacting Systems

Systems whose entities interact with each other are common. In many inte...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Nowadays, users from all kinds of applications have been producing an ocean of data as they browse the internet. Such data potentially contain valuable information, such as user’s interest, trait, and behaviour pattern, for providing them with personalised services. The scenario is typically named user profiling. Taking (Rao et al., 2010) as an example, they first propose to predict user’s gender, age, area, and political tendency by modelling their twitters from social networks. Recent works expand the task purpose to a broader scope, including occupation (Zhao et al., 2019), geolocation (Rahimi et al., 2018), ideology (Xiao et al., 2020), and race (Preoţiuc-Pietro and Ungar, 2018).

Figure 1. The heterogenerous graph with multiple types of entities and relations for user profiling. Best view in colour.

In user profiling, an intuitive way is to model the user’s interaction behaviour with graphs. Despite the success of traditional deep learning approaches (Rao et al., 2010; Nguyen et al., 2013; Farnadi et al., 2018), graph methods are highlighting their advantages on non-euclidean relations in such tasks (Wu et al., 2019a; Yu et al., 2020; Zhang et al., 2020b, a; Li et al., 2020; Zhang et al., 2021b, a). Rahimi et al. (2018); Chen et al. (2019); Xiao et al. (2020) regard users with co-relation (like co-purchase in e-commerce) as a graph with entities and hierarchically pump the heterogeneous information up from the attribute with graph attention networks. In addition to the interaction, the semantic of entities is also important. For example, items usually possess the description of their category and brand, and advertisements have that of their sponsor and campaign, which are unified as side information (Liu et al., 2021). Chen et al. (2019) apply the words of the title as entities to represent the side information.

Figure 2. The overall architecture of the relation-aware heterogeneous graph network. Suppose an edge links a source node and a target node , denote its meta relation as . The representation of the user is obtained by collecting the neighbourhood messages.

However, two problems remain untouched in existing works. First, previous studies oversimplify the relations between entities and merely count on the binary association (with and without interaction). In real-world scenarios, users generally interact with other objects with multiple behaviours, e.g. they can click, like, or purchase items on a typical shopping platform. We argue that different behaviours may contribute to different intentions and the degree of favour. A ‘click’ is naturally a less strong association than a ‘like’ or ‘purchase’. Second, the side information is insufficiently considered so far. Only titles of the item are deemed as attributes, whereas other types, such as categories and brands, are dismissed. Simply integrating more types of side information into the graph may not be sound since they expect different semantic space when doing attention operations. Actual experiments have proved that they do not produce satisfactory results.

The core issue mentioned above is that most works are based on a single type of entity and relation, but more information needs to be uncovered by different types. To that end, we propose Relation-aware Heterogeneous Graph Network for user profiling (RHGN) that can model multiple relations on the heterogeneous graph. In contrast to the single relation graph by previous approaches, we adopt a graph with various relations between different types of entities, as illustrated in Figure 1. We also design a heterogeneous graph propagation network that aggregates information from multiple sources. A transformer-like multi-relation attention (Vaswani et al., 2017; Hu et al., 2020) is employed to learn the importance between nodes and reveal the meta-relation significance on the graph. We validate our model on two real-world user profiling datasets. The experimental result shows that our approach significantly advances the prediction for user profiles. The main contributions of this paper are as follows:

  • We first propose the heterogeneous graph with multiple types of relations and entities for user profiling.

  • We adopt a heterogeneous graph propagation network to acquire heterogeneous information from multiple sources.

  • We use heterogeneous multi-relation attention to automatically reveal the meta-relation significance on the graph.

2. Relation-aware Heterogeneous Graph Networks

In this section, we formulate the user profiling problem and introduce our approach to address the multi-source information extraction on the heterogeneous graph. As illustrated in Figure 2, the model consists of three segments: Neighbourhood Message Passing, Multi-Relation Attention, and Target State Update.

2.1. Problem Statement

Given a collection of users’ behaviours and properties, user profiling aims to predict their labels, such as age and gender. We present the entities and relations as a directed heterogeneous graph , where nodes and edges can be mapped into their types by function and , respectively. In the case of e-commerce, typically has types ‘user’, ‘item’, ‘advertisement’, and ‘attribute’, and has types ‘click’, ‘purchase’, and ‘has_attribute’, as illustrated in Figure 1. Suppose an edge links a source node and a target node , denote its meta relation as . The meta relation generally reflects different interaction intentions between entities.

Note that in real situations, labels for the user are usually limited. Semi-supervised learning is thus required by using a large number of unlabelled data.

2.2. Neighbourhood Message Passing

To obtain the representation of users, it is necessary to collect their neighbourhood messages, that is, the items they have interacted with. They, to a large extent, exhibit one’s interest and concern. Similarly, items should learn from their neighbourhood users and attributes. Formally, for a triplet of a target node , its neighbour node , and an edge relation , we calculate the message to pass in the -th layer as:


where denotes the concatenation operation, is the -th multi-head linear function, is the number of heads, and is a matrix that projects the message into a relation-dependent space. By keeping a distinct for each meta relation , our model can better distinguish different intentions between the source and the target with various types of interaction, e.g. click and purchase.

Model JD-dataset Alibaba-dataset
(l)2-9 Gender-Acc Gender-F1 Age-Acc Age-F1 Gender-Acc Gender-F1 Age-Acc Age-F1
GCN 40.60 28.88 51.83 17.80 78.81 45.48 32.14 14.34
GAT 77.52 75.85 51.45 25.52 79.09 47.75 34.89 16.70
RGCN 60.30 56.40 45.10 17.10 78.00 69.96 36.94 22.28
HGCN 80.00 79.33 52.82 25.77 80.97 63.82 41.55 26.78
(+info) 78.10 77.52 52.30 23.74 80.56 63.27 41.23 26.12
HGAT 80.20 79.40 51.70 19.46 79.89 64.27 36.72 21.76
(+info) 78.74 78.05 51.40 17.13 79.12 63.70 35.16 20.58
RHGN 80.44 79.18 54.70 33.95 83.00 77.73 43.80 29.60
Table 1. The comparison between various models on the two datasets.

2.3. Multi-Relation Attention

Like many sorts of research demonstrated (Rahimi et al., 2018; Chen et al., 2019)

, not all neighbourhood messages are necessarily essential for the target node. For example, the model may need to pay more attention to which item the user bought rather than which advertisement they viewed. We thus adopt an attention weight to rescale the significance of each message. In formal, we project the source node and the target node into a Key vector and a Query vector, respectively, and measure their similarity as:


where and are the -th multi-head linear function for the Key vector and the Query vector, respectively, is a projection matrix, and is the dimension of the vector. The architecture resembles Transformer (Vaswani et al., 2017), and is also used to smooth the dot product of the Key vector and the Query vector. However, the vanilla Transformer calculates the dot product with the same set of parameters for all inputs, which does not consider the effect of multiple associations. The additional weight here can help the model reassign attentions according to different meta relations.

2.4. Target State Update

After propagating the messages and their attentions to the target node, we assemble them to update the target node embedding. We define the update formulation as:


where is a linear function that maps the message back to the target feature distribution, and

denotes the activation function. As the attention

is normalised by the softmax procedure (), it can be directly applied to the message

without affecting its distribution. Through stacking the graph layer and the residual connection, each node can reach

-hop neighbours. For example, a user can receive messages from other users who may share similar interest and behaviour, though they are not connected.

2.5. Training

The final step involves classifying the user representation in the last layer into profile labels. Formally, we employ a single linear classifier and optimize the model with cross-entropy as:


where denotes the labelled user node set, denotes the total number of profile labels, and denotes the ground truth.

3. Experiments

In this section, we conduct experiments on two real-world datasets to evaluate our proposed method.

3.1. Datasets

To examine the actual performance of our proposed method, we select two public large-scale user profiling datasets in real scenes: JD-dataset111 and Alibaba-dataset222, two of the most popular e-commerce portals in China. For each dataset, the heterogeneous graphs are extracted with multiple relations between users, items (or advertisements), and attributes. In consistency with (Chen et al., 2019), we use the user’s gender and age as the label of their profiles. In the JD-dataset, users and items have ‘click’ and ‘purchase’ relations, and attributes include four category descriptions of items. In the Alibaba-dataset, users and items have four relations - ‘click’, ‘purchase’, ‘favorite’, and ‘shopping cart’; users and advertisements have ‘view’ relations; items and advertisements have ‘promotion’ relations; and attributes have three types of basic information of advertisements, including category ID, campaign ID, and sponsor ID.

3.2. Baselines

We consider both classical and state-of-the-art graph methods as baselines: (1) GCN (Kipf and Welling, 2017) and GAT (Veličković et al., 2017) are two representative and strong baselines on many tasks, that are based on homogeneous graphs and do not take mutiple types of relations and entities into account; (2) RGCN (Schlichtkrull et al., 2018) refers to Relational GCN. It splits the graph into several sub-graphs according to different types of relations and uses parallel GCN layers for each sub-graph; (3) HGCN and HGAT (Chen et al., 2019) are two state-of-the-art methods. They regard the node as heterogeneity. However, they lack the distinction between relation types. They also dismiss some important side information; (4) We further extend HGCN and HGAT with (+info) to investigate whether they gain benefit from more side information. We add side information under the same attention distribution of their model.

Model Gender-Acc Gender-F1 Age-Acc Age-F1
RHGN 80.44 79.18 54.70 33.95
w/o U-I relations 78.99 77.56 54.62 31.01
w/o I-A relations 75.95 74.24 52.69 30.23
Table 2. Ablation study on the JD-dataset

3.3. Experimental Setup

We implement our RHGN in the PyTorch framework for efficient GPU computation. In the experiment, we randomly split labelled users into a training set, a validation set, and a test set with the ratio 75 : 12.5 : 12.5

(Qiu et al., 2018; Chen et al., 2019). The embedding for users and items is randomly initialized, whereas that for attributes is initialized by their content via Fasttext (Armand Joulin and Mikolov, 2017). We adopt the grid-search strategy to find the optimal parameter combination for our model. The entity-level aggregation network has two layers with the hidden dimension in {32, 64, 128, 256}. The number of heads in multi-head attention is searched in {1, 2, 4, 8, 16}. All models are optimized via the AdamW optimizer with the One Cycle Learning Rate Scheduler. The learning rate, weight decay, and mini-batch size are set to 0.001, 0.01, 512, respectively. We use GELU (Hendrycks and Gimpel, 2016) as our activation function. The implementation of all baselines follow their original paper.

There are two node classification tasks: the gender prediction (binary classification task) and the age prediction (multi-class classification task). We evaluate the models with Accuracy and Macro-F1 (Wu et al., 2019b; Chen et al., 2019), which are widely used in user profiling problems.

3.4. Results

Table 1 displays the experimental results of different methods on the two datasets. We observe that our model significantly boost the performance of most tasks. In particular, our model presents an averagely higher performance gain on the Alibaba dataset than that on the JD dataset. It is reasonable because the Alibaba dataset contains more diverse interaction behaviours, which carries richer user intentions. By modelling distinct meta relations, our model can intrinsically extract more information than the baselines.

The result also shows that HGCN and HGAT outperform vanilla GCN and GAT, implying that the task benefits from the heterogeneous node types. Nevertheless, they do not improve further by incorporating more side information. It is probably because they project different types of side information into the same distribution so that they cannot discriminate the impact of each.

Overall, the experiment indicates that it is necessary to consider multiple types of meta relations in user profiling, and our approach can leverage such information to provide better services.

Figure 3. An example from the JD-dataset. Visualizing the significance of distinct relation types with different items for opposite genders.

3.5. Ablation Study

To investigate the individual effectiveness of user-item multi-relation and item-attribute (side information) multi-relation, we carry out ablation study experiments on them. Specifically, we modify RHGN by consolidating user-item attention (w/o U-I relations) and item-attribute relations (w/o I-A relations), respectively. As demonstrated in Table 2, the result shows both individuals can improve the performance compared with original model, suggesting that either multi-relation can contribute to the task. In addition, the item-attribute relation yields a higher performance influence than the user-item relation. It is a plausible phenomenon since side information contains category semantics that can reflect user intentions.

3.6. Case Study

To understand how the meta relation impacts the prediction for user profiles, we visualize the attention score between two users (a male and a female) and their interacted items, as illustrated in Figure 3. According to different user genders, the attention score exhibits different significances in terms of the item category and interactive relation. It is worth noting that the ‘click’ relation of some gender-oriented items are more biased than the ‘purchase’ relation of some neutral items.

4. Conclusion

In this paper, we proposed a heterogeneous graph with multiple entities and relations for user profiling. We also adopted a relation-aware heterogeneous graph network to learn the meta relation significance on such a graph. Through experiments on real large datasets, we found incorporating more types of entities and relations is generally beneficial for capturing user’s intentions and predicting their profile labels. Further studies demonstrate the interpretability of RHGM with different meta relation attention weights.


This work is supported by National Key Research and Development Program (2019QY1601, 2019QY1600), National Natural Science Foundation of China (61772528).


  • (1)
  • Armand Joulin and Mikolov (2017) Edouard Grave Armand Joulin and Piotr Bojanowski Tomas Mikolov. 2017. Bag of tricks for efficient text classification. In EACL.
  • Chen et al. (2019) Weijian Chen, Yulong Gu, Zhaochun Ren, Xiangnan He, Hongtao Xie, Tong Guo, Dawei Yin, and Yongdong Zhang. 2019. Semi-supervised User Profiling with Heterogeneous Graph Attention Networks.. In IJCAI, Vol. 19. 2116–2122.
  • Farnadi et al. (2018) Golnoosh Farnadi, Jie Tang, Martine De Cock, and Marie-Francine Moens. 2018. User profiling through deep multimodal fusion. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 171–179.
  • Hendrycks and Gimpel (2016) Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).
  • Hu et al. (2020) Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. 2020. Heterogeneous graph transformer. In Proceedings of The Web Conference 2020. 2704–2710.
  • Kipf and Welling (2017) Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In ICLR.
  • Li et al. (2020) Xiaohan Li, Mengqi Zhang, Shu Wu, Zheng Liu, Liang Wang, and S Yu Philip. 2020. Dynamic graph collaborative filtering. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 322–331.
  • Liu et al. (2021) Chang Liu, Xiaoguang Li, Guohao Cai, Zhenhua Dong, Hong Zhu, and Lifeng Shang. 2021. Non-invasive Self-attention for Side Information Fusion in Sequential Recommendation. In AAAI.
  • Nguyen et al. (2013) Dong Nguyen, Rilana Gravel, Dolf Trieschnigg, and Theo Meder. 2013. ” How Old Do You Think I Am?” A Study of Language and Age in Twitter. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 7.
  • Preoţiuc-Pietro and Ungar (2018) Daniel Preoţiuc-Pietro and Lyle Ungar. 2018. User-level race and ethnicity predictors from twitter text. In Proceedings of the 27th International Conference on Computational Linguistics. 1534–1545.
  • Qiu et al. (2018) Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. 2018. Deepinf: Social influence prediction with deep learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2110–2119.
  • Rahimi et al. (2018) Afshin Rahimi, Trevor Cohn, and Timothy Baldwin. 2018. Semi-supervised user geolocation via graph convolutional networks. In ACL.
  • Rao et al. (2010) Delip Rao, David Yarowsky, Abhishek Shreevats, and Manaswi Gupta. 2010. Classifying latent user attributes in twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents. 37–44.
  • Schlichtkrull et al. (2018) Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European semantic web conference. Springer, 593–607.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS.
  • Veličković et al. (2017) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. In ICLR.
  • Wu et al. (2019b) Chuhan Wu, Fangzhao Wu, Junxin Liu, Shaojian He, Yongfeng Huang, and Xing Xie. 2019b.

    Neural demographic prediction using search query. In

    Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 654–662.
  • Wu et al. (2019a) Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019a. Session-based Recommendation with Graph Neural Networks. In AAAI. 346–353.
  • Xiao et al. (2020) Zhiping Xiao, Weiping Song, Haoyan Xu, Zhicheng Ren, and Yizhou Sun. 2020. TIMME: Twitter Ideology-detection via Multi-task Multi-relational Embedding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2258–2268.
  • Yu et al. (2020) Feng Yu, Yanqiao Zhu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2020. TAGNN: Target Attentive Graph Neural Networks for Session-based Recommendation. In SIGIR. 1921–1924.
  • Zhang et al. (2021b) Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang. 2021b. Mining Latent Structures for Multimedia Recommendation. (2021).
  • Zhang et al. (2020a) Mengqi Zhang, Shu Wu, Meng Gao, Xin Jiang, Ke Xu, and Liang Wang. 2020a. Personalized graph neural networks with attention mechanism for session-aware recommendation. IEEE Transactions on Knowledge and Data Engineering (2020).
  • Zhang et al. (2020b) Yufeng Zhang, Xueli Yu, Zeyu Cui, Shu Wu, Zhongzhen Wen, and Liang Wang. 2020b. Every document owns its structure: Inductive text classification via graph neural networks. In ACL.
  • Zhang et al. (2021a) Yufeng Zhang, Jinghao Zhang, Zeyu Cui, Shu Wu, and Liang Wang. 2021a. A Graph-based Relevance Matching Model for Ad-hoc Retrieval. In AAAI.
  • Zhao et al. (2019) Sha Zhao, Zhiling Luo, Ziwen Jiang, Haiyan Wang, Feng Xu, Shijian Li, Jianwei Yin, and Gang Pan. 2019. Appusage2vec: Modeling smartphone app usage for prediction. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1322–1333.