Personalized News Recommendation: A Survey

06/16/2021 ∙ by Chuhan Wu, et al. ∙ Microsoft Tsinghua University 0

Personalized news recommendation is an important technique to help users find their interested news information and alleviate their information overload. It has been extensively studied over decades and has achieved notable success in improving users' news reading experience. However, there are still many unsolved problems and challenges that need to be further studied. To help researchers master the advances in personalized news recommendation over the past years, in this paper we present a comprehensive overview of personalized news recommendation. Instead of following the conventional taxonomy of news recommendation methods, in this paper we propose a novel perspective to understand personalized news recommendation based on its core problems and the associated techniques and challenges. We first review the techniques for tackling each core problem in a personalized news recommender system and the challenges they face. Next, we introduce the public datasets and evaluation methods for personalized news recommendation. We then discuss the key points on improving the responsibility of personalized news recommender systems. Finally, we raise several research directions that are worth investigating in the future. This paper can provide up-to-date and comprehensive views to help readers understand the personalized news recommendation field. We hope this paper can facilitate research on personalized news recommendation and as well as related fields in natural language processing and data mining.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

In the era of the Internet, online news distributing platforms such as Microsoft News111https://microsoftnews.msn.com have attracted hundreds of millions of users (Wu et al., 2020f). Due to the convenience and timeliness of online news services, many users have shifted their news reading habits from conventional newspapers to digital news content (Okura et al., 2017). However, a large number of news articles are created and published every day, and it is impossible for users to browse through all available news to seek their interest news information (Wu et al., 2019b). Thus, personalized news recommendation techniques, which aim to select news according to users’ personal interest, are critical for news platforms to help users alleviate their information overload of users and improve news reading experience (Li and Wang, 2019). Researches on personalized news recommendation have also attracted increasing attention from both academia and industry in recent years (Okura et al., 2017; Wu et al., 2019a).

An example workflow of personalized news recommender system is shown in Fig. 1. When a user visits the news platform, the news platform will recall a small set of candidate news from a large-scale news pool, and the personalized news recommender will rank these candidate news articles according to the user interests inferred from user profiles. Then, the top K ranked news will be displayed to the user, and the user behaviors on these news will be recorded by the platform to update the maintained user profile for providing future services. Although many prior works have extensively studied these problems in different aspects, personalized news recommendation remains challenging. For example, news articles on news websites usually have short life cycles. Many new articles emerge every day, and old ones will expire after a short period of time. Thus, news recommendation faces a severe cold-start problem. In addition, news articles usually contain rich textual information such as title and body. Thus, it is very important to understand news content from their texts with advanced natural language processing techniques. Moreover, there is usually no explicit user feedback such as reviews and ratings on news platforms. Thus, we need to infer the personal interests of users from their implicit feedback like clicks. However, user interests are usually diverse and dynamic, which poses great challenges to user modeling algorithms. The complexity of personalized news recommendation makes it a fascinating research topic with various challenges to be tackled (Feng et al., 2020).

Figure 1. An example workflow of personalized news recommender systems.

A comprehensive overview of existing personalized news recommendation approaches can provide useful guidance for future research in this field. Over the past years, there are many survey papers that review the techniques of news recommendation (Bogers and Van den Bosch, 2007; Borges and Lorena, 2010; Li et al., 2011a; Sood and Kaur, 2014b; Özgöbek et al., 2014; Durairaj and Kumar, 2014; Doychev et al., 2014; Harandi and Gulla, 2015; Dwivedi and Arya, 2016; Karimi et al., 2018; Li and Wang, 2019; Feng et al., 2020; Qin, 2020). For example, Li et al. (Li and Wang, 2019)

reviewed the personalized news recommendation methods based on handcrafted features to build news and user representations. They covered many traditional methods, including collaborative filtering (CF) based ones that use the IDs of users and news, content-based ones that use features extracted from the content of news and the user behaviors on news, and hybrid ones that rely on content-based collaborative filtering. They also studied the datasets used by these methods and their techniques for user and news representation construction, data processing and user privacy protection. Feng et al. 

(Feng et al., 2020)

reviewed news recommendation approaches in many different scenarios including personalized and non-personalized ones. For personalized news recommendation methods, they also classify them into three categories, i.e., CF-based, content-based, and hybrid ones. They mainly studied the techniques adopted by different methods, the challenges they tackled, and the datasets and metrics for evaluation. However, many recent works especially those based on deep learning are not covered by existing survey papers, which is not beneficial for researchers to track recent advances in the personalized news recommendation field. In addition, the conventional news recommendation method taxonomy (i.e., CF-based, content-based and hybrid) used by many existing surveys cannot meet the development of this field, and a more systematic overview of existing news recommendation methods is needed to help understand their characteristics and inspire further research.

In this paper, we present a comprehensive review of the personalized news recommendation field. Instead of reviewing existing personalized news recommendation methods based on the conventional taxonomy, in this survey we propose a novel perspective to review them based on the core problems involved in personalized news recommendation and the associated techniques and challenges. We first introduce the framework of developing a personalized news recommender system in Section 2. Next, we systematically review the core problems, techniques and challenges in personalized news recommendation, including: news modeling, user modeling, news ranking, model training, datasets, benchmarks and evaluation, which are introduced in Sections 3-7, respectively. We then present some discussions on developing responsible news recommender systems in Section 10, and raise several potential future directions in Section 11. Finally, we conclude this paper in Section 12.

2. Framework of Personalized News Recommendation

Personalized news recommendation techniques have been widely used in many online news websites (Okura et al., 2017; Wu et al., 2020f). Different from non-personalized news recommendation methods that suggest news articles solely based on non-personalized factors (Lavrenko et al., 2000) such as news popularity (Corsini and Larson, 2016; Yang, 2016; Ludmann, 2017; Lommatzsch et al., 2018), editors’ demonstration (Wang et al., 2017) and geographic information (Son et al., 2013; Chen et al., 2017), personalized news recommendation can consider the personal interest of each individual user to provide personalized news services and better satisfy users’ need.

Figure 2. A framework of the key components in developing personalized news recommendation model.

Existing surveys on personalized news recommendation usually classify methods into three categories, i.e., collaborative filtering-based, content-based and hybrid ones (Li and Wang, 2019). However, this classification criteria cannot adapt to the recent advances in news recommendation because many methods with diverse characteristics fall in the same category without distinguishment. For example, the category of content-based method includes traditional semantic-based methods, contextual bandit-based methods and recent deep learning-based methods, which is difficult for researchers to understand the technical paradigm of personalized news recommendation. Thus, a systematic overview of existing techniques is required to help understand the development of this field.

Instead of following the conventional taxonomy, in this survey we propose a novel perspective to review existing personalized news recommendation techniques based on the core problems involved in the development of a personalized news recommender system. A common framework of personalized news recommendation model development is shown in Fig. 2. We can see that there are several key problems in this framework. First, news modeling is the backbone of news recommendation and a core problem is how to understand the content and characteristics of news. In addition, user modeling is required to understand the personal interest of users on news, and it is critical to accurately infer user interest from user profiles like behaviors. Based on the news and user representations built by the news and user models, the next step is ranking candidate news according to certain policies such as the relevance between news and user interest. Then, it is important to train the recommendation model with proper objectives to make high-quality news recommendations, and evaluating the ranking results given by the recommendation model is also a core problem in the development of personalized recommender systems. Besides, the datasets and benchmarks for news recommendation are also necessities in designing personalized news recommendation models. Moreover, beyond developing accurate models, improving the responsibility of intelligent systems has been a spotlight problem in recent years. How to develop responsible news recommender systems is a less studied but extremely important problem in personalized news recommendation. Next, we briefly discuss the key problems mentioned above in the following sections.

2.1. News Modeling

News modeling aims to understand the characteristics and content of news, which is the backbone of news recommendation. There are mainly two kinds of techniques for news modeling, i.e., feature-based news modeling and deep learning-based news modeling. Feature-based news modeling methods usually rely on handcrafted features to represent news articles. For instance, in many methods based on collaborative filtering (CF), news articles are represented by their IDs (Resnick et al., 1994; Das et al., 2007). However, on most news websites novel news articles are published continuously and old ones soon vanish. Thus, representing news articles with their IDs will suffer from severe cold-start problems, and the performance is usually suboptimal.

Considering the drawbacks of ID-based news modeling methods, most approaches incorporate content features to represent news. Among them, many methods use features extracted from news texts for news modeling. For instance, Capelle et al. (Capelle et al., 2012) proposed to represent news with Synset Frequency-Inverse Document Frequency (SF-IDF), which uses WordNet synonym set to replace the term frequencies in TF-IDF. Besides the news texts, many methods also explore to incorporate various factors that may have influence on users’ news browsing decisions into news modeling, such as news popularity and recency (Li et al., 2011b). However, in these methods, the features to represent news are usually manually designed, which usually requires much effort and domain knowledge. In addition, handcrafted features are usually not optimal in representing the semantic information encoded in news texts.

With the development of natural language processing techniques in recent years, many methods employ neural NLP models to learn deep representations of news. For example, Okura et al. (Okura et al., 2017)

proposed to use autoencoders to learn news representations from news content. Wang et al. 

(Wang et al., 2018)

proposed to use a knowledge-aware convolutional neural network (CNN) to learn news representations from news titles and their entities. Wu et al. 

(Wu et al., 2019d) proposed to learn news representations from news titles via a combination of multi-head self-attention and additive attention networks. Wu et al. (Wu et al., 2021c)

studied to use pre-trained language models to encode news texts. These deep learning-based news modeling methods can automatically learn informative news representations without the need of manual feature engineering, and they can usually better understand news content than traditional feature-based methods.

2.2. User Modeling

User modeling techniques in news recommendation aim to understand users’ personal interest in news. Similar to news modeling, user modeling methods can also be roughly classified into two categories, i.e., feature-based and deep learning-based. Some feature-based methods like CF represent users with their IDs (Resnick et al., 1994; Das et al., 2007). However, they usually suffer from the sparsity of user data and cannot model user interest accurately. Thus, most feature-based methods consider other user information such as click behaviors on news. For example, Garcin et al. (Garcin et al., 2012)

proposed to use Latent Dirichlet Allocation (LDA) to extract topics from the concatenation of news title, summary and body. The topic vectors of all clicked news are further aggregated into a user vector by averaging. There are also several works that explore to incorporate other user features into user modeling, such as demographics 

(Lee and Park, 2007), location (Fortuna et al., 2010) and access patterns (Li et al., 2011b). However, feature-based user modeling methods also require an enormous amount of domain knowledge to design informative user features in specific scenarios, and they are usually suboptimal in representing user interests.

There are several methods that use neural networks to learn user representations from users’ click behaviors. For example, Okura et al. 

(Okura et al., 2017) proposed to use a GRU network to learn user representations from clicked news. Wu et al. (Wu et al., 2019b) proposed a personalized attention network to learn user representations from clicked news in a personalized manner. These methods can automatically learn deep interest representations of users for personalized news recommendation, which are usually more accurate than handcrafted user interest features.

2.3. News Ranking

On the basis of news and user interest modeling, the next step is rank candidate news in a personalized way according to user interest. Most methods rank news based on their relevance to user interest, and how to accurately measure the relevance between user interest and candidate news is their core problem. Some methods measure the user-news relevance based on their representations. For example, Goossen et al. (Goossen et al., 2011)

proposed to compute the cosine similarity between the Concept Frequency-Inverse Document Frequency (CF-IDF) features extracted from candidate news and clicked news, which was further used for personalized candidate news ranking. Okura et al. 

(Okura et al., 2017) used the inner product between news and user embeddings to compute the click scores, and ranked candidate news based on these scores. Gershman et al. (Gershman et al., 2011) proposed to use an SVM model for each individual user to classify whether this user will click a candidate news based on news and user interest features. In several recent methods, the relevance between candidate news and user interest is modeled in a fine-grained way by matching candidate news with clicked news. For example, Wang et al. (Wang et al., 2020b) proposed to match candidate news and clicked news with a 3-D convolutional neural network to mine the fine-grained relatedness between their content. However, ranking candidate news and user interest merely based on their relevance may recommend news that are similar to those previously clicked by users (Qi et al., 2021c), which may cause the “filter bubble” problem.

A few methods use reinforcement learning for news ranking. Li et al. 

(Li et al., 2010) first explore to model the personalized news recommendation task as a contextual bandit problem. They proposed a LinUCB approach that computes the upper confidence bound (UCB) of each arm efficiently in closed form based on a linear payoff model, which can match news with users’ personal interest and meanwhile explore making diverse recommendations. DRN (Zheng et al., 2018)

uses a deep reinforcement learning approach to find the interest matching policy that optimizes the long-term reward. In addition, it uses a Dueling Bandit Gradient Descent (DBGD) method for exploration. These methods usually optimize the long-term reward rather than the current click probability, which has the potential to alleviate the filter bubble problem by exploring more diverse user interest.

2.4. Model Training

Many personalized news recommendation methods employ machine learning models for news modeling, user modeling and interest matching. How to train these models to make accurate recommendations is a critical problem. A few methods train their models by predicting the ratings on news given by users. For example, the Grouplens 

(Resnick et al., 1994) system is trained by predicting the unknown ratings in the user-news matrix. However, explicit feedback such as ratings is usually sparse on news platforms. Thus, most existing methods use implicit feedback like clicks to construct prediction targets for model training. For example, Wang et al. (Wang et al., 2018)

formulated the news click prediction problem as a binary classification task, and use crossentropy as the loss function for model training. Wu et al. 

(Wu et al., 2019b) proposed to employ negative sampling techniques that combine each positive sample with several negative samples to construct labeled samples for model training. However, implicit feedback usually contains much noise and may not indicate user interest, which poses great challenges to learning accurate user interest models.

2.5. Evaluation

Properly evaluating the performance of personalized news recommendation algorithms is important for developing high-quality news recommender systems. Most of existing methods use click-related metrics to measure the accuracy of recommendation results. Some of them regard the recommendation task as a classification problem (Wang et al., 2018; Lian et al., 2018; Hu et al., 2020a), where the performance is evaluated by classification metrics such as Area Under Curve (AUC) and F1-score. Many other methods use ranking metrics such as Mean Reciprocal Rank (MRR) and normalized Discounted Cummulative Gain (nDCG). However, click based metrics may not indicate user experience. Thus, a few works explore to use user engagement based metrics to evaluate the recommendation performance (Wu et al., 2021d), such as dwell time and dislike, which can evaluate the performance of recommendation models more comprehensively.

In most works, the performance of recommendation models is offline evaluated. However, the data used for offline evaluation is usually influenced by the recommendation results generated by the predecessor recommendation algorithms. Only a few works reported online evaluation results (Wu et al., 2021c), which may better indicate the real performance of the recommender systems.

2.6. Dataset

In the news recommendation field most researches are conducted on proprietary datasets, while there are only a few datasets that are publicly available. Several representative datasets are plista (Kille et al., 2013), Adressa (Gulla et al., 2017) and MIND (Wu et al., 2020f). Among them, only MIND is a large-scale English news recommendation dataset with raw textual information of news. In addition, MIND is associated with a public leaderboard and an open competition. Thus, many recent researches are conducted on the MIND dataset (Wu et al., 2021c, f, a).

2.7. Responsible Personalized News Recommendation

Most endeavors on personalized news recommendation focus on improving the accuracy of recommendation results. In recent years, research on the responsibility of intelligent systems has gained high attention to help AI techniques better help humans and avoid their potential negative societal impact. Thus, a few studies explore to improve the responsibility of personalized news recommender systems in different aspects, such as privacy preserving (Qi et al., 2020), diversity (Wu et al., 2020c), debiasing (Yi et al., 2021) and fairness (Wu et al., 2021g). These methods have the potential to help develop higher-quality news recommendation algorithms to serve users in a responsible way.

On the basis of the overview above, we then present in-depth discussions on each core problem in the following sections.

3. News Modeling

News modeling is a critical step in personalized news recommendation methods to capture the characteristics of news articles and understand their content. The techniques for news modeling can be roughly divided into two categories, i.e., feature-based and deep learning-based, which are introduced as follows.

3.1. Feature-based News Modeling

Feature-based news modeling methods mainly rely on handcrafted features to represent news articles. As summarized in Fig. 1, there are mainly four types of features used in news modeling, which are introduced as follows.

In many CF-based methods, news articles are represented by collaborative filtering signals such as news IDs (Resnick et al., 1994; Das et al., 2007; Shan Liu et al., 2016; Xiao et al., 2015; Ji et al., 2016; Mookiah et al., 2018; Han, 2020). However, on most news websites novel news are published quickly and old ones will soon vanish. These methods model news in a content-agnostic manner, which may suffer from the serious cold start problem due to the difficulty in processing newly generated news. Thus, it is not suitable to simply represent news articles with their IDs (Domann and Lommatzsch, 2017).

Figure 3. An overview of different types of news features.

Due to the drawbacks of ID-based news modeling, many methods incorporate news content into news modeling. For instance, Gershman et al. (Gershman et al., 2011) considered Term Frequency-Inverse Document Frequency (TF-IDF) features extracted from news texts. In news articles, entities/concepts are usually more important than other words in understanding news content. Thus, many methods use the entities/concepts in news texts to represent their content. For example, Goossen et al. (Goossen et al., 2011) proposed to use Concept Frequency-Inverse Document Frequency (CF-IDF) to model news content, which is a variant of TF-IDF that uses the frequency of concepts extracted from WordNet rather than term frequency. Capelle et al. (Capelle et al., 2012) proposed to use Synset Frequency–Inverse Document Frequency (SF-IDF) to model news, which is based on the frequency of synonym sets in WordNet. SF-IDF is extended by Moerland et al. (Moerland et al., 2013) into SF-IDF+ by additionally considering the relationships of concepts. They extend the synonym sets of concepts in news by adding other concepts in WordNet that have relationships with the included concepts. Based on aforementioned approaches, the family of CF-IDF is expanded by a set of later works (Hogenboom et al., 2013, 2014; Capelle et al., 2015; de Koning et al., 2018; Brocken et al., 2019).

Besides semantic features, some works explore to extract other kinds of content features to enhance modeling (Lee and Park, 2007; Li et al., 2011b; Parizi and Kazemifard, 2015). For example, Garcin et al. (Garcin et al., 2012) proposed to use Latent Dirichlet Allocation (LDA) to extract topics from the concatenation of news title, summary and main content. Parizi et al. (Parizi and Kazemifard, 2015) proposed to extract emotion features of sentences in news as complementary information of TF-IDF features. In their method, the emotion is represented by the Ekman model that contains 6 emotion categories. A variant of this method that uses the sentiment orientation (i.e., positive, neutral and negative) is also developed by Parizi et al. (Parizi et al., 2016). Beyond news texts, the exploitation of vision-related information such as the videos of news is also studied in (Luo et al., 2008). These features can provide complementary information to better understand news content.

In addition to content features, many other genres of features are used for news modeling. They can be roughly divided into two categories, i.e., property features and context features. Property features such as categories, locations and publishers usually reflect intrinsic properties of news. The most widely used news property feature is category, since it is an important clue for modeling news content and targeting user interest. For example, Liu et al. (Liu et al., 2010) proposed to represent news using their topic categories. However, since the category labels of news often need to be manually annotated by editors, in some scenarios news may not have off-the-shelf category labels, Thus, several methods explore to cluster news into categories based on their content. For instance, in the SCENE (Li et al., 2011b) recommender system, news articles are clustered in a hierarchical manner based on their topic features extracted by LDA. By incorporating the categories or clusters of news into news modeling, the news recommender can be aware of news topics and provide more targeted recommendation services. Another representative property feature is news location, which is also widely used to provide users with the news related to the locations that they are interested in. For example, Tavakolifard et al. (Tavakolifard et al., 2013) incorporated the geographic information of news to filter news based on their locations. In addition, since news from different publishers may have differences in their content and topics, the information of news publisher is also considered by several methods to enrich the information for news modeling (Ilievski and Roy, 2013; Liang et al., 2017).

Different from property features that are usually static after news publishing, context features of news are dynamic. Popularity and recency, which reflect the attractiveness and freshness of news, are two representative context features used by existing methods. For instance, MONERS (Lee and Park, 2007) is a news recommender system that represents news articles by news categories, news importance suggested by providers and the recency of news articles. Gershman et al. (Gershman et al., 2011) proposed to use four kinds of features to represent news, i.e., news popularity, news age (recency), TF-IDF features of words and named entities. Jonnalagedda et al. (Jonnalagedda and Gauch, 2013) proposed to use the timeline on Twitter to enhance news modeling. They use the popularity and categories of news on Twitter for news representation. News recency only considers the time interval between the publishing and display of news, while time stamp of news display can provide finer-grained information, such as seasons, months, days and the time in a day. Thus, several approaches incorporate the time stamp of news impression (Chu and Park, 2009; Fortuna et al., 2010; Ilievski and Roy, 2013; Xiao et al., 2015; Epure et al., 2017). For example, Ilievski et al. (Ilievski and Roy, 2013) proposed to incorporate the weekday and the hour of a news impression in news modeling. In addition to the context features mentioned above, several methods also explore to use weather (Yeung and Yang, 2010), click-through rate (CTR) (Chu and Park, 2009), and fact/opinion bias (Patankar et al., 2019) to enrich the representations of news.

Some hybrid methods consider both news IDs and additional features in news modeling (Lommatzsch, 2014). For example, NewsWeeder (Lang, 1995) represents news articles by their IDs and bag-of-word features. Claypool et al. (Claypool et al., 1999) proposed to use news IDs and keywords to model news. Liu et al. (Liu et al., 2010) proposed to represent news using their IDs and topic categories. Saranya et al. (Saranya and Sadhasivam, 2012) proposed to represent news by their IDs, topics, click frequency and the weights of a news belonging to different categories. Using the combination of ID-based and content-based news modeling techniques can mitigate the cold-start problem of news to some extent, and have been widely explored by integrating other information like news property features (Wen et al., 2012; Darvishy et al., 2020), news sessions (Sottocornola et al., 2018), ontology (Cantador et al., 2011; Nguyen et al., 2015; Rao et al., 2013; Shapira et al., 2009; Gao et al., 2009)

and knowledge graphs 

(Zhang et al., 2017).

To draw a big picture of feature-based news modeling methods, we summarize the major features they used in Table 1.

Features for News Modeling References

BOW/XF-IDF*
(Goossen et al., 2011)(Capelle et al., 2012)(Moerland et al., 2013)(Capelle et al., 2013)(Hogenboom et al., 2013)(Hogenboom et al., 2014)(Capelle et al., 2015)(de Koning et al., 2018)(Brocken et al., 2019)(Gershman et al., 2011)(Chu and Park, 2009)(Lang, 1995)(Wen et al., 2012)(Nguyen et al., 2015)(Rao et al., 2013)
(Zhang et al., 2017)(Billsus and Pazzani, 2000)(Gabrilovich et al., 2004)(Cantador and Castells, 2009)(Gu et al., 2014)(Kirshenbaum et al., 2012)(Kompan and Bieliková, 2010)(Luostarinen and Kohonen, 2013)(Phelan et al., 2009)(Parizi and Kazemifard, 2015)(Parizi et al., 2016)(Liu et al., 2021)(Wei et al., 2021)(Lu and Liu, 2016)
Entity/Keyword
(Goossen et al., 2011)(Capelle et al., 2012)(Moerland et al., 2013)(Capelle et al., 2013)(Hogenboom et al., 2013)(Hogenboom et al., 2014)(Capelle et al., 2015)(de Koning et al., 2018)(Brocken et al., 2019)(Li et al., 2011b)(Gershman et al., 2011)(Tavakolifard et al., 2013)(Ilievski and Roy, 2013)(Lin et al., 2014)(Mookiah et al., 2018)(Li and Li, 2013)(Joseph and Jiang, 2019) (Claypool et al., 1999)(Darvishy et al., 2020)
(Rao et al., 2013)(Zhang et al., 2017)(Billsus and Pazzani, 2000)(Gabrilovich et al., 2004)(Zheng et al., 2018)(Caldarelli et al., 2016)(Cantador and Castells, 2009)(Cantador et al., 2011)(Frasincar et al., 2009)(Jugovac et al., 2018)(Kompan and Bieliková, 2010)(Tran et al., 2010)(Khattar et al., 2017)(Wang et al., 2020a)

Cluster/Category
(Lee and Park, 2007)(Li et al., 2011b)(Jonnalagedda and Gauch, 2013)(Darvishy et al., 2020)(Epure et al., 2017)(Chu and Park, 2009)(Liu et al., 2010)(Saranya and Sadhasivam, 2012)(Darvishy et al., 2020)(Sottocornola et al., 2018)(Gao et al., 2009)(Li et al., 2014)(Li et al., 2011c)(Viana and Soares, 2017)(Li et al., 2010)(Zheng et al., 2018)(Caldarelli et al., 2016)
(Gu et al., 2014)(Jonnalagedda et al., 2016)(Kompan and Bieliková, 2010)(Sood and Kaur, 2014a)(Zeleník and Bieliková, 2011)(Liu et al., 2021)(Gharahighehi et al., 2020)(Yang et al., 2020)
Topic Distribution (Garcin et al., 2012)(Li et al., 2011b)(Noh et al., 2014)(Zihayat et al., 2019)(Li and Li, 2013)(Garcin and Faltings, 2013)(Garcin et al., 2013)(Saranya and Sadhasivam, 2012)(Li et al., 2014)(Li et al., 2011c)(Hsieh et al., 2016)(Li et al., 2010)(Luostarinen and Kohonen, 2013)(Patankar et al., 2019)(Tran et al., 2010)(Hsieh et al., 2016)
Location (Tavakolifard et al., 2013)(Noh et al., 2014)(Yeung and Yang, 2010)(Ilievski and Roy, 2013)(Viana and Soares, 2017)(Kazai et al., 2016)(Wei et al., 2021)
Publisher (Ilievski and Roy, 2013)(Liang et al., 2017)(Zheng et al., 2018)(Yang et al., 2020)
Popularity (Shan Liu et al., 2016)(Li et al., 2011b)(Gershman et al., 2011)(Jonnalagedda and Gauch, 2013)(Tavakolifard et al., 2013)(Darvishy et al., 2020)(Ilievski and Roy, 2013)(Zihayat et al., 2019)(Chu and Park, 2009)(Darvishy et al., 2020)(Li et al., 2011c)(Jonnalagedda et al., 2016)(Kazai et al., 2016)(Kirshenbaum et al., 2012)(Garcin and Faltings, 2013)(Ilievski and Roy, 2013)
CTR (Chu and Park, 2009)
Recency (Lee and Park, 2007)(Li et al., 2011b)(Gershman et al., 2011)(Tavakolifard et al., 2013)(Darvishy et al., 2020)(Ilievski and Roy, 2013)(Zihayat et al., 2019)(Saranya and Sadhasivam, 2012)(Khattar et al., 2017)(Darvishy et al., 2020)(Li et al., 2011c)(Zheng et al., 2018)(Zeleník and Bieliková, 2011)(Wei et al., 2021)
Novelty (Garcin et al., 2013)(Gabrilovich et al., 2004)
Dwell Time (Chen et al., 2009)(Gershman et al., 2011)(Yi et al., 2014)(Ilievski and Roy, 2013)(Zihayat et al., 2019)(Zheng et al., 2018)(Ilievski and Roy, 2013)
Time Stamp (Ilievski and Roy, 2013)(Epure et al., 2017)(Chu and Park, 2009)(Fortuna et al., 2010)(Xiao et al., 2015)(Yang et al., 2020)
Emotion/Sentiment (Parizi and Kazemifard, 2015)(Parizi et al., 2016)
Bias (Patankar et al., 2019)
Knowledge Graph (Joseph and Jiang, 2019)(Zhang et al., 2017)
News/User Graph (Lin et al., 2014)(Mookiah et al., 2018)(Li and Li, 2013)(Garcin et al., 2013)(Trevisiol et al., 2014)(Li et al., 2014)(Li et al., 2010)(Phelan et al., 2011)(Gharahighehi et al., 2020)
Ontology (Goossen et al., 2011)(Capelle et al., 2012)(Moerland et al., 2013)(Capelle et al., 2013)(Hogenboom et al., 2013)(Hogenboom et al., 2014)(Capelle et al., 2015)(de Koning et al., 2018)(Brocken et al., 2019)(Wen et al., 2012)(Nguyen et al., 2015)(Rao et al., 2013)(Shapira et al., 2009)(Gao et al., 2009)(Cantador and Castells, 2009)(Cantador et al., 2011)(Frasincar et al., 2009)
Visual Information (Luo et al., 2008)
Table 1. Main features used for news representation. *XF-IDF means TF-IDF and its variants such as CF-IDF and SF-IDF.

3.2. Deep learning-based News Modeling

With the development of deep learning techniques, in recent years many methods employ neural networks to automatically learn news representations. Most of them use neural NLP techniques to learn news representations from news texts. For example, Okura et al. (Okura et al., 2017)

proposed an embedding-based news recommendation (EBNR) method that uses a variant of denoising autoencoders to learn news representations from news texts. RA-DSSM 

(Kumar et al., 2017a) is a neural news recommendation approach which incorporates a similar architecture as DSSM (Huang et al., 2013). It first builds the representations of news using the doc2vec (Le and Mikolov, 2014) tool, then uses a two-layer neural network to learn hidden news representations. This method is also adopted by (Kumar et al., 2017b). 3-D-CNN (Kumar et al., 2017c) represents news by the embeddings of their words word2vec (Mikolov et al., 2013). However, it is difficult for these methods to mine the semantic information in news texts with traditional neural NLP models.

Many later approaches use more advanced neural NLP models for text modeling. For example, WE3CN (Khattar et al., 2018) uses 2D CNN models to learn representations of news. NPA (Wu et al., 2019b) uses CNN to generate contextual representations of words in news titles, and use a personalized attention network to form news representations by selecting important words in a personalized manner. NRMS (Wu et al., 2019d) learns word representations with a multi-head self-attention network, and useS an additive attention network to form news representations. Similar news modeling method is also used by many later works (Wu et al., 2020d, c, 2021g, 2021d, 2021f). NRNF (Wu et al., 2020a) uses self-attention to model the contexts of words in news title and body, and it uses an interactive attention network to model the relatedness between title and body. FedRec (Qi et al., 2020) learns news representations from news titles via a combination of CNN and multi-head self-attention networks. These methods usually learn news representations based on shallow text models and non-contextualized word embeddings such as GloVe (Pennington et al., 2014), which may be insufficient to capture the deep semantic information in news. Pre-trained language models (PLMs) such as BERT (Devlin et al., 2019) have been greatly successful in the NLP field, and a few recent works explore to empower news modeling with PLMs (Wu et al., 2021c; Xiao et al., 2021). For example, PLM-NR (Wu et al., 2021c) uses different PLMs to empower English and multilingual news recommendation, and the online flight results in Microsoft News showed notable performance improvement. Their findings imply the importance of accurate text understanding in news recommendation.

Instead of merely modeling semantic information in news texts, several methods study to use entities or keywords in news texts to enhance news modeling by introducing complementary knowledge and commonsense information. For instance, DAN (Zhu et al., 2019)

learns news representations from news titles and entities via two parallel CNN networks with max pooling operations. DKN 

(Wang et al., 2018) learns news representations from the titles of news and the entities within titles via a knowledge-aware CNN. The representations of entities are learned from a knowledge graph using the TransD (Ji et al., 2015) knowledge graph embedding algorithm. Saskr (Chu et al., 2019) builds news representations from news titles and bodies based on the average embeddings of their entities. DNA (Zhang et al., 2019a) learns news representations from the news body, news ID and the elements (entities and keywords). More specifically, the sentences in a news body are transformed into their embeddings via doc2vec (Le and Mikolov, 2014), and then are aggregated into a unified one via a sentence-level candidate-aware attention network. Each news element is represented by averaging the embeddings of its words, and elements representations are synthesized together via an element-level candidate-aware attention network. The embeddings of the ID, texts, and elements of each piece of news are concatenated together into a unified news representation. Gao et al. (Gao et al., 2018) proposed a knowledge-aware news recommendation approach with hierarchical attention networks. In their method, a word attention network is used to learn word-based news representations by using the embeddings of keywords as attention queries, and these representations are concatenated with both entity embeddings and the average embeddings of the entities in their contexts. An item attention network is used to aggregate these three kinds of news representations by modeling their informativeness. Liu et al. (Liu et al., 2019) proposed to construct a news-relevant knowledge graph on the basis of the Microsoft Satori knowledge graph by extracting additional knowledge entities and topic entities from news and connecting entities in the same news, entities clicked by the same user and entities appearing in the same browsing session to enrich the relations between entities in the knowledge graph. They combine the entity embeddings learned by TransE (Bordes et al., 2013) with the news text embeddings learned by LDA and DSSM. TEKGR (Lee et al., 2020) also enriches the knowledge graph with topical relations between entities. It predicts the topic of news based on texts and concepts, and used the predicted topic to enrich the knowledge graph and learn topic enriched knowledge representations of news with graph neural networks. CAGE (Sheu and Li, 2020; Sheu et al., 2021) constructs subgraph of KG by using one-hop neighbors of entities, and uses the TransE embeddings of entities as complements to text embeddings learned by CNN. KRED (Liu et al., 2020) first learns entity embeddings from knowledge graph with graph attention networks, then incorporates additional entity features such as frequency, category and position, and finally selects entities according to the texts representations of news. HieRec (Qi et al., 2021c) uses text self-attention and entity self-attention to model the contexts in news title and the relations between entities in news texts, respectively. KIM (Qi et al., 2021a) incorporates a knowledge-aware interactive news modeling method that can model the relations between the entities and their neighbors of clicked news and candidate news.

To better model the characteristics of news articles, several methods explore to incorporate other types of news information beyond texts into news modeling. For example, DeepJoNN (Zhang et al., 2018) learns news representations from news IDs, categories, keywords and entities via a character-level CNN. Park et al. (Park et al., 2017) proposed a neural news recommendation method based on LSTM. They use a proprietary corpus to train a doc2vec (Le and Mikolov, 2014) model to encode news articles into their vector representations, and use an LSTM network to generate user representations from the representations of news. In addition, they incorporate the categories of news into news representations, which are predicted by a CNN (Kim, 2014) model. TANR (Wu et al., 2019e) learns news representation from news titles via a combination of CNN and attention network, which is also used in (Wu et al., 2019c; Yi et al., 2021). Moreover, TANR incorporates an auxiliary news topic prediction task to learn topic-aware news representations. NAML (Wu et al., 2019a) is a news recommendation method with attentive multi-view learning, which incorporates different kinds of news information as different views of news. In this method, news titles, bodies, categories and subcategories are processed by different models, and their embeddings are further aggregated together into a unified one via a view-level attention network. A similar method is also used by (Zhang et al., 2021; Wu et al., 2021a) to model candidate news. LSTUR (An et al., 2019)

uses a combination of CNN and attention network to process news titles, and incorporates categories and subcategories by applying a non-linear transformation to their embeddings. CHAMELEON 

(Gabriel De Souza et al., 2019; de Souza Pereira Moreira et al., 2018) learns news representations from news bodies by using CNN with different kernel sizes, and these textual representations are fused with news metadata features such as topics, categories and tags using a fully connected layer. It also predicts the metadata features of news via auxiliary tasks. PP-Rec (Qi et al., 2021b) uses both news title, entities and news popularity information in news modeling. It uses gating mechanisms to synthesize the near-real-time CTR, recency and popularity predicted from news title into a unified news popularity score. SentiRec (Wu et al., 2020c) considers the sentiment orientation of news to learn sentiment-aware news representations. It uses the VADER (Hutto and Gilbert, 2014) algorithm to compute real-valued sentiment scores of news. MM-Rec (Wu et al., 2021e) uses a visiolinguistic model ViLBERT (Lu et al., 2019) to learn news multi-modal representations from both news texts and images. DebiasRec (Yi et al., 2021) uses CNN and attention network to learn news content representations from news titles, and learns news bias representations from the size and positions of news displayed on websites with a bias model. These methods can usually understand news better by incorporating additional news information. However, some news features (e.g., news category labels) may not be available in real-world news recommender systems, which hinders the exploitation of these features.

 

Methods Information Used Model
EBNR (Okura et al., 2017) Body Autoencoder
RA-DSSM (Kumar et al., 2017a) Title+Body Doc2vec+NN
Khattar et al. (Kumar et al., 2017b) Title+Body Doc2vec+NN
3-D-CNN (Kumar et al., 2017c) Title+Body Word2vec
WE3CN (Khattar et al., 2018) Title+Body 2-D CNN
NPA (Wu et al., 2019b) Title CNN+Personalized Attention
NRMS (Wu et al., 2019d) Title Self-Attention+Attention
NRNF (Wu et al., 2020a) Title Transformer+Attention
UniRec (Wu et al., 2021f) Title Self-Attention+Attention
FeedRec (Wu et al., 2021d) Title Transformer+Attention
NRHUB (Wu et al., 2019c) Title CNN+Attention
DAINN (Zhang et al., 2019b) Body CNN+Dynamic Topic Model
FedRec (Qi et al., 2020) Title CNN+Self-Attention+Attention
CPRS (Wu et al., 2020d) Title+Body Self-Attention+Attention

FairRec (Wu et al., 2021g)
Title Transformer+Attention
PLM-NR (Wu et al., 2021c) Title PLM+Attention
DAN (Zhu et al., 2019) Title+Entity CNN
DNA (Zhang et al., 2019a) Body+Element+ID Doc2vec+Candidate-Aware Attention+ID Embedding
DKN (Wang et al., 2018) Title+Entity KCNN
Saskr (Chu et al., 2019) Entity Entity Embedding
Gao et al. (Gao et al., 2018) Body+Entity Attention
Liu et al. (Liu et al., 2019) Title+Entity Entity Embedding+Attention
TEKGR (Lee et al., 2020) Title+Entity Entity Embedding+Candidate-aware Attention
CAGE (Sheu and Li, 2020) Title+Entity CNN+Entity Embedding
KRED (Liu et al., 2020) Title+Entity+Entity Context Feature Attention
HieRec (Qi et al., 2021c) Title+Entity Transformer+Attention
KIM (Qi et al., 2021a) Title+Entity CNN+Transformer+Co-Attention+Graph Co-Attention
Park et al. (Park et al., 2017) Title+Body+Query+Category Doc2vec
DeepJoNN (Zhang et al., 2018) Keywords/Entities+Category+ID Char CNN
TANR (Wu et al., 2019e) Title+Category CNN+Attention+Topic Prediction
LSTUR (An et al., 2019) Title+Category+Subcategory CNN+Attention
NAML (Wu et al., 2019a) Title+Body+Category+Subcategory CNN+Attention
EEG (Zhang et al., 2021) Title+Abstract+Body CNN+Attention
CHAMELEON (Gabriel De Souza et al., 2019) Body+Metadata+Context Features CNN+Attribute Prediction
PP-Rec (Qi et al., 2021b) Title+Entity+CTR+Recency Self-Attention+Co-Attention+Gating
SentiRec (Wu et al., 2020c) Title+Sentiment Self-Attention
MM-Rec (Wu et al., 2021e) Title+Image ViLBERT
DebiasRec (Yi et al., 2021) Title+Position+Size CNN+Attention+Bias Embedding
User-as-Graph (Wu et al., 2021a) Title+Category+Subcategory+Entity Transformer+Attention
IGNN (Qian et al., 2019) Title+Entity+User-News Graph KCNN+GNN
INNR (Ren et al., 2019) Heterogeneous Graph Node2vec
GNewsRec (Hu et al., 2020a) Title+Entity+Heterogeneous Graph CNN+GNN
GERL (Ge et al., 2020) Title+Category+User-News Graph Transformer+GAT
MVL (Santosh et al., 2020) Title+Body+Category+User-News Graph CNN+Attention+GAT
GNUD (Hu et al., 2020b) Title+Entity+User-News Graph CNN+Disentangled GCN

 

Table 2. Comparison of different methods on news modeling.

There are a few methods that learn news representations from graphs. For example, IGNN (Qian et al., 2019) uses KCNN (Wang et al., 2018) to learn text-based news representations from news titles, and learn graph-based news representations from the user-news graph. GNewsRec (Hu et al., 2020a) is a hybrid approach which considers graph information of users and news as well as news topic categories. It uses the same architecture with DAN to learn text-based news representations, and uses a two-layer graph neural network (GNN) to learn graph-based news representations from a heterogeneous user-news-topic graph. GERL (Ge et al., 2020) learns news title representations with a combination of multi-head self-attention and additive attention networks, and combines title representations with the embeddings of news categories. MVL (Santosh et al., 2020) uses a content view to incorporate news title, body and category, and uses a graph view to enhance news representations with their neighbors on the user-news graph. In addition, it uses a graph attention network to enhance representations of news by incorporating the information their first- and second-order neighbors on the user-news graph. GNUD (Hu et al., 2020b) also uses the same news encoder as DAN to learn text-based news representations, and uses a graph convolution network with a preference disentanglement regularization to learn disentangled news representations on user-news graphs. These methods can exploit the high-order information on graphs to enhance news modeling. However, it is difficult for these methods to handle newly generated news with few connections to existing nodes on the old graph used for training.

To help better understand the relatedness and differences between the methods reviewed above, we summarize the information and models they used for learning news representations in Table 2. Next, we provide several discussions on the aforementioned methods for news modeling.

3.3. Discussions on News Modeling

3.3.1. Feature-based News Modeling

In feature-based news modeling methods, mining textual information of news is critical for representing news content. Many methods incorporate BOW/TF-IDF features or their variants to represent news texts, which are also popular in the NLP field. In addition, topic models like LDA are employed by various methods to extract topics from texts. This is probably because topic models are capable of mining the topic distributions of news articles and can also provide useful clues for inferring user interest on different topics. Moreover, since users may focus more on the entities or keywords in news, they are considered by many methods to summarize the content and topic of news, and can also be useful links to find similar news or map news on knowledge graphs. Especially, some methods also use ontology such as Wikipedia to extract entity features to represent them more accurately.

Besides the texts of news, many methods utilize other information of news. For instance, the categories or clusters of news are popular news features to help model news content. In addition, several dynamic features of news are also widely employed in feature-based news modeling methods, such as popularity and recency. Since many users may pay more attention to popular events and news usually vanish quickly, incorporating news popularity and recency can help build more informative news representations. Besides, several environmental factors, such as locations and time are also utilized by several methods. This is because considering locations of news can provide news related to users’ neighbors, and using the timestamps of news may be useful for providing time-aware news services.

A few methods also study incorporating other interesting features. For example, the sentiment information of news is useful for news understanding, because users may have different tastes on the sentiment of news. The bias of news may also need to be taken into consideration, because recommending news with biased opinions and facts may hurt user experience and the reputation of news platforms. Finally, although several non-personalized news recommendation methods have used news images to build news representations (Lommatzsch et al., 2018), few personalized ones consider the visual information of news, which is very useful for news modeling.

Although feature-based news modeling methods have comprehensive coverage of various news information, they usually require a large amount of domain knowledge for feature design. In addition, handcrafted features are usually not optimal in representing the textual content of news due to the absence of the contexts and orders of words.

3.3.2. Deep Learning-based News Modeling

Among all the reviewed methods, only two methods, i.e., DNA (Zhang et al., 2019a) and DeepJoNNA (Zhang et al., 2018), directly incorporate the embeddings of news IDs. This is probably because of the short lifecycle of news articles and the quick generation of novel news, which make the coverage of news IDs in the training set very limited. Thus, it is very important to understand news from their content.

News text modeling is critical for news understanding. Most methods use news titles to model news since news titles, because news titles usually have decisive influence on users’ click behaviors. Several methods such as EBNR (Okura et al., 2017), NAML (Wu et al., 2019a) and CPRS (Wu et al., 2020d) use news bodies to enhance news representations, since news bodies are contain more detailed information of news. In existing methods, CNN is the most frequently used architecture for text modeling. This is because local contexts in news articles are important for modeling news content, and CNN is effective and efficient in capturing local contexts. In addition, since different news information may have different informativeness in modeling news content and user interest, attention mechanisms are also widely used to build news representations by selecting important features. With the success of Transformer in NLP, many methods also use Transformer-like architectures for news modeling, such as NRMS (Wu et al., 2019d) and CPRS (Wu et al., 2020d). In addition, a few methods use pre-trained language or and visiolinguistic models to empower news modeling (Wu et al., 2021c, e). These advanced NLP techniques can greatly improve news content understanding, which is very important for personalized news recommendation. However, these methods mainly aim to capture the semantic information of news and may not be aware of the knowledge and commonsense information encoded in news.

To address this issue, many methods incorporate news entities into news modeling to learn knowledge-aware news representations. Some methods such as DAN (Zhu et al., 2019) directly use entity texts to represent entities, while several other methods like DKN (Wang et al., 2018) use knowledge graph embeddings to represent entities. These entity representations are usually combined with representations learned from news texts to better model news content. However, there are many new entities and concepts emerging in news and it may be difficult to accurately represent them with off-the-shelf knowledge bases.

Several methods incorporate the topic categories of news into news modeling, because news topics are very useful for understanding news content and inferring user interest. Considering the scenarios that some news articles are not labeled with topic categories, some methods such as TANR (Wu et al., 2019e) and CHAMELEON (Gabriel De Souza et al., 2019) also adopt auxiliary tasks by predicting news topic categories to encode topic information into news representations. In addition, a few methods study using other kinds of news features such as sentiment, popularity, recency (Wu et al., 2020c; Qi et al., 2021b), which can help better understand the characteristics of news. However, some additional news features (e.g., category) may be unavailable in certain scenarios, which limits the application of these methods.

There are also a few methods explore to enhance news modeling with graph information (Ge et al., 2020; Hu et al., 2020a). These methods can incorporate the high-order information on user-news bipartite graphs (Qian et al., 2019; Ge et al., 2020; Hu et al., 2020b; Santosh et al., 2020) or more complicated heterogeneous graphs (Ren et al., 2019; Hu et al., 2020a), which can provide useful contexts on understanding the characteristics of news for news recommendation. However, since the graphs used in these methods are static, they may have some difficulties in accurately representing newly published news.

In summary, by reviewing news modeling techniques used in existing news recommendation methods, we can see that news modeling is still a quite challenging problem in news recommendation due to the variety, dynamic, and timeliness of online news information.

4. User Modeling

User modeling is also a critical step in personalized news recommender systems to infer users’ personal interests in news. It is usually important for user modeling algorithms to understand users from their behaviors (Wu et al., 2019e). An example user modeling framework in personalized news recommendation is shown in Fig. 4. We can see that user modeling is based on the modeling of news that users have interactions with, and it introduces additional user features to achieve better personalized user understanding. The techniques for user modeling in existing news recommendation methods can also be classified into feature-based ones and deep learning-based ones, which are introduced in the following sections.

Figure 4. An example framework of user modeling.

4.1. Feature-based User Modeling

Feature-based user modeling methods use handcrafted features to represent users. Similar to news modeling, in CF-based methods users are also represented by their IDs (Resnick et al., 1994; Das et al., 2007). However, ID-based user modeling methods usually suffer from the data sparsity. Thus, most methods consider the behaviors of users such as news clicks to model their interest. An intuitive way is to use the features of clicked news to build user features. For example, Goossen et al. (Goossen et al., 2011) used the CF-IDF features of clicked news to represent user interest. Capelle et al. (Capelle et al., 2012) proposed to use the SF-IDF features of clicked news for user modeling. Garcin et al. (Garcin et al., 2012) proposed to model users by aggregating the LDA features of all clicked news into a user vector by averaging. However, it is difficult for these methods to model users accurately when their news click behaviors are sparse.

Besides news features, many methods consider other supplementary information of users in user modeling. For instance, in the MONERS (Lee and Park, 2007) recommender system, users are clustered into segments, and the preferences of user segments on news categories and news articles are used to represent users. In addition, the demographics of users, such as age, gender and profession, are also useful information for user modeling because users in different demographic groups usually have different preferences on news. Thus, user demographic features are incorporated by several methods (Lee and Park, 2007; Yeung and Yang, 2010; Ilievski and Roy, 2013). For instance, Yeung et al. (Yeung and Yang, 2010) proposed to use the age, gender, occupation status and social economic grade of users to help identify their different preferences on news in different categories. Chu et al. (Chu and Park, 2009) used the age and gender categories of users to model their characteristics. Besides, the location information of users is also very useful for accurate user modeling, and it has been used by several location-aware news recommendation methods (Fortuna et al., 2010; Noh et al., 2014). However, some kinds of user features such as locations and demographics are privacy-sensitive, and many users may not provide their accurate personal information.

Since news clicks may not necessarily indicate user interests, several methods also consider other kinds of user behaviors or feedback. For example, Gershman et al. (Gershman et al., 2011) proposed to represent users by the news they carefully read (regarded as positive news), rejected, and scrolled (both are regarded as negative news). In addition, users’ dwell time on clicked news is also an important indication of user interest, and Yi et al. (Yi et al., 2014) studied to use dwell time as the weights of clicked news for user modeling. Besides these user behaviors, several other kinds of user behavior information such as access patterns, are utilized by a few methods (Li et al., 2011b; Saranya and Sadhasivam, 2012) to capture the users’ habits on news reading.

Several methods also consider graph information (e.g., news-user graphs) in user modeling (Gharahighehi et al., 2020). For example, Li et al. (Li and Li, 2013) proposed a news personalization method by using hypergraph to model various high-order interactions between different news information, where users are represented by subgraphs of the hypergraph. Garcin et al. (Garcin et al., 2013) proposed to use context trees for user modeling. They constructed context trees based on the sequence of articles, the sequence of topics and the distribution of topics. Trevisiol et al. (Trevisiol et al., 2014) proposed to build a browsing graph from the news browsing histories of users on Yahoo News. Joseph et al. (Joseph and Jiang, 2019) proposed to represent users by regarding the clicked news as subgraphs of a knowledge graph, which are constructed via entity linking. These methods can consider the high-order information on graphs to help understand user behaviors, which can improve user modeling.

A few methods combine user IDs with other user features in user modeling (Lommatzsch, 2014). For example, NewsWeeder (Lang, 1995) used user IDs and the bag-of-words features of clicked news to represent users. Claypool et al. (Claypool et al., 1999) used user IDs and keywords of clicked news for user modeling Liu et al. (Liu et al., 2010) proposed to represent users using their IDs and user interest features predicted by a Bayesian model. These methods can mitigate the drawbacks of ID-based user modeling and meanwhile incorporate useful personal information encoded by user IDs.

Considering the evolutionary characteristics of user interest, some methods model both long-term and short-term user interests (Cantador et al., 2011; Li et al., 2014). NewsDude (Billsus and Pazzani, 2000) may be one of the earliest methods that consider long short-term user interests. In this approach, users are represented by a hybrid model, which models short-term interest of users based on recently browsed news, and models long-term user interest by sorting words of news in each category with respect to their TF-IDF values and select the top ranked words. Li et al. (Li et al., 2011c) proposed LOGO, which is a news recommendation method that models both long-term and short-term user interests. LOGO uses a weighted summation of the topic distributions of news clicked by users to indicate long-term user interest, and it uses the topic distribution of the latest clicked news as the short-term user interest. Viana et al. (Viana and Soares, 2017) proposed another news recommendation method based on long short-term user interest. In their method, the long-term interest of users is represented by the frequency of a specific tag being read by this user, and short-term interest is represented by several recently clicked news. Different from other methods that only consider short-term or long-term user interests, these methods can better model the evolution of user interests by capturing long short-term user interests.

To help readers better understand feature-based user modeling methods in personalized news recommender systems, we summarize the additional user features (ID and news features are excluded) used in these methods in Table 3.

Features for User Representation* References

Demographic
(Lee and Park, 2007)(Yeung and Yang, 2010)(Ilievski and Roy, 2013)(Chu and Park, 2009)(Li et al., 2010)(Wei et al., 2021)
Cluster/Segment (Lee and Park, 2007)(Yeung and Yang, 2010)(Zheng et al., 2018)(Darvishy et al., 2020)(Manoharan and Senthilkumar, 2020)
Tag/Keyword (Jonnalagedda and Gauch, 2013)(Yeung and Yang, 2010)(Darvishy et al., 2020)(Gao et al., 2009)(Billsus and Pazzani, 2000)(Cantador et al., 2011)(Jonnalagedda et al., 2016)(Sood and Kaur, 2014a)
Location (Fortuna et al., 2010)(Yeung and Yang, 2010)(Tavakolifard et al., 2013)(Noh et al., 2014)(Ilievski and Roy, 2013)(Viana and Soares, 2017)(Li et al., 2010)(Kazai et al., 2016)(Chu and Park, 2009)(Wei et al., 2021)
Access Pattern (Li et al., 2011b)(Saranya and Sadhasivam, 2012)(Wei et al., 2021)
Behaviors on Other Platforms (Li et al., 2010)(Gu et al., 2014)(Bai et al., 2017)(Hsieh et al., 2016)(Kazai et al., 2016)(Li et al., 2010)(Phelan et al., 2011)(Phelan et al., 2009)(Hsieh et al., 2016) (Lian et al., 2018)
Table 3. Additional features used for user representation. *ID/textual features of clicked news are excluded because they are incorporated by most methods.

4.2. Deep Learning-based User Modeling

In recent years, many personalized news recommendation methods use deep learning techniques for user modeling to remove the need of manual feature engineering. Most of them infer user interests from historical news click behaviors. EBNR (Okura et al., 2017) learns representations of users from the representations of their browsed news via a GRU network. Khattar et al. (Kumar et al., 2017b) used the summation of clicked news representations weighted by a exponential discounting function, where more recent clicks gain higher weights. RA-DSSM (Kumar et al., 2017a)

uses a bi-directional long short-term memory (Bi-LSTM) network to process the historical news click sequence, and then use a news-level attention network to form a user representation. WE3CN 

(Khattar et al., 2018)

learns representations of users from the 3D representation tensors of their clicked news using a 3D CNN model. NAML 

(Wu et al., 2019a) and KRED (Liu et al., 2020) learns user representations from the representations of clicked news using a news-level attention network. DKN (Wang et al., 2018) learns user representations from the representations of clicked news via a candidate-aware attention network, i.e., computing the attention weight of each clicked news according to its relevance to candidate news. The candidate-aware attention mechanism is also used by TEKGR (Lee et al., 2020) for user modeling. Liu et al. (Liu et al., 2019) uses a simple time-decayed averaging of the embeddings of clicked news to build the user embedding. DAN (Zhu et al., 2019) learns user representations from clicked news using a combination of attentive LSTM and candidate-aware attention, which generate user historical sequential embedding and user interest embedding, respectively. NRMS (Wu et al., 2019d) learns contextual news representations by using a news-level multi-head self-attention network, and uses an additive attention network to form the user representation. This method is also adopted by many methods like FairRec (Wu et al., 2021g) and SentiRec (Wu et al., 2020c). MM-Rec (Wu et al., 2021e) uses a crossmodal candidate-aware attention network that selects clicked news based on their crossmodal relatedness with candidate news for user modeling. UniRec (Wu et al., 2021f) learns the user embedding for news ranking with the NRMS (Wu et al., 2019d) model, and then uses this embedding as the attention query to select a set of basis interest embeddings to aggregate them into a user embedding for news recall. CAGE (Sheu and Li, 2020) first uses a GCN model to capture the relations between different behaviors within a news session to refine the behavior representations, and then uses a GRU network to build user representations. HieRec (Qi et al., 2021c) uses a hierarchical user interest representation method that first models subtopic-level user interest from the news within the same subtopic, then aggregates subtopic-level interest representations into coarse-grained topic-level user interest representations, and finally synthesizes topic-level interest representations into an over interest representation. KIM (Qi et al., 2021a) uses a user-news co-encoder that models the interactions between candidate news and clicked news to collaboratively learn a candidate-aware user interest representation and a user-aware candidate news representation. PP-Rec (Qi et al., 2021b) uses a popularity-aware user modeling method that first uses self-attention to model the contexts of user behaviors and then uses a content-popularity joint attention network that selects clicked news according to their content and popularity for user interest modeling. DebiasRec (Yi et al., 2021) uses a bias-aware user modeling module to learn debiased user interest representations by considering the influence of presentation bias information on click behaviors. User-as-Graph (Wu et al., 2021a) is probably the first work in news recommendation that represents each user with a personalized heterogeneous graph constructed from click behaviors, where the user modeling task is modeled as a graph pooling problem. It uses a heterogeneous graph pooling method named HG-Pool to iteratively summarize the personalized heterogeneous graph for learning user interest representations. EEG (Zhang et al., 2021) models each user as an entity graph. It first uses a graph neural network to learn hidden entity representations, and then uses an attention network to aggregate them into an entity-based user representation. A few methods also consider the ID information of users (Zhang et al., 2018, 2019a). For example, NPA (Wu et al., 2019b) uses a news-level personalized attention network to select important news according to user characteristics, where the embeddings of user IDs are used to generate the attention queries. LSTUR (An et al., 2019) learns short-term user interest embeddings by a GRU network, and models long-term user interests by the embeddings of user IDs. To fuse the two kinds of user representations, LSTUR explores two methods, i.e., concatenating two vectors together, or using the long-term user interest embedding to initialize the hidden state of the GRU network. All these aforementioned methods mainly rely on the information of users’ click behaviors. However, click behaviors are very noisy and may not necessarily indicate user interest.

A few methods incorporate other kinds of user information to enhance user interest modeling (Zhang et al., 2019b). For example, CHAMELEON (Gabriel De Souza et al., 2019; de Souza Pereira Moreira et al., 2018) uses several user context features like time, device, location and referrer. It uses a UGRNN network to learn representations of users in a session, and the click score is evaluated by the cosine similarity between user and candidate news representations. Moreover, several methods study using different kinds of user behaviors in user modeling. NRHUB (Wu et al., 2019c) considers heterogeneous user behaviors, including news clicks, search queries, and browsed webpages. It incorporates different kinds of user behaviors as different views of users by learning a user embedding from each kind of user behaviors separately, where a combination of CNN and attention network is used to learn behavior representations and a behavior attention network is used to learn a user embedding by selecting important user behaviors. The user embeddings from different views are aggregated into a unified one via a view attention network. CPRS (Wu et al., 2020d) considers users’ click and reading behaviors in user modeling. It models the click preference of users from the titles of clicked news, and models their reading satisfaction from the body of clicked news as well as the personalized reading speed metric derived from dwell time and body length. NRNF (Wu et al., 2020a) uses a dwell time threshold to divide click news into positive ones and negative ones. It uses separate Transformers and attention networks to learn positive and negative user interest representations. FeedRec (Wu et al., 2021d) uses various kinds of user feedback including click, nonclick, finish, quick close, share and dislike to model user interest. It uses a heterogeneous Transformer to model the relatedness between all kinds of feedback and uses different homogeneous Transformers to model the interactions between the same kind of feedback. In addition, it uses a strong-to-weak attention network that uses the representations of strong feedback to distill real positive and negative user interest information from weak feedback. Besides, it incorporates behavior type, dwell time and time interval to enhance behavior understanding for user interest modeling. These methods can usually infer user interests more accurately by mining complementary information encoded in additional user features and behaviors.

There are also several methods that learn user representations on graphs. For example, IGNN (Qian et al., 2019) learns content-based user representations using the average embedding of clicked news, and learns graph-based user representations from the user-news graph via a graph neural network. The content-based user representation is concatenated with graph-based user representation to form a unified one. GNewsRec (Hu et al., 2020a) uses the same architecture with DAN to learn short-term user representations, and uses a two-layer graph neural network (GNN) to learn long-term user representations from a heterogeneous user-news-topic graph. Both short-term and long-term user representations are concatenated to build a unified user representation. GERL (Ge et al., 2020) uses multi-head self-attention and additive attention networks to form content-based user representations from the click history. In addition, it uses a graph attention network to learn graph-based representations of users by capturing high-order information on the user-news graph, which are further combined with the content-based user representations. MVL (Santosh et al., 2020) uses attention networks to learn user interest representations in a content view, and uses a graph attention network to model user interest from the user-news graph in a graph view. GNUD (Hu et al., 2020b) uses a disentangled graph convolution network to learn user representations from the user-news graph. These methods can exploit the high-order information on graphs to enhance user modeling. However, it is challenging for them to accurately represent new users that do not participate in the model training.

Methods Information Used Model
EBNR (Okura et al., 2017) News Click GRU
RA-DSSM (Kumar et al., 2017a) News Click Bi-LSTM+Attention
Khattar et al. (Kumar et al., 2017b) News Click Exponential-decayed Average
3-D-CNN (Kumar et al., 2017c) News Click Word2vec
Park et al. (Park et al., 2017) News Click LSTM
WE3CN (Khattar et al., 2018) News Click 3-D CNN
NAML (Wu et al., 2019a) News Click Attention
TANR (Wu et al., 2019e) News Click Attention
KRED (Liu et al., 2020) News Click Attention
PLM-NR (Wu et al., 2021c) News Click Attention
DKN (Wang et al., 2018) News Click Candidate-Aware Attention
TEKGR (Lee et al., 2020) News Click Candidate-Aware Attention
Liu et al. (Liu et al., 2019) News Click Time-decayed Average
DAN (Zhu et al., 2019) News Click LSTM+Self-Attention+Candidate-Aware Attention
Gao et al. (Gao et al., 2018) News Click Candidate-Aware+Multi-Head Attention
Saskr (Chu et al., 2019) News Click Self-Attention+Candidate-Aware Attention
NRMS (Wu et al., 2019d) News Click Self-Attention+Attention
UniRec (Wu et al., 2021f) News Click Self-Attention+Attention+Basis User Embedding
FedRec (Qi et al., 2020) News Click Self-Attention+Attention+GRU
FairRec (Wu et al., 2021g) News Click Transformer+Attention
SentiRec (Wu et al., 2020c) News Click Transformer+Attention
MM-Rec (Wu et al., 2021e) News Click Crossmodal Candidate-Aware Attention
CAGE (Sheu and Li, 2020) News Click GCN+GRU
HieRec (Qi et al., 2021c) News Click Hierarchical Attention
KIM (Qi et al., 2021a) News Click User-News Co-Encoder
PP-Rec (Qi et al., 2021b) News Click Content-Popularity Joint Attention Network
DebiasRec (Yi et al., 2021) News Click Content Attention+Bias Attention
User-as-Graph (Wu et al., 2021a) News Click Heterogeneous Graph Pooling
EEG (Zhang et al., 2021) News Click GNN+Attention
DeepJoNN (Zhang et al., 2018) News Click+User ID LSTM
NPA (Wu et al., 2019b) News Click+User ID Personalized Attention
LSTUR (An et al., 2019) News Click+User ID GRU+ID Embedding
DNA (Zhang et al., 2019a) News Click+User ID+Time Interval Attention+CNN
CHAMELEON (Gabriel De Souza et al., 2019) News Click+Context Features UGRNN
DAINN (Zhang et al., 2019b) News Click+Time+Location Attention
NRHUB (Wu et al., 2019c) News Click+Query+Webpage Attention
CPRS (Wu et al., 2020d) News Click+News Reading Attention+Content-Satisfaction Attention
NRNF (Wu et al., 2020a) Positive News Click+Negative News Click Transformer+Attention
FeedRec (Wu et al., 2021d) News Click+Nonclick+Finish+Quick Close+Share+Dislike Transformer+Strong-to-Weak Attention
IGNN (Qian et al., 2019) News Click+User-News Graph GNN
INNR (Ren et al., 2019) Heterogeneous Graph Node2vec
GNewsRec (Hu et al., 2020a) News Click+Heterogeneous Graph LSTM+Attention+GNN
GERL (Ge et al., 2020) News Click+User-News Graph Self-Attention+GAT
MVL (Santosh et al., 2020) News Click+User-News Graph Self-Attention+GAT
GNUD (Hu et al., 2020b) User-News Graph Disentangled GCN
Table 4. Comparison of different methods on user modeling.

We summarize the user information and user modeling techniques used in these deep learning-based methods in Table 4 We then provide several discussions on the user modeling methods introduced in this section.

4.3. Discussions on User Modeling

4.3.1. Feature-based User Modeling

Most feature-based methods construct user profiles based on the collections of features extracted from the clicked news. Besides the news information, some methods leverage additional user features to facilitate user modeling. For example, the demographics of users (e.g., age, gender and profession) are used in several methods, since users with different demographics usually have different preferences on news. The location of users can be used to identify the news related to the user’s neighborhood, and the access patterns of users can also help understand the news click behaviors of users. In addition, many methods use the tags or keywords of users to indicate user interest, and cluster users based on their characteristics. In this way, the recommender system can more effectively recommend news according to users’ interest in different topics. Moreover, several methods incorporate user behaviors on other platforms, such as social media, search engines and e-commerce platforms. These behaviors can not only facilitate user interest modeling, but also has the potential to mitigate the problem of cold-start on the news platform if user data can be successfully aligned. However, feature-based user modeling methods usually require massive expertise for feature design and validation, and may not be optimal for representing user interests.

4.3.2. Deep Learning-based User Modeling

Deep learning based user modeling methods usually aim to learn user representations from user behaviors without feature engineering. Many of them infer user interests merely from click behaviors, because click behaviors are implicit indications of users interest on news. However, click behaviors are usually noisy and they do not necessarily indicate real user interests. Thus, many methods consider other kinds of information in user modeling. For example, some methods such as NPA and LSTUR incorporate the IDs of users to better capture users’ personal interest. CHAMELEON and DAINN consider the context features of users such as devices and user locations. CPRS and FeedRec incorporate multiple kinds of user feedback on the news platform to consider user engagement information in user interest modeling. GERL and GNewsRec can exploit the high-order information on graphs to encode user representations. However, it is still difficult for these methods to accurately infer user interests when user behaviors on the news platforms are sparse. There is only one method, i.e., NRHUB, that considers users’ behaviors on multiple platforms, which can still model users accurately even user behaviors on the news platform are sparse. However, there exist some difficulties in linking user data on different platforms due to privacy reasons.

According to the summarization in Table 2

, we can see that the model architectures used for user representation learning are diverse. Some methods utilize recurrent neural networks to capture the relatedness of news clicked by users, such as EBNR, DAN and CHAMELEON. With the great success of Transformer models, many methods also use self-attention or Transformer networks to model the global contexts of user behaviors. However, these sequential models cannot effectively model the high-order relations between user behaviors, which can provide useful contexts for user interest understanding. Instead of modeling user behaviors as a sequence, in User-as-Graph each user is modeled as a personalized heterogeneous graph, where the relations between behaviors can be fully modeled. In addition, several works such as GERL and GNUD use graph neural networks to capture the high-order interactions between users and news on the global user-news graphs, which can also help better understand user interest. However, the computational cost of these graph-based architectures is usually much heavier than sequential models.

To select clicked news that is informative for inferring user interest, attention mechanisms are widely used by many methods. In some works such as NAML and KRED, the attention query is a global parameter vector, which is invariant with respect to different users. In the NPA method, the attention query is generated by the embedding of user ID, which can achieve personalized news selection. Both kinds of attention mechanisms are efficient in the online test phase because user representations can be prepared in advance (Wu et al., 2019b). However, the relatedness between candidate news and clicked news cannot be fully modeled, which may not be optimal in modeling user interests in a specific candidate news. Another kind of attention mechanism, i.e., candidate-aware attention, is also widely used by many methods such as DKN, DAN and KIM. In candidate-aware attention networks, the representation of candidate news is used as the attention query, and user representations can be dynamically constructed based on candidate news. However, they need to memorize the representations of all clicked news in the test phase, which may lead to some sacrifice in efficiency.

Some methods study modeling multiple types of user interests. For example, LSTUR, GNewsRec and FedRec consider both long-term and short-term interests of users to better capture their interest dynamics. HieRec models the hierarchical structure of user interests, which can capture the user interests in different granularities. These methods can improve user interest understanding of user interests by taking different kinds of user interest into consideration. However, user interests are diverse and evolutional, which are still difficult to be comprehensively and accurately modeled by these methods.

In summary, by reviewing user modeling techniques used in existing news recommendation methods, we argue that user modeling is also remained challenging due to many reasons, such as the noisy and sparsity of user behaviors, the diverse and dynamic characteristics of user interests, and the difficulties in modeling user interests in a specific candidate news effectively and efficiently.

5. News Ranking

On the basis of news and user modeling, news ranking aims to rank candidate news for personalized display according to users’ personal interest. Common news ranking techniques can be divided into two categories, i.e., relevance-based and reinforcement learning-based. We introduce them in the following sections.

5.1. Relevance-based News Ranking

Relevance-based news ranking methods usually ranking candidate news with user interests based on their personalized relevance. In these methods, how to accurately measure the relevance between candidate news and user interest is a core problem. Many methods directly evaluate the user-news relevance based on the similarities of their final representations. For instance, Goossen et al. (Goossen et al., 2011) computed the cosine similarities between the CF-IDF feature vectors of user and news to measure their relevance. Garcin et al. (Garcin et al., 2012) used the similarities between the news topic vectors and the user topic vector to evaluate their relevance. Okura et al. (Okura et al., 2017) used the inner product between news and users representations to predict the relevance scores. DFM (Lian et al., 2018)

uses an inception module that combines neural networks with different depths to compute the relevance scores from news and user features. However, user interests are usually diverse, and candidate news may only match the user interests indicated by part of the clicked news. Thus, a few methods use fine-grained interest matching method to better model the relevance between users’ interest and candidate news. For example, FIM 

(Wang et al., 2020b) first multiplies together the word representations of candidate news and clicked news, and then uses a matching module with 3-D CNN networks to compute relevance scores by capturing the fine-grained relatedness between candidate news and clicked news. KIM (Qi et al., 2021a) first uses a knowledge-aware news co-encoder to model the relatedness between words and entities in candidate news and clicked news, and further uses a user-news co-encoder to further help model the interactions between clicked news and candidate news for better relevance modeling. HieRec (Qi et al., 2021c) has a hierarchical interest matching mechanism that matches candidate news with the fine-grained subtopic-level user interest, the coarse-grained topic-level user interest and the overall user interest. These methods can more accurately evaluate the relevance between candidate news and user interest by modeling their fine-grained and multi-grained relatedness, which can help generate news ranking results that better target user interest.

In most methods, candidate news with higher relevance to user interest will gain higher ranks. However, these methods may tend to recommend news that are similar to those previously clicked by users, which is also called the “filter bubble” problem. Thus, some news ranking methods explore to recommend news that are somewhat different from previously clicked ones. For example, Newsjunkie (Gabrilovich et al., 2004) is a system that ranks news articles based on their novelty in the context of the news that users previously clicked. SCENE (Li et al., 2011b) first ranks news articles based on their relevance to user interests, and then refines the ranking list based on news popularity and recency to form the final recommendation list. Different from the methods that are solely based on the relevance between candidate news and user interests, these methods have the potential to provide more diverse recommendations.

5.2. Reinforcement Learning-based News Ranking

Different from relevance-based ranking methods that mainly aim to optimize the objectives (e.g., clicks) on current candidate news articles, reinforcement learning-based ranking methods usually aim to optimize the total reward in a long term. A representative reinforcement learning-based approach to personalized news recommendation is LinUCB (Li et al., 2010), which models the problem of personalized news recommendation as a contextual bandit problem. In this method, LinUCB computes the payoff by a hybrid linear model, which means that some parameters are shared by all arms, while the others are not. LinUCB can outperform context-free bandit methods such as -greedy and Upper Confidence Bound (UCB), and it is computationally efficient because the block parameters in LinUCB have fixed dimensions and can be incrementally updated (Li et al., 2010). It is also latterly evaluated by (Li et al., 2011d)

in an unbiased manner by estimating the per-trial payoff with log data directly rather than a simulator. In the CLEF NewsREEL 2017 challenge, Liang et al. 

(Liang et al., 2017) also developed a system based on LinUCB. The LinUCB model is used to help choose the appropriate recommender from a pool of recommendation algorithms based on user and news features. Deep reinforcement learning is also explored in news recommendation. For example, DRN (Zheng et al., 2018) uses a Deep Q-Network (DQN) to estimate the policy reward, which is a weighted summation of click labels and the activeness of users that is computed based on their return time after recommendations. In addition, DRN applies the Dueling Bandit Gradient Descent (Yue and Joachims, 2009) algorithm to eliminate the recommendation performance decline brought by classical exploration methods such as -greedy and UCB. Different from relevance-based ranking methods, reinforcement learning-based ranking methods have the ability of exploration, which can increase the diversity of recommendation results and further discover potential user interests.

5.3. Discussions on News Ranking

In this section we provide some discussions on the news ranking methods in existing personalized news recommender systems. Relevance-based news ranking methods mainly need to accurately evaluate the relevance between candidate news and user interest for subsequent news ranking. Many methods model their overall relevance by evaluating the relevance between the unified representations of user interest and candidate news. However, candidate news usually can only match part of user interests, and directly match the overall user interest with candidate news may be suboptimal. A few methods explore to evaluate the relevance between user interest and candidate news in a fine-grained way by modeling the relatedness between candidate news and clicked news, which can improve the accuracy of relevance modeling for news ranking. However, these methods are much more time-consuming because the representations of users are dependent on candidate news and cannot be computed in advance. Moreover, pure relevance-based interest matching methods may tend to recommend news that are similar to previously clicked news, which is not beneficial for users to receive diverse news information. Thus, a few works explore to adjust the news ranking strategy by incorporating other factors such as news novelty, popularity and recency, which have the potential to make more diverse news recommendations and mitigate the filter bubble problem in news recommender systems.

In relevance-based news ranking methods, candidate news is usually greedily matched with users, i.e., choosing the news in each impression that mostly satisfy the ranking policy on the current candidate news list. However, it may not be optimal in improving long-term user experience. In reinforcement learning-based methods, the ranking algorithm aims to find the optimal ranking policy to maximize the long-term reward. Thus, RL-based news ranking methods may be more suitable for exploring potential user interest and improving long-term user experience and engagement, while it may have some sacrifice in short-term news CTRs.

In summary, news ranking in news recommendation also faces many challenges, including how to accurately and efficiently evaluate the relevance between candidate news and user interest indicated by user behaviors, how to mitigate the “filter bubble” problem in news recommender systems, and how to explore potential user interests without hurting user experience.

6. Model Training

Many personalized news recommendation methods exploit machine learning models for news modeling, user modeling and interest matching. Training these models is a necessary step in building an accurate news recommender system. In this section, we review the techniques used for model training in news recommendation.

6.1. Training Methods

In a few methods based on collaborative filtering, the news recommendation task is formulated as a rating prediction problem, i.e., predicting the ratings that users give to news (Ji et al., 2016). To learn their models, they usually use loss functions such as the mean squared error (MSE) computed between the predicted ratings and the gold ratings, which are further used to optimize the model (Claypool et al., 1999). However, explicit user feedback like rating is usually very sparse, which may be insufficient to train an accurate recommendation model.

Since implicit feedback such as click is redundant, most methods use the click feedback of users as the prediction target. They formulate the news recommendation task as a click prediction task. Some methods simply classify whether a candidate news will be clicked by a target user (Fortuna et al., 2010; Gershman et al., 2011; Wang et al., 2018). However, these methods cannot exploit the relatedness between clicked and nonclicked samples. Thus, a few methods use contrastive training techniques to maximize the margin between the predicted click scores of clicked and nonclicked news. For example, PP-Rec (Qi et al., 2021b) uses the Bayesian Personalized Ranking (BPR) loss for model training by comparing each clicked sample with an nonclicked one. However, the BPR loss can only exploit a small part of nonclicked samples. NPA (Wu et al., 2019b) uses the InfoNCE (Oord et al., 2018) loss for model training. For each clicked sample (regarded as a positive sample), it randomly samples a certain number of nonclicked ones (regarded as negative samples) and jointly predicts their click scores. These click scores are further normalized by the softmax function to compute the posterior click probabilities, and the model aims to maximize the negative log-likelihood of the posterior click probability of positive samples. In this way, the model can exploit the information of more negative samples.

Besides click feedback, a few methods also consider other kinds of feedback to construct training tasks. For example, CPRS (Wu et al., 2020d) trains the recommendation model collaboratively in the click prediction task and an additional reading satisfaction prediction task, which aims to infer the personalized reading speed based on user interest and news body. FeedRec (Wu et al., 2021d) trains the model in three tasks, including click prediction, dwell time prediction and finish prediction. These methods can encourage the model to optimize not only CTR but also user engagement, which can help learn engagement-aware news recommendation models. There are also several methods that use additional news information to design auxiliary training tasks. For example, EBNR (Okura et al., 2017) uses autoencoder to learn news representations and it uses another weak supervision task by encouraging the embeddings of news in the same topic to be similar than the embeddings of news in different topics. TANR (Wu et al., 2019e) uses an auxiliary news topic prediction task to help learn topic-aware news representations. SentiRec (Wu et al., 2020c) uses a news sentiment orientation score prediction task to learn sentiment-bearing news representations. KRED (Liu et al., 2020) trains the model in various tasks including item recommendation, item-to-item recommendation, category classification, popularity prediction and local news detection. These methods can also effectively encode additional information into the recommendation model without taking it as the input. However, it is usually a non-trivial task to balance the main recommendation task and the auxiliary tasks.

6.2. Training Environment

Existing research mainly focus on the model training methods while ignore the implementation environment of model training, which is in fact important in developing real-world news recommender systems. In many existing methods, the news recommendation models are offline trained on centrally stored data with centralized computing resources (Wu et al., 2020f). This model training paradigm can help quick development of news recommender systems, but it also has several main drawbacks. First, user behavior data for model training is usually abundant and many recent news recommendation models are in large size (Wu et al., 2021c), which require a large amount of computing resource to train accurate models. Although some recent works like (Wu et al., 2021c) explore to train models in parallel on multiple GPUs, it is still insufficient to train huge models. Thus, distributed model learning may be required in industrial practice. Second, the model learned on offline data only may also has some mismatches with the characteristics of recommendation scenarios (Zheng et al., 2018). Moreover, the distribution of user interest and news topic may also evolve, and it is shown in previous research that the performance of offline trained models may decline with time (Wu et al., 2019b). Thus, instead of re-training models periodically, online model training on streaming data is needed. Third, most existing news recommendation methods are trained on centrally stored user data, which may have some privacy risks because user data usually contains private user information. FedRec (Qi et al., 2020) explores to train news recommendation models based on decentralized data with federated learning techniques, which can better protect user privacy in model training.

6.3. Discussions on Model Training

Next, we provide some discussions on the model training techniques used in news recommendation methods. In some CF-based methods, news recommendation is modeled as a regression task where the ratings given by users are regarded as prediction targets. However, on news platforms explicit user feedback such as rating is usually scarce, which poses great challenges to model training. Therefore, most methods adopt implicit feedback to construct training tasks. Click feedback is one of the most widely used signals for model training because it can implicitly indicate user interests in news and help the model optimize the CTR of recommendation results. However, click signals also have some gaps with the real user interests (Yi et al., 2014), and increasing CTR only may lead to recommending clickbait news to users, which is actually harmful to user experience. Thus, a few methods incorporating other user engagement signals such as dwell time and finish into model training, which can help learn user engagement-aware recommendation model to improve user experience. Besides user feedback, some methods also consider using additional news information as auxiliary prediction objectives. By jointly training the model in both recommendation task and auxiliary tasks, the model can be aware of the additional news information. Since these methods do not take the additional features as the input, they can handle the scenarios where the additional features are unavailable. However, in multi-task learning based methods, it is difficult to choose the proper coefficients for weighting the loss functions of different tasks, and these coefficients may also be sensitive to the dataset characteristics.

Another important problem in model training is designing effective strategies for constructing labeled training samples. In most methods the negative samples are randomly drawn from the entire news set or the impression list (Wu et al., 2021f), which are further packed with the positive samples. However, researchers have found that randomly selected negative samples may be too easy for the model to distinguish, which is not beneficial for learning discriminative recommendation models (Li et al., 2019). It is also an interesting problem to study the influence of the number of negative samples on model training (Wu et al., 2021b).

Besides, the environment for news recommendation model training is a less studied but important problem. Most researches are offline conducted by learning models on centralized data with centralized computing resources. As discussed in the previous section, this model training environment may pose many potential challenges like the limitation of centralized computing resources, the gaps between offline data and online applications, and the privacy concerns and risks of centralized model training, which need to be extensively studied in the future.

In summary, model training is critical for news recommendation while it still has much room for improvement, such as designing more effective training tasks, choosing more representative training samples, adaptively tuning the loss coefficients for multi-task learning, and building more effective, efficient and privacy-preserving environment for news recommendation model training.

7. Evaluation Metrics

There are many metrics to quantitatively evaluate the performance of news recommender systems. Most metrics aim to measure the recommendation performance in terms of the ranking relevance. For methods that regard the task of news recommendation as a classification problem, the Area Under Curve (AUC) score is a widely used metric, which is formulated as follows:

(1)

where and are the numbers of positive and negative samples, respectively. is the predicted score of the -th positive sample and is the score of the -th negative sample. Another set of popular metrics are precision, recall and F1 scores, which are computed as:

(2)
(3)
(4)

where TP, FP and FN respectively denote true positive, false positive and false negative.

For methods that model news recommendation as a regression task (e.g., predict the ratings of news), several common metrics for regression such as mean absolute error (MAE), mean squared error (MSE), rooted mean squared error (RMSE) and Pearson correlation coefficient (PCC) are used to indicate the recommendation performance, which are respectively formulated as follows:

(5)
(6)
(7)
(8)

where and are the real and predicted ratings of the -th sample, and respectively denote the arithmetic mean of the real and predicted ratings, and

is the standard deviation.

For methods that regard news recommendation as a ranking task, besides the AUC metric there are also several other metrics such as Average Precision (AP), Hit Ratio (HR), Mean Reciprocal Rank (MRR) and normalized Discounted Cummulative Gain (nDCG). Note that these metrics may be applied to the top K recommendation lists, e.g., HR@K and nDCG@K. These metrics are respectively formulated as follows:

(9)
(10)
(11)
(12)

where is a relevance score of news with the -th rank, which is 1 for clicked news and 0 for non-clicked news. There are several other metrics such as Click-Through Rate (CTR) and dwell time, which can be only used to measure the performance of online news recommenders.

Besides the metrics for measuring recommendation accuracy, there are several other objective or subjective metrics to evaluate news recommender systems in other aspects. For example, in (Zheng et al., 2018), an Intra-List Similarity (ILS) function is used to measure the diversity of recommendation results. More specifically, given a ranking list , its ILS score is calculated as follows:

(13)

where represents the cosine similarity between the item and . A similar diversity metric ILAD is also used in (Qi et al., 2021b, c). In (Gabrilovich et al., 2004) the recommendation results are evaluated by novelty, which is subjectively judged by a group of human subjects by rating the news sets from most novel to least novel. In FeedRec (Wu et al., 2021d), the recommendation results are further evaluated by a set of user engagement-related metrics, such as the average dwell time, finish ratio, dislike ratio and share ratio of the top ranked news. These metrics can help comprehensively evaluate the performance news recommender systems and further improve user experience.

8. Datasets

Many works in the news recommendation field are based on proprietary datasets, such as those collected from Google News (Das et al., 2007), Yahoo’s news (Okura et al., 2017), Bing news (Lian et al., 2018) and MSN news (Wu et al., 2019b). There are only a few publicly available datasets for the research on personalized news recommendation, which are respectively introduced as follows.

 

Dataset Language # Users # News # Clicks News Text Has Leaderboard?
Plista German Unknown 70,353 1,095,323 title, body
Adressa Norwegian 3,083,438 48,486 27,223,576 title, body, category, entities
Globo Portuguese 314,000 46,000 3,000,000 word embeddings of texts
Yahoo! English Unknown 14,180 34,022 anonymized word IDs
MIND English 1,000,000 161,013 24,155,470 title, abstract, body, category, entities

 

Table 5. Comparisons of the five public datasets for news recommendation.

The first one is the plista (Kille et al., 2013) dataset. It is constructed by collecting the 70,353 news articles from 13 German news portals as well as 1,095,323 news click logs of users. In the CLEF 2017 NewsREEL task, the organizers publish a new version of the plista dataset, which records users’ interactions with news from eight publishers in February 2016. This dataset contains 2 million notifications, 58 thousand news updates, and 168 million recommendation requests. The language used in the plista datasets is German since it is mainly based on the news websites and users in German speaking world. Note that the number of users is not provided. The second one is the Adressa (Gulla et al., 2017) dataset, which was constructed by collecting the news logs of the Adresseavisen website in three months. It has a full version with logs in 10 weeks and a small version with logs in one week. The small version contains 561,733 users, 11,207 articles and 2,286,835 clicks, and the full version contains 3,083,438 users, 48,486 articles and 27,223,576 clicks. The news articles in Adressa are written in Norwegian. The third one is the Globo (de Souza Pereira Moreira et al., 2018) dataset, which is retrieved from the Globo news portal in Brazil. This dataset contains about 314,000 users, 46,000 news articles and 3 million news clicks. This dataset is in Portuguese, and there is no original news text in this dataset, and it only provides the embeddings of words generated by a neural model that is pre-trained in a news metadata classification task. The fourth one is a Yahoo!222https://webscope.sandbox.yahoo.com/catalog.php?datatype=l dataset for session-based news recommendation. It contains 14,180 news articles and 34,022 click events. In this dataset, no news text is provided and the number of users is also unknown because there is no information about user ID. The fifth one is the MIND (Wu et al., 2020f)333https://msnews.github.io/ dataset, which is a large-scale English dataset for news recommendation. This dataset is recently released by MSN News, which contains the real news logs of 1 million users in 6 weeks (from October 12 to November 22, 2019). It involves 161,013 news articles, 15,777,377 impressions and 24,155,470 news clicks. We present a comparison of the volume, textual information and leaderboard information of these datasets in Table 5. We can see that only the MIND dataset is associated with a public leaderboard. In fact, many researches conducted on other datasets such as Adressa use different dataset preprocessing methods (Zhu et al., 2019; Hu et al., 2020a)

, making it difficult to make head-to-head comparisons between the results reported in different papers. On the contrary, on the MIND dataset the training, validation and test samples are given, and the evaluation metrics are consistent. Thus, MIND can serve as a standard testbed for news recommendation research.

9. Competition and Benchmark

There are several competitions on personalized news recommendation. One representative one is the NEWSREEL challenge held from 2013 to 2017 (in 2013 the challenge is named NRS).444https://www.newsreelchallenge.org/ There are usually two tasks in the NEWSREEL challenge. The first one is news recommendation in a living lab, which are conducted on an operating news recommendation service. The goal of recommendation algorithms in this task is achieving high news CTRs. The second one is offline evaluation of news recommendation methods in a simulated environment. This task is performed based on the plista dataset, and the goal is to predict which news articles a visitor would read in the future. In the 2017 edition of NewsREEL 87 participants are registered (Kille et al., 2017), and two systems achieved CTRs higher than 2% in the online evaluation task.

Another recent competition is the MIND News Recommendation Competition555https://msnews.github.io/competition.html, which is conducted on the MIND dataset. The goal of this challenge is to predict the click scores of candidate news based on user interests and rank candidate news in each impression. This challenge attracted more than 200 registered participants and the top submission achieved 71.33% in terms of AUC. The leaderboard of this challenge opens after the challenge, and researchers can submit their predictions on the test set to obtain the official evaluation scores. The current top results on this leaderboard is 72.43% in terms of AUC, which is achieved by a recommender named “UniUM” based on the techniques in (Wu et al., 2021c). The MIND dataset, challenge and the public leaderboard can form a good benchmark to facilitate research and engineering on personalized news recommendation.

10. Responsible Personalized News Recommendation

Although personalized news recommendation techniques have achieved notable success in targeting user interest, they still have several issues that may affect user experience and even lead to potential negative social impacts. There are several critical problems in developing more responsible personalized news recommender systems, including privacy protection, debiasing and fairness, diversity, and content quality, which are discussed in the following sections, respectively.

10.1. Privacy Protection

Most existing personalized news recommender systems rely on centralized storage of users behavior data for user modeling and model training. However, user behaviors are usually privacy sensitive, and centrally storing them may lead to users’ privacy concerns and further risks on data leakage (Li et al., 2020). There are only a few works that study the privacy preservation problem in news recommendation (Desarkar and Shinde, 2014; Qi et al., 2020). For example, FedRec (Qi et al., 2020) may be the first attempt to learning privacy-preserving news recommendation model. Instead of collecting and storing user behavior data in a central server, in FedRec users’ news click data are locally stored on user devices. FedRec uses a federated learning based framework to collaboratively learn news recommendation model. Each client keeps a local copy of the model and locally computes the model updates based on local data. The local model updates are uploaded to a central server that coordinates a number of user clients for model training. The server aggregates the local gradients into a global one to update its maintained global model, and distributes the updated global model to user devices for local update. In addition, to further protect user privacy, FedRec applies local differential privacy (LDP) techniques to perturb the local model gradients. Since the protected model gradients usually contain much less private information, user privacy can be better protected. However, FedRec is only a framework for privacy-preserving news recommendation model training, and privacy-preserving online serving is still a challenging problem. In addition, many kinds of user information such as demographics and locations may not be used due to privacy reasons, which also leads to some sacrifice in model performance. Thus, it is challenging to develop accurate and privacy-preserving news recommender systems.

10.2. Debiasing

User behavior data usually encodes various kinds of biases. Some kinds of biases are related to news. For example, click behaviors are influenced by the positions and sizes of news displayed on the webpages (i.e., presentation bias) (Yi et al., 2021). In addition, popular news may have higher chances to be clicked than unpopular news (i.e., popularity bias) (Qi et al., 2021b). These types of bias information may affect the accuracy of user interest modeling and model training. A few works explore to eliminate the influence of certain kinds of bias information to improve personalized news recommendation. For instance, DebiasRec (Yi et al., 2021) aims to reduce the influence of position and size biases on news recommendation. It uses a bias-aware user modeling method to learn debiased user interest representations, and uses a bias-aware click prediction method that decomposes the overall click score into a bias score and a bias-independent user preference score. PP-Rec (Qi et al., 2021b) uses a popularity-aware user modeling method to learn calibrated user interest representations, and it separately models the popularity of news and users’ personal preference on news, which can help better model personalized user interest. These methods mainly aim to infer debiased user interest from on biased user data. However, without any prior knowledge about unbiased data distribution, the bias information usually cannot be fully eliminated. In addition, many kinds of bias such as exposure and selection biases are rarely studied in the news recommendation field. Thus, it is important for future research to understand how different biases affect user behaviors and the recommendation model as well as how to eliminate their effect in model training and evaluation.

10.3. Fairness

Making fair recommendations is an important problem in responsible news recommendation. Researchers have studied various kinds of fairness problems in recommendation, such as provider-side fairness and consumer-side fairness (Burke, 2017). In personalized news recommendation, a representative kind of unfairness is brought by the biases related to sensitive user attributes, such as genders and professions. Users with the same sensitive attributes may have similar patterns in news click behaviors, e.g., fashion news are more preferred by female users. The model may capture these biases and produce biased recommendation results, e.g., tend to only recommend fashion news to female users. This will lead to the unfairness problem that some users cannot obtain their interested news information, which is harmful for user experience. To address this problem, FairRec (Wu et al., 2021g)

uses a decomposed adversarial learning framework with independent user models to learn a bias-aware user embedding and a bias-free user embedding. The bias-aware user embedding mainly aims to capture bias information related to sensitive user attributes, and the bias-free user embedding aims to model bias-independent user interest. Both embeddings are regularized to be orthogonal thereby the bias-free user embedding can contain less bias information. The bias-free user embedding is further used for making fair news recommendations. By learning user embeddings that are agnostic to the sensitive user attributes, the unfairness brought by the bias information related to sensitive user attributes can be effectively mitigated. However, adversarial learning based methods are usually brittle and it is difficult to tune their hyperparameters to fully remove the bias information. In addition, many other genres of fairness (e.g., provider-side fairness) are less studied in news recommendation. In summary, there are many types of fairness to be improved in news recommendation and it is non-trivial to make both fair and accurate news recommendations.

10.4. Diversity

Diversity is critical for personalized news recommendation. Users may not prefer to click news with homogeneous information and improving the information variety is important for improving user experience and engagement (Bernstein et al., 2020). However, most existing news recommendation methods focus on optimizing recommendation accuracy while ignore recommendation diversity, and it is shown in (Wu et al., 2020c; Qi et al., 2021b, c) that many existing news recommendation methods cannot make sufficiently diverse recommendations. There are only a few methods that consider the diversity of news recommendation. Some methods aim to recommend news that are diverse from previously clicked news (Gabrilovich et al., 2004; Wu et al., 2020c), and several other works explore to diversify the top news recommendation list (Li et al., 2011b; Gabriel De Souza et al., 2019). However, is still no work on promote both kinds of diversity in news recommendation. In addition, many diversity-aware news recommendation methods rely on reranking strategies to improve recommendation diversity, which may not be optimal for achieving good tradeoff between recommendation accuracy and diversity. Thus, further research on learning unified diversity-aware news recommendation models is important for improving the quality of online news services.

10.5. Content Moderation

The moderation of news content in news recommendation is a rarely studied problem. In fact, some news articles published online are clickbaits, fake news or containing misinformation. In addition, some news may contain low quality or even harmful content (e.g., racialism and hate speech). Recommending these news will damage user experience and the reputation of news platforms, and may even lead to negative societal impact (Lazer et al., 2018). Although online news platforms can perform manual moderation on news content quality, the huge amount of online news information makes it too difficult or even impossible to filter all news articles with harmful and useless content. Thus, it is important to design news recommendation algorithms that can avoid recommending news with low quality content. Researchers have found that news with high ratios of short reading dwell time (e.g., less than 10 seconds) are probably clickbaits (Wu et al., 2020a). In addition, user behaviors such as comments and sharing on social media may also provide rich clues for detecting news that contain misinformation and harmful content (Shu et al., 2017; Banko et al., 2020). Thus, incorporating the various user feedback has the potential to help recommend news with high-quality content, which can improve the responsibility of news recommendation algorithms.

11. Future Directions

By comprehensively reviewing existing news recommendation techniques in different aspects, we can see that personalized news recommendation techniques have achieved substantial progress over the past years. However, there remain many challenges and unresolved problems. Thus, in this section we raise several potential directions that worth exploring in the future.

11.1. Deep News Understanding

News modeling is at the heart of personalized news recommendation. It can be improved in the following aspects. First, text understanding is a core problem in news modeling, and existing methods may not be capable of understanding the textual content of news deeply. Thus, using more advanced NLP techniques (e.g., knowledge-aware PLMs) may help better understand news texts and improve news modeling. Second, besides textual information, news also contain rich multimodal information such as images, videos and slides. The multimodal news content can provide complementary information on news understanding. Thus, using multimodal content modeling techniques has the potential to improve the comprehensiveness of news understanding. Third, there are many useful factors for news modeling that are not covered by news content, such as publisher, popularity and recency. A unified framework is required to incorporate various kinds of news information (e.g., property features and context features) and meanwhile effectively model the relatedness between different features. Further research on these directions can help understand news more accurately and deeply to empower subsequent user modeling and news ranking.

11.2. Universal User Modeling

User modeling is critical for understanding users’ interest in news. However, it is difficult to model the dynamic and diverse user interest accurately and comprehensively for news recommendation. To tackle this problem, a universal user modeling framework that can model various kinds of user interest is needed. We argue that this framework should satisfy the following requirements. First, the user modeling framework needs to comprehensively infer user interest from multiple kinds of user behaviors and feedback. This is because click behaviors are very noisy and may be sparse for some users, and it is insufficient to model user interests solely from click behaviors. Fortunately, different kinds of user behaviors and feedback (e.g., read and dislike) can provide rich complementary information like user engagement, and incorporating them in a unified framework can better support user modeling. Second, the framework needs to model the diverse and multi-grained user interest. Since a single user embedding may be insufficient to comprehensively model user interests, it may be a promising way to represent user interest with more sophisticated structures such as embedding sets and graphs to improve the understanding of user interest. Third, the framework needs to capture the dynamics of user interests. Since user interest usually evolves with time, it is important to understand user interest in different periods and further model their inherent relations. To meet this end, using more advanced sequence modeling techniques may help improve user interest modeling in personalized news recommendation.

11.3. Effective and Efficient News Ranking

News ranking is an essential step to make personalized news recommendations. There are mainly three research directions to improve news ranking. First, most existing news ranking methods are mainly based the coarse-grained relevance between candidate news and user interest, which may not be optimal for accurately targeting user interest. Although a few methods can model the fine-grained relatedness between user and news, they are inefficient and may not be suitable for scenarios with limited computation resources and latency tolerance. Thus, developing both effective and efficient news ranking methods is important for improving online news recommendation. Second, ranking news solely based on relevance may lead to the filter bubble problem. It is important to design more sophisticated news ranking strategies to achieve a good tradeoff between accuracy and diversity. Third, most existing news ranking methods are greedy, i.e., only consider the current ranking list in the ranking policy. However, they may not be optimal for achieving good user engagement in the long-term. Thus, designing proper news ranking strategies to optimize long-term rewards may be beneficial for user experience.

11.4. Unified Model Training

Model training techniques are also important for learning effective and robust personalized news recommendation models. There are four potential directions for future works to improve model training. First, most methods only use click signals for model training, which may be inaccurate because click signals are usually noisy and biased. In addition, the supervision signals in specific tasks may also be insufficient (Wu et al., 2020e). Thus, a unified framework to incorporate various kinds of supervised and self-supervised training signals and objectives for collaborative model learning can effectively improve the model quality. Second, although several methods explore to use multi-task learning frameworks to incorporate multiple objectives into model training, they need to manually tune the loss coefficients of different tasks in model training, which usually require much human effort and may be sensitive to the characteristics of datasets. Thus, a self-adaptive multi-task learning framework to automatically tune hyperparameters like loss coefficients can reduce the developing effort and improve the model generality. Third, many methods use randomly selected negative samples for model training, which may be noisy and less informative. Thus, using more effective negative sampling can help train more robust and accurate news recommendation models. Fourth, offline trained models may have gaps with the online scenarios and may suffer from the performance decline with time. Thus, it is important to incorporate both offline and online learning techniques to help the model better adapt to the latest online serving requirements.

11.5. Privacy-preserving News Recommendation

In recent years, the ethical issues of intelligent systems have attracted much attention from both the academia and public. Developing more responsible news recommender systems can help better serve users of online news services with smaller risks. One important direction for improving the responsibility of personalized news recommendation is user privacy protection. FedRec (Qi et al., 2020) explores to use federated learning techniques to train news recommendation models in a privacy-preserving way. However, there are still many challenges in developing a privacy-preserving news recommender system. First, given a model learned in a federated way, it is still challenging to use it to serve online users without user behavior data. Second, there may also be potential privacy risks during the training of news recommendation models. Third, the federated learning framework usually brings high computation and communication cost to user devices, which may hinder its application in real-world scenarios. Thus, further researches on developing more effective, efficient and privacy-preserving news recommendation methods are needed.

11.6. Diversity-aware News Recommendation

Besides accuracy, diversity in news recommendation also has decisive influence on user experience. There are three main research directions to improve the diversity of news recommendation. The first one is temporal-spatial diversity-aware news recommendation, which aims to recommend news that are diverse from each other and meanwhile diverse from historical clicked news. This can help the recommendation results better satisfy users’ preference on information variety. The second one is personalizing the diversity in news recommendation. Different users may have different preferences on the tradeoff between accuracy and diversity, and it may be better to consider their personalized preference to improve user experience. The third one is fine-grained diversity, which aims to not only diversify the content and topic of news, but also many other factors like publishers, locations, opinions and emotions, which has the potential to make higher-quality diversity-aware news recommendations.

11.7. Debiasing in News Recommendation

Debiasing is another important problem in improving the responsibility of news recommendation. The biases encoded by user behavior data will propagate to the recommendation model and may further be amplified in the loops of recommendation. Thus, designing effective methods to eliminate the influence of the various kinds of biases on recommendation results is important for making high-quality news recommendations. There are several potential research directions in this field. First, it is important to understand the influence of different kinds of biases on user behaviors and the recommendation model, which can help the subsequent debiasing. Second, different users may be influenced by the same bias information in different ways, and considering the personalized preference of users on bias information can help better eliminate the effects of biases. Third, there are various kinds of biases in news recommendation. A unified debiasing framework that can simultaneously reduce the effects of different biases can greatly improve the accuracy and robustness of news recommendation algorithms.

11.8. Fairness-aware News Recommendation

Fairness is an essential but often ignored factor in personalized news recommendation. A fair news recommender system is required to provide fair recommendation services to different groups of users and meanwhile give fair chances to news from different providers to be recommended. Future research on fair news recommendation can be conducted in the following three directions. First, it is important to reduce the consumer-side unfairness related to sensitive user attributes. Although adversarial learning techniques are mature solutions to this problem, they are usually brittle and difficult to tune. Thus, more robust and effective methods are required to remove the biases introduced by sensitive user attributes. Second, different news providers and publishers are diverse in their characteristics, such as topic preference and reputation. Thus, it is non-trivial to properly balance the recommendation chances of news from different providers and publishers to achieve better provider-side fairness. Third, there are different types of fairness in the personalized news recommendation scenario, and it is very challenging to simultaneously achieve multi-side fairness without a heavy sacrifice of recommendation accuracy.

11.9. Content Moderation in News Recommendation

The moderation of news content is important for online news platforms to avoid recommending news with low quality or harmful content to users and mitigate their impact on users and society. However, this issue is rarely studied and cannot be resolved by most existing news recommendation methods. There are three key research directions on this problem. First, it is essential to understand the generation and spreading mechanism of harmful news as well as their impact on users, which can help news platforms better defend toxic content. Second, it may be useful to incorporating content moderation techniques like fake news detection (Shu et al., 2017) and clickbait detection (Wu et al., 2020b) into news recommendation to adjust the recommendation results according to the quality of news content. Third, without the assistance of additional tasks and resources, we can learn content quality-aware news recommendation models with the guidance of certain kinds of user feedback such as comments and dislikes, which is expected to help recommend high-quality news to users.

11.10. Societal Impact of News Recommendation

News recommender systems can generate societal impact when they serve a certain number of users. They may imperceptibly influence the opinions and views of users when displaying personalized news content. Thus, it is valuable for further research to identify and analyze the societal impact of personalized news recommendation algorithms, such as their influence on political events, economic activities and psychological health. In addition, research on how to reduce the potential negative societal impact of personalized news recommendation methods can help avoid their risky behaviors and better serve online users.

12. Conclusion

In this paper, we present a comprehensive overview of the personalized news recommendation field, including the technologies involved in different core modules of a personalized news recommender, the dataset and metrics for performance evaluation, the key points for developing responsible personalized news recommender systems, and potential directions to be explored in the future. Different from existing survey papers that follow the conventional taxonomy of news recommendation methods, in this paper we provide a novel perspective to understand personalized news recommendation from its key problems and the associated techniques and challenges. In addition, this is the first survey paper that comprehensively covers both traditional and up-to-date deep learning techniques for personalized news recommendation, which can provide rich insights for extending the frontier of this field. We hope this paper can facilitate future research on personalized news recommendation as well as related fields in NLP and data mining.

References

  • M. An, F. Wu, C. Wu, K. Zhang, Z. Liu, and X. Xie (2019) Neural news recommendation with long-and short-term user representations. In ACL, pp. 336–345. Cited by: §3.2, Table 2, §4.2, Table 4.
  • X. Bai, B. B. Cambazoglu, F. Gullo, A. Mantrach, and F. Silvestri (2017) Exploiting search history of users for news personalization. Information Science 385, pp. 125–137. Cited by: Table 3.
  • M. Banko, B. MacKeen, and L. Ray (2020) A unified taxonomy of harmful content. In WOAH@EMNLP, pp. 125–137. Cited by: §10.5.
  • A. Bernstein, C. de Vreese, N. Helberger, W. Schulz, K. Zweig, C. Baden, M. A. Beam, M. P. Hauer, L. Heitz, P. Jürgens, et al. (2020) Diversity in news recommendations. arXiv preprint arXiv:2005.09495. Cited by: §10.4.
  • D. Billsus and M. J. Pazzani (2000) User modeling for adaptive news access. User Modeling and User-adapted Interaction 10 (2-3), pp. 147–180. Cited by: Table 1, §4.1, Table 3.
  • T. Bogers and A. Van den Bosch (2007) Comparing and evaluating information retrieval algorithms for news recommendation. In Recsys, pp. 141–144. Cited by: §1.
  • A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko (2013) Translating embeddings for modeling multi-relational data. In NIPS, pp. 1–9. Cited by: §3.2.
  • H. L. Borges and A. C. Lorena (2010) A survey on recommender systems for news data. In Smart Information and Knowledge Management, pp. 129–151. Cited by: §1.
  • E. Brocken, A. Hartveld, E. de Koning, T. van Noort, F. Hogenboom, F. Frasincar, and T. Robal (2019) Bing-cf-idf+: a semantics-driven news recommender system. In CAiSE, pp. 32–47. Cited by: §3.1, Table 1.
  • R. Burke (2017) Multisided fairness for recommendation. arXiv preprint arXiv:1707.00093. Cited by: §10.3.
  • S. Caldarelli, D. F. Gurini, A. Micarelli, and G. Sansonetti (2016) A signal-based approach to news recommendation.. In UMAP (Extended Proceedings), Cited by: Table 1.
  • I. Cantador, P. Castells, and A. Bellogín (2011) An enhanced semantic layer for hybrid recommender systems: application to news recommendation. Int. Journal on Semantic Web and Inf. Syst. 7 (1), pp. 44–78. Cited by: §3.1, Table 1, §4.1, Table 3.
  • I. Cantador and P. Castells (2009) Semantic contextualisation in a news recommender system. Cited by: Table 1.
  • M. Capelle, F. Frasincar, M. Moerland, and F. Hogenboom (2012) Semantics-based news recommendation. In WIMS, pp. 1–9. Cited by: §2.1, §3.1, Table 1, §4.1.
  • M. Capelle, F. Hogenboom, A. Hogenboom, and F. Frasincar (2013) Semantic news recommendation using wordnet and bing similarities. In SAC, pp. 296–302. Cited by: Table 1.
  • M. Capelle, M. Moerland, F. Hogenboom, F. Frasincar, and D. Vandic (2015) Bing-sf-idf+ a hybrid semantics-driven news recommender. In SAC, pp. 732–739. Cited by: §3.1, Table 1.
  • C. Chen, X. Meng, Z. Xu, and T. Lukasiewicz (2017) Location-aware personalized news recommendation with deep semantic analysis. IEEE Access 5, pp. 1624–1638. Cited by: §2.
  • W. Chen, L. Zhang, C. Chen, and J. Bu (2009) A hybrid phonic web news recommender system for pervasive access. In Proc. of the WRI Int. Conf. on Commun. and Mobile Comput., Vol. 3, pp. 122–126. Cited by: Table 1.
  • Q. Chu, G. Liu, H. Sun, and C. Zhou (2019) Next news recommendation via knowledge-aware sequential model. In CCL, pp. 221–232. Cited by: §3.2, Table 2, Table 4.
  • W. Chu and S. Park (2009) Personalized recommendation on dynamic content using predictive bilinear models. In WWW, pp. 691–700. Cited by: §3.1, Table 1, §4.1, Table 3.
  • M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. Netes, and M. Sartin (1999) Combing content-based and collaborative filters in an online newspaper. In Recommender Systems@SIGIR, Cited by: §3.1, Table 1, §4.1, §6.1.
  • F. Corsini and M. Larson (2016) CLEF newsreel 2016: image based recommendation. In CLEF, pp. 593–605. Cited by: §2.
  • A. Darvishy, H. Ibrahim, F. Sidi, and A. Mustapha (2020) HYPNER: a hybrid approach for personalized news recommendation. IEEE Access 8, pp. 46877–46894. Cited by: §3.1, Table 1, Table 3.
  • A. S. Das, M. Datar, A. Garg, and S. Rajaram (2007) Google news personalization: scalable online collaborative filtering. In WWW, pp. 271–280. Cited by: §2.1, §2.2, §3.1, §4.1, §8.
  • E. de Koning, F. Hogenboom, and F. Frasincar (2018) News recommendation with cf-idf+. In Int. Conf. on Adv. Information Syst. Eng., pp. 170–184. Cited by: §3.1, Table 1.
  • G. de Souza Pereira Moreira, F. Ferreira, and A. M. da Cunha (2018) News session-based recommendations using deep neural networks. In DLRS@RecSys, pp. 15–23. Cited by: §3.2, §4.2, §8.
  • M. S. Desarkar and N. Shinde (2014) Diversification in news recommendation for privacy concerned users. In DSAA, pp. 135–141. Cited by: §10.1.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, pp. 4171–4186. Cited by: §3.2.
  • J. Domann and A. Lommatzsch (2017) A highly available real-time news recommender based on apache spark. In Int. Conf. of the Cross-Lang. Eval. Forum for Eur. Lang., pp. 161–172. Cited by: §3.1.
  • D. Doychev, A. Lawlor, R. Rafter, and B. Smyth (2014) An analysis of recommender algorithms for online news. In CLEF, pp. 177–184. Cited by: §1.
  • M. Durairaj and K. M. Kumar (2014) News recommendation systems using web mining: a survey. In International Journal of Engineering Trends and Technology, Vol. 126, pp. 293–299. Cited by: §1.
  • S. K. Dwivedi and C. Arya (2016) A survey of news recommendation approaches. In ICTBIG, pp. 1–6. Cited by: §1.
  • E. V. Epure, B. Kille, J. E. Ingvaldsen, R. Deneckere, C. Salinesi, and S. Albayrak (2017) Recommending personalized news in short user sessions. In Recsys, pp. 121–129. Cited by: §3.1, Table 1.
  • C. Feng, M. Khan, A. U. Rahman, and A. Ahmad (2020) News recommendation systems-accomplishments, challenges & future directions. IEEE Access 8, pp. 16702–16725. Cited by: §1, §1.
  • B. Fortuna, C. Fortuna, and D. Mladenić (2010) Real-time news recommender system. In ECML PKDD, pp. 583–586. Cited by: §2.2, §3.1, Table 1, §4.1, Table 3, §6.1.
  • F. Frasincar, J. Borsje, and L. Levering (2009) A semantic web-based approach for building personalized news services. Int. Journal of E-Business Res. 5 (3), pp. 35–53. Cited by: Table 1.
  • P. M. Gabriel De Souza, D. Jannach, and A. M. Da Cunha (2019) Contextual hybrid session-based news recommendation with recurrent neural networks. IEEE Access 7, pp. 169185–169203. Cited by: §10.4, §3.2, §3.3.2, Table 2, §4.2, Table 4.
  • E. Gabrilovich, S. Dumais, and E. Horvitz (2004) Newsjunkie: providing personalized newsfeeds via analysis of information novelty. In WWW, pp. 482–490. Cited by: §10.4, Table 1, §5.1, §7.
  • F. Gao, Y. Li, L. Han, and J. Ma (2009) InfoSlim: an ontology-content based personalized mobile news recommendation system. In WiCOM, pp. 1–4. Cited by: §3.1, Table 1, Table 3.
  • J. Gao, X. Xin, J. Liu, R. Wang, J. Lu, B. Li, X. Fan, and P. Guo (2018) Fine-grained deep knowledge-aware network for news recommendation with self-attention. In WI, pp. 81–88. Cited by: §3.2, Table 2, Table 4.
  • F. Garcin, C. Dimitrakakis, and B. Faltings (2013) Personalized news recommendation with context trees. In Recsys, pp. 105–112. Cited by: Table 1, §4.1.
  • F. Garcin and B. Faltings (2013) Pen recsys: a personalized news recommender systems framework. In Proceedings of the international news recommender systems workshop and challenge, pp. 3–9. Cited by: Table 1.
  • F. Garcin, K. Zhou, B. Faltings, and V. Schickel (2012) Personalized news recommendation based on collaborative filtering. In WI-IAT, Vol. 1, pp. 437–441. Cited by: §2.2, §3.1, Table 1, §4.1, §5.1.
  • S. Ge, C. Wu, F. Wu, T. Qi, and Y. Huang (2020) Graph enhanced representation learning for news recommendation. In WWW, pp. 2863–2869. Cited by: §3.2, §3.3.2, Table 2, §4.2, Table 4.
  • A. Gershman, T. Wolfe, E. Fink, and J. G. Carbonell (2011)

    News personalization using support vector machines

    .
    Cited by: §2.3, §3.1, §3.1, Table 1, §4.1, §6.1.
  • A. Gharahighehi, C. Vens, and K. Pliakos (2020) Multi-stakeholder news recommendation using hypergraph learning. In INRA@ECML PKDD, Cited by: Table 1, §4.1.
  • F. Goossen, W. IJntema, F. Frasincar, F. Hogenboom, and U. Kaymak (2011) News personalization using the cf-idf semantic recommender. In WIMS, pp. 1–12. Cited by: §2.3, §3.1, Table 1, §4.1, §5.1.
  • W. Gu, S. Dong, Z. Zeng, and J. He (2014) An effective news recommendation method for microblog user. The Scientific World Journal 2014. Cited by: Table 1, Table 3.
  • J. A. Gulla, L. Zhang, P. Liu, Ö. Özgöbek, and X. Su (2017) The adressa dataset for news recommendation. In WI, pp. 1042–1048. Cited by: §2.6, §8.
  • K. Han (2020) Personalized news recommendation and simulation based on improved collaborative filtering algorithm. Complexity 2020. Cited by: §3.1.
  • M. Harandi and J. A. Gulla (2015) Survey of user profiling in news recommender systems. In INRA@ECML PKDD, pp. 20–26. Cited by: §1.
  • F. Hogenboom, M. Capelle, M. Moerland, and F. Frasincar (2014) Bing-sf-idf+ semantics-driven news recommendation. In WWW, pp. 291–292. Cited by: §3.1, Table 1.
  • F. Hogenboom, M. Capelle, and M. Moerland (2013) News recommendation using semantics with the bing-sf-idf approach. In ER, pp. 160–169. Cited by: §3.1, Table 1.
  • C. Hsieh, L. Yang, H. Wei, M. Naaman, and D. Estrin (2016) Immersive recommendation: news and event recommendations using personal digital traces. In WWW, pp. 51–62. Cited by: Table 1, Table 3.
  • L. Hu, C. Li, C. Shi, C. Yang, and C. Shao (2020a) Graph neural news recommendation with long-term and short-term interest modeling. Information Processing & Management 57 (2), pp. 102142. Cited by: §2.5, §3.2, §3.3.2, Table 2, §4.2, Table 4, §8.
  • L. Hu, S. Xu, C. Li, C. Yang, C. Shi, N. Duan, X. Xie, and M. Zhou (2020b) Graph neural news recommendation with unsupervised preference disentanglement. In ACL, pp. 4255–4264. Cited by: §3.2, §3.3.2, Table 2, §4.2, Table 4.
  • P. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck (2013) Learning deep structured semantic models for web search using clickthrough data. In CIKM, pp. 2333–2338. Cited by: §3.2.
  • C. Hutto and E. Gilbert (2014)

    Vader: a parsimonious rule-based model for sentiment analysis of social media text

    .
    In ICWSM, Vol. 8. Cited by: §3.2.
  • I. Ilievski and S. Roy (2013) Personalized news recommendation based on implicit feedback. In Proceedings of the international news recommender systems workshop and challenge, pp. 10–15. Cited by: §3.1, §3.1, Table 1, §4.1, Table 3.
  • G. Ji, S. He, L. Xu, K. Liu, and J. Zhao (2015) Knowledge graph embedding via dynamic mapping matrix. In ACL-IJCNLP, pp. 687–696. Cited by: §3.2.
  • Y. Ji, W. Hong, Y. Shangguan, H. Wang, and J. Ma (2016)

    Regularized singular value decomposition in news recommendation system

    .
    In ICCSE, pp. 621–626. Cited by: §3.1, §6.1.
  • N. Jonnalagedda, S. Gauch, K. Labille, and S. Alfarhood (2016) Incorporating popularity in a personalized news recommender system. PeerJ 2. Cited by: Table 1, Table 3.
  • N. Jonnalagedda and S. Gauch (2013) Personalized news recommendation using twitter. In WI-IAT, Vol. 3, pp. 21–25. Cited by: §3.1, Table 1, Table 3.
  • K. Joseph and H. Jiang (2019) Content based news recommendation via shortest entity distance over knowledge graphs. In Companion Proceedings of WWW, pp. 690–699. Cited by: Table 1, §4.1.
  • M. Jugovac, D. Jannach, and M. Karimi (2018) Streamingrec: a framework for benchmarking stream-based news recommenders. In Recsys, pp. 269–273. Cited by: Table 1.
  • M. Karimi, D. Jannach, and M. Jugovac (2018) News recommender systems–survey and roads ahead. Information Processing & Management 54 (6), pp. 1203–1227. Cited by: §1.
  • G. Kazai, I. Yusof, and D. Clarke (2016) Personalised news and blog recommendations based on user location, facebook and twitter user profiling. In SIGIR, pp. 1129–1132. Cited by: Table 1, Table 3.
  • D. Khattar, V. Kumar, V. Varma, and M. Gupta (2018) Weave&rec: a word embedding based 3-d convolutional network for news recommendation. In CIKM, pp. 1855–1858. Cited by: §3.2, Table 2, §4.2, Table 4.
  • D. Khattar, V. Kumar, and V. Varma (2017) Leveraging moderate user data for news recommendation. In SERecSys@ICDM, R. Gottumukkala, X. Ning, G. Dong, V. Raghavan, S. Aluru, G. Karypis, L. Miele, and X. Wu (Eds.), pp. 757–760. Cited by: Table 1.
  • B. Kille, F. Hopfgartner, T. Brodt, and T. Heintz (2013) The plista dataset. In Proceedings of the international news recommender systems workshop and challenge, pp. 16–23. Cited by: §2.6, §8.
  • B. Kille, A. Lommatzsch, F. Hopfgartner, M. Larson, and T. Brodt (2017) CLEF 2017 newsreel overview: offline and online evaluation of stream-based news recommender systems. In CLEF, Cited by: §9.
  • Y. Kim (2014) Convolutional neural networks for sentence classification. In EMNLP, pp. 1746–1751. Cited by: §3.2.
  • E. Kirshenbaum, G. Forman, and M. Dugan (2012) A live comparison of methods for personalized article recommendation at forbes.com. In ECML PKDD, P. A. Flach, T. De Bie, and N. Cristianini (Eds.), pp. 51–66. Cited by: Table 1.
  • M. Kompan and M. Bieliková (2010) Content-based news recommendation. In EC-Web, F. Buccafurri and G. Semeraro (Eds.), pp. 61–72. Cited by: Table 1.
  • V. Kumar, D. Khattar, S. Gupta, M. Gupta, and V. Varma (2017a) Deep neural architecture for news recommendation.. In CLEF, Cited by: §3.2, Table 2, §4.2, Table 4.
  • V. Kumar, D. Khattar, S. Gupta, M. Gupta, and V. Varma (2017b) User profiling based deep neural network for temporal news recommendation. In SERecSys@ICDM, R. Gottumukkala, X. Ning, G. Dong, V. Raghavan, S. Aluru, G. Karypis, L. Miele, and X. Wu (Eds.), pp. 765–772. Cited by: §3.2, Table 2, §4.2, Table 4.
  • V. Kumar, D. Khattar, S. Gupta, and V. Varma (2017c) Word semantics based 3-d convolutional neural networks for news recommendation. In SERecSys@ICDM, pp. 761–764. Cited by: §3.2, Table 2, Table 4.
  • K. Lang (1995) Newsweeder: learning to filter netnews. In Machine Learning, pp. 331–339. Cited by: §3.1, Table 1, §4.1.
  • V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan (2000) Language models for financial news recommendation. In CIKM, pp. 389–396. Cited by: §2.
  • D. M. Lazer, M. A. Baum, Y. Benkler, A. J. Berinsky, K. M. Greenhill, F. Menczer, M. J. Metzger, B. Nyhan, G. Pennycook, D. Rothschild, et al. (2018) The science of fake news. Science 359 (6380), pp. 1094–1096. Cited by: §10.5.
  • Q. Le and T. Mikolov (2014) Distributed representations of sentences and documents. In ICML, pp. 1188–1196. Cited by: §3.2, §3.2, §3.2.
  • D. Lee, B. Oh, S. Seo, and K. Lee (2020) News recommendation with topic-enriched knowledge graphs. In CIKM, pp. 695–704. Cited by: §3.2, Table 2, §4.2, Table 4.
  • H. J. Lee and S. J. Park (2007) MONERS: a news recommender for the mobile web. Expert Systems With Applications 32 (1), pp. 143–150. Cited by: §2.2, §3.1, §3.1, Table 1, §4.1, Table 3.
  • J. Li, C. Tao, W. Wu, Y. Feng, D. Zhao, and R. Yan (2019) Sampling matters! an empirical study of negative sampling strategies for learning of matching models in retrieval-based dialogue systems. In EMNLP-IJCNLP, pp. 1291–1296. Cited by: §6.3.
  • L. Li and T. Li (2013) News recommendation via hypergraph learning: encapsulation of user behavior and news content. In WSDM, pp. 305–314. Cited by: Table 1, §4.1.
  • L. Li, D. Wang, S. Zhu, and T. Li (2011a) Personalized news recommendation: a review and an experimental investigation. Journal of Computer Science and Technology 26 (5), pp. 754. Cited by: §1.
  • L. Li, D. Wang, T. Li, D. Knox, and B. Padmanabhan (2011b) SCENE: a scalable two-stage personalized news recommendation system. In SIGIR, pp. 125–134. Cited by: §10.4, §2.1, §2.2, §3.1, §3.1, Table 1, §4.1, Table 3, §5.1.
  • L. Li, L. Zheng, and T. Li (2011c) Logo: a long-short user interest integration in personalized news recommendation. In Recsys, pp. 317–320. Cited by: Table 1, §4.1.
  • L. Li, L. Zheng, F. Yang, and T. Li (2014) Modeling and broadening temporal user interest in personalized news recommendation. Expert Systems With Applications 41 (7), pp. 3168–3177. Cited by: Table 1, §4.1.
  • L. Li, W. Chu, J. Langford, and R. E. Schapire (2010) A contextual-bandit approach to personalized news article recommendation. In WWW, pp. 661–670. Cited by: §2.3, Table 1, Table 3, §5.2.
  • L. Li, W. Chu, J. Langford, and X. Wang (2011d) Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In WSDM, pp. 297–306. Cited by: §5.2.
  • M. Li and L. Wang (2019) A survey on personalized news recommendation technology. IEEE Access 7, pp. 145861–145879. Cited by: §1, §1, §2.
  • Q. Li, J. Wang, Y. P. Chen, and Z. Lin (2010) User comments for news recommendation in forum-based social media. Information Science 180 (24), pp. 4929–4939. Cited by: Table 1, Table 3.
  • T. Li, A. K. Sahu, A. Talwalkar, and V. Smith (2020) Federated learning: challenges, methods, and future directions. IEEE Signal Processing Magazine 37 (3), pp. 50–60. Cited by: §10.1.
  • J. Lian, F. Zhang, X. Xie, and G. Sun (2018) Towards better representation learning for personalized news recommendation: a multi-channel deep fusion approach. In IJCAI, pp. 3805–3811. Cited by: §2.5, Table 3, §5.1, §8.
  • Y. Liang, B. Loni, and M. Larson (2017) CLEF newsreel 2017: contextual bandit news recommendation.. In CLEF, Cited by: §3.1, Table 1, §5.2.
  • C. Lin, R. Xie, X. Guan, L. Li, and T. Li (2014) Personalized news recommendation via implicit social experts. Information Science 254, pp. 1–18. Cited by: Table 1.
  • D. Liu, T. Bai, J. Lian, X. Zhao, G. Sun, J. Wen, and X. Xie (2019) News graph: an enhanced knowledge graph for news recommendation.. In KaRS@CIKM, pp. 1–7. Cited by: §3.2, Table 2, §4.2, Table 4.
  • D. Liu, J. Lian, S. Wang, Y. Qiao, J. Chen, G. Sun, and X. Xie (2020) KRED: knowledge-aware document representation for news recommendations. In Recsys, pp. 200–209. Cited by: §3.2, Table 2, §4.2, Table 4, §6.1.
  • J. Liu, P. Dolan, and E. R. Pedersen (2010) Personalized news recommendation based on click behavior. In IUI, pp. 31–40. Cited by: §3.1, §3.1, Table 1, §4.1.
  • J. Liu, J. Song, C. Li, X. Zhu, and R. Deng (2021)

    A hybrid news recommendation algorithm based on k-means clustering and collaborative filtering

    .
    In Journal of Physics: Conference Series, Vol. 1881, pp. 032050. Cited by: Table 1.
  • A. Lommatzsch, B. Kille, F. Hopfgartner, and L. Ramming (2018) NewsREEL multimedia at mediaeval 2018: news recommendation with image and text content. In MediaEval, Cited by: §2, §3.3.1.
  • A. Lommatzsch (2014) Real-time news recommendation using context-aware ensembles. In ECIR, pp. 51–62. Cited by: §3.1, §4.1.
  • J. Lu, D. Batra, D. Parikh, and S. Lee (2019) Vilbert: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. pp. 13–23. Cited by: §3.2.
  • M. Lu and J. Liu (2016) Hier-uim: a hierarchy user interest model for personalized news recommender. In CCIS, pp. 249–254. Cited by: Table 1.
  • C. A. Ludmann (2017) Recommending news articles in the clef news recommendation evaluation lab with the data stream management system odysseus.. In CLEF, Cited by: §2.
  • H. Luo, J. Fan, and D. A. Keim (2008) Personalized news video recommendation. In ACM MM, pp. 1001–1002. Cited by: §3.1, Table 1.
  • T. Luostarinen and O. Kohonen (2013) Using topic models in content-based news recommender systems. In NODALIDA, pp. 239–251. Cited by: Table 1.
  • S. Manoharan and R. Senthilkumar (2020) An intelligent fuzzy rule-based personalized news recommendation using social media mining. Computational intelligence and neuroscience 2020. Cited by: Table 3.
  • T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In NIPS, pp. 3111–3119. Cited by: §3.2.
  • M. Moerland, F. Hogenboom, M. Capelle, and F. Frasincar (2013) Semantics-based news recommendation with sf-idf+. In WIMS, pp. 1–8. Cited by: §3.1, Table 1.
  • L. Mookiah, W. Eberle, and M. Mondal (2018) Personalized news recommendation using graph-based approach. Intelligent Data Analysis 22 (4), pp. 881–909. Cited by: §3.1, Table 1.
  • C. D. H. Nguyen, N. Arch-Int, and S. Arch-Int (2015) A semantically hybrid framework of personalizing news recommendations. International Journal of Innovative Computing, Information and Control 11 (6), pp. 1947–1963. Cited by: §3.1, Table 1.
  • Y. Noh, Y. Oh, and S. Park (2014) A location-based personalized news recommendation. In BigComp, pp. 99–104. Cited by: Table 1, §4.1, Table 3.
  • S. Okura, Y. Tagami, S. Ono, and A. Tajima (2017) Embedding-based news recommendation for millions of users. In KDD, pp. 1933–1942. Cited by: §1, §2.1, §2.2, §2.3, §2, §3.2, §3.3.2, Table 2, §4.2, Table 4, §5.1, §6.1, §8.
  • A. v. d. Oord, Y. Li, and O. Vinyals (2018) Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748. Cited by: §6.1.
  • Ö. Özgöbek, J. A. Gulla, and R. C. Erdur (2014) A survey on challenges and methods in news recommendation. In WEBIST, pp. 278–285. Cited by: §1.
  • A. H. Parizi, M. Kazemifard, and M. Asghari (2016) EmoNews: an emotional news recommender system.. Journal of Digital Information Management 14 (6). Cited by: §3.1, Table 1.
  • A. H. Parizi and M. Kazemifard (2015) Emotional news recommender system. In ICCS, pp. 37–41. Cited by: §3.1, Table 1.
  • K. Park, J. Lee, and J. Choi (2017) Deep neural networks for news recommendations. In CIKM, pp. 2255–2258. Cited by: §3.2, Table 2, Table 4.
  • A. Patankar, J. Bose, and H. Khanna (2019) A bias aware news recommendation system. In ICSC, pp. 232–238. Cited by: §3.1, Table 1.
  • J. Pennington, R. Socher, and C. Manning (2014) Glove: global vectors for word representation. In EMNLP, pp. 1532–1543. Cited by: §3.2.
  • O. Phelan, K. McCarthy, M. Bennett, and B. Smyth (2011) On using the real-time web for news recommendation & discovery. In WWW, pp. 103–104. Cited by: Table 1, Table 3.
  • O. Phelan, K. McCarthy, and B. Smyth (2009) Using twitter to recommend real-time topical news. In Recsys, pp. 385–388. Cited by: Table 1, Table 3.
  • T. Qi, F. Wu, C. Wu, Y. Huang, and X. Xie (2020) Privacy-preserving news recommendation model learning. In EMNLP: Findings, pp. 1423–1432. Cited by: §10.1, §11.5, §2.7, §3.2, Table 2, Table 4, §6.2.
  • T. Qi, F. Wu, C. Wu, and Y. Huang (2021a) Personalized news recommendation with knowledge-aware interactive matching. In SIGIR, Cited by: §3.2, Table 2, §4.2, Table 4, §5.1.
  • T. Qi, F. Wu, C. Wu, and Y. Huang (2021b) PP-rec: news recommendation with personalized user interest and time-aware news popularity. In ACL, Cited by: §10.2, §10.4, §3.2, §3.3.2, Table 2, §4.2, Table 4, §6.1, §7.
  • T. Qi, F. Wu, C. Wu, P. Yang, Y. Yu, X. Xie, and Y. Huang (2021c) HieRec: hierarchical user interest modeling for personalized news recommendation. In ACL, Cited by: §10.4, §2.3, §3.2, Table 2, §4.2, Table 4, §5.1, §7.
  • Y. Qian, P. Zhao, Z. Li, J. Fang, L. Zhao, V. S. Sheng, and Z. Cui (2019) Interaction graph neural network for news recommendation. In WISE, pp. 599–614. Cited by: §3.2, §3.3.2, Table 2, §4.2, Table 4.
  • J. Qin (2020) Research progress of news recommendation methods. arXiv preprint arXiv:2012.02360. Cited by: §1.
  • J. Rao, A. Jia, Y. Feng, and D. Zhao (2013) Personalized news recommendation using ontologies harvested from the web. In WAIM, pp. 781–787. Cited by: §3.1, Table 1.
  • J. Ren, J. Long, and Z. Xu (2019) Financial news recommendation based on graph embeddings. Decision Support Systems 125, pp. 113115. Cited by: §3.3.2, Table 2, Table 4.
  • P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl (1994) GroupLens: an open architecture for collaborative filtering of netnews. In CSCW, pp. 175–186. Cited by: §2.1, §2.2, §2.4, §3.1, §4.1.
  • T. Santosh, A. Saha, and N. Ganguly (2020) MVL: multi-view learning for news recommendation. In SIGIR, pp. 1873–1876. Cited by: §3.2, §3.3.2, Table 2, §4.2, Table 4.
  • K. Saranya and G. S. Sadhasivam (2012) A personalized online news recommendation system. International Journal of Computer Applications 57 (18). Cited by: §3.1, Table 1, §4.1, Table 3.
  • Shan Liu, Yao Dong, and Jianping Chai (2016) Research of personalized news recommendation system based on hybrid collaborative filtering algorithm. In ICCC, pp. 865–869. Cited by: §3.1, Table 1.
  • B. Shapira, P. Shoval, N. Tractinsky, and J. Meyer (2009) EPaper: a personalized mobile newspaper. Journal of the American Society for Information Science and Technology 60 (11), pp. 2333–2346. Cited by: §3.1, Table 1.
  • H. Sheu, Z. Chu, D. Qi, and S. Li (2021) Knowledge-guided article embedding refinement for session-based news recommendation. TNNLS. Cited by: §3.2.
  • H. Sheu and S. Li (2020) Context-aware graph embedding for session-based news recommendation. In Recsys, pp. 657–662. Cited by: §3.2, Table 2, §4.2, Table 4.
  • K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu (2017) Fake news detection on social media: a data mining perspective. KDD 19 (1), pp. 22–36. Cited by: §10.5, §11.9.
  • J. Son, A. Kim, and S. Park (2013) A location-based news article recommendation with explicit localized semantic analysis. In SIGIR, pp. 293–302. Cited by: §2.
  • M. Sood and H. Kaur (2014a) Preference based personalized news recommender system. International Journal of Advanced Computer Research 4 (2), pp. 575. Cited by: Table 1, Table 3.
  • M. Sood and H. Kaur (2014b) Survey on news recommendation. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering 3 (6), pp. 9972–9977. Cited by: §1.
  • G. Sottocornola, P. Symeonidis, and M. Zanker (2018) Session-based news recommendations. In Companion Proceedings of WWW, pp. 1395–1399. Cited by: §3.1, Table 1.
  • M. Tavakolifard, J. A. Gulla, K. C. Almeroth, J. E. Ingvaldesn, G. Nygreen, and E. Berg (2013) Tailored news in the palm of your hand: a multi-perspective transparent approach to news recommendation. In WWW, pp. 305–308. Cited by: §3.1, Table 1, Table 3.
  • M. Tran, X. Tran, and H. Uong (2010) User interest analysis with hidden topic in news recommendation system. In IALP, pp. 211–214. Cited by: Table 1.
  • M. Trevisiol, L. M. Aiello, R. Schifanella, and A. Jaimes (2014) Cold-start news recommendation with domain-dependent browse graph. In Recsys, pp. 81–88. Cited by: Table 1, §4.1.
  • P. Viana and M. Soares (2017) A hybrid approach for personalized news recommendation in a mobility scenario using long-short user interest.

    International Journal on Artificial Intelligence Tools

    26 (02), pp. 1760012.
    Cited by: Table 1, §4.1, Table 3.
  • C. Wang, L. Kim, G. Bang, H. Singh, R. Kociuba, S. Pomerville, and X. Liu (2020a) Discovery news: a generic framework for financial news recommendation. In AAAI, Vol. 34, pp. 13390–13395. Cited by: Table 1.
  • H. Wang, F. Wu, Z. Liu, and X. Xie (2020b) Fine-grained interest matching for neural news recommendation. In ACL, pp. 836–845. Cited by: §2.3, §5.1.
  • H. Wang, F. Zhang, X. Xie, and M. Guo (2018) DKN: deep knowledge-aware network for news recommendation. In WWW, pp. 1835–1844. Cited by: §2.1, §2.4, §2.5, §3.2, §3.2, §3.3.2, Table 2, §4.2, Table 4, §6.1.
  • X. Wang, L. Yu, K. Ren, G. Tao, W. Zhang, Y. Yu, and J. Wang (2017) Dynamic attention deep model for article recommendation by learning human editors’ demonstration. In KDD, pp. 2051–2059. Cited by: §2.
  • G. Wei, Y. Wei, J. Lei, et al. (2021) News recommendation based on click-through rate prediction model. In LISS, pp. 373. Cited by: Table 1, Table 3.
  • H. Wen, L. Fang, and L. Guan (2012) A hybrid approach for personalized recommendation of news on the web. Expert Systems With Applications 39 (5), pp. 5806–5814. Cited by: §3.1, Table 1.
  • C. Wu, F. Wu, M. An, J. Huang, Y. Huang, and X. Xie (2019a) Neural news recommendation with attentive multi-view learning. In IJCAI, pp. 3863–3869. Cited by: §1, §3.2, §3.3.2, Table 2, §4.2, Table 4.
  • C. Wu, F. Wu, M. An, J. Huang, Y. Huang, and X. Xie (2019b) Npa: neural news recommendation with personalized attention. In KDD, pp. 2576–2584. Cited by: §1, §2.2, §2.4, §3.2, Table 2, §4.2, §4.3.2, Table 4, §6.1, §6.2, §8.
  • C. Wu, F. Wu, M. An, T. Qi, J. Huang, Y. Huang, and X. Xie (2019c) Neural news recommendation with heterogeneous user behavior. In EMNLP-IJCNLP, pp. 4876–4885. Cited by: §3.2, Table 2, §4.2, Table 4.
  • C. Wu, F. Wu, S. Ge, T. Qi, Y. Huang, and X. Xie (2019d) Neural news recommendation with multi-head self-attention. In EMNLP-IJCNLP, pp. 6390–6395. Cited by: §2.1, §3.2, §3.3.2, Table 2, §4.2, Table 4.
  • C. Wu, F. Wu, Y. Huang, and X. Xie (2020a) Neural news recommendation with negative feedback. CCF TPCI, pp. 178–188. Cited by: §10.5, §3.2, Table 2, §4.2, Table 4.
  • C. Wu, F. Wu, Y. Huang, and X. Xie (2021a) User-as-graph: user modeling with heterogeneous graph pooling for news recommendation. In IJCAI, Cited by: §2.6, §3.2, Table 2, §4.2, Table 4.
  • C. Wu, F. Wu, and Y. Huang (2021b) Rethinking infonce: how many negative samples do you need?. arXiv preprint arXiv:2105.13003. Cited by: §6.3.
  • C. Wu, F. Wu, J. Liu, S. He, Y. Huang, and X. Xie (2019e)

    Neural demographic prediction using search query

    .
    In WSDM, pp. 654–662. Cited by: §3.2, §3.3.2, Table 2, Table 4, §4, §6.1.
  • C. Wu, F. Wu, T. Qi, and Y. Huang (2020b) Clickbait detection with style-aware title modeling and co-attention. In CCL, pp. 430–443. Cited by: §11.9.
  • C. Wu, F. Wu, T. Qi, and Y. Huang (2020c) SentiRec: sentiment diversity-aware neural news recommendation. In AACL, pp. 44–53. Cited by: §10.4, §2.7, §3.2, §3.2, §3.3.2, Table 2, §4.2, Table 4, §6.1.
  • C. Wu, F. Wu, T. Qi, and Y. Huang (2020d) User modeling with click preference and reading satisfaction for news recommendation. In IJCAI, pp. 3023–3029. Cited by: §3.2, §3.3.2, Table 2, §4.2, Table 4, §6.1.
  • C. Wu, F. Wu, T. Qi, and Y. Huang (2021c) Empowering news recommendation with pre-trained language models. In SIGIR, Cited by: §2.1, §2.5, §2.6, §3.2, §3.3.2, Table 2, Table 4, §6.2, §9.
  • C. Wu, F. Wu, T. Qi, and Y. Huang (2021d) FeedRec: news feed recommendation with various user feedbacks. arXiv preprint arXiv:2102.04903. Cited by: §2.5, §3.2, Table 2, §4.2, Table 4, §6.1, §7.
  • C. Wu, F. Wu, T. Qi, and Y. Huang (2021e) MM-rec: multimodal news recommendation. arXiv preprint arXiv:2104.07407. Cited by: §3.2, §3.3.2, Table 2, §4.2, Table 4.
  • C. Wu, F. Wu, T. Qi, and Y. Huang (2021f) Two birds with one stone: unified model learning for both recall and ranking in news recommendation. arXiv preprint arXiv:2104.07404. Cited by: §2.6, §3.2, Table 2, §4.2, Table 4, §6.3.
  • C. Wu, F. Wu, T. Qi, J. Lian, Y. Huang, and X. Xie (2020e) PTUM: pre-training user model from unlabeled user behaviors via self-supervision. In EMNLP: Findings, pp. 1939–1944. Cited by: §11.4.
  • C. Wu, F. Wu, X. Wang, Y. Huang, and X. Xie (2021g) FairRec:fairness-aware news recommendation with decomposed adversarial learning. In AAAI, pp. 4462–4469. Cited by: §10.3, §2.7, §3.2, Table 2, §4.2, Table 4.
  • F. Wu, Y. Qiao, J. Chen, C. Wu, T. Qi, J. Lian, D. Liu, X. Xie, J. Gao, W. Wu, and M. Zhou (2020f) MIND: a large-scale dataset for news recommendation. In ACL, Cited by: §1, §2.6, §2, §6.2, §8.
  • S. Xiao, Z. Liu, Y. Shao, T. Di, and X. Xie (2021) Training microsoft news recommenders with pretrained language models in the loop. arXiv e-prints, pp. arXiv–2102. Cited by: §3.2.
  • Y. Xiao, P. Ai, C. Hsu, H. Wang, and X. Jiao (2015) Time-ordered collaborative filtering for news recommendation. China Communcations 12 (12), pp. 53–62. Cited by: §3.1, §3.1, Table 1.
  • J. Yang (2016) Effects of popularity-based news recommendations (“most-viewed”) on users’ exposure to online news. Media Psychology 19 (2), pp. 243–271. Cited by: §2.
  • Z. Yang, Y. Wu, M. Wu, and Y. Wang (2020) Double cross & deep network for news recommendation. In ICEEIM, pp. 101–106. Cited by: Table 1.
  • K. F. Yeung and Y. Yang (2010) A proactive personalized mobile news recommendation system. In Developments in E-systems Engineering, pp. 207–212. Cited by: §3.1, Table 1, §4.1, Table 3.
  • J. Yi, F. Wu, C. Wu, Q. Li, G. Sun, and X. Xie (2021) DebiasedRec: bias-aware user modeling and click prediction for personalized news recommendation. arXiv preprint arXiv:2104.07360. Cited by: §10.2, §2.7, §3.2, Table 2, §4.2, Table 4.
  • X. Yi, L. Hong, E. Zhong, N. N. Liu, and S. Rajan (2014) Beyond clicks: dwell time for personalization. In Recsys, pp. 113–120. Cited by: Table 1, §4.1, §6.3.
  • Y. Yue and T. Joachims (2009) Interactively optimizing information retrieval systems as a dueling bandits problem. In ICML, pp. 1201–1208. Cited by: §5.2.
  • D. Zeleník and M. Bieliková (2011) News recommending based on text similarity and user behaviour. In WEBIST, pp. 302–307. Cited by: Table 1.
  • H. Zhang, X. Chen, and S. Ma (2019a) Dynamic news recommendation with hierarchical attention network. In ICDM, pp. 1456–1461. Cited by: §3.2, §3.3.2, Table 2, §4.2, Table 4.
  • K. Zhang, X. Xin, P. Luo, and P. Guot (2017) Fine-grained news recommendation by fusing matrix factorization, topic analysis and knowledge graph representation. In SMC, pp. 918–923. Cited by: §3.1, Table 1.
  • L. Zhang, P. Liu, and J. A. Gulla (2018) A deep joint network for session-based news recommendations with contextual augmentation. In HT, pp. 201–209. Cited by: §3.2, §3.3.2, Table 2, §4.2, Table 4.
  • L. Zhang, P. Liu, and J. A. Gulla (2019b) Dynamic attention-integrated neural network for session-based news recommendation. Machine Learning 108 (10), pp. 1851–1875. Cited by: Table 2, §4.2, Table 4.
  • X. Zhang, Q. Yang, and D. Xu (2021) Combining explicit entity graph with implicit text information for news recommendation. In Companion Proceedings of WWW, pp. 412–416. Cited by: §3.2, Table 2, §4.2, Table 4.
  • G. Zheng, F. Zhang, Z. Zheng, Y. Xiang, N. J. Yuan, X. Xie, and Z. Li (2018) DRN: a deep reinforcement learning framework for news recommendation. In WWW, pp. 167–176. Cited by: §2.3, Table 1, Table 3, §5.2, §6.2, §7.
  • Q. Zhu, X. Zhou, Z. Song, J. Tan, and L. Guo (2019) Dan: deep attention neural network for news recommendation. In AAAI, Vol. 33, pp. 5973–5980. Cited by: §3.2, §3.3.2, Table 2, §4.2, Table 4, §8.
  • M. Zihayat, A. Ayanso, X. Zhao, H. Davoudi, and A. An (2019) A utility-based news recommendation system. Decision Support Systems 117, pp. 14–27. Cited by: Table 1.