Voice-based personal assistants can serve as a medium for getting useful and interesting information in a timely and convenient manner. One of their key benefits is their ability to proactively update the user about new political developments, sports events, weather, and a variety of other topics that might be relevant to them. News is one of the most important sources of useful information. However, in order for the user to hear the news relevant to them, it must be recommended in an interesting way.
An instance of such a recommendation can be seen in the example conversation between a user and an open-domain voice-based conversational agent in Table1. The user talks to the agent about a recent movie, and when the user loses interest in discussing movies, the system recommends a trending technology story to engage the user in hearing the news.
|Bot :||Hey, did you see the recent movie Joker?|
|User:||Yes, I loved it. It’s pretty awesome.|
|Bot :||That’s nice. I’m guessing you like superhero movies?|
|User:||Not really, but Joker is just on another level. Loved the|
|character and the aesthetic.|
|Bot :||I see. Would you like to talk about other movies like|
|User:||I don’t want to talk about movies.|
|Bot :||There is a trending technology story about|
|Apple’s new AirPods Pro. Want to hear it?|
|Bot :||Wired on Oct 28th reported: Apple revealed a new|
|version of AirPods on its website today, adding the|
|totally wire-free headphones to its list of products.|
|I have more news on this topic. Would you like to hear?|
Our work is complementary to the task of which item to recommend. Instead, we study an important problem of how to present a news item in a voice-based conversational setting. To answer this question, we experimented with different ways of suggesting news as the next topic to initiate a conversation about news in a large scale user study with live Alexa users. In particular, we study the user engagement and satisfaction with five different methods of presenting news recommendations: (1) offering a generic news briefing; (2) proposing to talk about an entity from the current conversation; (3) proposing to talk about an entity from the user’s past conversations; (4) offering news on a trending news topic; and (5) a generic suggestion to talk about news without a specific entity or topic.
Our results show that entity-based news recommendations have 29% higher acceptance compared to briefing recommendations, and almost 100% higher acceptance compared to recommending trending news. Furthermore, we found that recommending news on an entity that the user has shown interest in, leads to significantly higher likelihood of the user hearing the news. In summary, our contributions include:
|Generic||Proposing to talk about news without a specific entity/headline.||”There is so much going on in the world. Would you like to talk about the news?|
|Trending News||Proposing to tell news on a trending topic from Bing News/Google Trends.||”How about a recent sports story titled ”Astros solidify comeback”. Want to hear it?”|
|”There is a trending technology story about Apple AirPods Pro. Want to hear it?”|
|News Briefing||Proposing to provide a 45-second summary of the news from Reuters.||”I prepared a 45-second overview of the news for you. Want to hear it?”|
|Entity based (Current Conversation)||Proposing to tell news about the user’s favorite sports team (if they mentioned it earlier), or any entity that the user asked a question about. (How old is the Dalai Lama?)||”Do you want to hear the latest sports updates on your favorite team?”|
|Entity based (Past Conversation)||Proposing to tell news about an entity mentioned by the user in past conversations.||”Last time we spoke, you mentioned Imagine Dragons. Would you like to hear the latest news on Imagine Dragons?”|
Collating the different options for presenting news recommendations in an open-domain conversational agent.
Defining the evaluation metrics for quantifying the effectiveness of different presentation conditions for conversational news recommendation.
Reporting on a large-scale empirical evaluation of conversational news recommendation with real users.
To the best of our knowledge, this paper is the first large-scale, empirical evaluation of different variations of presenting news recommendation in a conversational setting. As our results show, the presentation of the news recommendation has a significant effect on recommendation acceptance, and some effect on subsequent user satisfaction with the conversational system, suggesting promising directions for future work.
The next section briefly talks about the related work to place our contributions in context.
2. Related work
In this section, we first present an overview of previous literature on general news recommendation. Then, we present approaches related to conversational news recommendation, followed by work on understanding user behaviors to evaluate conversational systems.
2.1. News Recommendation
Recommendation systems have a long tradition of literature since these systems are widely used in popular applications like e-commerce, video and streaming platforms, and web search. For web-based news recommendation systems like Google News and Bing web search portal, collaborative filtering (Das et al., 2007; Lu et al., 2015) and hybrid methods (Liu et al., 2010) are proposed to profile users based on their clicks and offer personalized recommendations.
2.2. News Dialogue Systems
Yoshino et al. (Yoshino and Kawahara, 2015) proposed a conversational news navigation system, which offers news headlines one by one, and reads out the full news article or answers questions about it depending on the user query. However, this kind of presentation is not feasible in an open-domain conversational agent.
From our examination of logs of user conversations with our system, we found that all news-related queries belonged to one of the following main categories:
News from a particular category: ”Tell me the latest sports news.”
News about a specific entity: ”Tell me some news about the New England Patriots.”
A news briefing: ”Give me my news briefing.”
Generic: ”Let’s talk about the news.”
In addition, we observed that returning users commonly mention entities from their past conversations. Based on these observations, we defined five different ways of presenting news recommendations in an open-domain conversational agent, summarized in Table 2.
2.3. Measuring Satisfaction and Engagement in Conversational Agents
Traditional IR systems have been evaluated by studying user behaviors using well-known metrics such as click-through, dwell-times and touch features (Kiseleva et al., 2016; Fox et al., 2005). For evaluating conversational systems, metrics like task-completion rate and direct user ratings have been used (Hashemi et al., 2018; Kiseleva et al., 2016; Hancock et al., 2019). While these studies primarily focus on predicting offline (session-level) user satisfaction, Choi et al. in (Choi et al., 2019) proposed a neural model that predicts both online and offline satisfaction labels. We will use their model, ConvSAT, to define a metric to evaluate user satisfaction with different types of news recommendations.
3. System Description
In this section, we first give an overview of the open-domain conversational agent that was used for our experiments. We then provide details of the news component of the system which is responsible for answering news-related queries and making news recommendations.
3.1. Conversational Agent Overview
This study was performed as part of a naturalistic assessment of open-domain conversational systems, organized by the Amazon Alexa Prize Conversational AI challenge. Amazon Alexa customers were randomly assigned to each participating system, and could converse on a wide range of topics. At the end of the conversation, the customer could optionally leave a rating (1.0-5.0) and an optional comment feedback.
Our goal was to develop a conversational agent that helps the user be informed about the world around them, while being entertained and engaged. Our agent incorporated real-time search, informed advice, and latest news into the conversation, by attempting to discuss and share information on relevant latest topics and opinions in the News, Sports, Entertainment, and general knowledge. The detailed description of the agent architecture, dialogue management and response ranking and generation can be found in our technical report (Ahmadvand et al., 2018).
The system could handle 8 major topics namely Movies, Music, News, Pets, Sports, Travel, Games and Cars. Each of these topics was handled by a domain-specific component or ”mini-skill”, which would keep the user engaged in the same domain, e.g. movies, as shown in the example conversation in Table 1. If the user lost interest in the topic, or the component failed to give a satisfactory response, a new topic or component would be suggested to the user. The same component was not recommended again until all the other components had already been recommended once to the user. For example, if the user is first recommended a news briefing, they will not be recommended news again until they have been recommended movies, music, travel and all the other topics.
3.2. News-Oriented Conversational Agent
The following API were used for crawling news articles and online news retrieval to provide news to the user.
Microsoft Bing News search API: 111https://azure.microsoft.com/en-us/services/cognitive-services/bing-news-search-api: Given a topic, the API returns a list of news headlines, a short snippet from the start of the article, source, and date of publication. The API also provides trending news organized by category, like sports or politics.
Reuters RSS Feed: We crawl the daily news briefing from Reuters. Each story in the briefing is shortened to one or two sentences, so that the briefing can be read out to the user in about 45 seconds.
If news on a trending topic or entity is recommended and the user accepts the recommendation, we query Bing News API for news on the trending topic or entity. We then keep offering more news on the topic or entity as in the conversation in Table 1 till the user refuses, or changes the subject from news to something else. If the user accepts the News Briefing recommendation, we return the daily news briefing crawled from Reuters, and then ask the user if they want to continue talking about the news. If the user accepts the generic news recommendation, we ask them to pick a category from sports, politics, technology and other, and then provide trending news from that category.
We describe our analysis of these methods in Section 5.
4. Experimental setup
4.1. Dataset Overview
4.2. Online Satisfaction Annotation
We used the pretrained ConvSAT(Choi et al., 2019)
model for predicting the probability of the user being satisfied at the end of the conversation, given the conversation upto an intermediate turn. The model proposed in the paper was trained to predict satisfaction labels for the same conversational agent which we used for conducting our experiments(Ahmadvand et al., 2018). As discussed in (Choi et al., 2019), the model achieved 1.072 root mean squared error (RMSE) and 0.772 mean absolute error (MAE) on a human-annotated test dataset.
4.3. Evaluation Metrics
We evaluate the different formulations of news recommendations on the following four metrics.
We define the acceptance rate of a formulation of news recommendation as the fraction of times the user explicitly accepted that type of recommendation.
Post-Recommendation Interruption Rate
This metric indicates the fraction of times the user asked to end the conversation at the turn when a recommendation of that formulation was made.
Post-Recommendation User Engagement
We define post recommendation user engagement as the average number of turns for which the user engaged with the news component after accepting that formulation of recommendation.
Post-Recommendation User Satisfaction
We used the user-satisfaction model described in (Choi et al., 2019) to measure post-recommendation user satisfaction. The model gives the probability of the user being satisfied at the end of the conversation given the conversation upto an intermediate turn. We used it to measure the probability of satisfaction at the turn when news was recommended and the turn after it. We define a formulation’s post recommendation user satisfaction as the fraction of times the probability of satisfaction was higher after the recommendation than before.
5. Results and Discussion
We first investigate the recommendation acceptance rate (Section 5.1) for different presentation conditions, which is indicative of how effective the presentation is in getting the user interested in news. Once the recommendation has been accepted, we analyze its post-recommendation user engagement (Section 5.2) and post-recommendation user engagement satisfaction (Section 5.3). We then look at the post-recommendation interruption rate (Section 5.4) to see if the presentation can ever be detrimental to the user experience.
5.1. Recommendation Acceptance Rate
As Table 3 indicates, recommendations based on entities mentioned by the user in the current conversation have a significantly higher acceptance rate compared to other types (improvement of and over baseline). For returning users, the rate of acceptance of news on past entities is also significantly high. Trending news recommendation, on the other hand, shows little improvement (not statistically significant) over the generic news recommendation. The reason for that is the user is unlikely to know what a trending news headline is about, and those recommendations often fail to get the user interested in news if they don’t want to hear the trending news in general.
|Entity (curr)||0.53 (+133%)||1516||0.49 (+97.3%)||221|
|Entity (past)||N/A||N/A||0.53 (+111%)||34|
|News Briefing||0.41 (+80.8%)||6939||0.36 (+44.7%)||1161|
|Trending News||0.24 (+6.9%)||7202||0.23 (-7.2%)||69|
5.2. Post-Recommendation User Engagement
|Entity (curr)||1.85 (-43.5%)||807||1.70 (-48.7%||109|
|Entity (past)||N/A||N/A||1.55 (-53.0%)||18|
|News Briefing||1.74 (-46.7%)||2866||1.95 (-40.9%)||420|
|Trending News||1.98 (-39.37%)||1759||1 (-69.8%)||16|
As Table 4 shows, generic recommendations have the highest number of follow-on turns. A possible reason for that is that users who accept the generic news recommendation are the ones who are actually interested in the news as a whole and not just specific topics. Also, the generic recommendation is followed by the bot asking the user to pick a category or topic of interest, which leads the user to engage for at least one more turn.
Although trending news topics have low acceptance rates, they have higher post-recommendation engagement compared to entity-based recommendations. ( p value = ). That is because for entity-based recommendation, news component will return some recent news about the entity mentioned by the user, but the trending news is more likely to be interesting.
5.3. Post-Recommendation User Satisfaction
|Entity (curr)||0.90 (-7.4%)||807||0.88 (-11.0%)||109|
|Entity (past)||N/A||N/A||1.0 (0.0%)||18|
|News Briefing||0.90 (-8.0%)||2866||0.85 (-14.0%)||420|
|Trending News||0.94 (-3.54%)||1759||0.66 (-33.3%)||16|
The generic news recommendation causes increase in immediate satisfaction the most number of times, possibly because users who accept the suggestion have prior interest in news.
5.4. Post-Recommendation Interruption Rate
|Entity (curr)||0.03 (-51.4%)||807||0.02 (-41.2%)||109|
|Entity (past)||N/A||N/A||0.0 (-100%)||18|
|News Briefing||0.07 (+0.1%)||2866||0.06 (+59%)||420|
|Trending News||0.07 (-5.28%)||1759||0.0 (-100%)||16|
As Table 6 shows, entity-based recommendation has the lowest post-recommendation interruption rate. Users very rarely end the conversation on being offered news on an entity that they expressed interest in during the current conversation. Although News Briefing recommendation has relatively high acceptance rate, it also has the most interruption. Users generally find news briefings useful. However, because the experiments were conducted with a ”socialbot” many users were not interested in a 45-second long news briefing.
6. Conclusions and future work
In order to engage a user about news, a conversational system needs to proactively suggest news to the user first. To the best of our knowledge, this was the first study comparing the effectiveness of different options of presenting news recommendation in a conversational setting. We proposed four quantitative ways of evaluating news recommendation, specifically designed for the conversational setting, and evaluated multiple presentation methods on thousands of conversations with real users. Our results showed that there is a significant difference in how users respond to different presentation styles, with the proposed metrics emphasizing different aspects of the system performance. In particular, high acceptance rate of news recommendations on entities from a user’s past conversation indicates that users retain interest in the topics they like across conversations and tend to want to hear updates on them. Trending news is more likely to be relevant to the user compared to news on an entity. But acceptance is higher for entity-based recommendations. That suggests that the representation affects acceptance in addition to the quality of recommendations. In future work, we plan to investigate how to better identify the news items to recommend based on the user’s conversational history and long term profile, and how to engage the user in a conversation about a news topic after the initial news item was accepted. Together, our results, evaluation metrics, and analysis provide valuable insights into improving conversational news recommendation, and more generally information-oriented conversational agents.
- Emory irisbot: an open-domain conversational bot for personalized information access. Alexa Prize Proceedings. Cited by: §3.1, §4.2.
- Offline and online satisfaction prediction in open-domain conversational systems. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Cited by: §2.3, §4.2, §4.3.
- Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th international conference on World Wide Web, pp. 271–280. Cited by: §2.1.
- Evaluating implicit measures to improve web search. ACM Transactions on Information Systems (TOIS) 23 (2), pp. 147–168. Cited by: §2.3.
- Learning from dialogue after deployment: feed yourself, chatbot!. arXiv preprint arXiv:1901.05415. Cited by: §2.3.
- Measuring user satisfaction on smart speaker intelligent assistants using intent sensitive query embeddings. In Proc. of CIKM, pp. 1183–1192. Cited by: §2.3.
- In Proc. of SIGIR, pp. 45–54. Cited by: §2.3.
- Personalized news recommendation based on click behavior. In Proceedings of the 15th international conference on Intelligent user interfaces, pp. 31–40. Cited by: §2.1.
Content-based collaborative filtering for news topic recommendation.
Twenty-ninth AAAI conference on artificial intelligence, Cited by: §2.1.
- News navigation system based on proactive dialogue strategy. In Natural Language Dialog Systems and Intelligent Assistants, pp. 15–25. Cited by: §2.2.