Would You Like to Hear the News? Investigating Voice-BasedSuggestions for Conversational News Recommendation

06/02/2020 ∙ by Harshita Sahijwani, et al. ∙ Emory University 0

One of the key benefits of voice-based personal assistants is the potential to proactively recommend relevant and interesting information. One of the most valuable sources of such information is the News. However, in order for the user to hear the news that is useful and relevant to them, it must be recommended in an interesting and informative way. However, to the best of our knowledge, how to present a news item for a voice-based recommendation remains an open question. In this paper, we empirically compare different ways of recommending news, or specific news items, in a voice-based conversational setting. Specifically, we study the user engagement and satisfaction with five different variants of presenting news recommendations: (1) a generic news briefing; (2) news about a specific entity relevant to the current conversation; (3) news about an entity from a past conversation; (4) news on a trending news topic; and (5) the default - a suggestion to talk about news in general. Our results show that entity-based news recommendations exhibit 29 higher acceptance compared to briefing recommendations, and almost 100 acceptance compared to recommending generic or trending news. Our investigation into the presentation of news recommendations and the resulting insights could make voice assistants more informative and engaging.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Voice-based personal assistants can serve as a medium for getting useful and interesting information in a timely and convenient manner. One of their key benefits is their ability to proactively update the user about new political developments, sports events, weather, and a variety of other topics that might be relevant to them. News is one of the most important sources of useful information. However, in order for the user to hear the news relevant to them, it must be recommended in an interesting way.

An instance of such a recommendation can be seen in the example conversation between a user and an open-domain voice-based conversational agent in Table

1. The user talks to the agent about a recent movie, and when the user loses interest in discussing movies, the system recommends a trending technology story to engage the user in hearing the news.

Bot : Hey, did you see the recent movie Joker?
User: Yes, I loved it. It’s pretty awesome.
Bot : That’s nice. I’m guessing you like superhero movies?
User: Not really, but Joker is just on another level. Loved the
character and the aesthetic.
Bot : I see. Would you like to talk about other movies like
the Joker?
User: I don’t want to talk about movies.
Bot : There is a trending technology story about
Apple’s new AirPods Pro. Want to hear it?
User: Yes.
Bot : Wired on Oct 28th reported: Apple revealed a new
version of AirPods on its website today, adding the
totally wire-free headphones to its list of products.
I have more news on this topic. Would you like to hear?
Table 1. Example human-machine conversation. The conversational agent recommends news as the next topic by proposing a trending technology news. (Note: This is not a conversation from an Alexa user.)

Our work is complementary to the task of which item to recommend. Instead, we study an important problem of how to present a news item in a voice-based conversational setting. To answer this question, we experimented with different ways of suggesting news as the next topic to initiate a conversation about news in a large scale user study with live Alexa users. In particular, we study the user engagement and satisfaction with five different methods of presenting news recommendations: (1) offering a generic news briefing; (2) proposing to talk about an entity from the current conversation; (3) proposing to talk about an entity from the user’s past conversations; (4) offering news on a trending news topic; and (5) a generic suggestion to talk about news without a specific entity or topic.

Our results show that entity-based news recommendations have 29% higher acceptance compared to briefing recommendations, and almost 100% higher acceptance compared to recommending trending news. Furthermore, we found that recommending news on an entity that the user has shown interest in, leads to significantly higher likelihood of the user hearing the news. In summary, our contributions include:

Type Description Examples
Generic Proposing to talk about news without a specific entity/headline. ”There is so much going on in the world. Would you like to talk about the news?
Trending News Proposing to tell news on a trending topic from Bing News/Google Trends. ”How about a recent sports story titled ”Astros solidify comeback”. Want to hear it?”
”There is a trending technology story about Apple AirPods Pro. Want to hear it?”
News Briefing Proposing to provide a 45-second summary of the news from Reuters. ”I prepared a 45-second overview of the news for you. Want to hear it?”
Entity based (Current Conversation) Proposing to tell news about the user’s favorite sports team (if they mentioned it earlier), or any entity that the user asked a question about. (How old is the Dalai Lama?) ”Do you want to hear the latest sports updates on your favorite team?”
Entity based (Past Conversation) Proposing to tell news about an entity mentioned by the user in past conversations. ”Last time we spoke, you mentioned Imagine Dragons. Would you like to hear the latest news on Imagine Dragons?”
Table 2. Five different news recommendation types defined after analyzing common news-related queries issued to our system.
  • Collating the different options for presenting news recommendations in an open-domain conversational agent.

  • Defining the evaluation metrics for quantifying the effectiveness of different presentation conditions for conversational news recommendation.

  • Reporting on a large-scale empirical evaluation of conversational news recommendation with real users.

To the best of our knowledge, this paper is the first large-scale, empirical evaluation of different variations of presenting news recommendation in a conversational setting. As our results show, the presentation of the news recommendation has a significant effect on recommendation acceptance, and some effect on subsequent user satisfaction with the conversational system, suggesting promising directions for future work.

The next section briefly talks about the related work to place our contributions in context.

2. Related work

In this section, we first present an overview of previous literature on general news recommendation. Then, we present approaches related to conversational news recommendation, followed by work on understanding user behaviors to evaluate conversational systems.

2.1. News Recommendation

Recommendation systems have a long tradition of literature since these systems are widely used in popular applications like e-commerce, video and streaming platforms, and web search. For web-based news recommendation systems like Google News and Bing web search portal, collaborative filtering (Das et al., 2007; Lu et al., 2015) and hybrid methods (Liu et al., 2010) are proposed to profile users based on their clicks and offer personalized recommendations.

2.2. News Dialogue Systems

Yoshino et al. (Yoshino and Kawahara, 2015) proposed a conversational news navigation system, which offers news headlines one by one, and reads out the full news article or answers questions about it depending on the user query. However, this kind of presentation is not feasible in an open-domain conversational agent.

From our examination of logs of user conversations with our system, we found that all news-related queries belonged to one of the following main categories:

  1. News from a particular category: ”Tell me the latest sports news.”

  2. News about a specific entity: ”Tell me some news about the New England Patriots.”

  3. A news briefing: ”Give me my news briefing.”

  4. Generic: ”Let’s talk about the news.”

In addition, we observed that returning users commonly mention entities from their past conversations. Based on these observations, we defined five different ways of presenting news recommendations in an open-domain conversational agent, summarized in Table 2.

2.3. Measuring Satisfaction and Engagement in Conversational Agents

Traditional IR systems have been evaluated by studying user behaviors using well-known metrics such as click-through, dwell-times and touch features (Kiseleva et al., 2016; Fox et al., 2005). For evaluating conversational systems, metrics like task-completion rate and direct user ratings have been used (Hashemi et al., 2018; Kiseleva et al., 2016; Hancock et al., 2019). While these studies primarily focus on predicting offline (session-level) user satisfaction, Choi et al. in (Choi et al., 2019) proposed a neural model that predicts both online and offline satisfaction labels. We will use their model, ConvSAT, to define a metric to evaluate user satisfaction with different types of news recommendations.

3. System Description

In this section, we first give an overview of the open-domain conversational agent that was used for our experiments. We then provide details of the news component of the system which is responsible for answering news-related queries and making news recommendations.

3.1. Conversational Agent Overview

This study was performed as part of a naturalistic assessment of open-domain conversational systems, organized by the Amazon Alexa Prize Conversational AI challenge. Amazon Alexa customers were randomly assigned to each participating system, and could converse on a wide range of topics. At the end of the conversation, the customer could optionally leave a rating (1.0-5.0) and an optional comment feedback.

Our goal was to develop a conversational agent that helps the user be informed about the world around them, while being entertained and engaged. Our agent incorporated real-time search, informed advice, and latest news into the conversation, by attempting to discuss and share information on relevant latest topics and opinions in the News, Sports, Entertainment, and general knowledge. The detailed description of the agent architecture, dialogue management and response ranking and generation can be found in our technical report (Ahmadvand et al., 2018).

The system could handle 8 major topics namely Movies, Music, News, Pets, Sports, Travel, Games and Cars. Each of these topics was handled by a domain-specific component or ”mini-skill”, which would keep the user engaged in the same domain, e.g. movies, as shown in the example conversation in Table 1. If the user lost interest in the topic, or the component failed to give a satisfactory response, a new topic or component would be suggested to the user. The same component was not recommended again until all the other components had already been recommended once to the user. For example, if the user is first recommended a news briefing, they will not be recommended news again until they have been recommended movies, music, travel and all the other topics.

3.2. News-Oriented Conversational Agent

Data Sources:

The following API were used for crawling news articles and online news retrieval to provide news to the user.

  • Microsoft Bing News search API: 111https://azure.microsoft.com/en-us/services/cognitive-services/bing-news-search-api: Given a topic, the API returns a list of news headlines, a short snippet from the start of the article, source, and date of publication. The API also provides trending news organized by category, like sports or politics.

  • Reuters RSS Feed: We crawl the daily news briefing from Reuters. Each story in the briefing is shortened to one or two sentences, so that the briefing can be read out to the user in about 45 seconds.

News retrieval:

If news on a trending topic or entity is recommended and the user accepts the recommendation, we query Bing News API for news on the trending topic or entity. We then keep offering more news on the topic or entity as in the conversation in Table 1 till the user refuses, or changes the subject from news to something else. If the user accepts the News Briefing recommendation, we return the daily news briefing crawled from Reuters, and then ask the user if they want to continue talking about the news. If the user accepts the generic news recommendation, we ask them to pick a category from sports, politics, technology and other, and then provide trending news from that category.

We describe our analysis of these methods in Section 5.

4. Experimental setup

4.1. Dataset Overview

Our conversational system was deployed as part of the Amazon Alexa Prize Challenge 2018 as mentioned in Section 3. The results are based on a total of 6994 conversations spread across June 2018 to August 2018, in which news was recommended at least once.

4.2. Online Satisfaction Annotation

We used the pretrained ConvSAT(Choi et al., 2019)

model for predicting the probability of the user being satisfied at the end of the conversation, given the conversation upto an intermediate turn. The model proposed in the paper was trained to predict satisfaction labels for the same conversational agent which we used for conducting our experiments

(Ahmadvand et al., 2018). As discussed in (Choi et al., 2019), the model achieved 1.072 root mean squared error (RMSE) and 0.772 mean absolute error (MAE) on a human-annotated test dataset.

4.3. Evaluation Metrics

We evaluate the different formulations of news recommendations on the following four metrics.

Acceptance Rate

We define the acceptance rate of a formulation of news recommendation as the fraction of times the user explicitly accepted that type of recommendation.

Post-Recommendation Interruption Rate

This metric indicates the fraction of times the user asked to end the conversation at the turn when a recommendation of that formulation was made.

Post-Recommendation User Engagement

We define post recommendation user engagement as the average number of turns for which the user engaged with the news component after accepting that formulation of recommendation.

Post-Recommendation User Satisfaction

We used the user-satisfaction model described in (Choi et al., 2019) to measure post-recommendation user satisfaction. The model gives the probability of the user being satisfied at the end of the conversation given the conversation upto an intermediate turn. We used it to measure the probability of satisfaction at the turn when news was recommended and the turn after it. We define a formulation’s post recommendation user satisfaction as the fraction of times the probability of satisfaction was higher after the recommendation than before.

5. Results and Discussion

We first investigate the recommendation acceptance rate (Section 5.1) for different presentation conditions, which is indicative of how effective the presentation is in getting the user interested in news. Once the recommendation has been accepted, we analyze its post-recommendation user engagement (Section 5.2) and post-recommendation user engagement satisfaction (Section 5.3). We then look at the post-recommendation interruption rate (Section 5.4) to see if the presentation can ever be detrimental to the user experience.

5.1. Recommendation Acceptance Rate

As Table 3 indicates, recommendations based on entities mentioned by the user in the current conversation have a significantly higher acceptance rate compared to other types (improvement of and over baseline). For returning users, the rate of acceptance of news on past entities is also significantly high. Trending news recommendation, on the other hand, shows little improvement (not statistically significant) over the generic news recommendation. The reason for that is the user is unlikely to know what a trending news headline is about, and those recommendations often fail to get the user interested in news if they don’t want to hear the trending news in general.

New Returning
Entity (curr) 0.53 (+133%) 1516 0.49 (+97.3%) 221
Entity (past) N/A N/A 0.53 (+111%) 34
News Briefing 0.41 (+80.8%) 6939 0.36 (+44.7%) 1161
Trending News 0.24 (+6.9%) 7202 0.23 (-7.2%) 69
Generic 0.23 3865 0.25 52
Table 3. Recommendation acceptance rates for all formulations for new and returning users, with relative improvements over the generic news recommendation.

5.2. Post-Recommendation User Engagement

New Returning
Entity (curr) 1.85 (-43.5%) 807 1.70 (-48.7% 109
Entity (past) N/A N/A 1.55 (-53.0%) 18
News Briefing 1.74 (-46.7%) 2866 1.95 (-40.9%) 420
Trending News 1.98 (-39.37%) 1759 1 (-69.8%) 16
Generic 3.27 883 3.30 13
Table 4. Post-recommendation user engagement for all formulations for new and returning users, with relative improvements over the generic news recommendation.

As Table 4 shows, generic recommendations have the highest number of follow-on turns. A possible reason for that is that users who accept the generic news recommendation are the ones who are actually interested in the news as a whole and not just specific topics. Also, the generic recommendation is followed by the bot asking the user to pick a category or topic of interest, which leads the user to engage for at least one more turn.

Although trending news topics have low acceptance rates, they have higher post-recommendation engagement compared to entity-based recommendations. ( p value = ). That is because for entity-based recommendation, news component will return some recent news about the entity mentioned by the user, but the trending news is more likely to be interesting.

5.3. Post-Recommendation User Satisfaction

New Returning
Entity (curr) 0.90 (-7.4%) 807 0.88 (-11.0%) 109
Entity (past) N/A N/A 1.0 (0.0%) 18
News Briefing 0.90 (-8.0%) 2866 0.85 (-14.0%) 420
Trending News 0.94 (-3.54%) 1759 0.66 (-33.3%) 16
Generic 0.98 883 1.0 13
Table 5. Post-recommendation user satisfaction for all formulations for new and returning users, with relative improvements over the generic news recommendation.

The generic news recommendation causes increase in immediate satisfaction the most number of times, possibly because users who accept the suggestion have prior interest in news.

5.4. Post-Recommendation Interruption Rate

New Returning
Entity (curr) 0.03 (-51.4%) 807 0.02 (-41.2%) 109
Entity (past) N/A N/A 0.0 (-100%) 18
News Briefing 0.07 (+0.1%) 2866 0.06 (+59%) 420
Trending News 0.07 (-5.28%) 1759 0.0 (-100%) 16
Generic 0.07 883 0.04 13
Table 6. Post-recommendation interruption rate for all formulations for new and returning users, with relative improvements over the generic news recommendation.

As Table 6 shows, entity-based recommendation has the lowest post-recommendation interruption rate. Users very rarely end the conversation on being offered news on an entity that they expressed interest in during the current conversation. Although News Briefing recommendation has relatively high acceptance rate, it also has the most interruption. Users generally find news briefings useful. However, because the experiments were conducted with a ”socialbot” many users were not interested in a 45-second long news briefing.

6. Conclusions and future work

In order to engage a user about news, a conversational system needs to proactively suggest news to the user first. To the best of our knowledge, this was the first study comparing the effectiveness of different options of presenting news recommendation in a conversational setting. We proposed four quantitative ways of evaluating news recommendation, specifically designed for the conversational setting, and evaluated multiple presentation methods on thousands of conversations with real users. Our results showed that there is a significant difference in how users respond to different presentation styles, with the proposed metrics emphasizing different aspects of the system performance. In particular, high acceptance rate of news recommendations on entities from a user’s past conversation indicates that users retain interest in the topics they like across conversations and tend to want to hear updates on them. Trending news is more likely to be relevant to the user compared to news on an entity. But acceptance is higher for entity-based recommendations. That suggests that the representation affects acceptance in addition to the quality of recommendations. In future work, we plan to investigate how to better identify the news items to recommend based on the user’s conversational history and long term profile, and how to engage the user in a conversation about a news topic after the initial news item was accepted. Together, our results, evaluation metrics, and analysis provide valuable insights into improving conversational news recommendation, and more generally information-oriented conversational agents.

References

  • A. Ahmadvand, I. J. Choi, H. Sahijwani, J. Schmidt, M. Sun, S. Volokhin, Z. Wang, and E. Agichtein (2018) Emory irisbot: an open-domain conversational bot for personalized information access. Alexa Prize Proceedings. Cited by: §3.1, §4.2.
  • I. J. Choi, A. Ahmadvand, and E. Agichtein (2019) Offline and online satisfaction prediction in open-domain conversational systems. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Cited by: §2.3, §4.2, §4.3.
  • A. S. Das, M. Datar, A. Garg, and S. Rajaram (2007) Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th international conference on World Wide Web, pp. 271–280. Cited by: §2.1.
  • S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White (2005) Evaluating implicit measures to improve web search. ACM Transactions on Information Systems (TOIS) 23 (2), pp. 147–168. Cited by: §2.3.
  • B. Hancock, A. Bordes, P. Mazare, and J. Weston (2019) Learning from dialogue after deployment: feed yourself, chatbot!. arXiv preprint arXiv:1901.05415. Cited by: §2.3.
  • S. H. Hashemi, K. Williams, A. El Kholy, I. Zitouni, and P. A. Crook (2018) Measuring user satisfaction on smart speaker intelligent assistants using intent sensitive query embeddings. In Proc. of CIKM, pp. 1183–1192. Cited by: §2.3.
  • J. Kiseleva, K. Williams, A. Hassan Awadallah, A. C. Crook, I. Zitouni, and T. Anastasakos (2016) In Proc. of SIGIR, pp. 45–54. Cited by: §2.3.
  • J. Liu, P. Dolan, and E. R. Pedersen (2010) Personalized news recommendation based on click behavior. In Proceedings of the 15th international conference on Intelligent user interfaces, pp. 31–40. Cited by: §2.1.
  • Z. Lu, Z. Dou, J. Lian, X. Xie, and Q. Yang (2015) Content-based collaborative filtering for news topic recommendation. In

    Twenty-ninth AAAI conference on artificial intelligence

    ,
    Cited by: §2.1.
  • K. Yoshino and T. Kawahara (2015) News navigation system based on proactive dialogue strategy. In Natural Language Dialog Systems and Intelligent Assistants, pp. 15–25. Cited by: §2.2.