The Alexa Prize funded 12 international teams to compete to create a conversational agent that can discuss any topic for at least 20 minutes. UCSC’s Slugbot was one of these funded teams.111The competition is still ongoing and its conditions prohibit us from reporting detailed information about the evaluation results or Slugbot’s system architecture. The greatest challenges with the competition arise directly from the potential for ongoing mixed-initiative multi-turn dialogues, which do not follow a particular plan or pursue a particular fixed information need. This paper describes some of the lessons we learned building SlugBot for the 2017 Alexa Prize, particularly focusing on the challenges of integrating content found via search with content from structured data in order to carry on an ongoing, coherent, open-domain, mixed-initiative conversation. SlugBot’s conversations over the semi-finals user evaluation averaged 8:17 minutes.
Unlike much previous work on conversational AI, SlugBot could not and did not assume that the user had an “information need” (kiseleva2016predicting, ; chuklin2015click, ; Radlinski17, ). Rather, the design of the Alexa Prize was aimed at open conversations that could engage the user, through any type of dialogue or chitchat, discussing films and books, gossiping about celebrities, playing verbal games, telling stories or sharing experiences, or any other of many different types of activities that conversation is often used for.
This open design foregrounds many longstanding challenges that have not been solved even for task-oriented dialogue systems. These include:
Modeling discourse coherence;
Supporting mixed-initiative dialogue;
Generating contextualized and stylistically appropriate natural language responses.
This paper is structured around the “lessons learned” with respect to these challenges from our experience building SlugBot. To be clear, we are not offering a solution to these problems: instead our intention is simply to highlight the difficulties with developing adequate computational models of these phenomena that particularly arise in the context of open-domain conversations, where users cannot be assumed to be pursuing a particular task or information need. We will attempt to motivate our hypothesis that a comprehensive solution to these challenges for open-domain dialogue requires a much deeper understanding and utilization of the semantic relations that underly dialogue coherence.
|S1||System||What do you want to talk about?||Opening prompt, expects to map an answer to a known topic.|
|U2||User||Let’s talk about movies.||Response to open prompt, topic ”movies”|
|S3||System||I love movies! Which movies have you seen recently?||Take initiative to provide an opinion, then ask followup question. SlugBot could have suggested talking about a particular movie, or asked if the user had scene a particular movie.|
|U4||User||I watched Jason Bourne recently.||
Names a particular film, named entity recognition must map “Jason Bourne” to a movie entity.
|S5||System||Oh yes, according to ratings, Jason Bourne is a pretty good movie.||SlugBot takes initiative, provide ratings information, not necessarily unique or what is asked for, conversational marker ”Oh yes”. Information provided from structured data from IMBD.|
|U6||User||I can’t remember the actor’s name in the movie, who stars in it?||Somewhat indirect question, conversational formulation ”can’t remember”, requires coreference resolution of ”the movie” and ”it” to ”the movie named Jason Bourne”|
|S7||System||It stars Matt Damon.||SlugBot provides a direct answer, recognizes that to maintain discourse coherence it should use the pronoun ”It”.|
|U8||User||Have you heard much about it in terms of the plot?||Coreference resolution of ”it”, recognition that ”the plot” is a related entity to a movie, and a valid conversational move. Need to map ”the plot” to a known relation or search query.|
|S9||System||The CIA’s most dangerous former operative is drawn out of hiding to uncover more explosive truths about his past.||System selects first sentence of plot synopsis that is returned by search. Needs to know that the first sentence is usually good by itself.|
|U10||User||Sounds about right. What other movies has Matt Damon been in?||User requests topic chaining by actor to other movie entities. System must know that actor can be a key to find other movies that star that actor.|
For example, consider dialogue focused on content related to the movie domain. This should be one of the easiest domains because it is well-structured, and there are existing systems handling conversations where there is a specified user information need or task, such as finding films with particular properties, finding out what is playing and where, or booking a movie ticket (kim2016natural, ; hakkani2014probabilistic, ; celikyilmaz2015enriching, ). Moreover, the Internet Movie Database (IMDB) (IMDb, ) provides information on plot, rating, and actors that can be leveraged to support conversations. IMDB also makes use of the Schema.org (Schema, ) structure to connect common entities to their related attribute types (such as Actor Person birthDate), allowing the system to retrieve a large set of possible next topics and related facts and entities.
However, remember that SlugBot is based on the assumption that the user might simply enjoy talking about films and related entities and therefore may freely move the conversational focus among different movie entities, along with the vast array of semantically-associated movie attributes: movies have actors, genres, plots, and awards; actors have names, affiliations, other movies they were in, awards, etc. Actors are people, who have spouses, families and friends, and engage in other life activities besides acting, such as political advocacy.
A potential dialogue is shown in Table 1.222An example of a recently developed MovieBot that tries to support free-ranging conversations can be found at (https://www.amazon.com/dp/B01MRKGF5W) The interaction might appear to be simple enough: the user chooses to discuss movies, and selects Jason Bourne as the specific movie she is interested in, the system finds the movie in IMDB, and then provides information on its rating, lead actor, and plot. The user then changes the topic to other movies with the same actor, and the conversation continues.
Even with the availability of IMDB, however, the interaction is not totally straightforward. The RHS of Table 1 describes some of the required competencies and decisions SlugBot must make. First, Slugbot must be able to perform coreference resolution and recognize that the movie and it
in turns U6 and U8 are coreferential. We estimate the accuracy of noun-phrase coreference resolution to only be about 70% for off-the-shelf tools applied to dialogue, since most of them are targeted to text(Grosz77, ; Walker96b, ; BFP87, ; recasens2010coreference, ; PrinceWalker93, ; pradhan2011conll, ; recasens2013life, ; Manningetal14, ).
More challenging is that at each system turn, there are a large number of conversational moves that are possible. Making good decisions about what to say next requires balancing a dialogue policy as to what dialogue acts might be good in this context, with real-time information as to what types of content might be possible to use in this context. Slugbot could offer an opinion as in turn S3, ask a follow-on question as in S3, take the initiative to provide unasked for information, as in S5, or decide, e.g. in the case of the user’s request for plot information, to use search to retrieve some relevant content. Search cannot be used effectively here without constructing an appropriate query, or knowing in advance where plot information might be available. In a real-time system, live search may not be able to achieve the required speed and efficiency, so preprocessing or caching of relevant information may be necessary. Finally, most other domains do not have such high quality structured data available, leaving us to develop or try to rely on more general models of discourse coherence.
2. Modeling Discourse Coherence
In open-domain conversation, dialogue coherence between related turns must be maintained. What underlies dialogue coherence goes beyond simple word overlap or similarity, and its clear that neural models of open-domain conversational dialogue do not yet capture it. Theories of discourse posit that there are a small number of semantic relations that can hold between adjacent turns: at the most general level these are contingency, comparison, expansion, and temporal order (Prasadetal08, ; Hobbs83, ; Hobbs90, ).333Some theories rely on the plan-based coherence of discourse (GS86, ; Litman85, ), but we believe that this can only be easily applied to task-oriented dialogue. We posit that one way to allow SlugBot to take the initiative and produce a turn that maintains discourse coherence is to find content to use in Slugbot’s next turn that instantiates a valid semantic relation between the current user turn and SlugBot’s next turn. One of the strongest bases for such semantic relations are the relations captured by ontologies or frames, which give us related entities, e.g. movies have actors and directors (hakkani2014probabilistic, ; heck2013leveraging, ). These types of relations can be used to instantiate the expansion relation, which basically captures moving to strongly related subtopics, often by chaining off a particular discourse entity. To find content to instantiate the expansion relation to use in Slugbot’s next turn (taking the initiative), we carry out the following pipeline:
Perform coreference resolution on the user’s turn
Identify relevant entities in the user query, i.e. a movie name or sports team, by parsing and checking entities using Google Knowledge Graph.
Look up information on the entity to assign relevant attributes, using tools like Schema.org, YAGO, or DBPedia (for example, a Sports Team has players).
Retrieve relevant information about the entity. In the easiest scenario, we might have a structured source of information, like IMBD. In other cases, we need to use search. This often then requires further processing, such as parsing unstructured information on Wikipedia or in search results and attempting to extract relevant content.
In the case of movies, the structure of IMDB, as discussed above, allows us to link between related entities and attributes using various DB keys. However other conversational domains do not have freely available richly structured information such as this. It is rare for a single resource to aggregate all the information that might be useful, so SlugBot must be able to leverage information and integrate information from multiple sources. But state-of-the-art knowledge bases and ontologies are still limited. Table 2 lists some of the resources that we have found to be most useful for search and structured information.
|1||Wikipedia (Wikipedia, )||Multi-lingual, web-based, free-content encyclopedia.|
|2||YAGO (Rebele2016, )||Semantic knowledge base from Wikipedia (Wikipedia, ), WordNet (WordNet, ), and GeoNames (GeoNames, ), with over 10 million entities and over 120 million facts about them.|
|3||DBPedia (DBPedia, )||Crowd-sourced semantic knowledge graph using Wikipedia data (around 4.22 million entities in the ontology).|
|4||Google Knowledge Graph (GoogleKnowledgeGraph, )||API for finding entity information include type and details, with some relevance score.|
Like movies, sports is another domain that has rich structure, and in which there is broad user interest. Search results for a query about ”Madison Bumgarner” are in Figure 1, showcasing a sample of the different information retrievable from each source (Step 2 of the pipeline).
From the Google Knowledge Graph (Figure 0(a) result we are able to ascertain the entity type, a brief description, and a relevant Wikipedia page (Figure 0(b)) which we can use to find accurate structured information. We may further augment our knowledge by using the information returned by the Google Knowledge Graph as parameters to our YAGO or DBpedia query which can more easily extract specific relationships between an entity-attribute. For example, the results returned by YAGO for the ”Madison Bumgarner” query contains a connection to the headline Struggling MadBum might not garner next start, which is contextually relevant data not encapsulated anywhere in the previously examined results.
There, however, exists a disconnect between the resources, i.e. some entities are available in one resource and not another, or there may be inconsistent information across resources. While it would be nice not to have to anticipate the types of integration that are needed, our take-away from this, is that at present, it appears we have to accomplish the steps in our pipeline by integrating knowledge from different resources in advance, even though projects such as YAGO have already been working on such integration for at least ten years.
Other discourse coherence relations besides expansion are also viable candidates for selecting content for next turns, but finding content that instantiates these relations can be a challenging problem in itself. For example, in casual conversation, it is common to provide opinions and then perhaps further take the initiative and justify them. The justification of an opinion is a type of contingency relation: we describe how we curate content to provide justifications in Section 3.
We have also been able to use the temporal relation in a limited way by drawing on narratively structured sources, such as personal stories in blogs. Since these stories are told in temporal order, we can repurpose the content of these blogs to tell stories, maintaining pre-existing narrative coherence when the system produces a sequence of turns (Bowdenetal16, ). However, we posit that there is much more that could be done to make better use of deep semantic discourse relations for recognizing discourse relations and generating coherent conversational turns.
3. Mixed Initiative Dialogue
Mixed Initiative dialogue is key to a natural conversational interaction (HallerMcroy98, ; Chu-Carroll98, ; WW90, ; Allen99, ; Bohus03, ; Radlinski17, ), and this is even more important for open domain dialogue than it is for task-oriented or information seeking dialogue. One of our primary hypotheses, as described above, is that good models of discourse coherence will help SlugBot identify content that can be used to take the initiative. However, models of discourse coherence have been rarely applied to conversation (stent2000rhetorical, ; riccardi2016discourse, ; tonelli2010annotation, ) and thus there is considerable work to be done simply in understanding how these relations can be instantiated in dialogue.
In addition, a further challenge arises from the fact that both system and user options for dialogue acts are extremely varied at each turn, e.g. user intents can be to provide opinions, give or solicit information, contrast two possibilities, request the system to perform an action, and more. One reasonable taxonomy for the types of dialogue acts that might be available to SlugBot could be based for example on the dialogue act annotations in the Switchboard corpus (SWBD-DAMSL, ).
|Magneto||comics||5||he can control metal|
|Aliens||movie||4||well cast,action packed|
|1||System||What do you want to talk about?|
|2||User||Let’s talk about comic books.|
|3||System||I’m a pretty avid comic book reader! Who is your favorite character?|
|4||User||I really like Moon Knight|
|5||System||Moon Knight is a fictional superhero appearing in American comic books published by Marvel Comics. Created by Doug Moench and Don Perlin, the character first appeared in Werewolf by Night issue 32. Does it sound like we are thinking of the same character? Why do you like them?|
|6||User||Yep that’s the one, I like him because he’s edgy. Who is your favorite character?|
|7||System||I like Magneto.|
|9||System||Personally, I think he’s cool because he can control metal.|
|10||User||I guess that is a cool power, when did he first appear?|
|11||System||I think that Magneto first appeared in X-men issue 1 in 1963.|
Here, we consider a simple case combining discourse relations and dialogue acts that we have implemented in Slugbot in order to take the initiative in a way that we hoped the user would find interesting. Our aim was to utilize the contingency discourse relation to connect a statement of opinion and its justification. We designed a template containing both arguments of the contingency relation, namely I think is because . We construct a table of argument pairs that can instantiate this relation, as shown in Table 3. This table can be populated by crowd-sourcing or by using search as a pre-processing step.
Table 4 illustrates how this is used in our conversations about comics. At Line 6, when the user asks Who is your favorite character?, it is most appropriate to provide an opinion. It is difficult to imagine retrieving search-based data which contains a contextually relevant opinion, but it is even more difficult to imagine that if search had returned such an opinion, that search could be used a second time in order to retrieve a justification for the provided opinion and answer the user’s follow-up question in Line 8, Okay why?. The source text for the search would have to be annotated for the type of content that could be used to provide justifications, and search would have to support these types of semantic relations.
4. Natural Language Generation
The current challenges for natural language generation, in our view, arise from the need to combine information from structured and unstructured sources when producing conversational utterances. SlugBot currently uses a combination of pre-written templates, sentence selection, and techniques for telling stories that are based on converting monologic stories to dialogic sequences (Bowdenetal16, ).
Structured data, when available, can do more than structure a search result: it can also be easier to use within a conversation because it provides the necessary structure needed for high precision natural language generation (Walkeretal07, ; StentMolina09, ). More precisely, a small set of generic templates with various slots can be filled with information from structured data sources to insure high quality, accurate responses. These generic templates can be hand crafted, or prepared in advance by learning natural language generation templates automatically from appropriate conversational domain sources such as different types of user-generated content (Higashinakaetal07, ; OrabyWalker17, ), as illustrated in our justification initiatives above in Section 3.
For general fact-based questions, on the other hand, search content can be used directly. For example, at line 14 in Table 5 when the user asks What was the first movie to feature a vampire?, search provides us with a good response. This introduces however the challenge of updating the discourse context with the right representation of the two movies under discussion, so that they can then be available for follow-on coreference. This is an open problem.
|1||System||You know a secret passion of mine is talking about famous monsters from monster movies. Want to talk about that?|
|3||System||Great, do you have a specific type of monster movie that you like?|
|4||User||I guess, I mean I like movies about aliens.|
|5||System||Did you have a specific alien movie in mind?|
|6||User||I just saw Aliens the other day. Can you tell me about it.|
|7||System||Aliens is a 1986 American science-fiction action horror film written and directed by James Cameron and starring Sigourney Weaver It is the sequel to the 1979 film Alien and the second installment in the Alien franchise.|
|8||User||What did you think of it?|
|9||System||I though Aliens was well cast and is action packed.|
|11||System||Do you want to hear about my favorite monsters?|
|13||System||I like Dracula from the Dracula movies. Have you heard of him?|
|14||User||What was the first movie to feature a vampire?|
|15||System||Nosferatu is the first film to feature a Vampire’s death by sunlight, which formerly only weakened vampires. The next classic treatment of the vampire legend was in Universal’s Dracula starring Bela Lugosi as Count Dracula.|
|1||User||What do you know about the Hitchhiker’s Guide to the Galaxy?|
|2||System||The Hitchhiker’s Guide to the Galaxy is a science fiction book from 1981.|
|1||User||What do you know about the Hitchhiker’s Guide to the Galaxy?|
|2||System||The Hitchhiker’s Guide to the Galaxy is the first of five books in the Hitchhiker’s Guide to the Galaxy comedy science fiction ”trilogy” by Douglas Adams.|
It is clear that in order to use a semi-structured approach, we need to determine when to utilize each source. Structured data can be easier to formulate into system responses and can often more easily handle on-topic follow-up questions, but is more limited in scope. An obvious approach, also used in the Watson Jeopardy system (ferrucci2010building, ), is to pool responses from both sources and rank them. We have not, to date, collected enough data to build a ranker.
Our plan is to apply a combination of reinforcement learning and learning of ranking functions for utterance variants in a particular context to SlugBot conversations as we move forward with our own data collection, outside of the Alexa Prize competition(stanfordCoherence16, ; yu16ticktockRL, ; Walker00, ; WKL00, ; WBK99, )
. The first step however is to use the Alexa Prize competition data to learn a Paradise-Open-Domain evaluation function, with additional metrics relevant to open-domain dialogue, e.g. independent variable metrics that predict overall dialogue quality such as response delay, vocabulary diversity, dialogue act sequence n-grams(WalkerPassonneau01, ), conversational depth, number of reprompts (WLHWG02, ), and other measures that can be automatically logged. Many of the required measures have been used over the last 20 years in Paradise to evaluate task-oriented dialogue systems and they remain highly relevant to overall dialogue quality in open-domain dialogue systems (Walkeretal02b, ; WPB01, ; WLKA97, ). We predict this can potentially improve the overall performance of the system as demonstrated in Table 6. Here, the structured data is sparse, resulting in an uninteresting response, while search returns a very robust answer. Our Paradise-Open-Domain evaluation function would need to learn to place priority on the result returned by search, through ranking, despite having structured data.
For open domain NLG, we have also conducted experiments with neural sequence to sequence approaches using open domain corpora such as film dialogue, Big Bang theory scripts, and open subtitles. These approaches to date do not produce interesting utterances that maintain discourse coherence. It is possible that further curation and semantic annotation of these resources, e.g. by labelling semantic roles and identifying dialogue acts and discourse relations might be helpful, but this could also introduce data sparsity. For example in Switchboard the dialogue act distribution is highly skewed. Integrating information across multiple sources could also be further explored(Bowdenetal16, ). Recent work on hybrid neural generation approaches that use knowledge of sentence and discourse planning structures also seem promising (Nayaketal17, ; Walker00, ; Stentetal02, ).
In this paper, we describe some of the challenges we encountered building SlugBot, an open domain conversational agent funded by the Amazon Alexa Prize. We have introduced more problems than we have solved, and we have attempted to support our hypothesis that we need richer models of discourse coherence and discourse semantics to allow a conversational agent to take the initiative in open domain conversations. We illustrated how search and structured information can be combined in order for SlugBot to find content to use to take the initiative and respond to the user’s utterances. We propose a hybrid approach for language generation that which combines templates to generate responses with sentence selection from search, and we show examples in different domains to demonstrate real-world use cases that make use of our approach. For future work, we plan to bring together resources that provide structured data from different sources into a single, accessible framework, to supply personal assistants with scalable knowledge bases that will power more natural, mixed initiative, and engaging conversations. We believe that it will be possible in the next few years to build conversational agents that can carry on a conversation for 20 minutes about many different topics.
-  Mixed-initiative interaction. IEEE Intelligent Systems, 14(5):14–23, September 1999.
-  Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. Dbpedia: A nucleus for a web of open data. In Proceedings of the 6th International The Semantic Web and 2Nd Asian Conference on Asian Semantic Web Conference, ISWC’07/ASWC’07, pages 722–735, Berlin, Heidelberg, 2007. Springer-Verlag.
-  Dan Bohus and Alexander I. Rudnicky. Ravenclaw: dialog management using hierarchical task decomposition and an expectation agenda. In INTERSPEECH. ISCA, 2003.
-  Kevin K. Bowden, Grace I. Lin, Lena I. Reed, and Marilyn A. Walker. M2d: Monolog to dialog generation for conversational story telling. In International Conference on Interactive Digital Storytelling, ICIDS’16, 2016.
-  Susan E. Brennan, Marilyn Walker Friedman, and Carl J. Pollard. A centering approach to pronouns. In Proc. 25th Annual Meeting of the ACL, Stanford, pages 155–162, 1987.
-  Asli Celikyilmaz, Dilek Hakkani-Tur, Panupong Pasupat, and Ruhi Sarikaya. Enriching word embeddings using knowledge graph for semantic tagging in conversational dialog systems. In AAAI Spring Symposium: Workshop on Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches, 2015.
-  Jennifer Chu-Carroll and Michael K. Brown. An evidential model for tracking initiative in collaborative dialogue interactions. User Modeling and User-Adapted Interaction, 8(3-4):215–254, February 1998.
-  Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. Click models for web search. Synthesis Lectures on Information Concepts, Retrieval, and Services, 7(3):1–115, 2015.
-  Mihail Eric and Christopher D Manning. Key-value retrieval networks for task-oriented dialogue. In SIGDIAL 2017: Session on Natural Language Generation for Dialog Systems, 2017.
-  David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya A Kalyanpur, Adam Lally, J William Murdock, Eric Nyberg, John Prager, et al. Building watson: An overview of the deepqa project. AI magazine, 31(3):59–79, 2010.
-  Google. Google Knowledge Graph. https://www.google.com/intl/bn/insidesearch/features/search/knowledge.html.
-  Barbara J. Grosz. The representation and use of focus in dialogue understanding. Technical Report 151, SRI International, 333 Ravenswood Ave, Menlo Park, Ca. 94025, 1977.
-  Barbara J. Grosz and Candace L. Sidner. Attention, intentions and the structure of discourse. Computational Linguistics, 12:175–204, 1986.
-  Dilek Hakkani-Tür, Asli Celikyilmaz, Larry Heck, Gokhan Tur, and Geoff Zweig. Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding. 2014.
-  Susan Haller and Susan McRoy. Special issue: Computational models of mixed-initiative interaction (part 1). user modeling and user-adapted interaction. 8, 1998.
-  Larry Heck, Dilek Hakkani-Tür, and Gokhan Tur. Leveraging knowledge graphs for web-scale unsupervised semantic parsing. 2013.
Ryuichiro Higashinaka, Marilyn A. Walker, and Rashmi Prasad.
An unsupervised method for learning generation lexicons for spoken dialogue systems by mining user reviews.ACM Transactions on Speech and Language Processing, 4(4), 2007.
-  Jerry R. Hobbs. Topic drift. In Bruce Dorval, editor, Conversational Coherence and Its Development. Ablex, Norwood, N.J., 1990.
-  J.R. Hobbs. Towards an understanding of coherence in discourse. In M. Brady and R.C. Berwick, editors, Computational Models of Discourse, pages 223–243. The MIT Press, Cambridge, Mass., 1983.
-  IMDb. IMDb. https://www.imdb.com/.
-  D. Jurafsky, E. Shriberg, and D. Biasca. Swbd-damsl labeling project coder’s manual. Technical report, University of Colorado, 1997. available as http://stripe.colorado.edu/ jurafsky/manual.august1.html.
-  Young-Bum Kim, Alexandre Rochette, and Ruhi Sarikaya. Natural language model re-usability for scaling to different domains. In EMNLP, pages 2071–2076, 2016.
-  Julia Kiseleva, Kyle Williams, Ahmed Hassan Awadallah, Aidan C Crook, Imed Zitouni, and Tasos Anastasakos. Predicting user satisfaction with intelligent assistants. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 45–54. ACM, 2016.
-  Sebastian Krause, Mikhail Kozhevnikov, Eric Malmi, and Daniele Pighin. Redundancy localization for the conversationalization of unstructured responses. In SIGDIAL 2017: Session on Natural Language Generation for Dialog Systems, 2017.
-  J. Li and D. Jurafsky. Neural Net Models for Open-Domain Discourse Coherence. ArXiv e-prints, June 2016.
-  Diane Litman. Plan recognition and discourse analysis: An integrated approach for understanding dialogues. Technical Report 170, University of Rochester, 1985.
-  Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In ACL (System Demonstrations), pages 55–60, 2014.
-  Neha Nayak, Dilek Hakkani-Tur, Marilyn Walker, and Larry Heck. Discourse planning in slot-value informed sequence to sequence models for language generation. In Proc. of Interspeech 2017, 2017.
-  Shereen Oraby and Marilyn Walker. Harvesting creative templates for stylistic variation of restaurant reviews. In EMNLP 2017 Workshop on Stylistic Variation, 2017.
-  Sameer Pradhan, Lance Ramshaw, Mitchell Marcus, Martha Palmer, Ralph Weischedel, and Nianwen Xue. Conll-2011 shared task: Modeling unrestricted coreference in ontonotes. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, pages 1–27. Association for Computational Linguistics, 2011.
-  R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo, A. Joshi, and B. Webber. The penn discourse treebank 2.0. In Proc. of the 6th International Conference on Language Resources and Evaluation (LREC 2008), pages 2961–2968, 2008.
-  Ellen F. Prince and Marilyn A. Walker. A bilateral approach to givenness: a hearer-status algorithm and a centering algorithm. In Proc. of the 4th International Pragmatics Conference. Benjamins, 1993.
-  Filip Radlinski and Nick Craswell. A theoretical framework for conversational search. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval, CHIIR ’17, pages 117–126, New York, NY, USA, 2017. ACM.
-  Thomas Rebele, Fabian Suchanek, Johannes Hoffart, Joanna Biega, Erdal Kuzey, and Gerhard Weikum. YAGO: A Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames, pages 177–185. Springer International Publishing, Cham, 2016.
-  Marta Recasens. Coreference: Theory, Annotation, Resolution and Evaluation. PhD thesis, Universitat de Barcelona, 2010.
-  Marta Recasens, Marie-Catherine de Marneffe, and Christopher Potts. The life and death of discourse entities: Identifying singleton mentions. In HLT-NAACL, pages 627–633, 2013.
-  Giuseppe Riccardi, Evgeny A Stepanov, and Shammur Absar Chowdhury. Discourse connective detection in spoken conversations. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pages 6095–6099. IEEE, 2016.
-  Jagan Sankaranarayanan, Hanan Samet, Benjamin E. Teitler, Michael D. Lieberman, and Jon Sperling. Twitterstand: News in tweets. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS ’09, pages 42–51, New York, NY, USA, 2009. ACM.
-  Schema.org. Schema.org. https://www.schema.org/.
-  Amanda Stent. Rhetorical structure in dialog. In Proceedings of the first international conference on Natural language generation-Volume 14, pages 247–252. Association for Computational Linguistics, 2000.
-  Amanda Stent and Martin Molina. Evaluating automatic extraction of rules for sentence plan construction. In Proc. of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 290–297, 2009.
-  Amanda Stent, Marilyn Walker, Steve Whittaker, and Preetam Maloor. User-tailored generation for spoken dialogue: An experiment. In ICSLP, 2002.
-  Sara Tonelli, Giuseppe Riccardi, Rashmi Prasad, and Aravind K Joshi. Annotation of discourse relations for conversational spoken dialogs. In LREC, 2010.
-  Princeton University. About Wordnet. http://wordnet.princeton.edu, 2010.
-  M. Walker and R. Passonneau. DATE: A dialogue act tagging scheme for evaluation. In Human Language Technology Conference, 2001.
-  M. Walker, A. Rudnicky, R. Prasad, J. Aberdeen, E. Bratt, J. Garofolo, H. Hastie, A. Le, B. Pellom, A. Potamianos, R. Passonneau, S. Roukos, G. Sanders, S. Seneff, and D. Stallard. DARPA communicator: Cross-system results for the 2001 evaluation. In ICSLP 2002, 2002.
M. A. Walker, I. Langkilde-Geary, H. Wright Hastie, J. Wright, and A. Gorin.
Automatically Training a Problematic Dialogue Predictor for a Spoken
Journal of Artificial Intelligence Research, 16:293–319, 2002.
-  Marilyn Walker, Julie Boland, and Candace Kamm. The utility of elapsed time as a usability metric for spoken dialogue systems. In Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, ASRUU99, 1999.
-  Marilyn A. Walker. Limited attention and discourse structure. Computational Linguistics, 22-2:255–264, 1996.
-  Marilyn A. Walker. An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email. Journal of Artificial Intelligence Research, 12:387–416, 2000.
-  Marilyn A. Walker, Candace A. Kamm, and Diane J. Litman. Towards developing general models of usability with PARADISE. Natural Language Engineering: Special Issue on Best Practice in Spoken Dialogue Systems, 2000.
-  Marilyn A. Walker, Diane Litman, Candace Kamm, and Alicia Abella. Evaluating interactive dialogue systems: Extending component evaluation to integrated system evaluation. In Proc. of the ACL/EACL Workshop on Interactive Spoken Dialogue Systems, 1997.
-  Marilyn A. Walker, Rebecca Passonneau, and Julie E. Boland. Quantitative and qualitative evaluation of DARPA communicator spoken dialogue systems. In Proc. of the Meeting of the Association for Computational Lingustics, ACL 2001, 2001.
-  Marilyn A. Walker, Amanda Stent, François Mairesse, and Rashmi Prasad. Individual and domain adaptation in sentence planning for dialogue. Journal of Artificial Intelligence Research (JAIR), 30:413–456, 2007.
-  Marilyn A. Walker and Steve Whittaker. Mixed initiative in dialogue: An investigation into discourse segmentation. In Proc. 28th Annual Meeting of the ACL, pages 70–79, 1990.
-  Wikipedia. Wikipedia. https://www.wikipedia.org/.
-  Zhou Yu, Ziyu Xu, Alan W. Black, and Alexander I. Rudnicky. Strategy and policy learning for non-task-oriented conversational systems. In 16th Annual Meeting of SIGDIAL, pages 404–412. The Association for Computer Linguistics, 2016.