Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

11/22/2017
by   Bing Liu, et al.
0

Dialog response selection is an important step towards natural response generation in conversational agents. Existing work on neural conversational models mainly focuses on offline supervised learning using a large set of context-response pairs. In this paper, we focus on online learning of response selection in retrieval-based dialog systems. We propose a contextual multi-armed bandit model with a nonlinear reward function that uses distributed representation of text for online response selection. A bidirectional LSTM is used to produce the distributed representations of dialog context and responses, which serve as the input to a contextual bandit. In learning the bandit, we propose a customized Thompson sampling method that is applied to a polynomial feature space in approximating the reward. Experimental results on the Ubuntu Dialogue Corpus demonstrate significant performance gains of the proposed method over conventional linear contextual bandits. Moreover, we report encouraging response selection performance of the proposed neural bandit model using the Recall@k metric for a small set of online training samples.

READ FULL TEXT
research
05/10/2017

Context Attentive Bandits: Contextual Bandit with Restricted Context

We consider a novel formulation of the multi-armed bandit model, which w...
research
01/28/2022

Top-K Ranking Deep Contextual Bandits for Information Selection Systems

In today's technology environment, information is abundant, dynamic, and...
research
12/12/2016

Deep Active Learning for Dialogue Generation

We propose an online, end-to-end, neural generative conversational model...
research
12/19/2022

On the Complexity of Representation Learning in Contextual Linear Bandits

In contextual linear bandits, the reward function is assumed to be a lin...
research
03/23/2023

Adaptive Endpointing with Deep Contextual Multi-armed Bandits

Current endpointing (EP) solutions learn in a supervised framework, whic...
research
06/22/2019

A Bandit Approach to Posterior Dialog Orchestration Under a Budget

Building multi-domain AI agents is a challenging task and an open proble...
research
04/06/2022

Mix-and-Match: Scalable Dialog Response Retrieval using Gaussian Mixture Embeddings

Embedding-based approaches for dialog response retrieval embed the conte...

Please sign up or login with your details

Forgot password? Click here to reset