Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls

08/05/2022
by   Da Ju, et al.
6

The promise of interaction between intelligent conversational agents and humans is that models can learn from such feedback in order to improve. Unfortunately, such exchanges in the wild will not always involve human utterances that are benign or of high quality, and will include a mixture of engaged (helpers) and unengaged or even malicious users (trolls). In this work we study how to perform robust learning in such an environment. We introduce a benchmark evaluation, SafetyMix, which can evaluate methods that learn safe vs. toxic language in a variety of adversarial settings to test their robustness. We propose and analyze several mitigating learning algorithms that identify trolls either at the example or at the user level. Our main finding is that user-based methods, that take into account that troll users will exhibit adversarial behavior across multiple examples, work best in a variety of settings on our benchmark. We then test these methods in a further real-life setting of conversations collected during deployment, with similar results.

READ FULL TEXT
research
06/07/2023

Improving Open Language Models by Learning from Organic Interactions

We present BlenderBot 3x, an update on the conversational model BlenderB...
research
03/09/2023

Evaluating the Robustness of Conversational Recommender Systems by Adversarial Examples

Conversational recommender systems (CRSs) are improving rapidly, accordi...
research
08/10/2017

"Is there anything else I can help you with?": Challenges in Deploying an On-Demand Crowd-Powered Conversational Agent

Intelligent conversational assistants, such as Apple's Siri, Microsoft's...
research
03/03/2023

Learning to Influence Human Behavior with Offline Reinforcement Learning

In the real world, some of the most complex settings for learned agents ...
research
09/13/2021

Analysing Mixed Initiatives and Search Strategies during Conversational Search

Information seeking conversations between users and Conversational Searc...
research
08/05/2022

Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

Frozen models trained to mimic static datasets can never improve their p...

Please sign up or login with your details

Forgot password? Click here to reset