CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants

04/27/2023
by   Albert Yu Sun, et al.
0

A wave of new task-based virtual assistants has been fueled by increasingly powerful large language models, such as GPT-4. These conversational agents can be customized to serve customer-specific use cases, but ensuring that agent-generated text conforms to designer-specified rules included in prompt instructions alone is challenging. Therefore, chatbot designers often use another model, called a guardrail model, to verify that the agent output aligns with their rules and constraints. We explore using a distillation approach to guardrail models to monitor the output of the first model using training data from GPT-4. We find two crucial steps to our CONSCENDI process: scenario-augmented generation and contrastive training examples. When generating conversational data, we generate a set of rule-breaking scenarios, which enumerate a diverse set of high-level ways a rule can be violated. This scenario-guided approach produces a diverse training set of rule-violating conversations, and it provides chatbot designers greater control over the classification process. We also prompt GPT-4 to also generate contrastive examples by altering conversations with violations into acceptable conversations. This set of borderline, contrastive examples enables the distilled model to learn finer-grained distinctions between what is acceptable and what is not. We find that CONSCENDI results in guardrail models that improve over baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2022

CoNAL: Anticipating Outliers with Large Language Models

In many task settings, text classification models are likely to encounte...
research
12/20/2022

Contrastive Learning Reduces Hallucination in Conversations

Pre-trained language models (LMs) store knowledge in their parameters an...
research
05/29/2019

Exploiting Persona Information for Diverse Generation of Conversational Responses

In human conversations, due to their personalities in mind, people can e...
research
08/15/2023

Multimodal Dataset Distillation for Image-Text Retrieval

Dataset distillation methods offer the promise of reducing a large-scale...
research
06/15/2021

Generative Conversational Networks

Inspired by recent work in meta-learning and generative teaching network...
research
02/22/2017

Data Distillation for Controlling Specificity in Dialogue Generation

People speak at different levels of specificity in different situations....

Please sign up or login with your details

Forgot password? Click here to reset