DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents

03/30/2023
by   Varun Nair, et al.
0

Large language models (LLMs) have emerged as valuable tools for many natural language understanding tasks. In safety-critical applications such as healthcare, the utility of these models is governed by their ability to generate outputs that are factually accurate and complete. In this work, we present dialog-enabled resolving agents (DERA). DERA is a paradigm made possible by the increased conversational abilities of LLMs, namely GPT-4. It provides a simple, interpretable forum for models to communicate feedback and iteratively improve output. We frame our dialog as a discussion between two agent types - a Researcher, who processes information and identifies crucial problem components, and a Decider, who has the autonomy to integrate the Researcher's information and makes judgments on the final output. We test DERA against three clinically-focused tasks. For medical conversation summarization and care plan generation, DERA shows significant improvement over the base GPT-4 performance in both human expert preference evaluations and quantitative metrics. In a new finding, we also show that GPT-4's performance (70 (Jin et al. 2021, USMLE) is well above the passing level (60 showing similar performance. We release the open-ended MEDQA dataset at https://github.com/curai/curai-research/tree/main/DERA.

READ FULL TEXT

page 5

page 8

research
01/27/2023

Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation

Language models have steadily increased in size over the past few years....
research
07/24/2020

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Can we develop visually grounded dialog agents that can efficiently adap...
research
06/13/2023

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

The latest breakthroughs in large vision-language models, such as Bard a...
research
04/20/2016

Dialog-based Language Learning

A long-term goal of machine learning research is to build an intelligent...
research
11/17/2021

MEDCOD: A Medically-Accurate, Emotive, Diverse, and Controllable Dialog System

We present MEDCOD, a Medically-Accurate, Emotive, Diverse, and Controlla...
research
08/10/2018

Community Regularization of Visually-Grounded Dialog

The task of conducting visually grounded dialog involves learning goal-o...
research
05/10/2023

Bot or Human? Detecting ChatGPT Imposters with A Single Question

Large language models like ChatGPT have recently demonstrated impressive...

Please sign up or login with your details

Forgot password? Click here to reset