Goal Driven Discovery of Distributional Differences via Language Descriptions

02/28/2023
by   Ruiqi Zhong, et al.
0

Mining large corpora can generate useful discoveries but is time-consuming for humans. We formulate a new task, D5, that automatically discovers differences between two large corpora in a goal-driven way. The task input is a problem comprising a research goal "comparing the side effects of drug A and drug B" and a corpus pair (two large collections of patients' self-reported reactions after taking each drug). The output is a language description (discovery) of how these corpora differ (patients taking drug A "mention feelings of paranoia" more often). We build a D5 system, and to quantitatively measure its performance, we 1) contribute a meta-dataset, OpenD5, aggregating 675 open-ended problems ranging across business, social sciences, humanities, machine learning, and health, and 2) propose a set of unified evaluation metrics: validity, relevance, novelty, and significance. With the dataset and the unified metrics, we confirm that language models can use the goals to propose more relevant, novel, and significant candidate discoveries. Finally, our system produces discoveries previously unknown to the authors on a wide range of applications in OpenD5, including temporal and demographic differences in discussion topics, political stances and stereotypes in speech, insights in commercial reviews, and error patterns in NLP models.

READ FULL TEXT
research
06/05/2020

Experimental Models of Drug Metabolism and Distribution in Drug Design and Development

Drug discovery and development involve the utilization of in vitro and i...
research
10/22/2022

PHEE: A Dataset for Pharmacovigilance Event Extraction from Text

The primary goal of drug safety researchers and regulators is to promptl...
research
09/15/2021

SWEAT: Scoring Polarization of Topics across Different Corpora

Understanding differences of viewpoints across corpora is a fundamental ...
research
11/04/2021

A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Synergy, and Drug-Drug Interaction Prediction

In recent years, numerous machine learning models which attempt to solve...
research
11/15/2022

Using Open-Ended Stressor Responses to Predict Depressive Symptoms across Demographics

Stressors are related to depression, but this relationship is complex. W...
research
09/06/2023

Large Language Models for Automated Open-domain Scientific Hypotheses Discovery

Hypothetical induction is recognized as the main reasoning type when sci...
research
03/21/2023

Large Language Models Can Be Used to Scale the Ideologies of Politicians in a Zero-Shot Learning Setting

The aggregation of knowledge embedded in large language models (LLMs) ho...

Please sign up or login with your details

Forgot password? Click here to reset