Summarizing Differences between Text Distributions with Natural Language

01/28/2022
by   Ruiqi Zhong, et al.
16

How do two distributions of texts differ? Humans are slow at answering this, since discovering patterns might require tediously reading through hundreds of samples. We propose to automatically summarize the differences by "learning a natural language hypothesis": given two distributions D_0 and D_1, we search for a description that is more often true for D_1, e.g., "is military-related." To tackle this problem, we fine-tune GPT-3 to propose descriptions with the prompt: "[samples of D_0] + [samples of D_1] + the difference between them is _____". We then re-rank the descriptions by checking how often they hold on a larger set of samples with a learned verifier. On a benchmark of 54 real-world binary classification tasks, while GPT-3 Curie (13B) only generates a description similar to human annotation 7 of the time, the performance reaches 61 our best system using GPT-3 Davinci (175B) reaches 76 describe distribution shifts, debug dataset shortcuts, summarize unknown tasks, and label text clusters, and present analyses based on automatically generated descriptions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2022

Improving Few-Shot Image Classification Using Machine- and User-Generated Natural Language Descriptions

Humans can obtain the knowledge of novel visual concepts from language d...
research
04/10/2016

TGIF: A New Dataset and Benchmark on Animated GIF Description

With the recent popularity of animated GIFs on social media, there is ne...
research
11/29/2018

Automatic Rendering of Building Floor Plan Images from Textual Descriptions in English

Human beings understand natural language description and could able to i...
research
05/31/2015

Visual Madlibs: Fill in the blank Image Generation and Question Answering

In this paper, we introduce a new dataset consisting of 360,001 focused ...
research
01/12/2021

CityFlow-NL: Tracking and Retrieval of Vehicles at City Scale by Natural Language Descriptions

Natural Language (NL) descriptions can be the most convenient or the onl...
research
01/26/2022

Natural Language Descriptions of Deep Visual Features

Some neurons in deep networks specialize in recognizing highly specific ...
research
10/05/2021

Truth-Conditional Captioning of Time Series Data

In this paper, we explore the task of automatically generating natural l...

Please sign up or login with your details

Forgot password? Click here to reset