Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory

11/03/2020
by   Hannah Chen, et al.
8

Most NLP datasets are manually labeled, so suffer from inconsistent labeling or limited size. We propose methods for automatically improving datasets by viewing them as graphs with expected semantic properties. We construct a paraphrase graph from the provided sentence pair labels, and create an augmented dataset by directly inferring labels from the original sentence pairs using a transitivity property. We use structural balance theory to identify likely mislabelings in the graph, and flip their labels. We evaluate our methods on paraphrase models trained using these datasets starting from a pretrained BERT model, and find that the automatically-enhanced training sets result in more accurate models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2020

Contrastive Self-supervised Learning for Graph Classification

Graph classification is a widely studied problem and has broad applicati...
research
03/06/2023

On computing sets of integers with maximum number of pairs summing to powers of 2

We address the problem of finding sets of integers of a given size with ...
research
01/26/2021

Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks

Contextualized representations from a pre-trained language model are cen...
research
07/26/2022

Training Effective Neural Sentence Encoders from Automatically Mined Paraphrases

Sentence embeddings are commonly used in text clustering and semantic re...
research
09/20/2023

Construction of Paired Knowledge Graph-Text Datasets Informed by Cyclic Evaluation

Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be...
research
03/29/2023

Larger Probes Tell a Different Story: Extending Psycholinguistic Datasets Via In-Context Learning

Language model probing is often used to test specific capabilities of th...

Please sign up or login with your details

Forgot password? Click here to reset