Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory

11/03/2020
by   Hannah Chen, et al.
8

Most NLP datasets are manually labeled, so suffer from inconsistent labeling or limited size. We propose methods for automatically improving datasets by viewing them as graphs with expected semantic properties. We construct a paraphrase graph from the provided sentence pair labels, and create an augmented dataset by directly inferring labels from the original sentence pairs using a transitivity property. We use structural balance theory to identify likely mislabelings in the graph, and flip their labels. We evaluate our methods on paraphrase models trained using these datasets starting from a pretrained BERT model, and find that the automatically-enhanced training sets result in more accurate models.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/13/2020

Contrastive Self-supervised Learning for Graph Classification

Graph classification is a widely studied problem and has broad applicati...
01/26/2021

Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks

Contextualized representations from a pre-trained language model are cen...
10/10/2021

What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study

The degree of semantic relatedness (or, closeness in meaning) of two uni...
11/24/2016

Automatically Building Face Datasets of New Domains from Weakly Labeled Data with Pretrained Models

Training data are critical in face recognition systems. However, labelin...
09/05/2021

Semi-Automated Labeling of Requirement Datasets for Relation Extraction

Creating datasets manually by human annotators is a laborious task that ...
07/05/2020

CORD19STS: COVID-19 Semantic Textual Similarity Dataset

In order to combat the COVID-19 pandemic, society can benefit from vario...
10/30/2020

Semantic Labeling Using a Deep Contextualized Language Model

Generating schema labels automatically for column values of data tables ...

Code Repositories

automatic-paraphrase-dataset-augmentation

Code and data for automatic paraphrase dataset augmentation.


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.