ConcreteGraph: A Data Augmentation Method Leveraging the Properties of Concept Relatedness Estimation

06/25/2022
by   Yueen Ma, et al.
0

The concept relatedness estimation (CRE) task is to determine whether two given concepts are related. Although existing methods for the semantic textual similarity (STS) task can be easily adapted to this task, the CRE task has some unique properties that can be leveraged to augment the datasets for addressing its data scarcity problem. In this paper, we construct a graph named ConcreteGraph (Concept relatedness estimation Graph) to take advantage of the CRE properties. For the sampled new concept pairs from the ConcreteGraph, we add an additional step of filtering out the new concept pairs with low quality based on simple yet effective quality thresholding. We apply the ConcreteGraph data augmentation on three Transformer-based models to show its efficacy. Detailed ablation study for quality thresholding further shows that even a limited amount of high-quality data is more beneficial than a large quantity of unthresholded data. This paper is the first one to work on the WORD dataset and the proposed ConcreteGraph can boost the accuracy of the Transformers by more than 2 the current state-of-theart method, Concept Interaction Graph (CIG), on the CNSE and CNSS datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

An Analysis of Simple Data Augmentation for Named Entity Recognition

Simple yet effective data augmentation techniques have been proposed for...
research
10/11/2020

PHICON: Improving Generalization of Clinical Text De-identification Models via Data Augmentation

De-identification is the task of identifying protected health informatio...
research
09/20/2021

Augmenting the User-Item Graph with Textual Similarity Models

This paper introduces a simple and effective form of data augmentation f...
research
10/04/2020

Reverse Operation based Data Augmentation for Solving Math Word Problems

Automatically solving math word problems is a critical task in the field...
research
10/29/2020

Conversation Graph: Data Augmentation, Training and Evaluation for Non-Deterministic Dialogue Management

Task-oriented dialogue systems typically rely on large amounts of high-q...
research
09/14/2021

A Three Step Training Approach with Data Augmentation for Morphological Inflection

We present the BME submission for the SIGMORPHON 2021 Task 0 Part 1, Gen...
research
11/01/2021

A New Tool for Efficiently Generating Quality Estimation Datasets

Building of data for quality estimation (QE) training is expensive and r...

Please sign up or login with your details

Forgot password? Click here to reset