HoloDetect: Few-Shot Learning for Error Detection

04/04/2019
by   Alireza Heidari, et al.
0

We introduce a few-shot learning framework for error detection. We show that data augmentation (a form of weak supervision) is key to training high-quality, ML-based error detection models that require minimal human involvement. Our framework consists of two parts: (1) an expressive model to learn rich representations that capture the inherent syntactic and semantic heterogeneity of errors; and (2) a data augmentation model that, given a small seed of clean records, uses dataset-specific transformations to automatically generate additional training data. Our key insight is to learn data augmentation policies from the noisy input dataset in a weakly supervised manner. We show that our framework detects errors with an average precision of 94 average recall of 93 different types and amounts of errors. We compare our approach to a comprehensive collection of error detection methods, ranging from traditional rule-based methods to ensemble-based and active learning approaches. We show that data augmentation yields an average improvement of 20 F1 points while it requires access to 3x fewer labeled examples compared to other ML approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2021

Few-shot learning via tensor hallucination

Few-shot classification addresses the challenge of classifying examples ...
research
05/09/2022

Few-shot Mining of Naturally Occurring Inputs and Outputs

Creating labeled natural language training data is expensive and require...
research
06/12/2022

Data Augmentation for Intent Classification

Training accurate intent classifiers requires labeled data, which can be...
research
02/02/2021

Neural Data Augmentation via Example Extrapolation

In many applications of machine learning, certain categories of examples...
research
07/22/2022

Multi-Level Fine-Tuning, Data Augmentation, and Few-Shot Learning for Specialized Cyber Threat Intelligence

Gathering cyber threat intelligence from open sources is becoming increa...
research
03/14/2022

Self-Promoted Supervision for Few-Shot Transformer

The few-shot learning ability of vision transformers (ViTs) is rarely in...
research
09/16/2021

Sister Help: Data Augmentation for Frame-Semantic Role Labeling

While FrameNet is widely regarded as a rich resource of semantics in nat...

Please sign up or login with your details

Forgot password? Click here to reset