EZLearn: Exploiting Organic Supervision in Large-Scale Data Annotation

09/25/2017
by   Maxim Grechkin, et al.
0

Many real-world applications require large-scale data annotation, such as identifying tissue origins based on gene expressions and classifying images into semantic categories. Annotation classes are often numerous and subject to changes over time, and annotating examples has become the major bottleneck for supervised learning methods. In science and other high-value domains, large repositories of data samples are often available, together with two sources of organic supervision: a lexicon for the annotation classes, and text descriptions that accompany some data samples. Distant supervision has emerged as a promising paradigm for exploiting such indirect supervision by automatically annotating examples where the text description contains a class mention in the lexicon. However, due to linguistic variations and ambiguities, such training data is inherently noisy, which limits the accuracy in this approach. In this paper, we introduce an auxiliary natural language processing system for the text modality, and incorporate co-training to reduce noise and augment signal in distant supervision. Without using any manually labeled data, our EZLearn system learned to accurately annotate data samples in functional genomics and scientific figure comprehension, substantially outperforming state-of-the-art supervised methods trained on tens of thousands of annotated examples.

READ FULL TEXT
research
03/29/2021

Visual Distant Supervision for Scene Graph Generation

Scene graph generation aims to identify objects and their relations in i...
research
06/13/2019

KCAT: A Knowledge-Constraint Typing Annotation Tool

Fine-grained Entity Typing is a tough task which suffers from noise samp...
research
05/04/2019

Learning to Denoise Distantly-Labeled Data for Entity Typing

Distantly-labeled data can be used to scale up training of statistical m...
research
02/25/2021

ANEA: Distant Supervision for Low-Resource Named Entity Recognition

Distant supervision allows obtaining labeled training corpora for low-re...
research
01/19/2023

Self Supervision Does Not Help Natural Language Supervision at Scale

Self supervision and natural language supervision have emerged as two ex...
research
07/21/2017

Learning Aerial Image Segmentation from Online Maps

This study deals with semantic segmentation of high-resolution (aerial) ...
research
05/25/2020

Incidental Supervision: Moving beyond Supervised Learning

Machine Learning and Inference methods have become ubiquitous in our att...

Please sign up or login with your details

Forgot password? Click here to reset