Low-Resource Name Tagging Learned with Weakly Labeled Data

08/26/2019
by   Yixin Cao, et al.
0

Name tagging in low-resource languages or domains suffers from inadequate training data. Existing work heavily relies on additional information, while leaving those noisy annotations unexplored that extensively exist on the web. In this paper, we propose a novel neural model for name tagging solely based on weakly labeled (WL) data, so that it can be applied in any low-resource settings. To take the best advantage of all WL sentences, we split them into high-quality and noisy portions for two modules, respectively: (1) a classification module focusing on the large portion of noisy data can efficiently and robustly pretrain the tag classifier by capturing textual context semantics; and (2) a costly sequence labeling module focusing on high-quality data utilizes Partial-CRFs with non-entity sampling to achieve global optimum. Two modules are combined via shared parameters. Extensive experiments involving five low-resource languages and fine-grained food domain demonstrate our superior performance (6 as efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2022

Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging

Part-of-Speech (POS) tagging is an important component of the NLP pipeli...
research
07/02/2018

Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data

Manually labeled corpora are expensive to create and often not available...
research
03/28/2019

Handling Noisy Labels for Robustly Learning from Self-Training Data for Low-Resource Sequence Labeling

In this paper, we address the problem of effectively self-training neura...
research
04/10/2019

A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages

Unsupervised part of speech (POS) tagging is often framed as a clusterin...
research
10/14/2019

Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels

In low-resource settings, the performance of supervised labeling models ...
research
04/18/2022

Detect Rumors in Microblog Posts for Low-Resource Domains via Adversarial Contrastive Learning

Massive false rumors emerging along with breaking news or trending topic...
research
06/03/2021

Noisy Labels are Treasure: Mean-Teacher-Assisted Confident Learning for Hepatic Vessel Segmentation

Manually segmenting the hepatic vessels from Computer Tomography (CT) is...

Please sign up or login with your details

Forgot password? Click here to reset