Simple Questions Generate Named Entity Recognition Datasets

12/16/2021
by   Hyunjae Kim, et al.
0

Named entity recognition (NER) is a task of extracting named entities of specific types from text. Current NER models often rely on human-annotated datasets requiring the vast engagement of professional knowledge on the target domain and entities. This work introduces an ask-to-generate approach, which automatically generates NER datasets by asking simple natural language questions that reflect the needs for entity types (e.g., Which disease?) to an open-domain question answering system. Without using any in-domain resources (i.e., training sentences, labels, or in-domain dictionaries), our models solely trained on our generated datasets largely outperform previous weakly supervised models on six NER benchmarks across four different domains. Surprisingly, on NCBI-disease, our model achieves 75.5 F1 score and even outperforms the previous best weakly supervised model by 4.1 F1 score, which utilizes a rich in-domain dictionary provided by domain experts. Formulating the needs of NER with natural language also allows us to build NER models for fine-grained entity types such as Award, where our model even outperforms fully supervised models. On three few-shot NER benchmarks, our model achieves new state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations

Most weakly supervised named entity recognition (NER) models rely on dom...
research
05/22/2020

Bootstrapping Named Entity Recognition in E-Commerce with Positive Unlabeled Learning

Named Entity Recognition (NER) in domains like e-commerce is an understu...
research
03/23/2022

Few-shot Named Entity Recognition with Self-describing Networks

Few-shot NER needs to effectively capture information from limited insta...
research
05/10/2023

Extracting Complex Named Entities in Legal Documents via Weakly Supervised Object Detection

Accurate Named Entity Recognition (NER) is crucial for various informati...
research
05/27/2022

Sparse Conditional Hidden Markov Model for Weakly Supervised Named Entity Recognition

Weakly supervised named entity recognition methods train label models to...
research
12/31/2020

TexSmart: A Text Understanding System for Fine-Grained NER and Enhanced Semantic Analysis

This technique report introduces TexSmart, a text understanding system t...
research
12/10/2020

Segmenting Natural Language Sentences via Lexical Unit Analysis

In this work, we present Lexical Unit Analysis (LUA), a framework for ge...

Please sign up or login with your details

Forgot password? Click here to reset