Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations

10/14/2022
by   Hyunjae Kim, et al.
0

Most weakly supervised named entity recognition (NER) models rely on domain-specific dictionaries provided by experts. This approach is infeasible in many domains where dictionaries do not exist. While a phrase retrieval model was used to construct pseudo-dictionaries with entities retrieved from Wikipedia automatically in a recent study, these dictionaries often have limited coverage because the retriever is likely to retrieve popular entities rather than rare ones. In this study, a phrase embedding search to efficiently create high-coverage dictionaries is presented. Specifically, the reformulation of natural language queries into phrase representations allows the retriever to search a space densely populated with various entities. In addition, we present a novel framework, HighGEN, that generates NER datasets with high-coverage dictionaries obtained using the phrase embedding search. HighGEN generates weak labels based on the distance between the embeddings of a candidate phrase and target entity type to reduce the noise in high-coverage dictionaries. We compare HighGEN with current weakly supervised NER models on six NER benchmarks and demonstrate the superiority of our models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2021

Simple Questions Generate Named Entity Recognition Datasets

Named entity recognition (NER) is a task of extracting named entities of...
research
04/24/2015

Learning Dictionaries for Named Entity Recognition using Minimal Supervision

This paper describes an approach for automatic construction of dictionar...
research
12/03/2019

HAMNER: Headword Amplified Multi-span Distantly Supervised Method for Domain Specific Named Entity Recognition

To tackle Named Entity Recognition (NER) tasks, supervised methods need ...
research
06/04/2019

Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning

In this work, we explore the way to perform named entity recognition (NE...
research
05/26/2021

BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition

We study the problem of learning a named entity recognition (NER) tagger...
research
05/10/2023

Extracting Complex Named Entities in Legal Documents via Weakly Supervised Object Detection

Accurate Named Entity Recognition (NER) is crucial for various informati...
research
04/25/2020

A Rigourous Study on Named Entity Recognition: Can Fine-tuning Pretrained Model Lead to the Promised Land?

Fine-tuning pretrained model has achieved promising performance on stand...

Please sign up or login with your details

Forgot password? Click here to reset