ScienceExamCER: A High-Density Fine-Grained Science-Domain Corpus for Common Entity Recognition

11/24/2019
by   Hannah Smith, et al.
0

Named entity recognition identifies common classes of entities in text, but these entity labels are generally sparse, limiting utility to downstream tasks. In this work we present ScienceExamCER, a densely-labeled semantic classification corpus of 133k mentions in the science exam domain where nearly all (96 semantic class labels including taxonomic groups, meronym groups, verb/action groups, properties and values, and synonyms. Semantic class labels are drawn from a manually-constructed fine-grained typology of 601 classes generated through a data-driven analysis of 4,239 science exam questions. We show an off-the-shelf BERT-based named entity recognition model modified for multi-label classification achieves an accuracy of 0.85 F1 on this task, suggesting strong utility for downstream tasks in science domain question answering requiring densely-labeled semantic classification.

READ FULL TEXT

page 8

page 9

page 11

page 12

page 14

page 16

page 17

page 18

research
06/12/2018

Named Entity Recognition with Extremely Limited Data

Traditional information retrieval treats named entity recognition as a p...
research
02/08/2017

Automatically Annotated Turkish Corpus for Named Entity Recognition and Text Categorization using Large-Scale Gazetteers

Turkish Wikipedia Named-Entity Recognition and Text Categorization (TWNE...
research
06/02/2020

Embeddings of Label Components for Sequence Labeling: A Case Study of Fine-grained Named Entity Recognition

In general, the labels used in sequence labeling consist of different ty...
research
04/07/2020

A German Corpus for Fine-Grained Named Entity Recognition and Relation Extraction of Traffic and Industry Events

Monitoring mobility- and industry-relevant events is important in areas ...
research
04/15/2021

UIT-E10dot3 at SemEval-2021 Task 5: Toxic Spans Detection with Named Entity Recognition and Question-Answering Approaches

The increment of toxic comments on online space is causing tremendous ef...
research
02/25/2022

Deep neural networks for fine-grained surveillance of overdose mortality

Surveillance of drug overdose deaths relies on death certificates for id...
research
11/11/2020

Overview of CAPITEL Shared Tasks at IberLEF 2020: Named Entity Recognition and Universal Dependencies Parsing

We present the results of the CAPITEL-EVAL shared task, held in the cont...

Please sign up or login with your details

Forgot password? Click here to reset