Chemical Identification and Indexing in PubMed Articles via BERT and Text-to-Text Approaches

11/30/2021
by   Virginia Adams, et al.
0

The Biocreative VII Track-2 challenge consists of named entity recognition, entity-linking (or entity-normalization), and topic indexing tasks – with entities and topics limited to chemicals for this challenge. Named entity recognition is a well-established problem and we achieve our best performance with BERT-based BioMegatron models. We extend our BERT-based approach to the entity linking task. After the second stage of pretraining BioBERT with a metric-learning loss strategy called self-alignment pretraining (SAP), we link entities based on the cosine similarity between their SAP-BioBERT word embeddings. Despite the success of our named entity recognition experiments, we find the chemical indexing task generally more challenging. In addition to conventional NER methods, we attempt both named entity recognition and entity linking with a novel text-to-text or "prompt" based method that uses generative language models such as T5 and GPT. We achieve encouraging results with this new approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2019

Joint Learning of Named Entity Recognition and Entity Linking

Named entity recognition (NER) and entity linking (EL) are two fundament...
research
07/05/2023

Named Entity Inclusion in Abstractive Text Summarization

We address the named entity omission - the drawback of many current abst...
research
03/14/2022

WCL-BBCD: A Contrastive Learning and Knowledge Graph Approach to Named Entity Recognition

Named Entity Recognition task is one of the core tasks of information ex...
research
04/07/2020

Inexpensive Domain Adaptation of Pretrained Language Models: A Case Study on Biomedical Named Entity Recognition

Domain adaptation of Pretrained Language Models (PTLMs) is typically ach...
research
11/20/2021

Improving Tagging Consistency and Entity Coverage for Chemical Identification in Full-text Articles

This paper is a technical report on our system submitted to the chemical...
research
07/27/2018

Clustering Prominent People and Organizations in Topic-Specific Text Corpora

Named entities in text documents are the names of people, organization, ...
research
09/27/2022

DAMO-NLP at NLPCC-2022 Task 2: Knowledge Enhanced Robust NER for Speech Entity Linking

Speech Entity Linking aims to recognize and disambiguate named entities ...

Please sign up or login with your details

Forgot password? Click here to reset