Named entity recognition in chemical patents using ensemble of contextual language models

07/24/2020
by   Jenny Copara, et al.
0

Chemical patent documents describe a broad range of applications holding key information, such as chemical compounds, reactions, and specific properties. However, the key information should be enabled to be utilized in downstream tasks. Text mining provides means to extract relevant information from chemical patents through information extraction techniques. As part of the Information Extraction task of the Cheminformatics Elseiver Melbourne University challenge, in this work we study the effectiveness of contextualized language models to extract reaction information in chemical patents. We compare transformer architectures trained on a generic corpus with models specialised in chemistry patents, and propose a new model based on the combination of existing architectures. Our best model, based on the ensemble approach, achieves an exact F1-score of 92.30 ensemble of contextualized language models provides an effective method to extract information from chemical patents. As a next step, we will investigate the effect of transformer language models pre-trained in chemical patents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2022

SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition using Large Pre-trained Language Models

Large scale pre-training models have been widely used in named entity re...
research
10/24/2022

Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction

Neural language models encode rich knowledge about entities and their re...
research
09/15/2023

Mining Patents with Large Language Models Demonstrates Congruence of Functional Labels and Chemical Structures

Predicting chemical function from structure is a major goal of the chemi...
research
08/16/2023

Large Language Models for Granularized Barrett's Esophagus Diagnosis Classification

Diagnostic codes for Barrett's esophagus (BE), a precursor to esophageal...
research
03/08/2023

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Recent advancements in large language models (LLMs) have led to the deve...
research
05/25/2023

Explainability Techniques for Chemical Language Models

Explainability techniques are crucial in gaining insights into the reaso...

Please sign up or login with your details

Forgot password? Click here to reset