PharmKE: Knowledge Extraction Platform for Pharmaceutical Texts using Transfer Learning

02/25/2021
by   Nasi Jofche, et al.
8

The challenge of recognizing named entities in a given text has been a very dynamic field in recent years. This is due to the advances in neural network architectures, increase of computing power and the availability of diverse labeled datasets, which deliver pre-trained, highly accurate models. These tasks are generally focused on tagging common entities, but domain-specific use-cases require tagging custom entities which are not part of the pre-trained models. This can be solved by either fine-tuning the pre-trained models, or by training custom models. The main challenge lies in obtaining reliable labeled training and test datasets, and manual labeling would be a highly tedious task. In this paper we present PharmKE, a text analysis platform focused on the pharmaceutical domain, which applies deep learning through several stages for thorough semantic analysis of pharmaceutical articles. It performs text classification using state-of-the-art transfer learning models, and thoroughly integrates the results obtained through a proposed methodology. The methodology is used to create accurately labeled training and test datasets, which are then used to train models for custom entity labeling tasks, centered on the pharmaceutical domain. The obtained results are compared to the fine-tuned BERT and BioBERT models trained on the same dataset. Additionally, the PharmKE platform integrates the results obtained from named entity recognition tasks to resolve co-references of entities and analyze the semantic relations in every sentence, thus setting up a baseline for additional text analysis tasks, such as question answering and fact extraction. The recognized entities are also used to expand the knowledge graph generated by DBpedia Spotlight for a given pharmaceutical text.

READ FULL TEXT

page 5

page 10

page 11

research
12/07/2022

A Study on Extracting Named Entities from Fine-tuned vs. Differentially Private Fine-tuned BERT Models

Privacy preserving deep learning is an emerging field in machine learnin...
research
05/25/2021

NukeLM: Pre-Trained and Fine-Tuned Language Models for the Nuclear and Energy Domains

Natural language processing (NLP) tasks (text classification, named enti...
research
11/07/2022

Reconciliation of Pre-trained Models and Prototypical Neural Networks in Few-shot Named Entity Recognition

Incorporating large-scale pre-trained models with the prototypical neura...
research
12/01/2021

Wiki to Automotive: Understanding the Distribution Shift and its impact on Named Entity Recognition

While transfer learning has become a ubiquitous technique used across Na...
research
10/14/2019

Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models

Training models on low-resource named entity recognition tasks has been ...
research
04/29/2017

Semi-supervised sequence tagging with bidirectional language models

Pre-trained word embeddings learned from unlabeled text have become a st...
research
03/06/2020

Transfer Learning for Information Extraction with Limited Data

This paper presents a practical approach to fine-grained information ext...

Please sign up or login with your details

Forgot password? Click here to reset