MatSciBERT: A Materials Domain Language Model for Text Mining and Information Extraction

09/30/2021
by   Tanishq Gupta, et al.
29

An overwhelmingly large amount of knowledge in the materials domain is generated and stored as text published in peer-reviewed scientific literature. Recent developments in natural language processing, such as bidirectional encoder representations from transformers (BERT) models, provide promising tools to extract information from these texts. However, direct application of these models in the materials domain may yield suboptimal results as the models themselves may not be trained on notations and jargon that are specific to the domain. Here, we present a materials-aware language model, namely, MatSciBERT, which is trained on a large corpus of scientific literature published in the materials domain. We further evaluate the performance of MatSciBERT on three downstream tasks, namely, abstract classification, named entity recognition, and relation extraction, on different materials datasets. We show that MatSciBERT outperforms SciBERT, a language model trained on science corpus, on all the tasks. Further, we discuss some of the applications of MatSciBERT in the materials domain for extracting information, which can, in turn, contribute to materials discovery or optimization. Finally, to make the work accessible to the larger materials community, we make the pretrained and finetuned weights and the models of MatSciBERT freely accessible.

READ FULL TEXT

page 11

page 12

page 15

page 16

research
12/10/2022

Structured information extraction from complex scientific text with fine-tuned large language models

Intelligently extracting and linking complex scientific information from...
research
12/31/2018

Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks

Leveraging new data sources is a key step in accelerating the pace of ma...
research
04/05/2023

Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT

The amount of data has growing significance in exploring cutting-edge ma...
research
09/15/2023

BioinspiredLLM: Conversational Large Language Model for the Mechanics of Biological and Bio-inspired Materials

The study of biological materials and bio-inspired materials science is ...
research
08/18/2023

Accelerated materials language processing enabled by GPT

Materials language processing (MLP) is one of the key facilitators of ma...
research
01/05/2021

Looking Through Glass: Knowledge Discovery from Materials Science Literature using Natural Language Processing

Most of the knowledge in materials science literature is in the form of ...
research
11/16/2022

Galactica: A Large Language Model for Science

Information overload is a major obstacle to scientific progress. The exp...

Please sign up or login with your details

Forgot password? Click here to reset