A general-purpose material property data extraction pipeline from large polymer corpora using Natural Language Processing

09/27/2022
by   Pranav Shetty, et al.
0

The ever-increasing number of materials science articles makes it hard to infer chemistry-structure-property relations from published literature. We used natural language processing (NLP) methods to automatically extract material property data from the abstracts of polymer literature. As a component of our pipeline, we trained MaterialsBERT, a language model, using 2.4 million materials science abstracts, which outperforms other baseline models in three out of five named entity recognition datasets when used as the encoder for text. Using this pipeline, we obtained  300,000 material property records from  130,000 abstracts in 60 hours. The extracted data was analyzed for a diverse range of applications such as fuel cells, supercapacitors, and polymer solar cells to recover non-trivial insights. The data extracted through our pipeline is made available through a web platform at https://polymerscholar.org which can be used to locate material property data recorded in abstracts conveniently. This work demonstrates the feasibility of an automatic pipeline that starts from published literature and ends with a complete set of extracted material property information.

READ FULL TEXT

page 8

page 14

page 39

research
06/27/2021

Analyzing Research Trends in Inorganic Materials Literature Using NLP

In the field of inorganic materials science, there is a growing demand t...
research
01/05/2021

Looking Through Glass: Knowledge Discovery from Materials Science Literature using Natural Language Processing

Most of the knowledge in materials science literature is in the form of ...
research
09/15/2020

MatScIE: An automated tool for the generation of databases of methods and parameters used in the computational materials science literature

The number of published articles in the field of materials science is gr...
research
02/09/2023

Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models

Accurate and comprehensive material databases extracted from research pa...
research
07/17/2022

Extracting and Visualizing Wildlife Trafficking Events from Wildlife Trafficking Reports

Experts combating wildlife trafficking manually sift through articles ab...
research
02/11/2023

MatKB: Semantic Search for Polycrystalline Materials Synthesis Procedures

In this paper, we present a novel approach to knowledge extraction and r...
research
07/17/2022

Natural language processing for clusterization of genes according to their functions

There are hundreds of methods for analysis of data obtained in mRNA-sequ...

Please sign up or login with your details

Forgot password? Click here to reset