Using natural language processing techniques to extract information on the properties and functionalities of energetic materials from large text corpora

03/01/2019
by   Dan Elton, et al.
0

The number of scientific journal articles and reports being published about energetic materials every year is growing exponentially, and therefore extracting relevant information and actionable insights from the latest research is becoming a considerable challenge. In this work we explore how techniques from natural language processing and machine learning can be used to automatically extract chemical insights from large collections of documents. We first describe how to download and process documents from a variety of sources - journal articles, conference proceedings (including NTREM), the US Patent & Trademark Office, and the Defense Technical Information Center archive on archive.org. We present a custom NLP pipeline which uses open source NLP tools to identify the names of chemical compounds and relates them to function words ("underwater", "rocket", "pyrotechnic") and property words ("elastomer", "non-toxic"). After explaining how word embeddings work we compare the utility of two popular word embeddings - word2vec and GloVe. Chemical-chemical and chemical-application relationships are obtained by doing computations with word vectors. We show that word embeddings capture latent information about energetic materials, so that related materials appear close together in the word embedding space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2020

Word Embeddings for Chemical Patent Natural Language Processing

We evaluate chemical patent word embeddings against known biomedical emb...
research
08/11/2022

Searching for chromate replacements using natural language processing and machine learning algorithms

The past few years has seen the application of machine learning utilised...
research
10/01/2020

Persistent homology advances interpretable machine learning for nanoporous materials

Machine learning for nanoporous materials design and discovery has emerg...
research
01/05/2021

Looking Through Glass: Knowledge Discovery from Materials Science Literature using Natural Language Processing

Most of the knowledge in materials science literature is in the form of ...
research
09/29/2022

polyBERT: A chemical language model to enable fully machine-driven ultrafast polymer informatics

Polymers are a vital part of everyday life. Their chemical universe is s...
research
10/30/2020

A Cross-lingual Natural Language Processing Framework for Infodemic Management

The COVID-19 pandemic has put immense pressure on health systems which a...
research
06/21/2017

Language That Matters: Statistical Inferences for Polarity Identification in Natural Language

Information forms the basis for all human behavior, including the ubiqui...

Please sign up or login with your details

Forgot password? Click here to reset