Searching for Carriers of the Diffuse Interstellar Bands Across Disciplines, using Natural Language Processing

The explosion of scientific publications overloads researchers with information. This is even more dramatic for interdisciplinary studies, where several fields need to be explored. A tool to help researchers overcome this is Natural Language Processing (NLP): a machine-learning (ML) technique that allows scientists to automatically synthesize information from many articles. As a practical example, we have used NLP to conduct an interdisciplinary search for compounds that could be carriers for Diffuse Interstellar Bands (DIBs), a long-standing open question in astrophysics. We have trained a NLP model on a corpus of 1.5 million cross-domain articles in open access, and fine-tuned this model with a corpus of astrophysical publications about DIBs. Our analysis points us toward several molecules, studied primarily in biology, having transitions at the wavelengths of several DIBs and composed of abundant interstellar atoms. Several of these molecules contain chromophores, small molecular groups responsible for the molecule's colour, that could be promising candidate carriers. Identifying viable carriers demonstrates the value of using NLP to tackle open scientific questions, in an interdisciplinary manner.

READ FULL TEXT

page 5

page 12

research
11/27/2021

Natural Language Processing in-and-for Design Research

We review the scholarly contributions that utilise Natural Language Proc...
research
07/30/2021

An automated domain-independent text reading, interpreting and extracting approach for reviewing the scientific literature

It is presented here a machine learning-based (ML) natural language proc...
research
08/11/2022

Searching for chromate replacements using natural language processing and machine learning algorithms

The past few years has seen the application of machine learning utilised...
research
02/10/2020

Exploring Chemical Space using Natural Language Processing Methodologies for Drug Discovery

Text-based representations of chemicals and proteins can be thought of a...
research
11/29/2022

Democratizing Machine Learning for Interdisciplinary Scholars: Report on Organizing the NLP+CSS Online Tutorial Series

Many scientific fields – including biology, health, education, and the s...
research
11/08/2019

The State of NLP Literature: A Diachronic Analysis of the ACL Anthology

The ACL Anthology (AA) is a digital repository of tens of thousands of a...

Please sign up or login with your details

Forgot password? Click here to reset