Extracting Structured Seed-Mediated Gold Nanorod Growth Procedures from Literature with GPT-3

by   Nicholas Walker, et al.

Although gold nanorods have been the subject of much research, the pathways for controlling their shape and thereby their optical properties remain largely heuristically understood. Although it is apparent that the simultaneous presence of and interaction between various reagents during synthesis control these properties, computational and experimental approaches for exploring the synthesis space can be either intractable or too time-consuming in practice. This motivates an alternative approach leveraging the wealth of synthesis information already embedded in the body of scientific literature by developing tools to extract relevant structured data in an automated, high-throughput manner. To that end, we present an approach using the powerful GPT-3 language model to extract structured multi-step seed-mediated growth procedures and outcomes for gold nanorods from unstructured scientific text. GPT-3 prompt completions are fine-tuned to predict synthesis templates in the form of JSON documents from unstructured text input with an overall accuracy of 86%. The performance is notable, considering the model is performing simultaneous entity recognition and relation extraction. We present a dataset of 11,644 entities extracted from 1,137 papers, resulting in 268 papers with at least one complete seed-mediated gold nanorod growth procedure and outcome for a total of 332 complete procedures.


Structured information extraction from complex scientific text with fine-tuned large language models

Intelligently extracting and linking complex scientific information from...

The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures

Materials science literature contains millions of materials synthesis pr...

Automatically Extracting Action Graphs from Materials Science Synthesis Procedures

Computational synthesis planning approaches have achieved recent success...

Entity Recognition and Relation Extraction from Scientific and Technical Texts in Russian

This paper is devoted to the study of methods for information extraction...

Ahead of the Text: Leveraging Entity Preposition for Financial Relation Extraction

In the context of the ACM KDF-SIGIR 2023 competition, we undertook an en...

Synergi: A Mixed-Initiative System for Scholarly Synthesis and Sensemaking

Efficiently reviewing scholarly literature and synthesizing prior art ar...

ChatGPT Chemistry Assistant for Text Mining and Prediction of MOF Synthesis

We use prompt engineering to guide ChatGPT in the automation of text min...

Please sign up or login with your details

Forgot password? Click here to reset