ChatGPT Chemistry Assistant for Text Mining and Prediction of MOF Synthesis

06/20/2023
by   Zhiling Zheng, et al.
0

We use prompt engineering to guide ChatGPT in the automation of text mining of metal-organic frameworks (MOFs) synthesis conditions from diverse formats and styles of the scientific literature. This effectively mitigates ChatGPT's tendency to hallucinate information – an issue that previously made the use of Large Language Models (LLMs) in scientific fields challenging. Our approach involves the development of a workflow implementing three different processes for text mining, programmed by ChatGPT itself. All of them enable parsing, searching, filtering, classification, summarization, and data unification with different tradeoffs between labor, speed, and accuracy. We deploy this system to extract 26,257 distinct synthesis parameters pertaining to approximately 800 MOFs sourced from peer-reviewed research articles. This process incorporates our ChemPrompt Engineering strategy to instruct ChatGPT in text mining, resulting in impressive precision, recall, and F1 scores of 90-99 Furthermore, with the dataset built by text mining, we constructed a machine-learning model with over 86 crystallization outcomes and preliminarily identifying important factors in MOF crystallization. We also developed a reliable data-grounded MOF chatbot to answer questions on chemical reactions and synthesis procedures. Given that the process of using ChatGPT reliably mines and tabulates diverse MOF synthesis information in a unified format, while using only narrative language requiring no coding expertise, we anticipate that our ChatGPT Chemistry Assistant will be very useful across various other chemistry sub-disciplines.

READ FULL TEXT

page 1

page 8

page 9

page 13

page 35

page 36

page 39

page 40

research
12/19/2022

Very Large Language Model as a Unified Methodology of Text Mining

Text data mining is the process of deriving essential information from l...
research
05/16/2019

The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures

Materials science literature contains millions of materials synthesis pr...
research
08/31/2023

Using Large Language Models to Automate Category and Trend Analysis of Scientific Articles: An Application in Ophthalmology

Purpose: In this paper, we present an automated method for article class...
research
04/23/2015

Open Data Platform for Knowledge Access in Plant Health Domain : VESPA Mining

Important data are locked in ancient literature. It would be uneconomic ...
research
01/23/2022

ULSA: Unified Language of Synthesis Actions for Representation of Synthesis Protocols

Applying AI power to predict syntheses of novel materials requires high-...
research
04/26/2023

Extracting Structured Seed-Mediated Gold Nanorod Growth Procedures from Literature with GPT-3

Although gold nanorods have been the subject of much research, the pathw...

Please sign up or login with your details

Forgot password? Click here to reset