What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks

05/27/2023
by   Taicheng Guo, et al.
0

Large Language Models (LLMs) with strong abilities in natural language processing tasks have emerged and have been rapidly applied in various kinds of areas such as science, finance and software engineering. However, the capability of LLMs to advance the field of chemistry remains unclear. In this paper,we establish a comprehensive benchmark containing 8 practical chemistry tasks, including 1) name prediction, 2) property prediction, 3) yield prediction, 4) reaction prediction, 5) retrosynthesis (prediction of reactants from products), 6)text-based molecule design, 7) molecule captioning, and 8) reagent selection. Our analysis draws on widely recognized datasets including BBBP, Tox21, PubChem, USPTO, and ChEBI, facilitating a broad exploration of the capacities of LLMs within the context of practical chemistry. Three GPT models (GPT-4, GPT-3.5,and Davinci-003) are evaluated for each chemistry task in zero-shot and few-shot in-context learning settings with carefully selected demonstration examples and specially crafted prompts. The key results of our investigation are 1) GPT-4 outperforms the other two models among the three evaluated; 2) GPT models exhibit less competitive performance in tasks demanding precise understanding of molecular SMILES representation, such as reaction prediction and retrosynthesis;3) GPT models demonstrate strong capabilities in text-related explanation tasks such as molecule captioning; and 4) GPT models exhibit comparable or better performance to classical machine learning models when applied to chemical problems that can be transformed into classification or ranking tasks, such as property prediction, and yield prediction.

READ FULL TEXT

page 8

page 9

page 12

research
07/14/2023

Can Large Language Models Empower Molecular Property Prediction?

Molecular property prediction has gained significant attention due to it...
research
05/22/2022

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

The goal of this work is to build flexible video-language models that ca...
research
11/08/2022

Active Example Selection for In-Context Learning

With a handful of demonstration examples, large-scale language models sh...
research
10/24/2022

FCM: Forgetful Causal Masking Makes Causal Language Models Better Zero-Shot Learners

Large language models (LLM) trained using the next-token-prediction obje...
research
03/18/2023

A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models

GPT series models, such as GPT-3, CodeX, InstructGPT, ChatGPT, and so on...
research
12/11/2020

When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?

Modern machine learning models are complex and frequently encode surpris...

Please sign up or login with your details

Forgot password? Click here to reset