From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery

09/11/2023
by   Yuhan Chen, et al.
0

Molecule discovery serves as a cornerstone in numerous scientific domains, fueling the development of new materials and innovative drug designs. Recent developments of in-silico molecule discovery have highlighted the promising results of cross-modal techniques, which bridge molecular structures with their descriptive annotations. However, these cross-modal methods frequently encounter the issue of data scarcity, hampering their performance and application. In this paper, we address the low-resource challenge by utilizing artificially-real data generated by Large Language Models (LLMs). We first introduce a retrieval-based prompting strategy to construct high-quality pseudo data, then explore the optimal method to effectively leverage this pseudo data. Experiments show that using pseudo data for domain adaptation outperforms all existing methods, while also requiring a smaller model scale, reduced data size and lower training cost, highlighting its efficiency. Furthermore, our method shows a sustained improvement as the volume of pseudo data increases, revealing the great potential of pseudo data in advancing low-resource cross-modal molecule discovery.

READ FULL TEXT

page 3

page 10

research
08/26/2022

Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning

Despite the recent developments in the field of cross-modal retrieval, t...
research
10/13/2022

Low-resource Neural Machine Translation with Cross-modal Alignment

How to achieve neural machine translation with limited parallel data? Ex...
research
08/28/2023

Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions

With the exponential surge in diverse multi-modal data, traditional uni-...
research
04/18/2022

Imagination-Augmented Natural Language Understanding

Human brains integrate linguistic and perceptual information simultaneou...
research
06/11/2023

Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective

Molecule discovery plays a crucial role in various scientific fields, ad...
research
05/24/2023

STAR: Boosting Low-Resource Event Extraction by Structure-to-Text Data Generation with Large Language Models

Structure prediction tasks such as event extraction require an in-depth ...
research
05/16/2022

CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction

Knowledge graph (KG) link prediction is a fundamental task in artificial...

Please sign up or login with your details

Forgot password? Click here to reset