Difficulty in learning chirality for Transformer fed with SMILES

03/21/2023
by   Yasuhiro Yoshikai, et al.
0

Recent years have seen development of descriptor generation based on representation learning of extremely diverse molecules, especially those that apply natural language processing (NLP) models to SMILES, a literal representation of molecular structure. However, little research has been done on how these models understand chemical structure. To address this, we investigated the relationship between the learning progress of SMILES and chemical structure using a representative NLP model, the Transformer. The results suggest that while the Transformer learns partial structures of molecules quickly, it requires extended training to understand overall structures. Consistently, the accuracy of molecular property predictions using descriptors generated from models at different learning steps was similar from the beginning to the end of training. Furthermore, we found that the Transformer requires particularly long training to learn chirality and sometimes stagnates with low translation accuracy due to misunderstanding of enantiomers. These findings are expected to deepen understanding of NLP models in chemistry.

READ FULL TEXT

page 13

page 24

page 25

page 34

page 35

research
04/10/2023

SELFormer: Molecular Representation Learning via SELFIES Language Models

Automated computational analysis of the vast chemical space is critical ...
research
10/02/2020

Beyond Chemical 1D knowledge using Transformers

In the present paper we evaluated efficiency of the recent Transformer-C...
research
02/27/2021

Generative chemical transformer: attention makes neural machine learn molecular geometric structures via text

Chemical formula is an artificial language that expresses molecules as t...
research
07/08/2022

Graph-based Molecular Representation Learning

Molecular representation learning (MRL) is a key step to build the conne...
research
11/15/2022

ParticleGrid: Enabling Deep Learning using 3D Representation of Materials

From AlexNet to Inception, autoencoders to diffusion models, the develop...
research
04/23/2018

Descriptor Selection via Self-Paced Learning for Bioactivity of Molecular Structure in QSAR Classification

Quantitative structure-activity relationship (QSAR) modelling is effecti...
research
10/04/2022

One Transformer Can Understand Both 2D 3D Molecular Data

Unlike vision and language data which usually has a unique format, molec...

Please sign up or login with your details

Forgot password? Click here to reset