Fragment-based t-SMILES for de novo molecular generation

01/04/2023
by   Juan-Ni Wu, et al.
0

At present, sequence-based and graph-based models are two of popular used molecular generative models. In this study, we introduce a general-purposed, fragment-based, hierarchical molecular representation named t-SMILES (tree-based SMILES) which describes molecules using a SMILES-type string obtained by doing breadth first search (BFS) on full binary molecular tree formed from fragmented molecular graph. The proposed t-SMILES combines the advantages of graph model paying more attention to molecular topology structure and language model possessing powerful learning ability. Experiments with feature tree rooted JTVAE and chemical reaction-based BRICS molecular decomposing algorithms using sequence-based autoregressive generation models on three popular molecule datasets including Zinc, QM9 and ChEMBL datasets indicate that t-SMILES based models significantly outperform previously proposed fragment-based models and being competitive with classical SMILES based and graph-based approaches. Most importantly, we proposed a new perspective for fragment based molecular designing. Hence, SOTA powerful sequence-based solutions could be easily applied for fragment based molecular tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/26/2021

Stepping Back to SMILES Transformers for Fast Molecular Representation Inference

In the intersection of molecular science and deep learning, tasks like v...
research
03/21/2023

Identifiability of the Rooted Tree Parameter under the Cavender-Farris-Neyman Model with a Molecular Clock

Identifiability of the discrete tree parameter is a key property for phy...
research
11/23/2021

Automaton of molecular perceptions in biochemical reactions

Local interactions among biomolecules, and the role played by their envi...
research
05/30/2023

Hierarchical Graph Generation with K^2-trees

Generating graphs from a target distribution is a significant challenge ...
research
03/22/2022

Root-aligned SMILES for Molecular Retrosynthesis Prediction

Retrosynthesis prediction is a fundamental problem in organic synthesis,...
research
05/28/2022

Robust Molecular Image Recognition: A Graph Generation Approach

Molecular image recognition is a fundamental task in information extract...
research
01/28/2022

FastFlows: Flow-Based Models for Molecular Graph Generation

We propose a framework using normalizing-flow based models, SELF-Referen...

Please sign up or login with your details

Forgot password? Click here to reset