Root-aligned SMILES for Molecular Retrosynthesis Prediction

03/22/2022
by   Zipeng Zhong, et al.
0

Retrosynthesis prediction is a fundamental problem in organic synthesis, where the task is to discover precursor molecules that can be used to synthesize a target molecule. A popular paradigm of existing computational retrosynthesis methods formulate retrosynthesis prediction as a sequence-to-sequence translation problem, where the typical SMILES representations are adopted for both reactants and products. However, the general-purpose SMILES neglects the characteristics of retrosynthesis that 1) the search space of the reactants is quite huge, and 2) the molecular graph topology is largely unaltered from products to reactants, resulting in the suboptimal performance of SMILES if straightforwardly applied. In this article, we propose the root-aligned SMILES (R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES, to narrow the string representation discrepancy for more efficient retrosynthesis. As the minimum edit distance between the input and the output is significantly decreased with the proposed R-SMILES, the computational model is largely relieved from learning the complex syntax and dedicated to learning the chemical knowledge for retrosynthesis. We compare the proposed R-SMILES with various state-of-the-art baselines on different benchmarks and show that it significantly outperforms them all, demonstrating the superiority of the proposed method.

READ FULL TEXT

page 7

page 18

research
01/04/2023

Fragment-based t-SMILES for de novo molecular generation

At present, sequence-based and graph-based models are two of popular use...
research
05/28/2022

Robust Molecular Image Recognition: A Graph Generation Approach

Molecular image recognition is a fundamental task in information extract...
research
03/28/2020

A Graph to Graphs Framework for Retrosynthesis Prediction

A fundamental problem in computational chemistry is to find a set of rea...
research
07/18/2022

FunQG: Molecular Representation Learning Via Quotient Graphs

Learning expressive molecular representations is crucial to facilitate t...
research
12/03/2018

Learning Multimodal Graph-to-Graph Translation for Molecular Optimization

We view molecular optimization as a graph-to-graph translation problem. ...
research
10/12/2022

Modular Flows: Differential Molecular Generation

Generating new molecules is fundamental to advancing critical applicatio...
research
04/27/2019

Towards Automation of Creativity: A Machine Intelligence Approach

This paper demonstrates emergence of computational creativity in the fie...

Please sign up or login with your details

Forgot password? Click here to reset