SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties

12/06/2017
by   Garrett B. Goh, et al.
0

Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2Vec, a deep RNN that automatically learns features from SMILES strings to predict chemical properties, without the need for additional explicit chemical information, or the "grammar" of how SMILES encode structural data. Using Bayesian optimization methods to tune the network architecture, we show that an optimized SMILES2Vec model can serve as a general-purpose neural network for learning a range of distinct chemical properties including toxicity, activity, solubility and solvation energy, while outperforming contemporary MLP networks that uses engineered features. Furthermore, we demonstrate proof-of-concept of interpretability by developing an explanation mask that localizes on the most important characters used in making a prediction. When tested on the solubility dataset, this localization identifies specific parts of a chemical that is consistent with established first-principles knowledge of solubility with an accuracy of 88 accurate chemical concepts. The fact that SMILES2Vec validates established chemical facts, while providing state-of-the-art accuracy, makes it a potential tool for widespread adoption of interpretable deep learning by the chemistry community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2017

How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?

In the last few years, we have seen the rise of deep learning applicatio...
research
12/07/2017

ChemNet: A Transferable and Generalizable Deep Neural Network for Small-Molecule Property Prediction

With access to large datasets, deep neural networks (DNN) have achieved ...
research
12/07/2017

Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction

With access to large datasets, deep neural networks (DNN) have achieved ...
research
06/27/2018

Quantum-chemical insights from interpretable atomistic neural networks

With the rise of deep neural networks for quantum chemistry applications...
research
08/13/2018

Multimodal Deep Neural Networks using Both Engineered and Learned Representations for Biodegradability Prediction

Deep learning algorithms excel at extracting patterns from raw data. Thr...
research
02/16/2018

Algorithmic Complexity and Reprogrammability of Chemical Structure Networks

Here we address the challenge of profiling causal properties and trackin...
research
06/06/2017

ChemKED: a human- and machine-readable data standard for chemical kinetics experiments

Fundamental experimental measurements of quantities such as ignition del...

Please sign up or login with your details

Forgot password? Click here to reset