Multimodal Deep Neural Networks using Both Engineered and Learned Representations for Biodegradability Prediction

08/13/2018
by   Garrett B. Goh, et al.
0

Deep learning algorithms excel at extracting patterns from raw data. Through representation learning and automated feature engineering on large datasets, such models have been highly successful in computer vision and natural language applications. However, in many other technical domains, large datasets on which to learn representations from may not be feasible. In this work, we develop a novel multimodal CNN-MLP neural network architecture that utilizes both domain-specific feature engineering as well as learned representations from raw data. We illustrate the effectiveness of such an approach in the chemical sciences, for predicting chemical properties, where labeled data is scarce owing to the high costs associated with acquiring labels through experimental measurements. By training on both raw chemical data and using engineered chemical features, while leveraging weak supervised learning and transfer learning methods, we show that the multimodal CNN-MLP network is more accurate than either a standalone CNN or MLP network that uses only raw data or engineered features respectively. Using this multimodal network, we then develop the DeepBioD model for predicting chemical biodegradability, which achieves an error classification rate of 0.125 that is 27 current state-of-the-art. Thus, our work indicates that combining traditional feature engineering with representation learning on raw data can be an effective approach, particularly in situations where labeled training data is limited. Such a framework can also be potentially applied to other technical fields, where substantial research efforts into feature engineering has been established.

READ FULL TEXT
research
09/13/2018

IL-Net: Using Expert Knowledge to Guide the Design of Furcated Neural Networks

Deep neural networks (DNN) excel at extracting patterns. Through represe...
research
12/07/2017

Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction

With access to large datasets, deep neural networks (DNN) have achieved ...
research
12/07/2017

ChemNet: A Transferable and Generalizable Deep Neural Network for Small-Molecule Property Prediction

With access to large datasets, deep neural networks (DNN) have achieved ...
research
12/07/2020

Reprogramming Language Models for Molecular Representation Learning

Recent advancements in transfer learning have made it a promising approa...
research
07/25/2017

Representation Learning on Large and Small Data

Deep learning owes its success to three key factors: scale of data, enha...
research
07/26/2020

Iterative Boosting Deep Neural Networks for Predicting Click-Through Rate

The click-through rate (CTR) reflects the ratio of clicks on a specific ...
research
12/06/2017

SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties

Chemical databases store information in text representations, and the SM...

Please sign up or login with your details

Forgot password? Click here to reset