Chemical transformer compression for accelerating both training and inference of molecular modeling

05/16/2022
by   Yi Yu, et al.
0

Transformer models have been developed in molecular science with excellent performance in applications including quantitative structure-activity relationship (QSAR) and virtual screening (VS). Compared with other types of models, however, they are large, which results in a high hardware requirement to abridge time for both training and inference processes. In this work, cross-layer parameter sharing (CLPS), and knowledge distillation (KD) are used to reduce the sizes of transformers in molecular science. Both methods not only have competitive QSAR predictive performance as compared to the original BERT model, but also are more parameter efficient. Furthermore, by integrating CLPS and KD into a two-state chemical network, we introduce a new deep lite chemical transformer model, DeLiCaTe. DeLiCaTe captures general-domains as well as task-specific knowledge, which lead to a 4x faster rate of both training and inference due to a 10- and 3-times reduction of the number of parameters and layers, respectively. Meanwhile, it achieves comparable performance in QSAR and VS modeling. Moreover, we anticipate that the model compression strategy provides a pathway to the creation of effective generative transformer models for organic drug and material design.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/26/2021

Stepping Back to SMILES Transformers for Fast Molecular Representation Inference

In the intersection of molecular science and deep learning, tasks like v...
research
08/05/2021

Decoupled Transformer for Scalable Inference in Open-domain Question Answering

Large transformer models, such as BERT, achieve state-of-the-art results...
research
07/25/2023

Curvature-based Transformer for Molecular Property Prediction

The prediction of molecular properties is one of the most important and ...
research
10/17/2022

A Transformer-based Generative Model for De Novo Molecular Design

Deep learning draws a lot of attention as a new way of generating unseen...
research
10/02/2020

Beyond Chemical 1D knowledge using Transformers

In the present paper we evaluated efficiency of the recent Transformer-C...
research
04/08/2021

Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency

Transformer-based self-supervised models are trained as feature extracto...
research
09/17/2022

Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition

While transformers and their variant conformers show promising performan...

Please sign up or login with your details

Forgot password? Click here to reset