DeepAI AI Chat
Log In Sign Up

Learning Dynamic BERT via Trainable Gate Variables and a Bi-modal Regularizer

by   Seohyeong Jeong, et al.

The BERT model has shown significant success on various natural language processing tasks. However, due to the heavy model size and high computational cost, the model suffers from high latency, which is fatal to its deployments on resource-limited devices. To tackle this problem, we propose a dynamic inference method on BERT via trainable gate variables applied on input tokens and a regularizer that has a bi-modal property. Our method shows reduced computational cost on the GLUE dataset with a minimal performance drop. Moreover, the model adjusts with a trade-off between performance and computational cost with the user-specified hyperparameter.


page 1

page 2

page 3

page 4


AdapLeR: Speeding up Inference by Adaptive Length Reduction

Pre-trained language models have shown stellar performance in various do...

Efficient Inference on Deep Neural Networks by Dynamic Representations and Decision Gates

The current trade-off between depth and computational cost makes it diff...

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

Natural Language Processing (NLP) has recently achieved great success by...

Learning when to skim and when to read

Many recent advances in deep learning for natural language processing ha...

Improving Inference Performance of Machine Learning with the Divide-and-Conquer Principle

Many popular machine learning models scale poorly when deployed on CPUs....

E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models

Building huge and highly capable language models has been a trend in the...

RecipeSnap – a lightweight image-to-recipe model

In this paper we want to address the problem of automation for recogniti...