DeepAI AI Chat
Log In Sign Up

Distilling BERT for low complexity network training

by   Bansidhar Mangalwedhekar, et al.

This paper studies the efficiency of transferring BERT learnings to low complexity models like BiLSTM, BiLSTM with attention and shallow CNNs using sentiment analysis on SST-2 dataset. It also compares the complexity of inference of the BERT model with these lower complexity models and underlines the importance of these techniques in enabling high performance NLP models on edge devices like mobiles, tablets and MCU development boards like Raspberry Pi etc. and enabling exciting new applications.


page 1

page 2

page 3

page 4


BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives

BERT has revolutionized the NLP field by enabling transfer learning with...

Fine-Tuning BERT for Sentiment Analysis of Vietnamese Reviews

Sentiment analysis is an important task in the field ofNature Language P...

2-Local Hamiltonian with Low Complexity is QCMA

We prove that 2-Local Hamiltonian (2-LH) with Low Complexity problem is ...

Coding for Optical Communications – Can We Approach the Shannon Limit With Low Complexity?

Approaching capacity with low complexity is a very challenging task. In ...

Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT

Recently, ChatGPT has attracted great attention, as it can generate flue...

Improving Simple Models with Confidence Profiles

In this paper, we propose a new method called ProfWeight for transferrin...