Demystifying BERT: Implications for Accelerator Design

04/14/2021
by   Suchita Pati, et al.
0

Transfer learning in natural language processing (NLP), as realized using models like BERT (Bi-directional Encoder Representation from Transformer), has significantly improved language representation with models that can tackle challenging language problems. Consequently, these applications are driving the requirements of future systems. Thus, we focus on BERT, one of the most popular NLP transfer learning algorithms, to identify how its algorithmic behavior can guide future accelerator design. To this end, we carefully profile BERT training and identify key algorithmic behaviors which are worthy of attention in accelerator design. We observe that while computations which manifest as matrix multiplication dominate BERT's overall runtime, as in many convolutional neural networks, memory-intensive computations also feature prominently. We characterize these computations, which have received little attention so far. Further, we also identify heterogeneity in compute-intensive BERT computations and discuss software and possible hardware mechanisms to further optimize these computations. Finally, we discuss implications of these behaviors as networks get larger and use distributed training environments, and how techniques such as micro-batching and mixed-precision training scale. Overall, our analysis identifies holistic solutions to optimize systems for BERT-like models.

READ FULL TEXT

page 4

page 8

page 10

page 11

research
08/17/2022

Boosting Distributed Training Performance of the Unpadded BERT Model

Pre-training models are an important tool in Natural Language Processing...
research
03/04/2021

Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing

BERT is the most recent Transformer-based model that achieves state-of-t...
research
09/18/2020

Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer

Designing hardware accelerators for deep neural networks (DNNs) has been...
research
06/28/2023

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

In recent years, Transformer-based language models have become the stand...
research
06/18/2020

I-BERT: Inductive Generalization of Transformer to Arbitrary Context Lengths

Self-attention has emerged as a vital component of state-of-the-art sequ...
research
01/06/2020

Cross-Dataset Design Discussion Mining

Being able to identify software discussions that are primarily about des...
research
05/14/2020

NIT-Agartala-NLP-Team at SemEval-2020 Task 8: Building Multimodal Classifiers to tackle Internet Humor

The paper describes the systems submitted to SemEval-2020 Task 8: Memoti...

Please sign up or login with your details

Forgot password? Click here to reset