NPE: An FPGA-based Overlay Processor for Natural Language Processing

04/13/2021
by   Hamza Khan, et al.
0

In recent years, transformer-based models have shown state-of-the-art results for Natural Language Processing (NLP). In particular, the introduction of the BERT language model brought with it breakthroughs in tasks such as question answering and natural language inference, advancing applications that allow humans to interact naturally with embedded devices. FPGA-based overlay processors have been shown as effective solutions for edge image and video processing applications, which mostly rely on low precision linear matrix operations. In contrast, transformer-based NLP techniques employ a variety of higher precision nonlinear operations with significantly higher frequency. We present NPE, an FPGA-based overlay processor that can efficiently execute a variety of NLP models. NPE offers software-like programmability to the end user and, unlike FPGA designs that implement specialized accelerators for each nonlinear function, can be upgraded for future NLP models without requiring reconfiguration. We demonstrate that NPE can meet real-time conversational AI latency targets for the BERT language model with 4× lower power than CPUs and 6× lower power than GPUs. We also show NPE uses 3× fewer FPGA resources relative to comparable BERT network-specific accelerators in the literature. NPE provides a cost-effective and power-efficient FPGA-based solution for Natural Language Processing at the edge.

READ FULL TEXT

page 5

page 6

research
01/05/2021

I-BERT: Integer-only BERT Quantization

Transformer based models, like BERT and RoBERTa, have achieved state-of-...
research
09/22/2022

DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation

Transformer is a deep learning language model widely used for natural la...
research
06/30/2022

The Topological BERT: Transforming Attention into Topology for Natural Language Processing

In recent years, the introduction of the Transformer models sparked a re...
research
03/04/2021

Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing

BERT is the most recent Transformer-based model that achieves state-of-t...
research
12/16/2020

No Budget? Don't Flex! Cost Consideration when Planning to Adopt NLP for Your Business

Recent advances in Natural Language Processing (NLP) have largely pushed...
research
11/30/2021

A Comparative Study of Transformers on Word Sense Disambiguation

Recent years of research in Natural Language Processing (NLP) have witne...
research
06/22/2022

Answer Fast: Accelerating BERT on the Tensor Streaming Processor

Transformers have become a predominant machine learning workload, they a...

Please sign up or login with your details

Forgot password? Click here to reset