E-BATCH: Energy-Efficient and High-Throughput RNN Batching

09/22/2020
by   Franyell Silfa, et al.
0

Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding since the batched input sequences may largely differ in length. Schemes that dynamically update the batch every few time-steps avoid padding. However, they require executing different RNN layers in a short timespan, decreasing energy efficiency. Hence, we propose E-BATCH, a low-latency and energy-efficient batching scheme tailored to RNN accelerators. It consists of a runtime system and effective hardware support. The runtime concatenates multiple sequences to create large batches, resulting in substantial energy savings. Furthermore, the accelerator notifies it when the evaluation of a sequence is done, so that a new sequence can be immediately added to a batch, thus largely reducing the amount of padding. E-BATCH dynamically controls the number of time-steps evaluated per batch to achieve the best trade-off between latency and energy efficiency for the given hardware platform. We evaluate E-BATCH on top of E-PUR and TPU. In E-PUR, E-BATCH improves throughput by 1.8x and energy-efficiency by 3.6x, whereas in TPU, it improves throughput by 2.1x and energy-efficiency by 1.6x, over the state-of-the-art.

READ FULL TEXT

page 4

page 6

page 8

research
02/20/2017

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks

FPGA-based hardware accelerators for convolutional neural networks (CNNs...
research
06/24/2016

FPMax: a 106GFLOPS/W at 217GFLOPS/mm2 Single-Precision FPU, and a 43.7GFLOPS/W at 74.6GFLOPS/mm2 Double-Precision FPU, in 28nm UTBB FDSOI

FPMax implements four FPUs optimized for latency or throughput workloads...
research
12/25/2020

EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference

Low-latency, low-power portable recurrent neural network (RNN) accelerat...
research
02/17/2022

MATCHA: A Fast and Energy-Efficient Accelerator for Fully Homomorphic Encryption over the Torus

Fully Homomorphic Encryption over the Torus (TFHE) allows arbitrary comp...
research
10/26/2020

RNNAccel: A Fusion Recurrent Neural Network Accelerator for Edge Intelligence

Many edge devices employ Recurrent Neural Networks (RNN) to enhance thei...
research
02/14/2022

Vau da muntanialas: Energy-efficient multi-die scalable acceleration of RNN inference

Recurrent neural networks such as Long Short-Term Memories (LSTMs) learn...
research
03/11/2023

Analysis and Design of Energy-Efficient Bus Encoding Schemes

In computer system buses, most of the energy is spent to change the volt...

Please sign up or login with your details

Forgot password? Click here to reset