E-BATCH: Energy-Efficient and High-Throughput RNN Batching

by   Franyell Silfa, et al.

Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding since the batched input sequences may largely differ in length. Schemes that dynamically update the batch every few time-steps avoid padding. However, they require executing different RNN layers in a short timespan, decreasing energy efficiency. Hence, we propose E-BATCH, a low-latency and energy-efficient batching scheme tailored to RNN accelerators. It consists of a runtime system and effective hardware support. The runtime concatenates multiple sequences to create large batches, resulting in substantial energy savings. Furthermore, the accelerator notifies it when the evaluation of a sequence is done, so that a new sequence can be immediately added to a batch, thus largely reducing the amount of padding. E-BATCH dynamically controls the number of time-steps evaluated per batch to achieve the best trade-off between latency and energy efficiency for the given hardware platform. We evaluate E-BATCH on top of E-PUR and TPU. In E-PUR, E-BATCH improves throughput by 1.8x and energy-efficiency by 3.6x, whereas in TPU, it improves throughput by 2.1x and energy-efficiency by 1.6x, over the state-of-the-art.



There are no comments yet.


page 4

page 6

page 8


A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks

FPGA-based hardware accelerators for convolutional neural networks (CNNs...

FPMax: a 106GFLOPS/W at 217GFLOPS/mm2 Single-Precision FPU, and a 43.7GFLOPS/W at 74.6GFLOPS/mm2 Double-Precision FPU, in 28nm UTBB FDSOI

FPMax implements four FPUs optimized for latency or throughput workloads...

EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference

Low-latency, low-power portable recurrent neural network (RNN) accelerat...

ROBIN: A Robust Optical Binary Neural Network Accelerator

Domain specific neural network accelerators have garnered attention beca...

Revisiting Batch Normalization for Training Low-latency Deep Spiking Neural Networks from Scratch

Spiking Neural Networks (SNNs) have recently emerged as an alternative t...

RNNAccel: A Fusion Recurrent Neural Network Accelerator for Edge Intelligence

Many edge devices employ Recurrent Neural Networks (RNN) to enhance thei...

Invocation-driven Neural Approximate Computing with a Multiclass-Classifier and Multiple Approximators

Neural approximate computing gains enormous energy-efficiency at the cos...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.