RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition

02/19/2020
by   Peiyan Dong, et al.
0

Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones. However, previous RNN compression techniques either suffer from hardware performance overhead due to irregularity or significant accuracy loss due to the preserved regularity for hardware friendliness. In this work, we propose RTMobile that leverages both a novel block-based pruning approach and compiler optimizations to accelerate RNN inference on mobile devices. Our proposed RTMobile is the first work that can achieve real-time RNN inference on mobile platforms. Experimental results demonstrate that RTMobile can significantly outperform existing RNN hardware acceleration methods in terms of inference accuracy and time. Compared with prior work on FPGA, RTMobile using Adreno 640 embedded GPU on GRU can improve the energy-efficiency by about 40× while maintaining the same inference time.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

research
12/12/2018

E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs

Recurrent Neural Networks (RNNs) are becoming increasingly important for...
research
03/14/2020

CoCoPIE: Making Mobile AI Sweet As PIE –Compression-Compilation Co-Design Goes a Long Way

Assuming hardware is the major constraint for enabling real-time mobile ...
research
06/03/2017

MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU

In this paper, we explore optimizations to run Recurrent Neural Network ...
research
04/05/2019

Measuring scheduling efficiency of RNNs for NLP applications

Recurrent neural networks (RNNs) have shown state of the art results for...
research
06/04/2018

Dynamically Hierarchy Revolution: DirNet for Compressing Recurrent Neural Network on Mobile Devices

Recurrent neural networks (RNNs) achieve cutting-edge performance on a v...
research
10/28/2020

INT8 Winograd Acceleration for Conv1D Equipped ASR Models Deployed on Mobile Devices

The intensive computation of Automatic Speech Recognition (ASR) models o...
research
05/11/2020

CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks

Recurrent neural networks (RNNs) have been widely adopted in temporal se...

Please sign up or login with your details

Forgot password? Click here to reset