AntMan: Sparse Low-Rank Compression to Accelerate RNN inference

10/02/2019
by   Samyam Rajbhandari, et al.
0

Wide adoption of complex RNN based models is hindered by their inference performance, cost and memory requirements. To address this issue, we develop AntMan, combining structured sparsity with low-rank decomposition synergistically, to reduce model computation, size and execution time of RNNs while attaining desired accuracy. AntMan extends knowledge distillation based training to learn the compressed models efficiently. Our evaluation shows that AntMan offers up to 100x computation reduction with less than 1pt accuracy drop for language and machine reading comprehension models. Our evaluation also shows that for a given accuracy target, AntMan produces 5x smaller models than the state-of-art. Lastly, we show that AntMan offers super-linear speed gains compared to theoretical speedup, demonstrating its practical value on commodity hardware.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2019

Making Neural Machine Reading Comprehension Faster

This study aims at solving the Machine Reading Comprehension problem whe...
research
11/06/2017

Neural Speed Reading via Skim-RNN

Inspired by the principles of speed reading, we introduce Skim-RNN, a re...
research
08/23/2018

Attention-Guided Answer Distillation for Machine Reading Comprehension

Despite that current reading comprehension systems have achieved signifi...
research
10/04/2019

Pushing the limits of RNN Compression

Recurrent Neural Networks (RNN) can be difficult to deploy on resource c...
research
01/20/2023

HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks

Low-rank compression is an important model compression strategy for obta...
research
10/19/2018

Lightweight Convolutional Approaches to Reading Comprehension on SQuAD

Current state-of-the-art reading comprehension models rely heavily on re...

Please sign up or login with your details

Forgot password? Click here to reset