A2P-MANN: Adaptive Attention Inference Hops Pruned Memory-Augmented Neural Networks

01/24/2021
by   Mohsen Ahmadzadeh, et al.
0

In this work, to limit the number of required attention inference hops in memory-augmented neural networks, we propose an online adaptive approach called A2P-MANN. By exploiting a small neural network classifier, an adequate number of attention inference hops for the input query is determined. The technique results in elimination of a large number of unnecessary computations in extracting the correct answer. In addition, to further lower computations in A2P-MANN, we suggest pruning weights of the final FC (fully-connected) layers. To this end, two pruning approaches, one with negligible accuracy loss and the other with controllable loss on the final accuracy, are developed. The efficacy of the technique is assessed by using the twenty question-answering (QA) tasks of bAbI dataset. The analytical assessment reveals, on average, more than 42 fewer computations compared to the baseline MANN at the cost of less than 1 accuracy loss. In addition, when used along with the previously published zero-skipping technique, a computation count reduction of up to 68 achieved. Finally, when the proposed approach (without zero-skipping) is implemented on the CPU and GPU platforms, up to 43 achieved.

READ FULL TEXT

page 1

page 8

research
07/07/2022

HE-PEx: Efficient Machine Learning under Homomorphic Encryption using Pruning, Permutation and Expansion

Privacy-preserving neural network (NN) inference solutions have recently...
research
01/21/2019

Partition Pruning: Parallelization-Aware Pruning for Deep Neural Networks

Parameters of recent neural networks require a huge amount of memory. Th...
research
06/02/2021

On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers

How much information do NLP tasks really need from a transformer's atten...
research
10/22/2018

A Fully Attention-Based Information Retriever

Recurrent neural networks are now the state-of-the-art in natural langua...
research
06/11/2018

DropBack: Continuous Pruning During Training

We introduce a technique that compresses deep neural networks both durin...
research
02/01/2018

Adaptive Memory Networks

We present Adaptive Memory Networks (AMN) that processes input-question ...
research
06/21/2021

Efficient Inference via Universal LSH Kernel

Large machine learning models achieve unprecedented performance on vario...

Please sign up or login with your details

Forgot password? Click here to reset