Streaming MANN: A Streaming-Based Inference for Energy-Efficient Memory-Augmented Neural Networks

05/21/2018
by   Seongsik Park, et al.
0

With the successful development of artificial intelligence using deep learning, there has been growing interest in its deployment. The mobile environment is the closest hardware platform to real life, and it has become an important platform for the success or failure of artificial intelligence. Memory-augmented neural networks (MANNs) are neural networks proposed to efficiently handle question-and-answer (Q&A) tasks, well-suited for mobile devices. As a MANN requires various types of operations and recurrent data paths, it is difficult to accelerate the inference in the structure designed for other conventional neural network models, which is one of the biggest obstacles to deploying MANNs in mobile environments. To address the aforementioned issues, we propose Streaming MANN. This is the first attempt to implement and demonstrate the architecture for energy-efficient inference of MANNs with the concept of streaming processing. To achieve the full potential of the streaming process, we propose a novel approach, called inference thresholding, using Bayesian approach considering the characteristics of natural language processing (NLP) tasks. To evaluate our proposed approaches, we implemented the architecture and method in a field-programmable gate array (FPGA) which is suitable for streaming processing. We measured the execution time and power consumption of the inference for the bAbI dataset. The experimental results showed that the performance efficiency per energy (FLOPS/kJ) of the Streaming MANN increased by a factor of up to about 126 compared to the results of NVIDIA TITAN V, and up to 140 if inference thresholding is applied.

READ FULL TEXT

page 4

page 6

page 7

page 12

page 13

page 18

page 19

research
01/25/2021

AdderNet and its Minimalist Hardware Design for Energy-Efficient Artificial Intelligence

Convolutional neural networks (CNN) have been widely used for boosting t...
research
11/06/2017

A General Neural Network Hardware Architecture on FPGA

Field Programmable Gate Arrays (FPGAs) plays an increasingly important r...
research
03/25/2023

Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures

Executing machine learning inference tasks on resource-constrained edge ...
research
07/16/2020

FTRANS: Energy-Efficient Acceleration of Transformers using FPGA

In natural language processing (NLP), the "Transformer" architecture was...
research
10/04/2018

Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA

Binarized Neural Network (BNN) removes bitwidth redundancy in classical ...
research
02/18/2022

Towards Enabling Dynamic Convolution Neural Network Inference for Edge Intelligence

Deep learning applications have achieved great success in numerous real-...
research
11/30/2021

SAMO: Optimised Mapping of Convolutional Neural Networks to Streaming Architectures

Toolflows that map Convolutional Neural Network (CNN) models to Field Pr...

Please sign up or login with your details

Forgot password? Click here to reset