Quantized Memory-Augmented Neural Networks

by   Seongsik Park, et al.

Memory-augmented neural networks (MANNs) refer to a class of neural network models equipped with external memory (such as neural Turing machines and memory networks). These neural networks outperform conventional recurrent neural networks (RNNs) in terms of learning long-term dependency, allowing them to solve intriguing AI tasks that would otherwise be hard to address. This paper concerns the problem of quantizing MANNs. Quantization is known to be effective when we deploy deep models on embedded systems with limited resources. Furthermore, quantization can substantially reduce the energy consumption of the inference procedure. These benefits justify recent developments of quantized multi layer perceptrons, convolutional networks, and RNNs. However, no prior work has reported the successful quantization of MANNs. The in-depth analysis presented here reveals various challenges that do not appear in the quantization of the other networks. Without addressing them properly, quantized MANNs would normally suffer from excessive quantization error which leads to degraded performance. In this paper, we identify memory addressing (specifically, content-based addressing) as the main reason for the performance degradation and propose a robust quantization method for MANNs to address the challenge. In our experiments, we achieved a computation-energy gain of 22x with 8-bit fixed-point and binary quantization compared to the floating-point implementation. Measured on the bAbI dataset, the resulting model, named the quantized MANN (Q-MANN), improved the error rate by 46 fixed-point and binary quantization, respectively, compared to the MANN quantized using conventional techniques.


page 4

page 6


A Survey on Methods and Theories of Quantized Neural Networks

Deep neural networks are the state-of-the-art methods for many real-worl...

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Neural network quantization is a promising compression technique to redu...

Alternating Multi-bit Quantization for Recurrent Neural Networks

Recurrent neural networks have achieved excellent performance in many ap...

Fixed-Point Performance Analysis of Recurrent Neural Networks

Recurrent neural networks have shown excellent performance in many appli...

Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond

Quantized Neural Networks (QNNs) use low bit-width fixed-point numbers f...

Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

We present a novel technique, called Term Revealing (TR), for furthering...

Impact of Low-bitwidth Quantization on the Adversarial Robustness for Embedded Neural Networks

As the will to deploy neural networks models on embedded systems grows, ...