FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA

02/09/2020
by   Shehzeen Hussain, et al.
0

Autoregressive convolutional neural networks (CNNs) have been widely exploited for sequence generation tasks such as audio synthesis, language modeling and neural machine translation. WaveNet is a deep autoregressive CNN composed of several stacked layers of dilated convolution that is used for sequence generation. While WaveNet produces state-of-the art audio generation results, the naive inference implementation is quite slow; it takes a few minutes to generate just one second of audio on a high-end GPU. In this work, we develop the first accelerator platform FastWave for autoregressive convolutional neural networks, and address the associated design challenges. We design the Fast-Wavenet inference model in Vivado HLS and perform a wide range of optimizations including fixed-point implementation, array partitioning and pipelining. Our model uses a fully parameterized parallel architecture for fast matrix-vector multiplication that enables per-layer customized latency fine-tuning for further throughput improvement. Our experiments comparatively assess the trade-off between throughput and resource utilization for various optimizations. Our best WaveNet design on the Xilinx XCVU13P FPGA that uses only on-chip memory, achieves 66 faster generation speed compared to CPU implementation and 11 faster generation speed than GPU implementation.

READ FULL TEXT

page 1

page 7

research
07/09/2021

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

The combination of Winograd's algorithm and systolic array architecture ...
research
10/21/2019

Automatic Generation of Multi-precision Multi-arithmetic CNN Accelerators for FPGAs

Modern deep Convolutional Neural Networks (CNNs) are computationally dem...
research
04/20/2017

Fast Generation for Convolutional Autoregressive Models

Convolutional autoregressive models have recently demonstrated state-of-...
research
05/07/2017

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

In recent years deep learning algorithms have shown extremely high perfo...
research
06/30/2016

Maximizing CNN Accelerator Efficiency Through Resource Partitioning

Convolutional neural networks (CNNs) are revolutionizing a variety of ma...
research
09/17/2019

A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks

Intensive computation is entering data centers with multiple workloads o...
research
06/30/2022

R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS

This paper introduces R-MelNet, a two-part autoregressive architecture w...

Please sign up or login with your details

Forgot password? Click here to reset