DeepAI AI Chat
Log In Sign Up

FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA

02/09/2020
by   Shehzeen Hussain, et al.
University of California, San Diego
0

Autoregressive convolutional neural networks (CNNs) have been widely exploited for sequence generation tasks such as audio synthesis, language modeling and neural machine translation. WaveNet is a deep autoregressive CNN composed of several stacked layers of dilated convolution that is used for sequence generation. While WaveNet produces state-of-the art audio generation results, the naive inference implementation is quite slow; it takes a few minutes to generate just one second of audio on a high-end GPU. In this work, we develop the first accelerator platform FastWave for autoregressive convolutional neural networks, and address the associated design challenges. We design the Fast-Wavenet inference model in Vivado HLS and perform a wide range of optimizations including fixed-point implementation, array partitioning and pipelining. Our model uses a fully parameterized parallel architecture for fast matrix-vector multiplication that enables per-layer customized latency fine-tuning for further throughput improvement. Our experiments comparatively assess the trade-off between throughput and resource utilization for various optimizations. Our best WaveNet design on the Xilinx XCVU13P FPGA that uses only on-chip memory, achieves 66 faster generation speed compared to CPU implementation and 11 faster generation speed than GPU implementation.

READ FULL TEXT

page 1

page 7

07/09/2021

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

The combination of Winograd's algorithm and systolic array architecture ...
10/21/2019

Automatic Generation of Multi-precision Multi-arithmetic CNN Accelerators for FPGAs

Modern deep Convolutional Neural Networks (CNNs) are computationally dem...
04/20/2017

Fast Generation for Convolutional Autoregressive Models

Convolutional autoregressive models have recently demonstrated state-of-...
05/07/2017

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

In recent years deep learning algorithms have shown extremely high perfo...
09/17/2019

A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks

Intensive computation is entering data centers with multiple workloads o...
04/27/2020

A scalable and efficient convolutional neural network accelerator using HLS for a System on Chip design

This paper presents a configurable Convolutional Neural Network Accelera...
04/16/2015

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

We present Caffe con Troll (CcT), a fully compatible end-to-end version ...