DeepAI AI Chat
Log In Sign Up

A Highly Parallel FPGA Implementation of Sparse Neural Network Training

by   Sourya Dey, et al.
University of Southern California

We demonstrate an FPGA implementation of a parallel and reconfigurable architecture for sparse neural networks, capable of on-chip training and inference. The network connectivity uses pre-determined, structured sparsity to significantly lower memory and computational requirements. The architecture uses a notion of edge-processing and is highly pipelined and parallelized, decreasing training times. Moreover, the device can be reconfigured to trade off resource utilization with training time to fit networks and datasets of varying sizes. The overall effect is to reduce network complexity by more than 8x while maintaining high fidelity of inference results. This complexity reduction enables significantly greater exploration of network hyperparameters and structure. As proof of concept, we show implementation results on an Artix-7 FPGA.


Accelerating Training of Deep Neural Networks via Sparse Edge Processing

We propose a reconfigurable hardware architecture for deep neural networ...

FPGA Implementation of Multi-Layer Machine Learning Equalizer with On-Chip Training

We design and implement an adaptive machine learning equalizer that alte...

Pre-Defined Sparse Neural Networks with Hardware Acceleration

Neural networks have proven to be extremely powerful tools for modern ar...

A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA

Deep Neural Networks are becoming the de-facto standard models for image...

Accelerating Recurrent Neural Networks for Gravitational Wave Experiments

This paper presents novel reconfigurable architectures for reducing the ...

Content Addressable Parallel Processors on a FPGA

In this short article, we report on the implementation of a Content Addr...