Mixed-TD: Efficient Neural Network Accelerator with Layer-Specific Tensor Decomposition

06/08/2023
by   Zhewen Yu, et al.
0

Neural Network designs are quite diverse, from VGG-style to ResNet-style, and from Convolutional Neural Networks to Transformers. Towards the design of efficient accelerators, many works have adopted a dataflow-based, inter-layer pipelined architecture, with a customised hardware towards each layer, achieving ultra high throughput and low latency. The deployment of neural networks to such dataflow architecture accelerators is usually hindered by the available on-chip memory as it is desirable to preload the weights of neural networks on-chip to maximise the system performance. To address this, networks are usually compressed before the deployment through methods such as pruning, quantization and tensor decomposition. In this paper, a framework for mapping CNNs onto FPGAs based on a novel tensor decomposition method called Mixed-TD is proposed. The proposed method applies layer-specific Singular Value Decomposition (SVD) and Canonical Polyadic Decomposition (CPD) in a mixed manner, achieving 1.73x to 10.29x throughput per DSP to state-of-the-art CNNs. Our work is open-sourced: https://github.com/Yu-Zhewen/Mixed-TD

READ FULL TEXT
research
12/01/2018

DeCoILFNet: Depth Concatenation and Inter-Layer Fusion based ConvNet Accelerator

Convolutional Neural Networks (CNNs) are rapidly gaining popularity in v...
research
09/03/2020

Layer-specific Optimization for Mixed Data Flow with Mixed Precision in FPGA Design for CNN-based Object Detectors

Convolutional neural networks (CNNs) require both intensive computation ...
research
08/09/2022

Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA

Convolutional Neural Networks (CNNs) reach high accuracies in various ap...
research
11/10/2022

PhotoFourier: A Photonic Joint Transform Correlator-Based Neural Network Accelerator

The last few years have seen a lot of work to address the challenge of l...
research
05/07/2021

ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models

With new accelerator hardware for DNN, the computing power for AI applic...
research
04/11/2021

TedNet: A Pytorch Toolkit for Tensor Decomposition Networks

Tensor Decomposition Networks(TDNs) prevail for their inherent compact a...
research
08/10/2021

Tensor Yard: One-Shot Algorithm of Hardware-Friendly Tensor-Train Decomposition for Convolutional Neural Networks

Nowadays Deep Learning became widely used in many economic, technical an...

Please sign up or login with your details

Forgot password? Click here to reset