Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs

06/30/2020
by   Ang Li, et al.
0

Despite foreseeing tremendous speedups over conventional deep neural networks, the performance advantage of binarized neural networks (BNNs) has merely been showcased on general-purpose processors such as CPUs and GPUs. In fact, due to being unable to leverage bit-level-parallelism with a word-based architecture, GPUs have been criticized for extremely low utilization (1 executing BNNs. Consequently, the latest tensorcores in NVIDIA Turing GPUs start to experimentally support bit computation. In this work, we look into this brand new bit computation capability and characterize its unique features. We show that the stride of memory access can significantly affect performance delivery and a data-format co-design is highly desired to support the tensorcores for achieving superior performance than existing software solutions without tensorcores. We realize the tensorcore-accelerated BNN design, particularly the major functions for fully-connect and convolution layers – bit matrix multiplication and bit convolution. Evaluations on two NVIDIA Turing GPUs show that, with ResNet-18, our BTC-BNN design can process ImageNet at a rate of 5.6K images per second, 77 approach is released on https://github.com/pnnl/TCBNN.

READ FULL TEXT

page 2

page 5

page 6

page 11

research
06/23/2021

APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Over the years, accelerating neural networks with quantization has been ...
research
05/23/2023

NeuralMatrix: Moving Entire Neural Networks to General Matrix Multiplication for Efficient Inference

In this study, we introduce NeuralMatrix, a novel framework that enables...
research
02/28/2018

Escort: Efficient Sparse Convolutional Neural Networks on GPUs

Deep neural networks have achieved remarkable accuracy in many artificia...
research
05/03/2022

SIMD^2: A Generalized Matrix Instruction Set for Accelerating Tensor Computation beyond GEMM

Matrix-multiplication units (MXUs) are now prevalent in every computing ...
research
05/19/2017

Espresso: Efficient Forward Propagation for BCNNs

There are many applications scenarios for which the computational perfor...
research
11/14/2015

8-Bit Approximations for Parallelism in Deep Learning

The creation of practical deep learning data-products often requires par...
research
04/11/2020

Bit-Parallel Vector Composability for Neural Acceleration

Conventional neural accelerators rely on isolated self-sufficient functi...

Please sign up or login with your details

Forgot password? Click here to reset