BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models

06/29/2023
by   Phuoc-Hoan Charles Le, et al.
0

With the increasing popularity and the increasing size of vision transformers (ViTs), there has been an increasing interest in making them more efficient and less computationally costly for deployment on edge devices with limited computing resources. Binarization can be used to help reduce the size of ViT models and their computational cost significantly, using popcount operations when the weights and the activations are in binary. However, ViTs suffer a larger performance drop when directly applying convolutional neural network (CNN) binarization methods or existing binarization methods to binarize ViTs compared to CNNs on datasets with a large number of classes such as ImageNet-1k. With extensive analysis, we find that binary vanilla ViTs such as DeiT miss out on a lot of key architectural properties that CNNs have that allow binary CNNs to have much higher representational capability than binary vanilla ViT. Therefore, we propose BinaryViT, in which inspired by the CNN architecture, we include operations from the CNN architecture into a pure ViT architecture to enrich the representational capability of a binary ViT without introducing convolutions. These include an average pooling layer instead of a token pooling layer, a block that contains multiple average pooling branches, an affine transformation right before the addition of each main residual connection, and a pyramid structure. Experimental results on the ImageNet-1k dataset show the effectiveness of these operations that allow a binary pure ViT model to be competitive with previous state-of-the-art (SOTA) binary CNN models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2022

BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons

This paper studies the problem of designing compact binary architectures...
research
08/08/2020

Towards Lossless Binary Convolutional Neural Networks Using Piecewise Approximation

Binary Convolutional Neural Networks (CNNs) can significantly reduce the...
research
02/27/2018

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

We propose a network for Congested Scene Recognition called CSRNet to pr...
research
08/08/2020

NASB: Neural Architecture Search for Binary Convolutional Neural Networks

Binary Convolutional Neural Networks (CNNs) have significantly reduced t...
research
12/22/2020

FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations

Binary neural networks (BNNs) have 1-bit weights and activations. Such n...
research
06/28/2017

Toward Computation and Memory Efficient Neural Network Acoustic Models with Binary Weights and Activations

Neural network acoustic models have significantly advanced state of the ...
research
07/09/2018

Pooling Pyramid Network for Object Detection

We'd like to share a simple tweak of Single Shot Multibox Detector (SSD)...

Please sign up or login with your details

Forgot password? Click here to reset