Stochastic Layers in Vision Transformers

12/30/2021
by   Nikola Popovic, et al.
0

We introduce fully stochastic layers in vision transformers, without causing any severe drop in performance. The additional stochasticity boosts the robustness of visual features and strengthens privacy. In this process, linear layers with fully stochastic parameters are used, both during training and inference, to transform the feature activations of each multilayer perceptron. Such stochastic linear operations preserve the topological structure, formed by the set of tokens passing through the shared multilayer perceptron. This operation encourages the learning of the recognition task to rely on the topological structures of the tokens, instead of their values, which in turn offers the desired robustness and privacy of the visual features. In this paper, we use our features for three different applications, namely, adversarial robustness, network calibration, and feature privacy. Our features offer exciting results on those tasks. Furthermore, we showcase an experimental setup for federated and transfer learning, where the vision transformers with stochastic layers are again shown to be well behaved. Our source code will be made publicly available.

READ FULL TEXT

page 2

page 12

research
04/27/2023

Vision Conformer: Incorporating Convolutions into Vision Transformer Layers

Transformers are popular neural network models that use layers of self-a...
research
03/29/2021

On the Adversarial Robustness of Visual Transformers

Following the success in advancing natural language processing and under...
research
07/04/2022

Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks

In this paper, we present a new approach for model acceleration by explo...
research
12/28/2021

APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers

Federated learning frameworks typically require collaborators to share t...
research
05/30/2023

Contextual Vision Transformers for Robust Representation Learning

We present Contextual Vision Transformers (ContextViT), a method for pro...
research
05/22/2022

Dynamic Query Selection for Fast Visual Perceiver

Transformers have been matching deep convolutional networks for vision a...

Please sign up or login with your details

Forgot password? Click here to reset