RGB no more: Minimally-decoded JPEG Vision Transformers

11/29/2022
by   Jeongsoo Park, et al.
0

Most neural networks for computer vision are designed to infer using RGB images. However, these RGB images are commonly encoded in JPEG before saving to disk; decoding them imposes an unavoidable overhead for RGB networks. Instead, our work focuses on training Vision Transformers (ViT) directly from the encoded features of JPEG. This way, we can avoid most of the decoding overhead, accelerating data load. Existing works have studied this aspect but they focus on CNNs. Due to how these encoded features are structured, CNNs require heavy modification to their architecture to accept such data. Here, we show that this is not the case for ViTs. In addition, we tackle data augmentation directly on these encoded features, which to our knowledge, has not been explored in-depth for training in this setting. With these two improvements – ViT and data augmentation – we show that our ViT-Ti model achieves up to 39.2 training and 17.9 counterpart.

READ FULL TEXT

page 1

page 2

page 3

page 5

page 6

research
11/26/2021

Data Augmented 3D Semantic Scene Completion with 2D Segmentation Priors

Semantic scene completion (SSC) is a challenging Computer Vision task wi...
research
10/05/2020

How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?

Task-agnostic forms of data augmentation have proven widely effective in...
research
12/16/2021

How to augment your ViTs? Consistency loss and StyleAug, a random style transfer augmentation

The Vision Transformer (ViT) architecture has recently achieved competit...
research
07/16/2021

CutDepth:Edge-aware Data Augmentation in Depth Estimation

It is difficult to collect data on a large scale in a monocular depth es...
research
09/20/2023

CNNs for JPEGs: A Study in Computational Cost

Convolutional neural networks (CNNs) have achieved astonishing advances ...
research
08/19/2021

Neural TMDlayer: Modeling Instantaneous flow of features via SDE Generators

We study how stochastic differential equation (SDE) based ideas can insp...
research
10/06/2022

The Lie Derivative for Measuring Learned Equivariance

Equivariance guarantees that a model's predictions capture key symmetrie...

Please sign up or login with your details

Forgot password? Click here to reset