ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

01/02/2023
by   Sanghyun Woo, et al.
0

Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7 650M Huge model that achieves a state-of-the-art 88.9 public training data.

READ FULL TEXT

page 4

page 12

research
06/22/2021

Unsupervised Object-Level Representation Learning from Scene Images

Contrastive self-supervised learning has largely narrowed the gap to sup...
research
01/21/2021

Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning

We present a plug-in replacement for batch normalization (BN) called exp...
research
05/05/2023

A vector quantized masked autoencoder for audiovisual speech emotion recognition

While fully-supervised models have been shown to be effective for audiov...
research
06/22/2023

Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies

Automatic singing voice understanding tasks, such as singer identificati...
research
08/25/2017

Multi-task Self-Supervised Visual Learning

We investigate methods for combining multiple self-supervised tasks--i.e...
research
12/07/2020

Self-Supervision Closes the Gap Between Weak and Strong Supervision in Histology

One of the biggest challenges for applying machine learning to histopath...
research
03/22/2022

Self-supervision through Random Segments with Autoregressive Coding (RandSAC)

Inspired by the success of self-supervised autoregressive representation...

Please sign up or login with your details

Forgot password? Click here to reset