Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

03/27/2023
by   Donggyun Kim, et al.
0

Dense prediction tasks are a fundamental class of problems in computer vision. As supervised methods suffer from high pixel-wise labeling cost, a few-shot learning solution that can learn any dense task from a few labeled images is desired. Yet, current few-shot learning methods target a restricted set of tasks such as semantic segmentation, presumably due to challenges in designing a general and unified model that is able to flexibly and efficiently adapt to arbitrary tasks of unseen semantics. We propose Visual Token Matching (VTM), a universal few-shot learner for arbitrary dense prediction tasks. It employs non-parametric matching on patch-level embedded tokens of images and labels that encapsulates all tasks. Also, VTM flexibly adapts to any task with a tiny amount of task-specific parameters that modulate the matching algorithm. We implement VTM as a powerful hierarchical encoder-decoder architecture involving ViT backbones where token matching is performed at multiple feature hierarchies. We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks. Surprisingly, it is competitive with fully supervised baselines using only 10 labeled examples of novel tasks (0.004 sometimes outperforms using 0.1 https://github.com/GitGyun/visual_token_matching.

READ FULL TEXT

page 4

page 9

page 14

page 18

page 19

page 21

page 24

page 25

research
09/11/2017

One-Shot Learning for Semantic Segmentation

Low-shot learning methods for image classification support learning from...
research
03/12/2019

Dense Classification and Implanting for Few-Shot Learning

Training deep neural networks from few examples is a highly challenging ...
research
08/19/2023

UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning

Animal visual perception is an important technique for automatically mon...
research
11/28/2022

Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization

A hallmark of the deep learning era for computer vision is the successfu...
research
03/14/2022

Self-Promoted Supervision for Few-Shot Transformer

The few-shot learning ability of vision transformers (ViTs) is rarely in...
research
12/16/2020

CompositeTasking: Understanding Images by Spatial Composition of Tasks

We define the concept of CompositeTasking as the fusion of multiple, spa...

Please sign up or login with your details

Forgot password? Click here to reset