DeepAI AI Chat
Log In Sign Up

Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

03/27/2023
by   Donggyun Kim, et al.
KAIST 수리과학과
Microsoft
0

Dense prediction tasks are a fundamental class of problems in computer vision. As supervised methods suffer from high pixel-wise labeling cost, a few-shot learning solution that can learn any dense task from a few labeled images is desired. Yet, current few-shot learning methods target a restricted set of tasks such as semantic segmentation, presumably due to challenges in designing a general and unified model that is able to flexibly and efficiently adapt to arbitrary tasks of unseen semantics. We propose Visual Token Matching (VTM), a universal few-shot learner for arbitrary dense prediction tasks. It employs non-parametric matching on patch-level embedded tokens of images and labels that encapsulates all tasks. Also, VTM flexibly adapts to any task with a tiny amount of task-specific parameters that modulate the matching algorithm. We implement VTM as a powerful hierarchical encoder-decoder architecture involving ViT backbones where token matching is performed at multiple feature hierarchies. We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks. Surprisingly, it is competitive with fully supervised baselines using only 10 labeled examples of novel tasks (0.004 sometimes outperforms using 0.1 https://github.com/GitGyun/visual_token_matching.

READ FULL TEXT

page 4

page 9

page 14

page 18

page 19

page 21

page 24

page 25

09/11/2017

One-Shot Learning for Semantic Segmentation

Low-shot learning methods for image classification support learning from...
03/12/2019

Dense Classification and Implanting for Few-Shot Learning

Training deep neural networks from few examples is a highly challenging ...
08/19/2023

UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning

Animal visual perception is an important technique for automatically mon...
11/28/2022

Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization

A hallmark of the deep learning era for computer vision is the successfu...
03/14/2022

Self-Promoted Supervision for Few-Shot Transformer

The few-shot learning ability of vision transformers (ViTs) is rarely in...
12/16/2020

CompositeTasking: Understanding Images by Spatial Composition of Tasks

We define the concept of CompositeTasking as the fusion of multiple, spa...