Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks

09/28/2022
by   Zhiyang Chen, et al.
0

Visual tasks vary a lot in their output formats and concerned contents, therefore it is hard to process them with an identical structure. One main obstacle lies in the high-dimensional outputs in object-level visual tasks. In this paper, we propose an object-centric vision framework, Obj2Seq. Obj2Seq takes objects as basic units, and regards most object-level visual tasks as sequence generation problems of objects. Therefore, these visual tasks can be decoupled into two steps. First recognize objects of given categories, and then generate a sequence for each of these objects. The definition of the output sequences varies for different tasks, and the model is supervised by matching these sequences with ground-truth targets. Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks. When experimenting on MS COCO, Obj2Seq achieves 45.7 and 65.0 to be generally applied to different visual tasks. Code has been made available at: https://github.com/CASIA-IVA-Lab/Obj2Seq.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2021

Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

Traditional approaches for learning 3D object categories have been predo...
research
09/28/2022

Target Features Affect Visual Search, A Study of Eye Fixations

Visual Search is referred to the task of finding a target object among a...
research
12/01/2022

GRiT: A Generative Region-to-text Transformer for Object Understanding

This paper presents a Generative RegIon-to-Text transformer, GRiT, for o...
research
10/12/2020

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Discriminatively localizing sounding objects in cocktail-party, i.e., mi...
research
04/09/2023

Curricular Object Manipulation in LiDAR-based Object Detection

This paper explores the potential of curriculum learning in LiDAR-based ...
research
07/29/2019

FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation

Over the past few years, we have witnessed the success of deep learning ...
research
09/29/2016

Redefining Binarization and the Visual Archetype

Although binarization is considered passe, it still remains a highly pop...

Please sign up or login with your details

Forgot password? Click here to reset