Is an Object-Centric Video Representation Beneficial for Transfer?

07/20/2022
by   Chuhan Zhang, et al.
6

The objective of this work is to learn an object-centric video representation, with the aim of improving transferability to novel tasks, i.e., tasks different from the pre-training task of action classification. To this end, we introduce a new object-centric video recognition model based on a transformer architecture. The model learns a set of object-centric summary vectors for the video, and uses these vectors to fuse the visual and spatio-temporal trajectory `modalities' of the video clip. We also introduce a novel trajectory contrast loss to further enhance objectness in these summary vectors. With experiments on four datasets – SomethingSomething-V2, SomethingElse, Action Genome and EpicKitchens – we show that the object-centric model outperforms prior video representations (both object-agnostic and object-aware), when: (1) classifying actions on unseen objects and unseen environments; (2) low-shot learning to novel classes; (3) linear probe to other downstream tasks; as well as (4) for standard action classification.

READ FULL TEXT

page 11

page 24

page 29

research
02/23/2023

Object-Centric Video Prediction via Decoupling of Object Dynamics and Interactions

We propose a novel framework for the task of object-centric video predic...
research
08/15/2023

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

We introduce an object-aware decoder for improving the performance of sp...
research
11/25/2022

Interaction Visual Transformer for Egocentric Action Anticipation

Human-object interaction is one of the most important visual cues that h...
research
05/18/2023

Paxion: Patching Action Knowledge in Video-Language Foundation Models

Action knowledge involves the understanding of textual, visual, and temp...
research
06/08/2022

Patch-based Object-centric Transformers for Efficient Video Generation

In this work, we present Patch-based Object-centric Video Transformer (P...
research
05/06/2021

Object-centric Video Prediction without Annotation

In order to interact with the world, agents must be able to predict the ...
research
06/16/2023

ALP: Action-Aware Embodied Learning for Perception

Current methods in training and benchmarking vision models exhibit an ov...

Please sign up or login with your details

Forgot password? Click here to reset