DeepAI AI Chat
Log In Sign Up

Representation learning from videos in-the-wild: An object-centric approach

by   Rob Romijnders, et al.

We propose a method to learn image representations from uncurated videos. We combine a supervised loss from off-the-shelf object detectors and self-supervised losses which naturally arise from the video-shot-frame-object hierarchy present in each video. We report competitive results on 19 transfer learning tasks of the Visual Task Adaptation Benchmark (VTAB), and on 8 out-of-distribution-generalization tasks, and discuss the benefits and shortcomings of the proposed approach. In particular, it improves over the baseline on all 18/19 few-shot learning tasks and 8/8 out-of-distribution generalization tasks. Finally, we perform several ablation studies and analyze the impact of the pretrained object detector on the performance across this suite of tasks.


page 1

page 3

page 13


Self-Supervised Learning of Video-Induced Visual Invariances

We propose a general framework for self-supervised learning of transfera...

Self-supervised video pretraining yields strong image representations

Videos contain far more information than still images and hold the poten...

A Study on Representation Transfer for Few-Shot Learning

Few-shot classification aims to learn to classify new object categories ...

Few-Shot Learning for Video Object Detection in a Transfer-Learning Scheme

Different from static images, videos contain additional temporal and spa...

A Rationale-Centric Framework for Human-in-the-loop Machine Learning

We present a novel rationale-centric framework with human-in-the-loop – ...

Evolving Losses for Unsupervised Video Representation Learning

We present a new method to learn video representations from large-scale ...

A Self-Supervised Framework for Function Learning and Extrapolation

Understanding how agents learn to generalize – and, in particular, to ex...