Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts

01/11/2021
by   Kunpeng Li, et al.
6

Learning visual knowledge from massive weakly-labeled web videos has attracted growing research interests thanks to the large corpus of easily accessible video data on the Internet. However, for video action recognition, the action of interest might only exist in arbitrary clips of untrimmed web videos, resulting in high label noises in the temporal space. To address this issue, we introduce a new method for pre-training video action recognition models using queried web videos. Instead of trying to filter out, we propose to convert the potential noises in these queried videos to useful supervision signals by defining the concept of Sub-Pseudo Label (SPL). Specifically, SPL spans out a new set of meaningful "middle ground" label space constructed by extrapolating the original weak labels during video querying and the prior knowledge distilled from a teacher model. Consequently, SPL provides enriched supervision for video models to learn better representations. SPL is fairly simple and orthogonal to popular teacher-student self-training frameworks without extra training cost. We validate the effectiveness of our method on four video action recognition datasets and a weakly-labeled image dataset to study the generalization ability. Experiments show that SPL outperforms several existing pre-training strategies using pseudo-labels and the learned representations lead to competitive results when fine-tuning on HMDB-51 and UCF-101 compared with recent pre-training methods.

READ FULL TEXT

page 1

page 4

page 8

page 12

page 13

05/02/2019

Large-scale weakly-supervised pre-training for video action recognition

Current fully-supervised video datasets consist of only a few hundred th...
09/17/2021

ActionCLIP: A New Paradigm for Video Action Recognition

The canonical approach to video action recognition dictates a neural mod...
05/01/2022

Preserve Pre-trained Knowledge: Transfer Learning With Self-Distillation For Action Recognition

Video-based action recognition is one of the most popular topics in comp...
06/13/2020

DTG-Net: Differentiated Teachers Guided Self-Supervised Video Action Recognition

State-of-the-art video action recognition models with complex network ar...
04/27/2022

Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training

Recently, much progress has been made for self-supervised action recogni...
01/16/2020

Learning Spatiotemporal Features via Video and Text Pair Discrimination

Current video representations heavily rely on learning from manually ann...
08/03/2020

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning

A steady momentum of innovations and breakthroughs has convincingly push...