Self-supervised Pretraining with Classification Labels for Temporal Activity Detection

11/26/2021
by   Kumara Kahatapitiya, et al.
13

Temporal Activity Detection aims to predict activity classes per frame, in contrast to video-level predictions as done in Activity Classification (i.e., Activity Recognition). Due to the expensive frame-level annotations required for detection, the scale of detection datasets is limited. Thus, commonly, previous work on temporal activity detection resorts to fine-tuning a classification model pretrained on large-scale classification datasets (e.g., Kinetics-400). However, such pretrained models are not ideal for downstream detection performance due to the disparity between the pretraining and the downstream fine-tuning tasks. This work proposes a novel self-supervised pretraining method for detection leveraging classification labels to mitigate such disparity by introducing frame-level pseudo labels, multi-action frames, and action segments. We show that the models pretrained with the proposed self-supervised detection task outperform prior work on multiple challenging activity detection benchmarks, including Charades and MultiTHUMOS. Our extensive ablations further provide insights on when and how to use the proposed models for activity detection. Code and models will be released online.

READ FULL TEXT
research
09/15/2023

Fine-tune the pretrained ATST model for sound event detection

Sound event detection (SED) often suffers from the data deficiency probl...
research
03/28/2023

Large-scale pretraining on pathological images for fine-tuning of small pathological benchmarks

Pretraining a deep learning model on large image datasets is a standard ...
research
08/26/2020

Self-Supervised Human Activity Recognition by Augmenting Generative Adversarial Networks

This article proposes a novel approach for augmenting generative adversa...
research
08/08/2023

Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction

The emerging field of action prediction plays a vital role in various co...
research
11/11/2022

Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks

Temporal Action Localization (TAL) methods typically operate on top of f...
research
04/11/2023

A surprisingly simple technique to control the pretraining bias for better transfer: Expand or Narrow your representation

Self-Supervised Learning (SSL) models rely on a pretext task to learn re...
research
04/05/2023

Exploring the Utility of Self-Supervised Pretraining Strategies for the Detection of Absent Lung Sliding in M-Mode Lung Ultrasound

Self-supervised pretraining has been observed to improve performance in ...

Please sign up or login with your details

Forgot password? Click here to reset