SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations

09/04/2023
by   Tanvir Mahmud, et al.
0

Despite significant progress in semi-supervised learning for image object detection, several key issues are yet to be addressed for video object detection: (1) Achieving good performance for supervised video object detection greatly depends on the availability of annotated frames. (2) Despite having large inter-frame correlations in a video, collecting annotations for a large number of frames per video is expensive, time-consuming, and often redundant. (3) Existing semi-supervised techniques on static images can hardly exploit the temporal motion dynamics inherently present in videos. In this paper, we introduce SSVOD, an end-to-end semi-supervised video object detection framework that exploits motion dynamics of videos to utilize large-scale unlabeled frames with sparse annotations. To selectively assemble robust pseudo-labels across groups of frames, we introduce flow-warped predictions from nearby frames for temporal-consistency estimation. In particular, we introduce cross-IoU and cross-divergence based selection methods over a set of estimated predictions to include robust pseudo-labels for bounding boxes and class labels, respectively. To strike a balance between confirmation bias and uncertainty noise in pseudo-labels, we propose confidence threshold based combination of hard and soft pseudo-labels. Our method achieves significant performance improvements over existing methods on ImageNet-VID, Epic-KITCHENS, and YouTube-VIS datasets. Code and pre-trained models will be released.

READ FULL TEXT
research
08/12/2019

Semi-Supervised Video Salient Object Detection Using Pseudo-Labels

Deep learning-based video salient object detection has recently achieved...
research
07/10/2022

Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning Framework for Monocular 3D Object Detection

Monocular 3D object detection is an essential perception task for autono...
research
03/30/2022

Knowledge-Spreader: Learning Facial Action Unit Dynamics with Extremely Limited Labels

Recent studies on the automatic detection of facial action unit (AU) hav...
research
08/15/2021

Semi-supervised 3D Object Detection via Adaptive Pseudo-Labeling

3D object detection is an important task in computer vision. Most existi...
research
10/04/2018

Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images

Deep learning based object detectors require thousands of diversified bo...
research
10/02/2020

Semantics through Time: Semi-supervised Segmentation of Aerial Videos with Iterative Label Propagation

Semantic segmentation is a crucial task for robot navigation and safety....
research
08/07/2020

A Novel Video Salient Object Detection Method via Semi-supervised Motion Quality Perception

Previous video salient object detection (VSOD) approaches have mainly fo...

Please sign up or login with your details

Forgot password? Click here to reset