MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation

03/14/2023
by   Roy Miles, et al.
1

This paper tackles the problem of semi-supervised video object segmentation on resource-constrained devices, such as mobile phones. We formulate this problem as a distillation task, whereby we demonstrate that small space-time-memory networks with finite memory can achieve competitive results with state of the art, but at a fraction of the computational cost (32 milliseconds per frame on a Samsung Galaxy S22). Specifically, we provide a theoretically grounded framework that unifies knowledge distillation with supervised contrastive representation learning. These models are able to jointly benefit from both pixel-wise contrastive learning and distillation from a pre-trained teacher. We validate this loss by achieving competitive J F to state of the art on both the standard DAVIS and YouTube benchmarks, despite running up to 5x faster, and with 32x fewer parameters.

READ FULL TEXT

page 2

page 5

page 7

page 13

page 14

research
10/23/2020

Iterative Graph Self-Distillation

How to discriminatively vectorize graphs is a fundamental challenge that...
research
12/10/2021

DisCo: Effective Knowledge Distillation For Contrastive Learning of Sentence Embeddings

Contrastive learning has been proven suitable for learning sentence embe...
research
06/27/2023

TrickVOS: A Bag of Tricks for Video Object Segmentation

Space-time memory (STM) network methods have been dominant in semi-super...
research
07/30/2021

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

It is a consensus that small models perform quite poorly under the parad...
research
09/14/2023

CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders

Large-scale self-supervised pre-trained speech encoders outperform conve...
research
06/07/2020

Multi-view Contrastive Learning for Online Knowledge Distillation

Existing Online Knowledge Distillation (OKD) aims to perform collaborati...
research
07/16/2020

Kernelized Memory Network for Video Object Segmentation

Semi-supervised video object segmentation (VOS) is a task that involves ...

Please sign up or login with your details

Forgot password? Click here to reset