Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment

05/31/2023
by   Quoc-Huy Tran, et al.
0

This paper presents a novel transformer-based framework for unsupervised activity segmentation which leverages not only frame-level cues but also segment-level cues. This is in contrast with previous methods which often rely on frame-level information only. Our approach begins with a frame-level prediction module which estimates framewise action classes via a transformer encoder. The frame-level prediction module is trained in an unsupervised manner via temporal optimal transport. To exploit segment-level information, we introduce a segment-level prediction module and a frame-to-segment alignment module. The former includes a transformer decoder for estimating video transcripts, while the latter matches frame-level features with segment-level features, yielding permutation-aware segmentation results. Moreover, inspired by temporal optimal transport, we develop simple-yet-effective pseudo labels for unsupervised training of the above modules. Our experiments on four public datasets, i.e., 50 Salads, YouTube Instructions, Breakfast, and Desktop Assembly show that our approach achieves comparable or better performance than previous methods in unsupervised activity segmentation.

READ FULL TEXT

page 4

page 5

page 8

page 9

page 11

research
05/27/2021

Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering

We present a novel approach for unsupervised activity segmentation, whic...
research
11/23/2022

TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision

Video copy localization aims to precisely localize all the copied segmen...
research
05/31/2023

Learning by Aligning 2D Skeleton Sequences in Time

This paper presents a novel self-supervised temporal video alignment fra...
research
03/12/2022

One-stage Video Instance Segmentation: From Frame-in Frame-out to Clip-in Clip-out

Many video instance segmentation (VIS) methods partition a video sequenc...
research
11/09/2021

Exploiting Robust Unsupervised Video Person Re-identification

Unsupervised video person re-identification (reID) methods usually depen...
research
12/19/2016

Learning Features by Watching Objects Move

This paper presents a novel yet intuitive approach to unsupervised featu...
research
02/17/2022

TraSeTR: Track-to-Segment Transformer with Contrastive Query for Instance-level Instrument Segmentation in Robotic Surgery

Surgical instrument segmentation – in general a pixel classification tas...

Please sign up or login with your details

Forgot password? Click here to reset