Unsupervised Video Understanding by Reconciliation of Posture Similarities

08/03/2017
by   Timo Milbich, et al.
0

Understanding human activity and being able to explain it in detail surpasses mere action classification by far in both complexity and value. The challenge is thus to describe an activity on the basis of its most fundamental constituents, the individual postures and their distinctive transitions. Supervised learning of such a fine-grained representation based on elementary poses is very tedious and does not scale. Therefore, we propose a completely unsupervised deep learning procedure based solely on video sequences, which starts from scratch without requiring pre-trained networks, predefined body models, or keypoints. A combinatorial sequence matching algorithm proposes relations between frames from subsets of the training data, while a CNN is reconciling the transitivity conflicts of the different subsets to learn a single concerted pose embedding despite changes in appearance across sequences. Without any manual annotation, the model learns a structured representation of postures and their temporal development. The model not only enables retrieval of similar postures but also temporal super-resolution. Additionally, based on a recurrent formulation, next frames can be synthesized.

READ FULL TEXT

page 1

page 3

page 5

page 7

page 8

research
05/31/2023

Learning by Aligning 2D Skeleton Sequences in Time

This paper presents a novel self-supervised temporal video alignment fra...
research
10/08/2022

Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling

While recent large-scale video-language pre-training made great progress...
research
03/28/2022

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities

Assembly101 is a new procedural activity dataset featuring 4321 videos o...
research
09/03/2021

Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition

Human pose is a useful feature for fine-grained sports action understand...
research
12/11/2018

Learning Discriminative Motion Features Through Detection

Despite huge success in the image domain, modern detection models such a...
research
03/25/2022

Semi-supervised and Deep learning Frameworks for Video Classification and Key-frame Identification

Automating video-based data and machine learning pipelines poses several...
research
08/31/2016

CliqueCNN: Deep Unsupervised Exemplar Learning

Exemplar learning is a powerful paradigm for discovering visual similari...

Please sign up or login with your details

Forgot password? Click here to reset