Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition

09/03/2021
by   James Hong, et al.
0

Human pose is a useful feature for fine-grained sports action understanding. However, pose estimators are often unreliable when run on sports video due to domain shift and factors such as motion blur and occlusions. This leads to poor accuracy when downstream tasks, such as action recognition, depend on pose. End-to-end learning circumvents pose, but requires more labels to generalize. We introduce Video Pose Distillation (VPD), a weakly-supervised technique to learn features for new video domains, such as individual sports that challenge pose estimation. Under VPD, a student network learns to extract robust pose features from RGB frames in the sports video, such that, whenever pose is considered reliable, the features match the output of a pretrained teacher pose detector. Our strategy retains the best of both pose and end-to-end worlds, exploiting the rich visual patterns in raw video frames, while learning features that agree with the athletes' pose and motion in the target video domain to avoid over-fitting to patterns unrelated to athletes' motion. VPD features improve performance on few-shot, fine-grained action recognition, retrieval, and detection tasks in four real-world sports video datasets, without requiring additional ground-truth pose annotations.

READ FULL TEXT

page 6

page 9

page 10

page 11

page 14

page 15

page 16

page 17

research
12/11/2018

Learning Discriminative Motion Features Through Detection

Despite huge success in the image domain, modern detection models such a...
research
05/14/2023

Is end-to-end learning enough for fitness activity recognition?

End-to-end learning has taken hold of many computer vision tasks, in par...
research
03/17/2023

Video Action Recognition with Attentive Semantic Units

Visual-Language Models (VLMs) have significantly advanced action video r...
research
05/17/2021

VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living

Many attempts have been made towards combining RGB and 3D poses for the ...
research
08/24/2023

POCO: 3D Pose and Shape Estimation with Confidence

The regression of 3D Human Pose and Shape (HPS) from an image is becomin...
research
10/20/2022

VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection

Video understanding is an important problem in computer vision. Currentl...
research
08/03/2017

Unsupervised Video Understanding by Reconciliation of Posture Similarities

Understanding human activity and being able to explain it in detail surp...

Please sign up or login with your details

Forgot password? Click here to reset