Recognizing Video Events with Varying Rhythms

by   Yikang Li, et al.

Recognizing Video events in long, complex videos with multiple sub-activities has received persistent attention recently. This task is more challenging than traditional action recognition with short, relatively homogeneous video clips. In this paper, we investigate the problem of recognizing long and complex events with varying action rhythms, which has not been considered in the literature but is a practical challenge. Our work is inspired in part by how humans identify events with varying rhythms: quickly catching frames contributing most to a specific event. We propose a two-stage end-to-end framework, in which the first stage selects the most significant frames while the second stage recognizes the event using the selected frames. Our model needs only event-level labels in the training stage, and thus is more practical when the sub-activity labels are missing or difficult to obtain. The results of extensive experiments show that our model can achieve significant improvement in event recognition from long videos while maintaining high accuracy even if the test videos suffer from severe rhythm changes. This demonstrates the potential of our method for real-world video-based applications, where test and training videos can differ drastically in rhythms of sub-activities.


page 3

page 7


Learning Latent Super-Events to Detect Multiple Activities in Videos

In this paper, we introduce the concept of learning latent super-events ...

TinyVIRAT: Low-resolution Video Action Recognition

The existing research in action recognition is mostly focused on high-qu...

TinyAction Challenge: Recognizing Real-world Low-resolution Activities in Videos

This paper summarizes the TinyAction challenge which was organized in Ac...

EventTransAct: A video transformer-based framework for Event-camera based action recognition

Recognizing and comprehending human actions and gestures is a crucial pe...

Event and Activity Recognition in Video Surveillance for Cyber-Physical Systems

This chapter aims to aid the development of Cyber-Physical Systems (CPS)...

Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos

Video Analytics Software as a Service (VA SaaS) has been rapidly growing...

Joint Max Margin and Semantic Features for Continuous Event Detection in Complex Scenes

In this paper the problem of complex event detection in the continuous d...

Please sign up or login with your details

Forgot password? Click here to reset