Multi-Label Activity Recognition using Activity-specific Features

by   Yanyi Zhang, et al.

We introduce an approach to multi-label activity recognition by extracting independent feature descriptors for each activity. Our approach first extracts a set of independent feature snippets, focused on different spatio-temporal regions of a video, that we call "observations". We then generate independent feature descriptors for each activity, that we call "activity-specific features" by combining these observations with attention, and further make action prediction based on these activity-specific features. This structure can be trained end-to-end and plugged into any existing network structures for video classification. Our method outperformed state-of-the-art approaches on three multi-label activity recognition datasets. We also evaluated the method and achieved state-of-the-art performance on two single-activity recognition datasets to show the generalizability of our approach. Furthermore, to better understand the activity-specific features that the system generates, we visualized these activity-specific features in the Charades dataset.


page 1

page 3

page 4

page 7


LSTA: Long Short-Term Attention for Egocentric Action Recognition

Egocentric activity recognition is one of the most challenging tasks in ...

Boosted Multiple Kernel Learning for First-Person Activity Recognition

Activity recognition from first-person (ego-centric) videos has recently...

Leaving Some Stones Unturned: Dynamic Feature Prioritization for Activity Detection in Streaming Video

Current approaches for activity recognition often ignore constraints on ...

Deep Adaptive Temporal Pooling for Activity Recognition

Deep neural networks have recently achieved competitive accuracy for hum...

Improving Human Activity Recognition Through Ranking and Re-ranking

We propose two well-motivated ranking-based methods to enhance the perfo...

Human Activity Recognition for Edge Devices

Video activity Recognition has recently gained a lot of momentum with th...

Three Branches: Detecting Actions With Richer Features

We present our three branch solutions for International Challenge on Act...