EventTransAct: A video transformer-based framework for Event-camera based action recognition

08/25/2023
by   Tristan de Blegiers, et al.
0

Recognizing and comprehending human actions and gestures is a crucial perception requirement for robots to interact with humans and carry out tasks in diverse domains, including service robotics, healthcare, and manufacturing. Event cameras, with their ability to capture fast-moving objects at a high temporal resolution, offer new opportunities compared to standard action recognition in RGB videos. However, previous research on event camera action recognition has primarily focused on sensor-specific network architectures and image encoding, which may not be suitable for new sensors and limit the use of recent advancements in transformer-based architectures. In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame and then utilizes a temporal self-attention mechanism. In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss (ℒ_EC) and event-specific augmentations. Proposed ℒ_EC promotes learning fine-grained spatial cues in the spatial backbone of VTN by contrasting temporally misaligned frames. We evaluate our method on real-world action recognition of N-EPIC Kitchens dataset, and achieve state-of-the-art results on both protocols - testing in seen kitchen (74.9% accuracy) and testing in unseen kitchens (42.43% and 46.66% Accuracy). Our approach also takes less computation time compared to competitive prior approaches, which demonstrates the potential of our framework EventTransAct for real-world applications of event-camera based action recognition. Project Page: <https://tristandb8.github.io/EventTransAct_webpage/>

READ FULL TEXT

page 1

page 4

research
09/28/2020

Event-based Action Recognition Using Timestamp Image Encoding Network

Event camera is an asynchronous, high frequency vision sensor with low p...
research
04/12/2021

Event-based Timestamp Image Encoding Network for Human Action Recognition and Anticipation

Event camera is an asynchronous, high frequency vision sensor with low p...
research
12/07/2021

E^2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition

Event cameras are novel bio-inspired sensors, which asynchronously captu...
research
11/22/2022

Event Transformer+. A multi-purpose solution for efficient event data processing

Event cameras record sparse illumination changes with high temporal reso...
research
07/26/2023

Event-based Vision for Early Prediction of Manipulation Actions

Neuromorphic visual sensors are artificial retinas that output sequences...
research
01/14/2020

Recognizing Video Events with Varying Rhythms

Recognizing Video events in long, complex videos with multiple sub-activ...
research
05/17/2021

A Fine-Grained Visual Attention Approach for Fingerspelling Recognition in the Wild

Fingerspelling in sign language has been the means of communicating tech...

Please sign up or login with your details

Forgot password? Click here to reset