Trear: Transformer-based RGB-D Egocentric Action Recognition

01/05/2021
by   Xiangyu Li, et al.
0

In this paper, we propose a Transformer-based RGB-D egocentric action recognition framework, called Trear. It consists of two modules, inter-frame attention encoder and mutual-attentional fusion block. Instead of using optical flow or recurrent units, we adopt self-attention mechanism to model the temporal structure of the data from different modalities. Input frames are cropped randomly to mitigate the effect of the data redundancy. Features from each modality are interacted through the proposed fusion block and combined through a simple yet effective fusion operation to produce a joint RGB-D representation. Empirical experiments on two large egocentric RGB-D datasets, THU-READ and FPHA, and one small dataset, WCVS, have shown that the proposed method outperforms the state-of-the-art results by a large margin.

READ FULL TEXT

page 3

page 4

page 7

research
08/24/2022

Modality Mixer for Multi-modal Action Recognition

In multi-modal action recognition, it is important to consider not only ...
research
09/10/2023

Unified Contrastive Fusion Transformer for Multimodal Human Action Recognition

Various types of sensors have been considered to develop human action re...
research
05/22/2019

What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention

Egocentric action anticipation consists in understanding which objects t...
research
08/20/2021

MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition

This paper presents a pure transformer-based approach, dubbed the Multi-...
research
02/23/2022

Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition

Action recognition has been a heated topic in computer vision for its wi...
research
02/28/2017

Scene Flow to Action Map: A New Representation for RGB-D based Action Recognition with Convolutional Neural Networks

Scene flow describes the motion of 3D objects in real world and potentia...
research
04/17/2018

PM-GANs: Discriminative Representation Learning for Action Recognition Using Partial-modalities

Data of different modalities generally convey complimentary but heteroge...

Please sign up or login with your details

Forgot password? Click here to reset