Cross-Enhancement Transformer for Action Segmentation

05/19/2022
by   Jiahui Wang, et al.
0

Temporal convolutions have been the paradigm of choice in action segmentation, which enhances long-term receptive fields by increasing convolution layers. However, high layers cause the loss of local information necessary for frame recognition. To solve the above problem, a novel encoder-decoder structure is proposed in this paper, called Cross-Enhancement Transformer. Our approach can be effective learning of temporal structure representation with interactive self-attention mechanism. Concatenated each layer convolutional feature maps in encoder with a set of features in decoder produced via self-attention. Therefore, local and global information are used in a series of frame actions simultaneously. In addition, a new loss function is proposed to enhance the training process that penalizes over-segmentation errors. Experiments show that our framework performs state-of-the-art on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities and the Breakfast dataset.

READ FULL TEXT
research
05/19/2023

Enhancing Transformer Backbone for Egocentric Video Action Segmentation

Egocentric temporal action segmentation in videos is a crucial task in c...
research
07/27/2021

PiSLTRc: Position-informed Sign Language Transformer with Content-aware Convolution

Since the superiority of Transformer in learning long-term dependency, t...
research
09/02/2023

Deep-Learning Framework for Optimal Selection of Soil Sampling Sites

This work leverages the recent advancements of deep learning in image pr...
research
04/01/2022

Vision Transformer with Cross-attention by Temporal Shift for Efficient Action Recognition

We propose Multi-head Self/Cross-Attention (MSCA), which introduces a te...
research
10/19/2020

SAINT+: Integrating Temporal Features for EdNet Correctness Prediction

We propose SAINT+, a successor of SAINT which is a Transformer based kno...
research
01/30/2021

MUSE: Multi-Scale Temporal Features Evolution for Knowledge Tracing

Transformer based knowledge tracing model is an extensively studied prob...
research
07/19/2021

Action Forecasting with Feature-wise Self-Attention

We present a new architecture for human action forecasting from videos. ...

Please sign up or login with your details

Forgot password? Click here to reset