Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition

09/30/2022
by   Ruiqi Wang, et al.
0

Human state recognition is a critical topic with pervasive and important applications in human-machine systems.Multi-modal fusion, the combination of metrics from multiple data sources, has been shown as a sound method for improving the recognition performance. However, while promising results have been reported by recent multi-modal-based models, they generally fail to leverage the sophisticated fusion strategies that would model sufficient cross-modal interactions when producing the fusion representation; instead, current methods rely on lengthy and inconsistent data preprocessing and feature crafting. To address this limitation, we propose an end-to-end multi-modal transformer framework for multi-modal human state recognition called Husformer.Specifically, we propose to use cross-modal transformers, which inspire one modality to reinforce itself through directly attending to latent relevance revealed in other modalities, to fuse different modalities while ensuring sufficient awareness of the cross-modal interactions introduced. Subsequently, we utilize a self-attention transformer to further prioritize contextual information in the fusion representation. Using two such attention mechanisms enables effective and adaptive adjustments to noise and interruptions in multi-modal signals during the fusion process and in relation to high-level features. Extensive experiments on two human emotion corpora (DEAP and WESAD) and two cognitive workload datasets (MOCAS and CogLoad) demonstrate that in the recognition of human state, our Husformer outperforms both state-of-the-art multi-modal baselines and the use of a single modality by a large margin, especially when dealing with raw multi-modal signals. We also conducted an ablation study to show the benefits of each component in Husformer/

READ FULL TEXT

page 4

page 14

page 15

research
05/02/2023

On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

We abstract the features (i.e. learned representations) of multi-modal d...
research
04/15/2023

MA-ViT: Modality-Agnostic Vision Transformers for Face Anti-Spoofing

The existing multi-modal face anti-spoofing (FAS) frameworks are designe...
research
08/28/2021

AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data

The use of multi-modal data such as the combination of whole slide image...
research
02/27/2022

DXM-TransFuse U-net: Dual Cross-Modal Transformer Fusion U-net for Automated Nerve Identification

Accurate nerve identification is critical during surgical procedures for...
research
09/20/2021

Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions

Personality computing has become an emerging topic in computer vision, d...
research
08/20/2021

MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition

This paper presents a pure transformer-based approach, dubbed the Multi-...
research
05/05/2023

Distilled Mid-Fusion Transformer Networks for Multi-Modal Human Activity Recognition

Human Activity Recognition is an important task in many human-computer c...

Please sign up or login with your details

Forgot password? Click here to reset