A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition

10/20/2022
by   Vijay John, et al.
0

Utilizing the sensor characteristics of the audio, visible camera, and thermal camera, the robustness of person recognition can be enhanced. Existing multimodal person recognition frameworks are primarily formulated assuming that multimodal data is always available. In this paper, we propose a novel trimodal sensor fusion framework using the audio, visible, and thermal camera, which addresses the missing modality problem. In the framework, a novel deep latent embedding framework, termed the AVTNet, is proposed to learn multiple latent embeddings. Also, a novel loss function, termed missing modality loss, accounts for possible missing modalities based on the triplet loss calculation while learning the individual latent embeddings. Additionally, a joint latent embedding utilizing the trimodal data is learnt using the multi-head attention transformer, which assigns attention weights to the different modalities. The different latent embeddings are subsequently used to train a deep neural network. The proposed framework is validated on the Speaking Faces dataset. A comparative analysis with baseline algorithms shows that the proposed framework significantly increases the person recognition accuracy while accounting for missing modalities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2021

A Study of Multimodal Person Verification Using Audio-Visual-Thermal Data

In this paper, we study an approach to multimodal person verification us...
research
10/19/2019

Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos

We present an audio-visual multimodal approach for the task of zeroshot ...
research
09/07/2023

Multi-Modality Guidance Network For Missing Modality Inference

Multimodal models have gained significant success in recent years. Stand...
research
04/28/2022

Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities

Multimodal sentiment analysis has been studied under the assumption that...
research
08/26/2022

TFusion: Transformer based N-to-One Multimodal Fusion Block

People perceive the world with different senses, such as sight, hearing,...
research
09/17/2016

GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for Multimodal Information Fusion

Data generated from real world events are usually temporal and contain m...
research
09/09/2022

Learning Audio-Visual embedding for Person Verification in the Wild

It has already been observed that audio-visual embedding is more robust ...

Please sign up or login with your details

Forgot password? Click here to reset