Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion

02/18/2022
by   Masahiro Yasuda, et al.
0

We tackle a challenging task: multi-view and multi-modal event detection that detects events in a wide-range real environment by utilizing data from distributed cameras and microphones and their weak labels. In this task, distributed sensors are utilized complementarily to capture events that are difficult to capture with a single sensor, such as a series of actions of people moving in an intricate room, or communication between people located far apart in a room. For sensors to cooperate effectively in such a situation, the system should be able to exchange information among sensors and combines information that is useful for identifying events in a complementary manner. For such a mechanism, we propose a Transformer-based multi-sensor fusion (MultiTrans) which combines multi-sensor data on the basis of the relationships between features of different viewpoints and modalities. In the experiments using a dataset newly collected for this task, our proposed method using MultiTrans improved the event detection performance and outperformed comparatives.

READ FULL TEXT
research
03/22/2022

Multi-Modal Learning for AU Detection Based on Multi-Head Fused Transformers

Multi-modal learning has been intensified in recent years, especially fo...
research
09/07/2021

Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention

Automatically describing video, or video captioning, has been widely stu...
research
12/30/2019

Objective Study of Sensor Relevance for Automatic Cough Detection

The development of a system for the automatic, objective and reliable de...
research
07/28/2022

Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer

Large-scale deployment of autonomous vehicles has been continually delay...
research
06/26/2022

AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry Estimation

Motion estimation approaches typically employ sensor fusion techniques, ...
research
08/15/2023

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

Jointly processing information from multiple sensors is crucial to achie...
research
09/28/2021

Fail-Safe Human Detection for Drones Using a Multi-Modal Curriculum Learning Approach

Drones are currently being explored for safety-critical applications whe...

Please sign up or login with your details

Forgot password? Click here to reset