EGOFALLS: A visual-audio dataset and benchmark for fall detection using egocentric cameras

09/08/2023
by   Xueyi Wang, et al.
0

Falls are significant and often fatal for vulnerable populations such as the elderly. Previous works have addressed the detection of falls by relying on data capture by a single sensor, images or accelerometers. In this work, we rely on multimodal descriptors extracted from videos captured by egocentric cameras. Our proposed method includes a late decision fusion layer that builds on top of the extracted descriptors. Furthermore, we collect a new dataset on which we assess our proposed approach. We believe this is the first public dataset of its kind. The dataset comprises 10,948 video samples by 14 subjects. We conducted ablation experiments to assess the performance of individual feature extractors, fusion of visual information, and fusion of both visual and audio information. Moreover, we experimented with internal and external cross-validation. Our results demonstrate that the fusion of audio and visual information through late decision fusion improves detection performance, making it a promising tool for fall prevention and mitigation.

READ FULL TEXT
research
11/22/2017

Integrating both Visual and Audio Cues for Enhanced Video Caption

Video caption refers to generating a descriptive sentence for a specific...
research
07/20/2023

Perceptual Quality Assessment of Omnidirectional Audio-visual Signals

Omnidirectional videos (ODVs) play an increasingly important role in the...
research
07/18/2022

Visual Representations of Physiological Signals for Fake Video Detection

Realistic fake videos are a potential tool for spreading harmful misinfo...
research
05/12/2022

Fall detection using multimodal data

In recent years, the occurrence of falls has increased and has had detri...
research
02/12/2022

Audio-Visual Fusion Layers for Event Type Aware Video Recognition

Human brain is continuously inundated with the multisensory information ...
research
08/15/2022

Elderly Fall Detection Using CCTV Cameras under Partial Occlusion of the Subjects Body

One of the possible dangers that older people face in their daily lives ...
research
05/31/2019

Multimodal Joint Emotion and Game Context Recognition in League of Legends Livestreams

Video game streaming provides the viewer with a rich set of audio-visual...

Please sign up or login with your details

Forgot password? Click here to reset