Metric-based multimodal meta-learning for human movement identification via footstep recognition

11/15/2021
by   Muhammad Shakeel, et al.
0

We describe a novel metric-based learning approach that introduces a multimodal framework and uses deep audio and geophone encoders in siamese configuration to design an adaptable and lightweight supervised model. This framework eliminates the need for expensive data labeling procedures and learns general-purpose representations from low multisensory data obtained from omnipresent sensing systems. These sensing systems provide numerous applications and various use cases in activity recognition tasks. Here, we intend to explore the human footstep movements from indoor environments and analyze representations from a small self-collected dataset of acoustic and vibration-based sensors. The core idea is to learn plausible similarities between two sensory traits and combining representations from audio and geophone signals. We present a generalized framework to learn embeddings from temporal and spatial features extracted from audio and geophone signals. We then extract the representations in a shared space to maximize the learning of a compatibility function between acoustic and geophone features. This, in turn, can be used effectively to carry out a classification task from the learned model, as demonstrated by assigning high similarity to the pairs with a human footstep movement and lower similarity to pairs containing no footstep movement. Performance analyses show that our proposed multimodal framework achieves a 19.99% accuracy increase (in absolute terms) and avoided overfitting on the evaluation set when the training samples were increased from 200 pairs to just 500 pairs while satisfactorily learning the audio and geophone representations. Our results employ a metric-based contrastive learning approach for multi-sensor data to mitigate the impact of data scarcity and perform human movement identification with limited data size.

READ FULL TEXT

page 3

page 4

page 7

research
10/21/2020

Contrastive Learning of General-Purpose Audio Representations

We introduce COLA, a self-supervised pre-training approach for learning ...
research
04/16/2019

Audio-Visual Model Distillation Using Acoustic Images

In this paper, we investigate how to learn rich and robust feature repre...
research
04/26/2021

Multimodal Self-Supervised Learning of General Audio Representations

We present a multimodal framework to learn general audio representations...
research
12/09/2020

Contrastive Predictive Coding for Human Activity Recognition

Feature extraction is crucial for human activity recognition (HAR) using...
research
07/18/2023

Siamese Networks for Weakly Supervised Human Activity Recognition

Deep learning has been successfully applied to human activity recognitio...
research
04/26/2022

Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation

Recent general-purpose audio representations show state-of-the-art perfo...
research
06/12/2020

Learning-to-Learn Personalised Human Activity Recognition Models

Human Activity Recognition (HAR) is the classification of human movement...

Please sign up or login with your details

Forgot password? Click here to reset