Learning behavioral context recognition with multi-stream temporal convolutional networks

by   Aaqib Saeed, et al.

Smart devices of everyday use (such as smartphones and wearables) are increasingly integrated with sensors that provide immense amounts of information about a person's daily life such as behavior and context. The automatic and unobtrusive sensing of behavioral context can help develop solutions for assisted living, fitness tracking, sleep monitoring, and several other fields. Towards addressing this issue, we raise the question: can a machine learn to recognize a diverse set of contexts and activities in a real-life through joint learning from raw multi-modal signals (e.g. accelerometer, gyroscope and audio etc.)? In this paper, we propose a multi-stream temporal convolutional network to address the problem of multi-label behavioral context recognition. A four-stream network architecture handles learning from each modality with a contextualization module which incorporates extracted representations to infer a user's context. Our empirical evaluation suggests that a deep convolutional network trained end-to-end achieves an optimal recognition rate. Furthermore, the presented architecture can be extended to include similar sensors for performance improvements and handles missing modalities through multi-task learning without any manual feature engineering on highly imbalanced and sparsely labeled dataset.


Recognizing Detailed Human Context In-the-Wild from Smartphones and Smartwatches

The ability to automatically recognize a person's behavioral context can...

Deep Tracking: Visual Tracking Using Deep Convolutional Networks

In this paper, we study a discriminatively trained deep convolutional ne...

Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition

Gesture recognition is getting more and more popular due to various appl...

Multimodal Continuous Emotion Recognition using Deep Multi-Task Learning with Correlation Loss

In this study, we focus on continuous emotion recognition using body mot...

Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations

In this study, we try to address the problem of leveraging visual signal...

Please sign up or login with your details

Forgot password? Click here to reset