Hierarchical Long Short-Term Concurrent Memory for Human Interaction Recognition

by   Xiangbo Shu, et al.
Nanjing University
University of Central Florida
Columbia University

In this paper, we aim to address the problem of human interaction recognition in videos by exploring the long-term inter-related dynamics among multiple persons. Recently, Long Short-Term Memory (LSTM) has become a popular choice to model individual dynamic for single-person action recognition due to its ability of capturing the temporal motion information in a range. However, existing RNN models focus only on capturing the dynamics of human interaction by simply combining all dynamics of individuals or modeling them as a whole. Such models neglect the inter-related dynamics of how human interactions change over time. To this end, we propose a novel Hierarchical Long Short-Term Concurrent Memory (H-LSTCM) to model the long-term inter-related dynamics among a group of persons for recognizing the human interactions. Specifically, we first feed each person's static features into a Single-Person LSTM to learn the single-person dynamic. Subsequently, the outputs of all Single-Person LSTM units are fed into a novel Concurrent LSTM (Co-LSTM) unit, which mainly consists of multiple sub-memory units, a new cell gate and a new co-memory cell. In a Co-LSTM unit, each sub-memory unit stores individual motion information, while this Co-LSTM unit selectively integrates and stores inter-related motion information between multiple interacting persons from multiple sub-memory units via the cell gate and co-memory cell, respectively. Extensive experiments on four public datasets validate the effectiveness of the proposed H-LSTCM by comparing against baseline and state-of-the-art methods.


page 1

page 6

page 7


Concurrence-Aware Long Short-Term Sub-Memories for Person-Person Action Recognition

Recently, Long Short-Term Memory (LSTM) has become a popular choice to m...

Fine-grained Event Learning of Human-Object Interaction with LSTM-CRF

Event learning is one of the most important problems in AI. However, not...

Human Interaction Recognition Framework based on Interacting Body Part Attention

Human activity recognition in videos has been widely studied and has rec...

Diverse Dance Synthesis via Keyframes with Transformer Controllers

Existing keyframe-based motion synthesis mainly focuses on the generatio...

Three-Stream Fusion Network for First-Person Interaction Recognition

First-person interaction recognition is a challenging task because of un...

Semi-tied Units for Efficient Gating in LSTM and Highway Networks

Gating is a key technique used for integrating information from multiple...

Differential Recurrent Neural Network and its Application for Human Activity Recognition

The Long Short-Term Memory (LSTM) recurrent neural network is capable of...

Please sign up or login with your details

Forgot password? Click here to reset