A Temporal Sequence Learning for Action Recognition and Prediction

06/17/2019
by   Sangwoo Cho, et al.
3

In this work[This work was supported in part by the National Science Foundation under grant IIS-1212948.], we present a method to represent a video with a sequence of words, and learn the temporal sequencing of such words as the key information for predicting and recognizing human actions. We leverage core concepts from the Natural Language Processing (NLP) literature used in sentence classification to solve the problems of action prediction and action recognition. Each frame is converted into a word that is represented as a vector using the Bag of Visual Words (BoW) encoding method. The words are then combined into a sentence to represent the video, as a sentence. The sequence of words in different actions are learned with a simple but effective Temporal Convolutional Neural Network (T-CNN) that captures the temporal sequencing of information in a video sentence. We demonstrate that a key characteristic of the proposed method is its low-latency, i.e. its ability to predict an action accurately with a partial sequence (sentence). Experiments on two datasets, UCF101 and HMDB51 show that the method on average reaches 95% of its accuracy within half the video frames. Results, also demonstrate that our method achieves compatible state-of-the-art performance in action recognition (i.e. at the completion of the sentence) in addition to action prediction.

READ FULL TEXT

page 3

page 4

research
12/13/2015

Action Recognition with Image Based CNN Features

Most of human actions consist of complex temporal compositions of more s...
research
04/15/2016

Long-term Temporal Convolutions for Action Recognition

Typical human actions last several seconds and exhibit characteristic sp...
research
01/23/2020

Action Recognition and State Change Prediction in a Recipe Understanding Task Using a Lightweight Neural Network Model

Consider a natural language sentence describing a specific step in a foo...
research
05/08/2018

Visual Attribute-augmented Three-dimensional Convolutional Neural Network for Enhanced Human Action Recognition

Visual attributes in individual video frames, such as the presence of ch...
research
04/30/2019

Curvature: A signature for Action Recognition in Video Sequences

In this paper, a novel signature of human action recognition, namely the...
research
11/22/2022

Knowledge Prompting for Few-shot Action Recognition

Few-shot action recognition in videos is challenging for its lack of sup...
research
03/30/2020

Speech2Action: Cross-modal Supervision for Action Recognition

Is it possible to guess human action from dialogue alone? In this work w...

Please sign up or login with your details

Forgot password? Click here to reset