Complex Sequential Understanding through the Awareness of Spatial and Temporal Concepts

05/30/2020
by   Bo Pang, et al.
28

Understanding sequential information is a fundamental task for artificial intelligence. Current neural networks attempt to learn spatial and temporal information as a whole, limited their abilities to represent large scale spatial representations over long-range sequences. Here, we introduce a new modeling strategy called Semi-Coupled Structure (SCS), which consists of deep neural networks that decouple the complex spatial and temporal concepts learning. Semi-Coupled Structure can learn to implicitly separate input information into independent parts and process these parts respectively. Experiments demonstrate that a Semi-Coupled Structure can successfully annotate the outline of an object in images sequentially and perform video action recognition. For sequence-to-sequence problems, a Semi-Coupled Structure can predict future meteorological radar echo images based on observed images. Taken together, our results demonstrate that a Semi-Coupled Structure has the capacity to improve the performance of LSTM-like models on large scale sequential tasks.

READ FULL TEXT

page 2

page 3

page 9

page 10

research
09/23/2019

Learning Coupled Spatial-temporal Attention for Skeleton-based Action Recognition

In this paper, we propose a coupled spatial-temporal attention (CSTA) mo...
research
09/03/2018

YouTube-VOS: Sequence-to-Sequence Video Object Segmentation

Learning long-term spatial-temporal features are critical for many video...
research
03/31/2020

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

Spatial-temporal graphs have been widely used by skeleton-based action r...
research
04/27/2023

Deeply-Coupled Convolution-Transformer with Spatial-temporal Complementary Learning for Video-based Person Re-identification

Advanced deep Convolutional Neural Networks (CNNs) have shown great succ...
research
07/17/2022

E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context

Recently, the image-wise implicit neural representation of videos, NeRV,...
research
10/25/2021

Where were my keys? – Aggregating Spatial-Temporal Instances of Objects for Efficient Retrieval over Long Periods of Time

Robots equipped with situational awareness can help humans efficiently f...
research
10/11/2021

EchoVPR: Echo State Networks for Visual Place Recognition

Recognising previously visited locations is an important, but unsolved, ...

Please sign up or login with your details

Forgot password? Click here to reset