Representing Videos based on Scene Layouts for Recognizing Agent-in-Place Actions

04/04/2018
by   Ruichi Yu, et al.
2

We address the recognition of agent-in-place actions, which are associated with agents who perform them and places where they occur, in the context of outdoor home surveillance. We introduce a representation of the geometry and topology of scene layouts so that a network can generalize from the layouts observed in the training set to unseen layouts in the test set. This Layout-Induced Video Representation (LIVR) abstracts away low-level appearance variance and encodes geometric and topological relationships of places in a specific scene layout. LIVR partitions the semantic features of a video clip into different places to force the network to learn place-based feature descriptions; to predict the confidence of each action, LIVR aggregates features from the place associated with an action and its adjacent places on the scene layout. We introduce the Agent-in-Place Action dataset to show that our method allows neural network models to generalize significantly better to unseen scenes.

READ FULL TEXT

page 2

page 3

page 7

page 13

page 14

research
10/09/2017

Geo-referencing Place from Everyday Natural Language Descriptions

Natural language place descriptions in everyday communication provide a ...
research
03/30/2021

Recognizing Actions in Videos from Unseen Viewpoints

Standard methods for video recognition use large CNNs designed to captur...
research
12/27/2021

Visual Place Representation and Recognition from Depth Images

This work proposes a new method for place recognition based on the scene...
research
12/05/2017

Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

The goal of this paper is to take a single 2D image of a scene and recov...
research
07/11/2018

DeepMove: Learning Place Representations through Large Scale Movement Data

Understanding and reasoning about places and their relationships are cri...
research
06/29/2018

Excavate Condition-invariant Space by Intrinsic Encoder

As the human, we can recognize the places across a wide range of changin...
research
07/27/2015

Discovery of Shared Semantic Spaces for Multi-Scene Video Query and Summarization

The growing rate of public space CCTV installations has generated a need...

Please sign up or login with your details

Forgot password? Click here to reset