Actor and Action Modular Network for Text-based Video Segmentation

11/02/2020
by   Jianhua Yang, et al.
0

The actor and action semantic segmentation is a challenging problem that requires joint actor and action understanding, and learns to segment from pre-defined actor and action label pairs. However, existing methods for this task fail to distinguish those actors that have same super-category and identify the actor-action pairs that outside of the fixed actor and action vocabulary. Recent studies have extended this task using textual queries, instead of word-level actor-action pairs, to make the actor and action can be flexibly specified. In this paper, we focus on the text-based actor and action segmentation problem, which performs fine-grained actor and action understanding in the video. Previous works predicted segmentation masks from the merged heterogenous features of a given video and textual query, while they ignored that the linguistic variation of the textual query and visual semantic discrepancy of the video, and led to the asymmetric matching between convolved volumes of the video and the global query representation. To alleviate aforementioned problem, we propose a novel actor and action modular network that individually localizes the actor and action in two separate modules. We first learn the actor-/action-related content for the video and textual query, and then match them in a symmetrical manner to localize the target region. The target region includes the desired actor and action which is then fed into a fully convolutional network to predict the segmentation mask. The whole model enables joint learning for the actor-action matching and segmentation, and achieves the state-of-the-art performance on A2D Sentences and J-HMDB Sentences datasets.

READ FULL TEXT

page 1

page 3

page 9

page 12

research
03/20/2018

Actor and Action Video Segmentation from a Sentence

This paper strives for pixel-level segmentation of actors and their acti...
research
07/23/2018

Actor-Action Semantic Segmentation with Region Masks

In this paper, we study the actor-action semantic segmentation problem, ...
research
11/13/2013

A Study of Actor and Action Semantic Retention in Video Supervoxel Segmentation

Existing methods in the semantic computer vision community seem unable t...
research
12/30/2015

Actor-Action Semantic Segmentation with Grouping Process Models

Actor-action semantic segmentation made an important step toward advance...
research
12/02/2018

Multi-modal Capsule Routing for Actor and Action Video Segmentation Conditioned on Natural Language Queries

In this paper, we propose an end-to-end capsule network for pixel level ...
research
10/22/2018

Actor-Expert: A Framework for using Action-Value Methods in Continuous Action Spaces

Value-based approaches can be difficult to use in continuous action spac...
research
10/01/2020

RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation

The task of video object segmentation with referring expressions (langua...

Please sign up or login with your details

Forgot password? Click here to reset