Learning Action Concept Trees and Semantic Alignment Networks from Image-Description Data

09/08/2016
by   Jiyang Gao, et al.
0

Action classification in still images has been a popular research topic in computer vision. Labelling large scale datasets for action classification requires tremendous manual work, which is hard to scale up. Besides, the action categories in such datasets are pre-defined and vocabularies are fixed. However humans may describe the same action with different phrases, which leads to the difficulty of vocabulary expansion for traditional fully-supervised methods. We observe that large amounts of images with sentence descriptions are readily available on the Internet. The sentence descriptions can be regarded as weak labels for the images, which contain rich information and could be used to learn flexible expressions of action categories. We propose a method to learn an Action Concept Tree (ACT) and an Action Semantic Alignment (ASA) model for classification from image-description data via a two-stage learning process. A new dataset for the task of learning actions from descriptions is built. Experimental results show that our method outperforms several baseline methods significantly.

READ FULL TEXT

page 2

page 14

research
11/12/2014

Collecting Image Description Datasets using Crowdsourcing

We describe our two new datasets with images described by humans. Both t...
research
11/24/2015

Fine-Grain Annotation of Cricket Videos

The recognition of human activities is one of the key problems in video ...
research
04/16/2016

ACD: Action Concept Discovery from Image-Sentence Corpora

Action classification in still images is an important task in computer v...
research
06/03/2022

A Learning-Based Method for Automatic Operator Selection in the Fanoos XAI System

We describe an extension of the Fanoos XAI system [Bayani et al 2022] wh...
research
11/28/2022

Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation

We introduce Action-GPT, a plug-and-play framework for incorporating Lar...
research
12/26/2017

SLAC: A Sparsely Labeled Dataset for Action Classification and Localization

This paper describes a procedure for the creation of large-scale video d...
research
11/23/2020

Action Concept Grounding Network for Semantically-Consistent Video Generation

Recent works in self-supervised video prediction have mainly focused on ...

Please sign up or login with your details

Forgot password? Click here to reset