Mining YouTube - A dataset for learning fine-grained action concepts from webly supervised video data

06/03/2019
by   Hilde Kuehne, et al.
0

Action recognition is so far mainly focusing on the problem of classification of hand selected preclipped actions and reaching impressive results in this field. But with the performance even ceiling on current datasets, it also appears that the next steps in the field will have to go beyond this fully supervised classification. One way to overcome those problems is to move towards less restricted scenarios. In this context we present a large-scale real-world dataset designed to evaluate learning techniques for human action recognition beyond hand-crafted datasets. To this end we put the process of collecting data on its feet again and start with the annotation of a test set of 250 cooking videos. The training data is then gathered by searching for the respective annotated classes within the subtitles of freely available videos. The uniqueness of the dataset is attributed to the fact that the whole process of collecting the data and training does not involve any human intervention. To address the problem of semantic inconsistencies that arise with this kind of training data, we further propose a semantical hierarchical structure for the mined classes.

READ FULL TEXT

page 2

page 4

page 8

research
06/03/2019

A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation

Action recognition has become a rapidly developing research field within...
research
11/27/2019

Literature Review of Action Recognition in the Wild

The literature review presented below on Action Recognition in the wild ...
research
10/13/2022

Real-time Action Recognition for Fine-Grained Actions and The Hand Wash Dataset

In this paper we present a three-stream algorithm for real-time action r...
research
06/27/2017

Recurrent Residual Learning for Action Recognition

Action recognition is a fundamental problem in computer vision with a lo...
research
06/07/2016

Hand Action Detection from Ego-centric Depth Sequences with Error-correcting Hough Transform

Detecting hand actions from ego-centric depth sequences is a practically...
research
11/28/2018

Unrepresentative video data: A review and evaluation

It is well known that the quality and quantity of training data are sign...
research
08/18/2023

Audiovisual Moments in Time: A Large-Scale Annotated Dataset of Audiovisual Actions

We present Audiovisual Moments in Time (AVMIT), a large-scale dataset of...

Please sign up or login with your details

Forgot password? Click here to reset