MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain

by   Francesco Ragusa, et al.

Wearable cameras allow to acquire images and videos from the user's perspective. These data can be processed to understand humans behavior. Despite human behavior analysis has been thoroughly investigated in third person vision, it is still understudied in egocentric settings and in particular in industrial scenarios. To encourage research in this field, we present MECCANO, a multimodal dataset of egocentric videos to study humans behavior understanding in industrial-like settings. The multimodality is characterized by the presence of gaze signals, depth maps and RGB videos acquired simultaneously with a custom headset. The dataset has been explicitly labeled for fundamental tasks in the context of human behavior understanding from a first person view, such as recognizing and anticipating human-object interactions. With the MECCANO dataset, we explored five different tasks including 1) Action Recognition, 2) Active Objects Detection and Recognition, 3) Egocentric Human-Objects Interaction Detection, 4) Action Anticipation and 5) Next-Active Objects Detection. We propose a benchmark aimed to study human behavior in the considered industrial-like scenario which demonstrates that the investigated tasks and the considered scenario are challenging for state-of-the-art algorithms. To support research in this field, we publicy release the dataset at


page 6

page 10

page 13

page 16

page 17

page 18

page 21

page 22


The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos in an Industrial-like Domain

Wearable cameras allow to collect images and videos of humans interactin...

Egocentric Human-Object Interaction Detection Exploiting Synthetic Data

We consider the problem of detecting Egocentric HumanObject Interactions...

Exploiting Multimodal Synthetic Data for Egocentric Human-Object Interaction Detection in an Industrial Scenario

In this paper, we tackle the problem of Egocentric Human-Object Interact...

Detekcja upadku i wybranych akcji na sekwencjach obrazów cyfrowych

In recent years a growing interest on action recognition is observed, in...

Highlight Timestamp Detection Model for Comedy Videos via Multimodal Sentiment Analysis

Nowadays, the videos on the Internet are prevailing. The precise and in-...

Next-Active-Object prediction from Egocentric Videos

Although First Person Vision systems can sense the environment from the ...

EGO-CH: Dataset and Fundamental Tasks for Visitors BehavioralUnderstanding using Egocentric Vision

Equipping visitors of a cultural site with a wearable device allows to e...

Please sign up or login with your details

Forgot password? Click here to reset