LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

07/31/2020
by   Baoxiong Jia, et al.
0

Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial intelligence. However, a few imperative components of daily human activities are largely missed in prior literature, including the goal-directed actions, concurrent multi-tasks, and collaborations among multi-agents. We introduce the LEMMA dataset to provide a single home to address these missing dimensions with meticulously designed settings, wherein the number of tasks and agents varies to highlight different learning objectives. We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities. We further devise challenging compositional action recognition and action/task anticipation benchmarks with baseline models to measure the capability of compositional action understanding and temporal reasoning. We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.

READ FULL TEXT

page 2

page 6

page 13

page 20

page 23

page 24

research
10/08/2022

EgoTaskQA: Understanding Human Tasks in Egocentric Videos

Understanding human tasks through video observations is an essential cap...
research
07/01/2020

The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose

The availability of a large labeled dataset is a key requirement for app...
research
04/20/2022

THORN: Temporal Human-Object Relation Network for Action Recognition

Most action recognition models treat human activities as unitary events....
research
11/01/2019

Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

An event happening in the world is often made of different activities an...
research
06/19/2018

VirtualHome: Simulating Household Activities via Programs

In this paper, we are interested in modeling complex activities that occ...
research
11/07/2016

Action2Activity: Recognizing Complex Activities from Sensor Data

As compared to simple actions, activities are much more complex, but sem...
research
10/11/2022

ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities

We introduce ViLPAct, a novel vision-language benchmark for human activi...

Please sign up or login with your details

Forgot password? Click here to reset