ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities

10/11/2022
by   Terry Yue Zhuo, et al.
9

We introduce ViLPAct, a novel vision-language benchmark for human activity planning. It is designed for a task where embodied AI agents can reason and forecast future actions of humans based on video clips about their initial activities and intents in text. The dataset consists of 2.9k videos from extended with intents via crowdsourcing, a multi-choice question test set, and four strong baselines. One of the baselines implements a neurosymbolic approach based on a multi-modal knowledge base (MKB), while the other ones are deep generative models adapted from recent state-of-the-art (SOTA) methods. According to our extensive experiments, the key challenges are compositional generalization and effective use of information from both modalities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/08/2022

Image Search with Text Feedback by Additive Attention Compositional Learning

Effective image retrieval with text feedback stands to impact a range of...
research
12/11/2019

Multimodal Generative Models for Compositional Representation Learning

As deep neural networks become more adept at traditional tasks, many of ...
research
08/20/2019

Multi-Modal Recognition of Worker Activity for Human-Centered Intelligent Manufacturing

In a human-centered intelligent manufacturing system, sensing and unders...
research
09/02/2022

Multi-Modal Experience Inspired AI Creation

AI creation, such as poem or lyrics generation, has attracted increasing...
research
01/22/2023

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding

Temporal grounding is the task of locating a specific segment from an un...
research
07/31/2020

LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

Understanding and interpreting human actions is a long-standing challeng...

Please sign up or login with your details

Forgot password? Click here to reset