Visual Goal-Step Inference using wikiHow

04/12/2021
by   Yue Yang, et al.
15

Procedural events can often be thought of as a high level goal composed of a sequence of steps. Inferring the sub-sequence of steps of a goal can help artificial intelligence systems reason about human activities. Past work in NLP has examined the task of goal-step inference for text. We introduce the visual analogue. We propose the Visual Goal-Step Inference (VGSI) task where a model is given a textual goal and must choose a plausible step towards that goal from among four candidate images. Our task is challenging for state-of-the-art muitimodal models. We introduce a novel dataset harvested from wikiHow that consists of 772,294 images representing human actions. We show that the knowledge learned from our data can effectively transfer to other datasets like HowTo100M, increasing the multiple-choice accuracy by 15 facilitate multi-modal reasoning about procedural events.

READ FULL TEXT

page 1

page 9

page 10

research
09/16/2020

Reasoning about Goals, Steps, and Temporal Ordering with WikiHow

We propose a suite of reasoning tasks on two types of relations between ...
research
07/10/2017

Learning Visual Reasoning Without Strong Priors

Achieving artificial visual reasoning - the ability to answer image-rela...
research
10/08/2022

Are All Steps Equally Important? Benchmarking Essentiality Detection of Events

Natural language often describes events in different granularities, such...
research
10/26/2020

Reading Between the Lines: Exploring Infilling in Visual Narratives

Generating long form narratives such as stories and procedures from mult...
research
08/31/2022

Generating Intermediate Steps for NLI with Next-Step Supervision

The Natural Language Inference (NLI) task often requires reasoning over ...
research
08/25/2020

Contextualized moral inference

Developing moral awareness in intelligent systems has shifted from a top...
research
12/08/2022

VASR: Visual Analogies of Situation Recognition

A core process in human cognition is analogical mapping: the ability to ...

Please sign up or login with your details

Forgot password? Click here to reset