Exploring the GLIDE model for Human Action-effect Prediction

08/01/2022
by   Fangjun Li, et al.
0

We address the following action-effect prediction task. Given an image depicting an initial state of the world and an action expressed in text, predict an image depicting the state of the world following the action. The prediction should have the same scene context as the input image. We explore the use of the recently proposed GLIDE model for performing this task. GLIDE is a generative neural network that can synthesize (inpaint) masked areas of an image, conditioned on a short piece of text. Our idea is to mask-out a region of the input image where the effect of the action is expected to occur. GLIDE is then used to inpaint the masked region conditioned on the required action. In this way, the resulting image has the same background context as the input image, updated to show the effect of the action. We give qualitative results from experiments using the EPIC dataset of ego-centric videos labelled with actions.

READ FULL TEXT

page 2

page 4

research
04/20/2018

Synthesizing Images of Humans in Unseen Poses

We address the computational problem of novel human pose synthesis. Give...
research
02/08/2018

Practical Issues of Action-conditioned Next Image Prediction

The problem of action-conditioned image prediction is to predict the exp...
research
08/05/2019

Predicting Actions to Help Predict Translations

We address the task of text translation on the How2 dataset using a stat...
research
02/03/2021

Object and Relation Centric Representations for Push Effect Prediction

Pushing is an essential non-prehensile manipulation skill used for tasks...
research
04/13/2022

Controllable Video Generation through Global and Local Motion Dynamics

We present GLASS, a method for Global and Local Action-driven Sequence S...
research
08/11/2020

Text as Neural Operator: Image Manipulation by Text Instruction

In this paper, we study a new task that allows users to edit an input im...
research
05/27/2019

Harry Potter and the Action Prediction Challenge from Natural Language

We explore the challenge of action prediction from textual descriptions ...

Please sign up or login with your details

Forgot password? Click here to reset