Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments

04/17/2021
by   Andrea Burns, et al.
0

In recent years, vision-language research has shifted to study tasks which require more complex reasoning, such as interactive question answering, visual common sense reasoning, and question-answer plausibility prediction. However, the datasets used for these problems fail to capture the complexity of real inputs and multimodal environments, such as ambiguous natural language requests and diverse digital domains. We introduce Mobile app Tasks with Iterative Feedback (MoTIF), a dataset with natural language commands for the greatest number of interactive environments to date. MoTIF is the first to contain natural language requests for interactive environments that are not satisfiable, and we obtain follow-up questions on this subset to enable research on task uncertainty resolution. We perform initial feasibility classification experiments and only reach an F1 score of 37.3, verifying the need for richer vision-language representations and improved architectures to reason about task feasibility.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2022

Interactive Mobile App Navigation with Uncertain or Under-specified Natural Language Commands

We introduce Mobile app Tasks with Iterative Feedback (MoTIF), a new dat...
research
03/23/2018

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Many vision and language tasks require commonsense reasoning beyond data...
research
10/02/2017

Visual Reasoning with Natural Language

Natural language provides a widely accessible and expressive interface f...
research
09/01/2021

WebQA: Multihop and Multimodal QA

Web search is fundamentally multimodal and multihop. Often, even before ...
research
07/10/2016

Annotation Methodologies for Vision and Language Dataset Creation

Annotated datasets are commonly used in the training and evaluation of t...
research
02/19/2020

Interactive Natural Language-based Person Search

In this work, we consider the problem of searching people in an unconstr...
research
05/02/2020

Benchmarking Multimodal Regex Synthesis with Complex Structures

Existing datasets for regular expression (regex) generation from natural...

Please sign up or login with your details

Forgot password? Click here to reset