UGIF: UI Grounded Instruction Following

11/14/2022
by   Sagar Gubbi Venkatesh, et al.
0

New smartphone users have difficulty engaging with it and often use only a limited set of features like calling and messaging. These users are hesitant to explore using the smartphone and rely on experienced users to teach them how to use the phone. However, experienced users are not always around to guide them. To help new users learn how to use the phone on their own, we propose a natural language based instruction following agent that operates over the UI and shows the user how to perform various tasks. Common how-to questions, such as "How to block calls from unknown numbers?", are documented on support sites with a sequence of steps in natural language describing what the user should do. We parse these steps using Large Language Models (LLMs) and generate macros that can be executed on-device when the user asks a query. To evaluate this agent, we introduce UGIF-DataSet, a multi-lingual, multi-modal UI grounded dataset for step-by-step task completion on the smartphone. It contains 523 natural language instructions with paired sequences of multilingual UI screens and actions that show how to execute the task in eight languages. We compare the performance of different large language models including PaLM, GPT3, etc. and find that the end-to-end task completion success rate is 48 the performance drops to 32 failure modes of existing models on this task and point out areas for improvement.

READ FULL TEXT
research
08/10/2021

Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior

We study continual learning for natural language instruction generation,...
research
04/13/2023

"What It Wants Me To Say": Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models

Code-generating large language models translate natural language into co...
research
05/12/2019

Improving Natural Language Interaction with Robots Using Advice

Over the last few years, there has been growing interest in learning mod...
research
11/12/2022

Collecting Interactive Multi-modal Datasets for Grounded Language Understanding

Human intelligence can remarkably adapt quickly to new tasks and environ...
research
05/05/2023

Otter: A Multi-Modal Model with In-Context Instruction Tuning

Large language models (LLMs) have demonstrated significant universal cap...
research
06/30/2016

Towards A Virtual Assistant That Can Be Taught New Tasks In Any Domain By Its End-Users

The challenge stated in the title can be divided into two main problems....
research
08/30/2019

PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and Demonstrations

Natural language programming is a promising approach to enable end users...

Please sign up or login with your details

Forgot password? Click here to reset