VASTA: A Vision and Language-assisted Smartphone Task Automation System

We present VASTA, a novel vision and language-assisted Programming By Demonstration (PBD) system for smartphone task automation. Development of a robust PBD automation system requires overcoming three key challenges: first, how to make a particular demonstration robust to positional and visual changes in the user interface (UI) elements; secondly, how to recognize changes in the automation parameters to make the demonstration as generalizable as possible; and thirdly, how to recognize from the user utterance what automation the user wishes to carry out. To address the first challenge, VASTA leverages state-of-the-art computer vision techniques, including object detection and optical character recognition, to accurately label interactions demonstrated by a user, without relying on the underlying UI structures. To address the second and third challenges, VASTA takes advantage of advanced natural language understanding algorithms for analyzing the user utterance to trigger the VASTA automation scripts, and to determine the automation parameters for generalization. We run an initial user study that demonstrates the effectiveness of VASTA at clustering user utterances, understanding changes in the automation parameters, detecting desired UI elements, and, most importantly, automating various tasks. A demo video of the system is available here:


page 4

page 9


Counterfactual Explanations for Natural Language Interfaces

A key challenge facing natural language interfaces is enabling users to ...

Incremental processing of noisy user utterances in the spoken language understanding task

The state-of-the-art neural network architectures make it possible to cr...

Snowy: Recommending Utterances for Conversational Visual Analysis

Natural language interfaces (NLIs) have become a prevalent medium for co...

Empowering LLM to use Smartphone for Intelligent Task Automation

Mobile task automation is an attractive technique that aims to enable vo...

Detecting and Summarizing GUI Changes in Evolving Mobile Apps

Mobile applications have become a popular software development domain in...

DetGPT: Detect What You Need via Reasoning

In recent years, the field of computer vision has seen significant advan...

On the Evaluation of Intelligence Process Automation

Intelligent Process Automation (IPA) is emerging as a sub-field of AI to...

Please sign up or login with your details

Forgot password? Click here to reset