Multi-Modal End-User Programming of Web-Based Virtual Assistant Skills

by   Michael H. Fischer, et al.

While Alexa can perform over 100,000 skills on paper, its capability covers only a fraction of what is possible on the web. To reach the full potential of an assistant, it is desirable that individuals can create skills to automate their personal web browsing routines. Many seemingly simple routines, however, such as monitoring COVID-19 stats for their hometown, detecting changes in their child's grades online, or sending personally-addressed messages to a group, cannot be automated without conventional programming concepts such as conditional and iterative evaluation. This paper presents VASH (Voice Assistant Scripting Helper), a new system that empowers users to create useful web-based virtual assistant skills without learning a formal programming language. With VASH, the user demonstrates their task of interest in the browser and issues a few voice commands, such as naming the skills and adding conditions on the action. VASH turns these multi-modal specifications into skills that can be invoked invoice on a virtual assistant. These skills are represented in a formal programming language we designed called WebTalk, which supports parameterization, function invocation, conditionals, and iterative execution. VASH is a fully working prototype that works on the Chrome browser on real-world websites. Our user study shows that users have many web routines they wish to automate, 81 VASH Is easy to learn, and that a majority of the users in our study want to use our system.


page 1

page 11


"Are you home alone?" "Yes" Disclosing Security and Privacy Vulnerabilities in Alexa Skills

The home voice assistants such as Amazon Alexa have become increasingly ...

An Experiment with a User Manual of a Programming Language Based on a Denotational Semantics

Denotational models should provide an opportunity for the revision of cu...

PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and Demonstrations

Natural language programming is a promising approach to enable end users...

Improve few-shot voice cloning using multi-modal learning

Recently, few-shot voice cloning has achieved a significant improvement....

Schema2QA: Answering Complex Queries on the Structured Web with a Neural Model

Virtual assistants today require every website to submit skills individu...

WebRobot: Web Robotic Process Automation using Interactive Programming-by-Demonstration

It is imperative to democratize robotic process automation (RPA), as RPA...

Automated Refactoring of Nested-IF Formulae in Spreadsheets

Spreadsheets are the most popular end-user programming software, where f...

Please sign up or login with your details

Forgot password? Click here to reset