Efficient Multimodal Neural Networks for Trigger-less Voice Assistants

05/20/2023
by   Sai Srujana Buddi, et al.
0

The adoption of multimodal interactions by Voice Assistants (VAs) is growing rapidly to enhance human-computer interactions. Smartwatches have now incorporated trigger-less methods of invoking VAs, such as Raise To Speak (RTS), where the user raises their watch and speaks to VAs without an explicit trigger. Current state-of-the-art RTS systems rely on heuristics and engineered Finite State Machines to fuse gesture and audio data for multimodal decision-making. However, these methods have limitations, including limited adaptability, scalability, and induced human biases. In this work, we propose a neural network based audio-gesture multimodal fusion system that (1) Better understands temporal correlation between audio and gesture data, leading to precise invocations (2) Generalizes to a wide range of environments and scenarios (3) Is lightweight and deployable on low-power devices, such as smartwatches, with quick launch times (4) Improves productivity in asset development processes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2019

Seeing and Hearing Egocentric Actions: How Much Can We Learn?

Our interaction with the world is an inherently multimodal experience. H...
research
04/24/2023

Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models

Singing voice transcription converts recorded singing audio to musical n...
research
09/14/2020

Understanding Gesture and Speech Multimodal Interactions for Manipulation Tasks in Augmented Reality Using Unconstrained Elicitation

This research establishes a better understanding of the syntax choices i...
research
07/03/2017

Structure Optimization for Deep Multimodal Fusion Networks using Graph-Induced Kernels

A popular testbed for deep learning has been multimodal recognition of h...
research
12/14/2018

Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition with Multimodal Training

We present an efficient approach for leveraging the knowledge from multi...
research
08/26/2020

Conversations On Multimodal Input Design With Older Adults

Multimodal input systems can help bridge the wide range of physical abil...
research
03/16/2020

A Formal Analysis of Multimodal Referring Strategies Under Common Ground

In this paper, we present an analysis of computationally generated mixed...

Please sign up or login with your details

Forgot password? Click here to reset