Seeing wake words: Audio-visual Keyword Spotting

09/02/2020
by   Liliane Momeni, et al.
6

The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio. We propose a zero-shot method suitable for in the wild videos. Our key contributions are: (1) a novel convolutional architecture, KWS-Net, that uses a similarity map intermediate representation to separate the task into (i) sequence matching, and (ii) pattern detection, to decide whether the word is there and when; (2) we demonstrate that if audio is available, visual keyword spotting improves the performance both for a clean and noisy audio signal. Finally, (3) we show that our method generalises to other languages, specifically French and German, and achieves a comparable performance to English with less language specific data, by fine-tuning the network pre-trained on English. The method exceeds the performance of the previous state-of-the-art visual keyword spotting architecture when trained and tested on the same benchmark, and also that of a state-of-the-art lip reading method.

READ FULL TEXT

page 2

page 3

page 6

page 8

page 17

research
10/29/2021

Visual Keyword Spotting with Attention

In this paper, we consider the task of spotting spoken keywords in silen...
research
08/17/2020

WSRNet: Joint Spotting and Recognition of Handwritten Words

In this work, we present a unified model that can handle both Keyword Sp...
research
12/09/2021

LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading

The aim of this work is to investigate the impact of crossmodal self-sup...
research
11/08/2021

Cascaded Multilingual Audio-Visual Learning from Videos

In this paper, we explore self-supervised audio-visual models that learn...
research
06/08/2023

Matching Latent Encoding for Audio-Text based Keyword Spotting

Using audio and text embeddings jointly for Keyword Spotting (KWS) has s...
research
08/31/2023

PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords

This study presents a novel zero-shot user-defined keyword spotting mode...
research
02/17/2022

A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning

Audio-only-based wake word spotting (WWS) is challenging under noisy con...

Please sign up or login with your details

Forgot password? Click here to reset