Visual Keyword Spotting with Attention

10/29/2021
by   K R Prajwal, et al.
1

In this paper, we consider the task of spotting spoken keywords in silent video sequences – also known as visual keyword spotting. To this end, we investigate Transformer-based models that ingest two streams, a visual encoding of the video and a phonetic encoding of the keyword, and output the temporal location of the keyword if present. Our contributions are as follows: (1) We propose a novel architecture, the Transpotter, that uses full cross-modal attention between the visual and phonetic streams; (2) We show through extensive evaluations that our model outperforms the prior state-of-the-art visual keyword spotting and lip reading methods on the challenging LRW, LRS2, LRS3 datasets by a large margin; (3) We demonstrate the ability of our model to spot words under the extreme conditions of isolated mouthings in sign language videos.

READ FULL TEXT

page 4

page 9

page 10

page 20

research
10/12/2022

Towards visually prompted keyword localisation for zero-resource spoken languages

Imagine being able to show a system a visual depiction of a keyword and ...
research
09/02/2020

Seeing wake words: Audio-visual Keyword Spotting

The goal of this work is to automatically determine whether and when a w...
research
06/30/2022

Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

In this paper, we propose a novel end-to-end user-defined keyword spotti...
research
07/23/2018

Zero-shot keyword spotting for visual speech recognition in-the-wild

Visual keyword spotting (KWS) is the problem of estimating whether a tex...
research
01/03/2020

A Multi-oriented Chinese Keyword Spotter Guided by Text Line Detection

Chinese keyword spotting is a challenging task as there is no visual bla...
research
12/02/2021

Improving Controllability of Educational Question Generation by Keyword Provision

Question Generation (QG) receives increasing research attention in NLP c...
research
10/18/2021

VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge

Keyword wakeup technology has always been a research hotspot in speech p...

Please sign up or login with your details

Forgot password? Click here to reset