Zero-shot keyword spotting for visual speech recognition in-the-wild

07/23/2018
by   Themos Stafylakis, et al.
2

Visual keyword spotting (KWS) is the problem of estimating whether a text query occurs in a given recording using only video information. This paper focuses on visual KWS for words unseen during training, a real-world, practical setting which so far has received no attention by the community. To this end, we devise an end-to-end architecture comprising (a) a state-of-the-art visual feature extractor based on spatiotemporal Residual Networks, (b) a grapheme-to-phoneme model based on sequence-to-sequence neural networks, and (c) a stack of recurrent neural networks which learn how to correlate visual features with the keyword representation. Different to prior works on KWS, which try to learn word representations merely from sequences of graphemes (i.e. letters), we propose the use of a grapheme-to-phoneme encoder-decoder model which learns how to map words to their pronunciation. We demonstrate that our system obtains very promising visual-only KWS results on the challenging LRS2 database, for keywords unseen during training. We also show that our system outperforms a baseline which addresses KWS via automatic speech recognition (ASR), while it drastically improves over other recently proposed ASR-free KWS methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2023

End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations

Conventional keyword search systems operate on automatic speech recognit...
research
01/13/2017

End-to-End ASR-free Keyword Search from Speech

End-to-end (E2E) systems have achieved competitive results compared to c...
research
12/06/2019

Semantic Mask for Transformer based End-to-End Speech Recognition

Attention-based encoder-decoder model has achieved impressive results fo...
research
03/28/2022

Filler Word Detection and Classification: A Dataset and Benchmark

Filler words such as `uh' or `um' are sounds or words people use to sign...
research
10/29/2021

Visual Keyword Spotting with Attention

In this paper, we consider the task of spotting spoken keywords in silen...
research
10/30/2017

Deep word embeddings for visual speech recognition

In this paper we present a deep learning architecture for extracting wor...
research
08/31/2023

PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords

This study presents a novel zero-shot user-defined keyword spotting mode...

Please sign up or login with your details

Forgot password? Click here to reset