Visually Grounded Keyword Detection and Localisation for Low-Resource Languages

02/01/2023
by   Kayode Kolawole Olaleye, et al.
0

This study investigates the use of Visually Grounded Speech (VGS) models for keyword localisation in speech. The study focusses on two main research questions: (1) Is keyword localisation possible with VGS models and (2) Can keyword localisation be done cross-lingually in a real low-resource setting? Four methods for localisation are proposed and evaluated on an English dataset, with the best-performing method achieving an accuracy of 57 containing spoken captions in Yoruba language is also collected and released for cross-lingual keyword localisation. The cross-lingual model obtains a precision of 16 improved by initialising from a model pretrained on English data. The study presents a detailed analysis of the model's success and failure modes and highlights the challenges of using VGS models for keyword localisation in low-resource settings.

READ FULL TEXT

page 19

page 25

page 33

page 40

page 42

research
10/10/2022

YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding

Visually grounded speech (VGS) models are trained on images paired with ...
research
06/13/2018

Visually grounded cross-lingual keyword spotting in speech

Recent work considered how images paired with speech can be used as supe...
research
07/12/2023

Useful but Distracting: Keyword Highlights and Time-Synchronization in Captions for Language Learning

Captions provide language learners with a scaffold for comprehension and...
research
02/08/2019

Models of Visually Grounded Speech Signal Pay Attention To Nouns: a Bilingual Experiment on English and Japanese

We investigate the behaviour of attention in neural models of visually g...
research
05/02/2023

Contrastive Speech Mixup for Low-resource Keyword Spotting

Most of the existing neural-based models for keyword spotting (KWS) in s...
research
07/12/2022

Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource Devices

This work introduces BRILLsson, a novel binary neural network-based repr...
research
02/02/2022

Keyword localisation in untranscribed speech using visually grounded speech models

Keyword localisation is the task of finding where in a speech utterance ...

Please sign up or login with your details

Forgot password? Click here to reset