Visually grounded few-shot word acquisition with fewer shots

05/25/2023
by   Leanne Nortje, et al.
0

We propose a visually grounded speech model that acquires new words and their visual depictions from just a few word-image example pairs. Given a set of test images and a spoken query, we ask the model which image depicts the query word. Previous work has simplified this problem by either using an artificial setting with digit word-image pairs or by using a large number of examples per class. We propose an approach that can work on natural word-image pairs but with less examples, i.e. fewer shots. Our approach involves using the given word-image example pairs to mine new unsupervised word-image training pairs from large collections of unlabelled speech and images. Additionally, we use a word-to-image attention mechanism to determine word-image similarity. With this new model, we achieve better performance with fewer shots than any existing approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2023

Visually grounded few-shot word learning in low-resource settings

We propose a visually grounded speech model that learns new words and th...
research
05/31/2020

Learning to Recognise Words using Visually Grounded Speech

We investigated word recognition in a Visually Grounded Speech model. Th...
research
06/16/2021

Attention-Based Keyword Localisation in Speech using Visual Grounding

Visually grounded speech models learn from images paired with spoken cap...
research
09/09/2019

Language learning using Speech to Image retrieval

Humans learn language by interaction with their environment and listenin...
research
12/10/2020

Direct multimodal few-shot learning of speech and images

We propose direct multimodal few-shot models that learn a shared embeddi...
research
03/28/2022

Word Discovery in Visually Grounded, Self-Supervised Speech Models

We present a method for visually-grounded spoken term discovery. After t...
research
09/01/2020

Hearings and mishearings: decrypting the spoken word

We propose a model of the speech perception of individual words in the p...

Please sign up or login with your details

Forgot password? Click here to reset