Word Discovery in Visually Grounded, Self-Supervised Speech Models

03/28/2022
by   Puyuan Peng, et al.
5

We present a method for visually-grounded spoken term discovery. After training either a HuBERT or wav2vec2.0 model to associate spoken captions with natural images, we show that powerful word segmentation and clustering capability emerges within the model's self-attention heads. Our experiments reveal that this ability is not present to nearly the same extent in the base HuBERT and wav2vec2.0 models, suggesting that the visual grounding task is a crucial component of the word discovery capability we observe. We also evaluate our method on the Buckeye word segmentation and ZeroSpeech spoken term discovery tasks, where we outperform all currently published methods on several metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2023

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode

In this paper, we show that representations capturing syllabic units eme...
research
05/31/2020

Learning to Recognise Words using Visually Grounded Speech

We investigated word recognition in a Visually Grounded Speech model. Th...
research
07/26/2017

SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set

This paper presents an augmentation of MSCOCO dataset where speech is ad...
research
09/18/2019

Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech

In this paper, we study how word-like units are represented and activate...
research
08/07/2019

Self-Organizing Maps with Variable Input Length for Motif Discovery and Word Segmentation

Time Series Motif Discovery (TSMD) is defined as searching for patterns ...
research
05/25/2023

Visually grounded few-shot word acquisition with fewer shots

We propose a visually grounded speech model that acquires new words and ...
research
06/26/2023

Learning with Difference Attention for Visually Grounded Self-supervised Representations

Recent works in self-supervised learning have shown impressive results o...

Please sign up or login with your details

Forgot password? Click here to reset