Models of Visually Grounded Speech Signal Pay Attention To Nouns: a Bilingual Experiment on English and Japanese

02/08/2019
by   William N. Havard, et al.
0

We investigate the behaviour of attention in neural models of visually grounded speech trained on two languages: English and Japanese. Experimental results show that attention focuses on nouns and this behaviour holds true for two very typologically different languages. We also draw parallels between artificial neural attention and human attention and show that neural attention focuses on word endings as it has been theorised for human attention. Finally, we investigate how two visually grounded monolingual models can be used to perform cross-lingual speech-to-speech retrieval. For both languages, the enriched bilingual (speech-image) corpora with part-of-speech tags and forced alignments are distributed to the community for reproducible research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2022

YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding

Visually grounded speech (VGS) models are trained on images paired with ...
research
02/01/2023

Visually Grounded Keyword Detection and Localisation for Low-Resource Languages

This study investigates the use of Visually Grounded Speech (VGS) models...
research
08/06/2021

Cross-lingual Capsule Network for Hate Speech Detection in Social Media

Most hate speech detection research focuses on a single language, genera...
research
04/09/2018

Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech

In this paper, we explore the learning of neural network embeddings for ...
research
07/14/2021

ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language Modelling track, 2021 edition

We present the visually-grounded language modelling track that was intro...
research
05/19/2023

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode

In this paper, we show that representations capturing syllabic units eme...
research
11/21/2019

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech

In this paper, we present a method for learning discrete linguistic unit...

Please sign up or login with your details

Forgot password? Click here to reset