Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples

03/30/2023
by   Hyeonggon Ryu, et al.
0

The objective of this work is to explore the learning of visually grounded speech models (VGS) from multilingual perspective. Bilingual VGS models are generally trained with an equal number of spoken captions from both languages. However, in reality, there can be an imbalance among the languages for the available spoken captions. Our key contribution in this work is to leverage the power of a high-resource language in a bilingual visually grounded speech model to improve the performance of a low-resource language. We introduce two methods to distill the knowledge of high-resource language into low-resource languages: (1) incorporating a strong pre-trained high-resource language encoder and (2) using semantically similar spoken captions. Our experiments show that combining these two approaches effectively enables the low-resource language to surpass the performances of monolingual and bilingual counterparts for cross-modal retrieval tasks.

READ FULL TEXT

page 2

page 4

research
10/06/2020

Textual Supervision for Visually Grounded Spoken Language Understanding

Visually-grounded models of spoken language understanding extract semant...
research
08/21/2019

Improving Captioning for Low-Resource Languages by Cycle Consistency

Improving the captioning performance on low-resource languages by levera...
research
10/14/2021

Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Visually-grounded spoken language datasets can enable models to learn cr...
research
07/14/2021

ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language Modelling track, 2021 edition

We present the visually-grounded language modelling track that was intro...
research
04/27/2021

Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques

This survey provides an overview of the evolution of visually grounded m...
research
11/30/2022

An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

Automatic spoken language identification (LID) is a very important resea...
research
09/20/2018

Lessons learned in multilingual grounded language learning

Recent work has shown how to learn better visual-semantic embeddings by ...

Please sign up or login with your details

Forgot password? Click here to reset