Learning Cross-lingual Visual Speech Representations

03/14/2023
by   Andreas Zinonos, et al.
0

Cross-lingual self-supervised learning has been a growing research topic in the last few years. However, current works only explored the use of audio signals to create representations. In this work, we study cross-lingual self-supervised visual representation learning. We use the recently-proposed Raw Audio-Visual Speech Encoders (RAVEn) framework to pre-train an audio-visual model with unlabelled multilingual data, and then fine-tune the visual model on labelled transcriptions. Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance; (2) multi-lingual outperforms English-only pre-training; (3) using languages which are more similar yields better results; and (4) fine-tuning on unseen languages is competitive to using the target language in the pre-training set. We hope our study inspires future research on non-English-only speech representation learning.

READ FULL TEXT
research
07/15/2021

CLSRIL-23: Cross Lingual Speech Representations for Indic Languages

We present a CLSRIL-23, a self supervised learning based audio pre-train...
research
05/09/2023

Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models

Self-supervised learning (SSL) has been dramatically successful not only...
research
10/10/2021

Injecting Text and Cross-lingual Supervision in Few-shot Learning from Self-Supervised Models

Self-supervised model pre-training has recently garnered significant int...
research
11/17/2021

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

This paper presents XLS-R, a large-scale model for cross-lingual speech ...
research
05/21/2023

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

Recent models such as XLS-R and Whisper have made multilingual speech te...
research
03/21/2022

XTREME-S: Evaluating Cross-lingual Speech Representations

We introduce XTREME-S, a new benchmark to evaluate universal cross-lingu...
research
10/26/2022

Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

In this paper, we elaborate upon recipes for building multilingual repre...

Please sign up or login with your details

Forgot password? Click here to reset