XTREME-S: Evaluating Cross-lingual Speech Representations

03/21/2022
by   Alexis Conneau, et al.
0

We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines using XLS-R and mSLAM on all downstream tasks. We motivate the design choices and detail how to use the benchmark. Datasets and fine-tuning scripts are made easily accessible at https://hf.co/datasets/google/xtreme_s.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2022

SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation

We propose the SAMU-XLSR: Semantically-Aligned Multimodal Utterance-leve...
research
06/24/2020

Unsupervised Cross-lingual Representation Learning for Speech Recognition

This paper presents XLSR which learns cross-lingual speech representatio...
research
09/15/2023

Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection

Existing deepfake speech detection systems lack generalizability to unse...
research
11/17/2021

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

This paper presents XLS-R, a large-scale model for cross-lingual speech ...
research
03/14/2023

Learning Cross-lingual Visual Speech Representations

Cross-lingual self-supervised learning has been a growing research topic...
research
03/09/2022

Language Adaptive Cross-lingual Speech Representation Learning with Sparse Sharing Sub-networks

Unsupervised cross-lingual speech representation learning (XLSR) has rec...
research
06/05/2023

Colexifications for Bootstrapping Cross-lingual Datasets: The Case of Phonology, Concreteness, and Affectiveness

Colexification refers to the linguistic phenomenon where a single lexica...

Please sign up or login with your details

Forgot password? Click here to reset