FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

05/25/2022
by   Alexis Conneau, et al.
0

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable speech technology in more languages and catalyze research in low-resource speech understanding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2021

The Multilingual TEDx Corpus for Speech Recognition and Translation

We present the Multilingual TEDx corpus, built to support speech recogni...
research
04/17/2020

AlloVera: A Multilingual Allophone Database

We introduce a new resource, AlloVera, which provides mappings from 218 ...
research
04/04/2023

Sociocultural knowledge is needed for selection of shots in hate speech detection tasks

We introduce HATELEXICON, a lexicon of slurs and targets of hate speech ...
research
05/18/2023

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

Speech processing Universal PERformance Benchmark (SUPERB) is a leaderbo...
research
02/24/2023

Pre-Finetuning for Few-Shot Emotional Speech Recognition

Speech models have long been known to overfit individual speakers for ma...
research
02/25/2020

Towards Learning a Universal Non-Semantic Representation of Speech

The ultimate goal of transfer learning is to reduce labeled data require...

Please sign up or login with your details

Forgot password? Click here to reset