MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

03/01/2023
by   Mohamed Anwar, et al.
0

We introduce MuAViC, a multilingual audio-visual corpus for robust speech recognition and robust speech-to-text translation providing 1200 hours of audio-visual speech in 9 languages. It is fully transcribed and covers 6 English-to-X translation as well as 6 X-to-English translation directions. To the best of our knowledge, this is the first open benchmark for audio-visual speech-to-text translation and the largest open benchmark for multilingual audio-visual speech recognition. Our baseline results show that MuAViC is effective for building noise-robust speech recognition and translation models. We make the corpus available at https://github.com/facebookresearch/muavic.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2023

Speech Wikimedia: A 77 Language Multilingual Speech Dataset

The Speech Wikimedia Dataset is a publicly available compilation of audi...
research
08/22/2023

SeamlessM4T-Massively Multilingual Multimodal Machine Translation

What does it take to create the Babel Fish, a tool that can help individ...
research
07/18/2023

OxfordVGG Submission to the EGO4D AV Transcription Challenge

This report presents the technical details of our submission on the EGO4...
research
05/18/2023

Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization

We investigate the emergent abilities of the recently proposed web-scale...
research
02/15/2022

Automatic Depression Detection: An Emotional Audio-Textual Corpus and a GRU/BiLSTM-based Model

Depression is a global mental health problem, the worst case of which ca...
research
01/02/2021

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

We introduce VoxPopuli, a large-scale multilingual corpus providing 100K...
research
08/08/2022

A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation

In this paper, we introduce a high-quality and large-scale benchmark dat...

Please sign up or login with your details

Forgot password? Click here to reset