EURO: ESPnet Unsupervised ASR Open-source Toolkit

11/30/2022
by   Dongji Gao, et al.
0

This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extends the functionality and promotes reproducibility for UASR tasks by integrating S3PRL and k2, resulting in flexible frontends from 27 self-supervised models and various graph-based decoding strategies. EURO is implemented in ESPnet and follows its unified pipeline to provide UASR recipes with a complete setup. This improves the pipeline's efficiency and allows EURO to be easily applied to existing datasets in ESPnet. Extensive experiments on three mainstream self-supervised models demonstrate the toolkit's effectiveness and achieve state-of-the-art UASR performance on TIMIT and LibriSpeech datasets. EURO will be publicly available at https://github.com/espnet/espnet, aiming to promote this exciting and emerging research area based on UASR through open-source activity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2019

ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit

This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit ...
research
12/31/2022

Lightmorphic Signatures Analysis Toolkit

In this paper we discuss the theory used in the design of an open source...
research
10/12/2021

S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

This paper introduces S3PRL-VC, an open-source voice conversion (VC) fra...
research
05/16/2018

MOABB: Trustworthy algorithm benchmarking for BCIs

BCI algorithm development has long been hampered by two major issues: sm...
research
05/29/2023

CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice

Despite the recent advancements in Automatic Speech Recognition (ASR), t...
research
10/30/2022

WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit

Keyword spotting (KWS) enables speech-based user interaction and gradual...
research
10/15/2021

ESPnet2-TTS: Extending the Edge of TTS Research

This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS)...

Please sign up or login with your details

Forgot password? Click here to reset