JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT

10/05/2022
by   Mayumi Ohta, et al.
6

JoeyS2T is a JoeyNMT extension for speech-to-text tasks such as automatic speech recognition and end-to-end speech translation. It inherits the core philosophy of JoeyNMT, a minimalist NMT toolkit built on PyTorch, seeking simplicity and accessibility. JoeyS2T's workflow is self-contained, starting from data pre-processing, over model training and prediction to evaluation, and is seamlessly integrated into JoeyNMT's compact and simple code base. On top of JoeyNMT's state-of-the-art Transformer-based encoder-decoder architecture, JoeyS2T provides speech-oriented components such as convolutional layers, SpecAugment, CTC-loss, and WER evaluation. Despite its simplicity compared to prior implementations, JoeyS2T performs competitively on English speech recognition and English-to-German speech translation benchmarks. The implementation is accompanied by a walk-through tutorial and available on https://github.com/may-/joeys2t.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2020

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) m...
research
06/05/2018

LSTM Benchmarks for Deep Learning Frameworks

This study provides benchmarks for different implementations of LSTM uni...
research
10/24/2018

The MeMAD Submission to the IWSLT 2018 Speech Translation Task

This paper describes the MeMAD project entry to the IWSLT Speech Transla...
research
05/24/2023

Unit-based Speech-to-Speech Translation Without Parallel Data

We propose an unsupervised speech-to-speech translation (S2ST) system th...
research
12/21/2021

Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement

End-to-end speech-to-text translation (E2E-ST) is becoming increasingly ...
research
12/23/2020

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans

This paper describes the recent development of ESPnet (https://github.co...
research
05/18/2022

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

Direct Speech-to-speech translation (S2ST) has drawn more and more atten...

Please sign up or login with your details

Forgot password? Click here to reset