A Textless Metric for Speech-to-Speech Comparison

10/21/2022
by   Laurent Besacier, et al.
6

This paper proposes a textless speech-to-speech comparison metric that allows comparing a speech hypothesis with a speech reference without falling-back to their text transcripts. We leverage recently proposed speech2unit encoders (such as HuBERT) to pseudo-transcribe the speech utterances into discrete acoustic units and propose a simple neural architecture that learns a speech-based metric which correlates well with its text-based counterpart. Such a textless metric could ultimately be interesting for speech-to-speech translation evaluation (for oral languages or languages with no reliable ASR system available).

READ FULL TEXT
research
09/14/2023

Direct Text to Speech Translation System using Acoustic Units

This paper proposes a direct text to speech translation system using dis...
research
10/26/2021

Assessing Evaluation Metrics for Speech-to-Speech Translation

Speech-to-speech translation combines machine translation with speech sy...
research
04/27/2023

Understanding Shared Speech-Text Representations

Recently, a number of approaches to train speech models by incorpo-ratin...
research
12/16/2022

BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

End-to-End speech-to-speech translation (S2ST) is generally evaluated wi...
research
02/01/2021

Generative Spoken Language Modeling from Raw Audio

Generative spoken language modeling involves learning jointly the acoust...
research
03/28/2022

vTTS: visual-text to speech

This paper proposes visual-text to speech (vTTS), a method for synthesiz...
research
09/15/2023

Diversity-based core-set selection for text-to-speech with linguistic and acoustic features

This paper proposes a method for extracting a lightweight subset from a ...

Please sign up or login with your details

Forgot password? Click here to reset