A Textless Metric for Speech-to-Speech Comparison

10/21/2022
by   Laurent Besacier, et al.
6

This paper proposes a textless speech-to-speech comparison metric that allows comparing a speech hypothesis with a speech reference without falling-back to their text transcripts. We leverage recently proposed speech2unit encoders (such as HuBERT) to pseudo-transcribe the speech utterances into discrete acoustic units and propose a simple neural architecture that learns a speech-based metric which correlates well with its text-based counterpart. Such a textless metric could ultimately be interesting for speech-to-speech translation evaluation (for oral languages or languages with no reliable ASR system available).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset