Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

03/29/2018
by   Yi-Chen Chen, et al.
0

Automatic speech recognition (ASR) has been widely researched with supervised approaches, while many low-resourced languages lack audio-text aligned data, and supervised methods cannot be applied on them. In this work, we propose a framework to achieve unsupervised ASR on a read English speech dataset, where audio and text are unaligned. In the first stage, each word-level audio segment in the utterances is represented by a vector representation extracted by a sequence-of-sequence autoencoder, in which phonetic information and speaker information are disentangled. Secondly, semantic embeddings of audio segments are trained from the vector representations using a skip-gram model. Last but not the least, an unsupervised method is utilized to transform semantic embeddings of audio segments to text embedding space, and finally the transformed embeddings are mapped to words. With the above framework, we are towards unsupervised ASR trained by unaligned text and speech only.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2019

Almost Unsupervised Text to Speech and Automatic Speech Recognition

Text to speech (TTS) and automatic speech recognition (ASR) are two dual...
research
12/10/2021

Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech

Many of the recent advances in speech separation are primarily aimed at ...
research
03/15/2023

Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences

Past work on unsupervised parsing is constrained to written form. In thi...
research
10/28/2019

Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding

In this paper, we investigate the benefit that off-the-shelf word embedd...
research
04/03/2022

Automatic Dialect Density Estimation for African American English

In this paper, we explore automatic prediction of dialect density of the...
research
07/11/2023

Speech Diarization and ASR with GMM

In this research paper, we delve into the topics of Speech Diarization a...
research
11/05/2017

Learning Word Embeddings from Speech

In this paper, we propose a novel deep neural network architecture, Sequ...

Please sign up or login with your details

Forgot password? Click here to reset