Listening while Speaking: Speech Chain by Deep Learning

07/16/2017
by   Andros Tjandra, et al.
0

Despite the close relationship between speech perception and production, research in automatic speech recognition (ASR) and text-to-speech synthesis (TTS) has progressed more or less independently without exerting much mutual influence on each other. In human communication, on the other hand, a closed-loop speech chain mechanism with auditory feedback from the speaker's mouth to her ear is crucial. In this paper, we take a step further and develop a closed-loop speech chain model based on deep learning. The sequence-to-sequence model in close-loop architecture allows us to train our model on the concatenation of both labeled and unlabeled data. While ASR transcribes the unlabeled speech features, TTS attempts to reconstruct the original speech waveform based on the text from ASR. In the opposite direction, ASR also attempts to reconstruct the original text transcription given the synthesized speech. To the best of our knowledge, this is the first deep learning model that integrates human speech perception and production behaviors. Our experimental results show that the proposed approach significantly improved the performance more than separate systems that were only trained with labeled data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2018

Machine Speech Chain with One-shot Speaker Adaptation

In previous work, we developed a closed-loop speech chain model based on...
research
10/31/2018

End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator

The speech chain mechanism integrates automatic speech recognition (ASR)...
research
11/04/2020

Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework

Previous research has proposed a machine speech chain to enable automati...
research
11/04/2020

Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time

Inspired by a human speech chain mechanism, a machine speech chain frame...
research
06/03/2019

From Speech Chain to Multimodal Chain: Leveraging Cross-modal Data Augmentation for Semi-supervised Learning

The most common way for humans to communicate is by speech. But perhaps ...
research
01/08/2023

SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain

This paper introduces SpeeChain, an open-source Pytorch-based toolkit de...
research
04/07/2022

Correcting Misproducted Speech using Spectrogram Inpainting

Learning a new language involves constantly comparing speech productions...

Please sign up or login with your details

Forgot password? Click here to reset