Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction

10/20/2020
by   Daria Soboleva, et al.
30

We present a novel multi-modal unspoken punctuation prediction system for the English language which combines acoustic and text features. We demonstrate for the first time, that by relying exclusively on synthetic data generated using a prosody-aware text-to-speech system, we can outperform a model trained with expensive human audio recordings on the unspoken punctuation prediction problem. Our model architecture is well suited for on-device use. This is achieved by leveraging hash-based embeddings of automatic speech recognition text output in conjunction with acoustic features as input to a quasi-recurrent neural network, keeping the model size small and latency low.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2018

Resource aware design of a deep convolutional-recurrent neural network for speech recognition through audio-visual sensor fusion

Today's Automatic Speech Recognition systems only rely on acoustic signa...
research
08/13/2020

Textual Echo Cancellation

In this paper, we propose Textual Echo Cancellation (TEC) - a framework ...
research
08/11/2020

Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems

LPCNet is an efficient vocoder that combines linear prediction and deep ...
research
02/13/2022

Multimodal Depression Classification Using Articulatory Coordination Features And Hierarchical Attention Based Text Embeddings

Multimodal depression classification has gained immense popularity over ...
research
03/30/2022

Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities

We present a novel feasibility study on the automatic recognition of Exp...
research
02/21/2022

Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning Environments

Audio recordings of collaborative learning environments contain a consta...
research
10/26/2021

ViDA-MAN: Visual Dialog with Digital Humans

We demonstrate ViDA-MAN, a digital-human agent for multi-modal interacti...

Please sign up or login with your details

Forgot password? Click here to reset