Multimodal Punctuation Prediction with Contextual Dropout

02/12/2021
by   Andrew Silva, et al.
0

Automatic speech recognition (ASR) is widely used in consumer electronics. ASR greatly improves the utility and accessibility of technology, but usually the output is only word sequences without punctuation. This can result in ambiguity in inferring user-intent. We first present a transformer-based approach for punctuation prediction that achieves 8 2012 TED Task, beating the previous state of the art [1]. We next describe our multimodal model that learns from both text and audio, which achieves 8 improvement over the text-only algorithm on an internal dataset for which we have both the audio and transcriptions. Finally, we present an approach to learning a model using contextual dropout that allows us to handle variable amounts of future context at test time.

READ FULL TEXT
research
06/02/2020

Detecting Audio Attacks on ASR Systems with Dropout Uncertainty

Various adversarial audio attacks have recently been developed to fool a...
research
10/16/2020

Multimodal Speech Recognition with Unstructured Audio Masking

Visual context has been shown to be useful for automatic speech recognit...
research
06/11/2021

Improving RNN-T ASR Performance with Date-Time and Location Awareness

In this paper, we explore the benefits of incorporating context into a R...
research
08/17/2021

A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems

It's challenging to customize transducer-based automatic speech recognit...
research
08/21/2023

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

We present TokenSplit, a speech separation model that acts on discrete t...
research
08/08/2019

Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

In this paper, we explore various approaches for semi supervised learnin...
research
01/24/2022

Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

The punctuation restoration task aims to correctly punctuate the output ...

Please sign up or login with your details

Forgot password? Click here to reset