CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting

09/18/2023
by   Yuang Li, et al.
0

End-to-end automatic speech recognition (ASR) systems often struggle to recognize rare name entities, such as personal names, organizations, or technical terms that are not frequently encountered in the training data. This paper presents Contextual Biasing Whisper (CB-Whisper), a novel ASR system based on OpenAI's Whisper model that performs keyword-spotting (KWS) before the decoder. The KWS module leverages text-to-speech (TTS) techniques and a convolutional neural network (CNN) classifier to match the features between the entities and the utterances. Experiments demonstrate that by incorporating predicted entities into a carefully designed spoken form prompt, the mixed-error-rate (MER) and entity recall of the Whisper model is significantly improved on three internal datasets and two open-sourced datasets that cover English-only, Chinese-only, and code-switching scenarios.

READ FULL TEXT
research
10/31/2018

Towards End-to-End Code-Switching Speech Recognition

Code-switching speech recognition has attracted an increasing interest r...
research
10/29/2018

Contextual Speech Recognition with Difficult Negative Training Examples

Improving the representation of contextual information is key to unlocki...
research
03/30/2023

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

End-to-End (E2E) automatic speech recognition (ASR) systems used in voic...
research
10/21/2022

Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent?

The usage of automatic speech recognition (ASR) systems are becoming omn...
research
12/14/2019

Personalization of End-to-end Speech Recognition On Mobile Devices For Named Entities

We study the effectiveness of several techniques to personalize end-to-e...
research
12/03/2021

BBS-KWS:The Mandarin Keyword Spotting System Won the Video Keyword Wakeup Challenge

This paper introduces the system submitted by the Yidun NISP team to the...
research
08/14/2023

Using Text Injection to Improve Recognition of Personal Identifiers in Speech

Accurate recognition of specific categories, such as persons' names, dat...

Please sign up or login with your details

Forgot password? Click here to reset