Generating Adversarial Samples For Training Wake-up Word Detection Systems Against Confusing Words

01/01/2022
by   Haoxu Wang, et al.
0

Wake-up word detection models are widely used in real life, but suffer from severe performance degradation when encountering adversarial samples. In this paper we discuss the concept of confusing words in adversarial samples. Confusing words are commonly encountered, which are various kinds of words that sound similar to the predefined keywords. To enhance the wake word detection system's robustness against confusing words, we propose several methods to generate the adversarial confusing samples for simulating real confusing words scenarios in which we usually do not have any real confusing samples in the training set. The generated samples include concatenated audio, synthesized data, and partially masked keywords. Moreover, we use a domain embedding concatenated system to improve the performance. Experimental results show that the adversarial samples generated in our approach help improve the system's robustness in both the common scenario and the confusing words scenario. In addition, we release the confusing words testing database called HI-MIA-CW for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2020

Training Wake Word Detection with Synthesized Speech Data on Confusion Words

Confusing-words are commonly encountered in real-life keyword spotting a...
research
10/06/2022

How Far Are We from Real Synonym Substitution Attacks?

In this paper, we explore the following question: how far are we from re...
research
07/09/2020

RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound Synthesis

Environmental sound synthesis is a technique for generating a natural en...
research
11/19/2015

Detection of Slang Words in e-Data using semi-Supervised Learning

The proposed algorithmic approach deals with finding the sense of a word...
research
07/10/2017

Towards Crafting Text Adversarial Samples

Adversarial samples are strategically modified samples, which are crafte...
research
03/18/2021

Impressions2Font: Generating Fonts by Specifying Impressions

Various fonts give us various impressions, which are often represented b...
research
02/28/2022

Robust Textual Embedding against Word-level Adversarial Attacks

We attribute the vulnerability of natural language processing models to ...

Please sign up or login with your details

Forgot password? Click here to reset