Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search

06/10/2018
by   Yougen Yuan, et al.
0

We propose to learn acoustic word embeddings with temporal context for query-by-example (QbE) speech search. The temporal context includes the leading and trailing word sequences of a word. We assume that there exist spoken word pairs in the training database. We pad the word pairs with their original temporal context to form fixed-length speech segment pairs. We obtain the acoustic word embeddings through a deep convolutional neural network (CNN) which is trained on the speech segment pairs with a triplet loss. Shifting a fixed-length analysis window through the search content, we obtain a running sequence of embeddings. In this way, searching for the spoken query is equivalent to the matching of acoustic word embeddings. The experiments show that our proposed acoustic word embeddings learned with temporal context are effective in QbE speech search. They outperform the state-of-the-art frame-level feature representations and reduce run-time computation since no dynamic time warping is required in QbE speech search. We also find that it is important to have sufficient speech segment pairs to train the deep CNN for effective acoustic word embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2015

Deep convolutional acoustic word embeddings using word-pair side information

Recent studies have been revisiting whole words as the basic modelling u...
research
09/18/2021

Fast query-by-example speech search using separable model

Traditional Query-by-Example (QbE) speech search approaches usually use ...
research
05/24/2020

Acoustic Word Embedding System for Code-Switching Query-by-example Spoken Term Detection

In this paper, we propose a deep convolutional neural network-based acou...
research
06/12/2017

Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings

Query-by-example search often uses dynamic time warping (DTW) for compar...
research
11/07/2018

Learning acoustic word embeddings with phonetically associated triplet network

Previous researches on acoustic word embeddings used in query-by-example...
research
11/12/2018

Analyzing deep CNN-based utterance embeddings for acoustic model adaptation

We explore why deep convolutional neural networks (CNNs) with small two-...
research
09/19/2021

Conditional probing: measuring usable information beyond a baseline

Probing experiments investigate the extent to which neural representatio...

Please sign up or login with your details

Forgot password? Click here to reset