Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection

08/07/2018
by   Yu-Hsuan Wang, et al.
0

While Word2Vec represents words (in text) as vectors carrying semantic information, audio Word2Vec was shown to be able to represent signal segments of spoken words as vectors carrying phonetic structure information. Audio Word2Vec can be trained in an unsupervised way from an unlabeled corpus, except the word boundaries are needed. In this paper, we extend audio Word2Vec from word-level to utterance-level by proposing a new segmental audio Word2Vec, in which unsupervised spoken word boundary segmentation and audio Word2Vec are jointly learned and mutually enhanced, so an utterance can be directly represented as a sequence of vectors carrying phonetic structure information. This is achieved by a segmental sequence-to-sequence autoencoder (SSAE), in which a segmentation gate trained with reinforcement learning is inserted in the encoder. Experiments on English, Czech, French and German show very good performance in both unsupervised spoken word segmentation and spoken term detection applications (significantly better than frame-based DTW).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2018

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech

In this paper, we propose a novel deep neural network architecture, Spee...
research
11/07/2018

Improved Audio Embeddings by Adjacency-Based Clustering with Applications in Spoken Term Detection

Embedding audio signal segments into vectors with fixed dimensionality i...
research
07/21/2018

Phonetic-and-Semantic Embedding of Spoken Words with Applications in Spoken Content Retrieval

Word embedding or Word2Vec has been successful in offering semantics for...
research
10/30/2022

Real-Time MRI Video synthesis from time aligned phonemes with sequence-to-sequence networks

Real-Time Magnetic resonance imaging (rtMRI) of the midsagittal plane of...
research
11/05/2017

Learning Word Embeddings from Speech

In this paper, we propose a novel deep neural network architecture, Sequ...
research
07/02/2021

Unsupervised Spoken Utterance Classification

An intelligent virtual assistant (IVA) enables effortless conversations ...
research
07/26/2020

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

Unsupervised spoken term discovery consists of two tasks: finding the ac...

Please sign up or login with your details

Forgot password? Click here to reset