Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech

03/23/2018
by   Yu-An Chung, et al.
0

In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the underlying spoken words, and are close to other vectors in the embedding space if their corresponding underlying spoken words are semantically similar. The proposed model can be viewed as a speech version of Word2Vec. Its design is based on a RNN Encoder-Decoder framework, and borrows the methodology of skipgrams or continuous bag-of-words for training. Learning word embeddings directly from speech enables Speech2Vec to make use of the semantic information carried by speech that does not exist in plain text. The learned word embeddings are evaluated and analyzed on 13 widely used word similarity benchmarks, and outperform word embeddings learned by Word2Vec from the transcriptions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2017

Learning Word Embeddings from Speech

In this paper, we propose a novel deep neural network architecture, Sequ...
research
07/06/2020

Contextualized Spoken Word Representations from Convolutional Autoencoders

A lot of work has been done recently to build sound language models for ...
research
08/07/2018

Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection

While Word2Vec represents words (in text) as vectors carrying semantic i...
research
04/03/2020

Analyzing autoencoder-based acoustic word embeddings

Recent studies have introduced methods for learning acoustic word embedd...
research
09/22/2022

Homophone Reveals the Truth: A Reality Check for Speech2Vec

Generating spoken word embeddings that possess semantic information is a...
research
02/17/2022

Word Embeddings for Automatic Equalization in Audio Mixing

In recent years, machine learning has been widely adopted to automate th...
research
08/12/2016

Redefining part-of-speech classes with distributional semantic models

This paper studies how word embeddings trained on the British National C...

Please sign up or login with your details

Forgot password? Click here to reset