Additional Shared Decoder on Siamese Multi-view Encoders for Learning Acoustic Word Embeddings

10/01/2019
by   Myunghun Jung, et al.
0

Acoustic word embeddings — fixed-dimensional vector representations of arbitrary-length words — have attracted increasing interest in query-by-example spoken term detection. Recently, on the fact that the orthography of text labels partly reflects the phonetic similarity between the words' pronunciation, a multi-view approach has been introduced that jointly learns acoustic and text embeddings. It showed that it is possible to learn discriminative embeddings by designing the objective which takes text labels as well as word segments. In this paper, we propose a network architecture that expands the multi-view approach by combining the Siamese multi-view encoders with a shared decoder network to maximize the effect of the relationship between acoustic and text embeddings in embedding space. Discriminatively trained with multi-view triplet loss and decoding loss, our proposed approach achieves better performance on acoustic word discrimination task with the WSJ dataset, resulting in 11.1 present experimental results on cross-view word discrimination and word level speech recognition tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2016

Multi-view Recurrent Neural Acoustic Word Embeddings

Recent work has begun exploring neural acoustic word embeddings---fixed-...
research
03/30/2022

Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings

Acoustic word embeddings (AWEs) are discriminative representations of sp...
research
11/07/2018

Learning acoustic word embeddings with phonetically associated triplet network

Previous researches on acoustic word embeddings used in query-by-example...
research
12/20/2014

Weakly Supervised Multi-Embeddings Learning of Acoustic Models

We trained a Siamese network with multi-task same/different information ...
research
07/20/2020

Acoustic Neighbor Embeddings

This paper proposes a novel acoustic word embedding called Acoustic Neig...
research
08/01/2019

Learning Joint Acoustic-Phonetic Word Embeddings

Most speech recognition tasks pertain to mapping words across two modali...
research
09/05/2020

A multi-view approach for Mandarin non-native mispronunciation verification

Traditionally, the performance of non-native mispronunciation verificati...

Please sign up or login with your details

Forgot password? Click here to reset