Query-by-example Spoken Term Detection using Attention-based Multi-hop Networks

09/01/2017
by   Chia-Wei Ao, et al.
0

Retrieving spoken content with spoken queries, or query-by- example spoken term detection (STD), is attractive because it makes possible the matching of signals directly on the acoustic level without transcribing them into text. Here, we propose an end-to-end query-by-example STD model based on an attention-based multi-hop network, whose input is a spoken query and an audio segment containing several utterances; the output states whether the audio segment includes the query. The model can be trained in either a supervised scenario using labeled data, or in an unsupervised fashion. In the supervised scenario, we find that the attention mechanism and multiple hops improve performance, and that the attention weights indicate the time span of the detected terms. In the unsupervised setting, the model mimics the behavior of the existing query-by-example STD system, yielding performance comparable to the existing system but with a lower search time complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2016

Hierarchical Attention Model for Improved Machine Comprehension of Spoken Content

Multimedia or spoken content presents more attractive information than p...
research
09/07/2015

Unsupervised Spoken Term Detection with Spoken Queries by Multi-level Acoustic Patterns with Varying Model Granularity

This paper presents a new approach for unsupervised Spoken Term Detectio...
research
06/30/2019

Multilingual Bottleneck Features for Query by Example Spoken Term Detection

State of the art solutions to query by example spoken term detection (Qb...
research
08/23/2016

Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine

Multimedia or spoken content presents more attractive information than p...
research
11/24/2020

Acoustic span embeddings for multilingual query-by-example search

Query-by-example (QbE) speech search is the task of matching spoken quer...
research
11/19/2019

Neural Network based End-to-End Query by Example Spoken Term Detection

This paper focuses on the problem of query by example spoken term detect...
research
04/01/2018

Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings

Unsupervised discovery of acoustic tokens from audio corpora without ann...

Please sign up or login with your details

Forgot password? Click here to reset