Log In Sign Up

Query-by-Example Keyword Spotting system using Multi-head Attention and Softtriple Loss

by   Jinmiao Huang, et al.

This paper proposes a neural network architecture for tackling the query-by-example user-defined keyword spotting task. A multi-head attention module is added on top of a multi-layered GRU for effective feature extraction, and a normalized multi-head attention module is proposed for feature aggregation. We also adopt the softtriple loss - a combination of triplet loss and softmax loss - and showcase its effectiveness. We demonstrate the performance of our model on internal datasets with different languages and the public Hey-Snips dataset. We compare the performance of our model to a baseline system and conduct an ablation study to show the benefit of each component in our architecture. The proposed work shows solid performance while preserving simplicity.


Orthogonality Constrained Multi-Head Attention For Keyword Spotting

Multi-head attention mechanism is capable of learning various representa...

Multi-query multi-head attention pooling and Inter-topK penalty for speaker verification

This paper describes the multi-query multi-head attention (MQMHA) poolin...

QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using MLPMixer

Current keyword spotting systems are typically trained with a large amou...

Multi-Head Decoder for End-to-End Speech Recognition

This paper presents a new network architecture called multi-head decoder...

Query-by-example on-device keyword spotting

A keyword spotting (KWS) system determines the existence of, usually pre...

Multi^2OIE: Multilingual Open Information Extraction based on Multi-Head Attention with BERT

In this paper, we propose Multi^2OIE, which performs open information ex...

Compositional Attention: Disentangling Search and Retrieval

Multi-head, key-value attention is the backbone of the widely successful...