Keyword Transformer: A Self-Attention Model for Keyword Spotting

04/01/2021
by   Axel Berg, et al.
0

The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6 the 12 and 35-command tasks respectively.

READ FULL TEXT
research
10/14/2021

Attention-Free Keyword Spotting

Till now, attention-based models have been used with great success in th...
research
09/21/2021

Audiomer: A Convolutional Transformer for Keyword Spotting

Transformers have seen an unprecedented rise in Natural Language Process...
research
07/23/2021

SALADnet: Self-Attentive multisource Localization in the Ambisonics Domain

In this work, we propose a novel self-attention based neural network for...
research
08/26/2021

Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs?

In this paper we address the problem of fine-tuned text generation with ...
research
04/07/2023

Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts

Scholars in the humanities rely heavily on ancient manuscripts to study ...
research
08/27/2019

Multiresolution Transformer Networks: Recurrence is Not Essential for Modeling Hierarchical Structure

The architecture of Transformer is based entirely on self-attention, and...
research
07/01/2021

Action Transformer: A Self-Attention Model for Short-Time Human Action Recognition

Deep neural networks based purely on attention have been successful acro...

Please sign up or login with your details

Forgot password? Click here to reset