Streaming keyword spotting on mobile devices

05/14/2020
by   Oleg Rybakov, et al.
0

In this work we explore the latency and accuracy of keyword spotting (KWS) models in streaming and non-streaming modes on mobile phones. NN model conversion from non-streaming mode (model receives the whole input sequence and then returns the classification result) to streaming mode (model receives portion of the input sequence and classifies it incrementally) may require manual model rewriting. We address this by designing a Tensorflow/Keras based library which allows automatic conversion of non-streaming models to streaming ones with minimum effort. With this library we benchmark multiple KWS models in both streaming and non-streaming modes on mobile phones and demonstrate different tradeoffs between latency and accuracy. We also explore novel KWS models with multi-head attention which reduce the classification error over the state-of-art by 10 library with all experiments is open-sourced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning

The unified streaming and non-streaming speech recognition model has ach...
research
06/15/2022

Streaming non-autoregressive model for any-to-many voice conversion

Voice conversion models have developed for decades, and current mainstre...
research
03/01/2022

Real time spectrogram inversion on mobile phone

With the growth of computing power on mobile phones and privacy concerns...
research
05/21/2023

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

Voice conversion is an increasingly popular technology, and the growing ...
research
10/25/2022

Streaming Parrotron for on-device speech-to-speech conversion

We present a fully on-device and streaming Speech-To-Speech (STS) conver...
research
12/03/2020

AugSplicing: Synchronized Behavior Detection in Streaming Tensors

How can we track synchronized behavior in a stream of time-stamped tuple...
research
07/20/2023

Globally Normalising the Transducer for Streaming Speech Recognition

The Transducer (e.g. RNN-Transducer or Conformer-Transducer) generates a...

Please sign up or login with your details

Forgot password? Click here to reset