Training end-to-end speech-to-text models on mobile phones

12/07/2021
by   Zitha S, et al.
0

Training the state-of-the-art speech-to-text (STT) models in mobile devices is challenging due to its limited resources relative to a server environment. In addition, these models are trained on generic datasets that are not exhaustive in capturing user-specific characteristics. Recently, on-device personalization techniques have been making strides in mitigating the problem. Although many current works have already explored the effectiveness of on-device personalization, the majority of their findings are limited to simulation settings or a specific smartphone. In this paper, we develop and provide a detailed explanation of our framework to train end-to-end models in mobile phones. To make it simple, we considered a model based on connectionist temporal classification (CTC) loss. We evaluated the framework on various mobile phones from different brands and reported the results. We provide enough evidence that fine-tuning the models and choosing the right hyperparameter values is a trade-off between the lowest WER achievable, training time on-device, and memory consumption. Hence, this is vital for a successful deployment of on-device training onto a resource-limited environment like mobile phones. We use training sets from speakers with different accents and record a 7.6 associated computational cost measurements with respect to time, memory usage, and cpu utilization in mobile phones in real-time.

READ FULL TEXT
research
06/15/2023

MobileASR: A resource-aware on-device personalisation framework for automatic speech recognition in mobile phones

We describe a comprehensive methodology for developing user-voice person...
research
09/14/2019

An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models

Speaker-independent speech recognition systems trained with data from ma...
research
09/17/2021

On-device neural speech synthesis

Recent advances in text-to-speech (TTS) synthesis, such as Tacotron and ...
research
03/24/2018

Mobile Device Type Substitution

Mobile users today interact with a variety of mobile device types includ...
research
10/01/2021

Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device

Streaming end-to-end speech recognition models have been widely applied ...
research
06/17/2023

Breaking On-device Training Memory Wall: A Systematic Survey

On-device training has become an increasingly popular approach to machin...
research
10/06/2016

Scalable Machine Translation in Memory Constrained Environments

Machine translation is the discipline concerned with developing automate...

Please sign up or login with your details

Forgot password? Click here to reset