UserLibri: A Dataset for ASR Personalization Using Only Text

07/02/2022
by   Theresa Breiner, et al.
0

Personalization of speech models on mobile devices (on-device personalization) is an active area of research, but more often than not, mobile devices have more text-only data than paired audio-text data. We explore training a personalized language model on text-only data, used during inference to improve speech recognition performance for that user. We experiment on a user-clustered LibriSpeech corpus, supplemented with personalized text-only data for each user from Project Gutenberg. We release this User-Specific LibriSpeech (UserLibri) dataset to aid future personalization research. LibriSpeech audio-transcript pairs are grouped into 55 users from the test-clean dataset and 52 users from test-other. We are able to lower the average word error rate per user across both sets in streaming and nonstreaming models, including an improvement of 2.5 for the harder set of test-other users when streaming.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2022

Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition

Recently, there has been an increasing interest in two-pass streaming en...
research
09/14/2019

An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models

Speaker-independent speech recognition systems trained with data from ma...
research
08/25/2016

Speech Is 3x Faster than Typing for English and Mandarin Text Entry on Mobile Devices

With laptops and desktops, the dominant method of text entry is the full...
research
10/13/2022

JOIST: A Joint Speech and Text Streaming Model For ASR

We present JOIST, an algorithm to train a streaming, cascaded, encoder e...
research
07/13/2023

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Recognition of personalized content remains a challenge in end-to-end sp...
research
12/14/2019

Personalization of End-to-end Speech Recognition On Mobile Devices For Named Entities

We study the effectiveness of several techniques to personalize end-to-e...
research
02/22/2022

ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users

Recent advances have enabled automatic sound recognition systems for dea...

Please sign up or login with your details

Forgot password? Click here to reset