10 hours data is all you need

10/24/2022
by   Zeping Min, et al.
0

We propose a novel procedure to generate pseudo mandarin speech data named as CAMP (character audio mix up), which aims at generating audio from a character scale. We also raise a method for building a mandarin character scale audio database adaptive to CAMP named as META-AUDIO, which makes full use of audio data and can greatly increase the data diversity of the database. Experiments show that our CAMP method is simple and quite effective. For example, we train models with 10 hours of audio data in AISHELL-1 and pseudo audio data generated by CAMP, and achieve a competitive 11.07 character error rate (CER). Besides, we also perform training with only 10 hours of audio data in AIDATATANG dataset and pseudo audio data generated by CAMP, which again achieves a competitive 8.26 CER.

READ FULL TEXT
research
05/19/2023

Unsupervised ASR via Cross-Lingual Pseudo-Labeling

Recent work has shown that it is possible to train an unsupervised autom...
research
10/06/2020

Digital Voicing of Silent Speech

In this paper, we consider the task of digitally voicing silent speech, ...
research
06/13/2021

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

This paper introduces GigaSpeech, an evolving, multi-domain English spee...
research
08/02/2021

Decoupling recognition and transcription in Mandarin ASR

Much of the recent literature on automatic speech recognition (ASR) is t...
research
10/07/2021

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus co...
research
12/05/2022

Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight

State-of-the-art activity recognizers are effective during the day, but ...
research
03/09/2020

Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading

Lip-reading aims to infer the speech content from the lip movement seque...

Please sign up or login with your details

Forgot password? Click here to reset