ASR model deployment environment is ever-changing, and the incoming spee...
To let the state-of-the-art end-to-end ASR model enjoy data efficiency, ...
One of the limitations in end-to-end automatic speech recognition framew...
Code-switching (CS) refers to the phenomenon that languages switch withi...
Intermediate layer output (ILO) regularization by means of multitask tra...
Internal Language Model Estimation (ILME) based language model (LM) fusi...
An end-to-end (E2E) speech recognition model implicitly learns a biased
...
To realize robust end-to-end Automatic Speech Recognition(E2E ASR) under...
Automatic speech recognition (ASR) for under-represented named-entity (U...
We report our NTU-AISG Text-to-speech (TTS) entry systems for the Blizza...
Human can perform multi-task recognition from speech. For instance, huma...
In this work, we study leveraging extra text data to improve low-resourc...
In this paper, we present a series of complementary approaches to improv...
Existing approaches for fine-grained visual recognition focus on learnin...
The attention-based end-to-end (E2E) automatic speech recognition (ASR)
...
The lack of code-switch training data is one of the major concerns in th...
The neural language models (NLM) achieve strong generalization capabilit...
Code-switching (CS) refers to a linguistic phenomenon where a speaker us...
In this paper, we present our overall efforts to improve the performance...
We propose an Encoder-Classifier framework to model the Mandarin tones u...