This work aims to build a multilingual text-to-speech (TTS) synthesis sy...
One of the limitations in end-to-end automatic speech recognition framew...
We present an expanded version of our previously released Kazakh
text-to...
We present the development of a dataset for Kazakh named entity recognit...
In this paper, we study an approach to multimodal person verification us...
We study training a single end-to-end (E2E) automatic speech recognition...
We present a freely available speech corpus for the Uzbek language and r...
This paper introduces a high-quality open-source speech synthesis datase...
We present SpeakingFaces as a publicly-available large-scale multimodal
...
Automatic speech recognition (ASR) for under-represented named-entity (U...
We present an open-source speech corpus for the Kazakh language. The Kaz...
In this work, we study leveraging extra text data to improve low-resourc...
In this paper, we present a series of complementary approaches to improv...
The attention-based end-to-end (E2E) automatic speech recognition (ASR)
...
The lack of code-switch training data is one of the major concerns in th...
The neural language models (NLM) achieve strong generalization capabilit...
Code-switching (CS) refers to a linguistic phenomenon where a speaker us...
In automatic speech recognition (ASR) systems, recurrent neural network
...