With recent rapid growth of large language models (LLMs), discrete speec...
As a subjective metric to evaluate the quality of synthesized speech, Me...
How can speech-to-text translation (ST) perform as well as machine
trans...
The advancement of audio-language (AL) multimodal learning tasks has bee...
How to solve the data scarcity problem for end-to-end speech-to-text
tra...
Audio captioning is the task of generating captions that describe the co...
Speech is the surface form of a finite set of phonetic units, which can ...
In human speech, the attitude of a speaker cannot be fully expressed onl...
Direct Speech-to-speech translation (S2ST) has drawn more and more atten...
This paper introduces GigaST, a large-scale pseudo speech translation (S...
This paper studies a novel pre-training technique with unpaired speech d...
Automated audio captioning aims to use natural language to describe the
...
Automated Audio captioning (AAC) is a cross-modal translation task that ...
Machine Speech Chain, which integrates both end-to-end (E2E) automatic s...
Auto-KWS 2021 challenge calls for automated machine learning (AutoML)
so...
The AutoSpeech challenge calls for automated machine learning (AutoML)
s...
Model-Agnostic Meta-Learning (MAML) and its variants are popular few-sho...
Keyword spotting with limited training data is a challenging task which ...