Humans encode information into sounds by controlling articulators and de...
To build speech processing methods that can handle speech as naturally a...
Estimation of fundamental frequency (F0) in voiced segments of speech
si...
Generative deep neural networks are widely used for speech synthesis, bu...
Recent self-supervised learning (SSL) models have proven to learn rich
r...
In the articulatory synthesis task, speech is synthesized from input fea...
In order for AI to be safely deployed in real-world scenarios such as
ho...
Speech processing systems currently do not support the vast majority of
...
This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS)...
Learning multimodal representations involves integrating information fro...
Existing approaches to ensuring privacy of user speech data primarily fo...
The natural world is abundant with concepts expressed via visual, acoust...
Existing multilingual speech NLP works focus on a relatively small subse...
The Domain Name System (DNS) is the foundation of a human-usable Interne...
Modern federated networks, such as those comprised of wearable devices,
...
In this project, we extend the state-of-the-art CheXNet (Rajpurkar et al...