Visually grounded speech systems learn from paired images and their spok...
The recently proposed Joint Energy-based Model (JEM) interprets
discrimi...
In recent studies, self-supervised pre-trained models tend to outperform...
Considering the abundance of unlabeled speech data and the high labeling...
Speech systems developed for a particular choice of acoustic domain and
...
The high cost of data acquisition makes Automatic Speech Recognition (AS...
Typically, unsupervised segmentation of speech into the phone and word-l...
Speech emotion recognition is the task of recognizing the speaker's emot...
In this paper, we propose a new approach to pathological speech synthesi...
Automatic detection of phoneme or word-like units is one of the core
obj...
This paper tackles automatically discovering phone-like acoustic units (...
Research in automatic speaker recognition (SR) has been undertaken for
s...
Current standard protocols used in the clinic for diagnosing COVID-19 in...
Data augmentation is a widely used strategy for training robust machine
...
The idea of combining multiple languages' recordings to train a single
a...
Only a handful of the world's languages are abundant with the resources ...