Speech is a fundamental means of communication that can be seen to provi...
Explainable AI (XAI) techniques have been widely used to help explain an...
Holistic perception of affective attributes is an important human percep...
In Speech Emotion Recognition (SER), textual data is often used alongsid...
Fusing multiple modalities for affective computing tasks has proven effe...
English is the most widely spoken language in the world, used daily by
m...
We address quality assessment for neural network based ASR by providing
...
While modern Text-to-Speech (TTS) systems can produce speech rated highl...
As a sub-branch of affective computing, impression recognition, e.g.,
pe...
Self-supervised speech models have grown fast during the past few years ...
In this work, we unify several existing decoding strategies for punctuat...
We present a method for cross-lingual training an ASR system using absol...
Alongside acoustic information, linguistic features based on speech
tran...
People convey information extremely effectively through spoken interacti...
Typical ASR systems segment the input audio into utterances using purely...
Although the lower layers of a deep neural network learn features which ...
Self-attention models such as Transformers, which can capture temporal
r...
Recently, Transformers have shown competitive automatic speech recogniti...
Deep speaker embeddings have become the leading method for encoding spea...
In this work, we focus on improving ASR output segmentation in the conte...
We present a structured overview of adaptation algorithms for neural
net...
Recently, self-attention models such as Transformers have given competit...
Many recent works on deep speaker embeddings train their feature extract...
We propose a multi-scale octave convolution layer to learn robust speech...
Previous work has encouraged domain-invariance in deep speaker embedding...
Speaker adaptive training (SAT) of neural network acoustic models learns...
Raw waveform acoustic modelling has recently gained interest due to neur...
In this work, we investigate the use of embeddings for speaker-adaptive
...
Acoustic model adaptation to unseen test recordings aims to reduce the
m...
In the broadcast domain there is an abundance of related text data and
p...
We explore why deep convolutional neural networks (CNNs) with small
two-...
End-to-end approaches have recently become popular as a means of simplif...
The performance of automatic speech recognition systems can be improved ...
We study the problem of evaluating automatic speech recognition (ASR) sy...
We investigate different approaches for dialect identification in Arabic...