
-
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
We propose using self-supervised discrete representations for the task o...
read it
-
Generative Spoken Language Modeling from Raw Audio
Generative spoken language modeling involves learning jointly the acoust...
read it
-
High Fidelity Speech Regeneration with Application to Speech Enhancement
Speech enhancement has seen great improvement in recent years mainly thr...
read it
-
Single channel voice separation for unknown number of speakers under reverberant and noisy settings
We present a unified network for voice separation of an unknown number o...
read it
-
Fairness in the Eyes of the Data: Certifying Machine-Learning Models
We present a framework that allows to certify the fairness degree of a m...
read it
-
SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation
Most existing deep learning based binaural speaker separation systems fo...
read it
-
Unsupervised Cross-Domain Singing Voice Conversion
We present a wav-to-wav generative model for the task of singing voice c...
read it
-
Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation
We propose a self-supervised representation learning model for the task ...
read it
-
Real Time Speech Enhancement in the Waveform Domain
We present a causal speech enhancement model working on the raw waveform...
read it
-
Voice Separation with an Unknown Number of Multiple Speakers
We present a new method for separating a mixed audio sequence, in which ...
read it
-
On the generalization of bayesian deep nets for multi-class classification
Generalization bounds which assess the difference between the true risk ...
read it
-
Phoneme Boundary Detection using Learnable Segmental Features
Phoneme boundary detection plays an essential first step for a variety o...
read it
-
Hide and Speak: Deep Neural Networks for Speech Steganography
Steganography is the science of hiding a secret message within an ordina...
read it
-
To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition
Transcribed datasets typically contain speaker identity for each instanc...
read it
-
Out-of-Distribution Detection using Multiple Semantic Label Representations
Deep Neural Networks are powerful models that attained remarkable result...
read it
-
Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring
Deep Neural Networks have recently gained lots of success after enabling...
read it
-
Fooling End-to-end Speaker Verification by Adversarial Examples
Automatic speaker verification systems are increasingly used as the prim...
read it
-
Houdini: Fooling Deep Structured Prediction Models
Generating adversarial examples is a critical step for evaluating and im...
read it
-
Automatic Measurement of Pre-aspiration
Pre-aspiration is defined as the period of glottal friction occurring in...
read it
-
Learning Similarity Functions for Pronunciation Variations
A significant source of errors in Automatic Speech Recognition (ASR) sys...
read it
-
Automatic measurement of vowel duration via structured prediction
A key barrier to making phonetic studies scalable and replicable is the ...
read it
-
Sequence Segmentation Using Joint RNN and Structured Prediction Models
We describe and analyze a simple and effective algorithm for sequence se...
read it
-
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
There is a lot of research interest in encoding variable length sentence...
read it