Timo Gerkmann

research

∙ 09/18/2023

Distilling HuBERT with LSTMs via Decoupled Knowledge Distillation

Much research effort is being applied to the task of compressing the kno...

0 Danilo de Oliveira, et al. ∙

research

∙ 09/18/2023

Single and Few-step Diffusion for Generative Speech Enhancement

Diffusion models have shown promising results in speech enhancement, usi...

0 Bunlong Lay, et al. ∙

research

∙ 09/14/2023

EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data

Speech emotion conversion is the task of converting the expressed emotio...

0 Navin Raj Prabhu, et al. ∙

research

∙ 09/13/2023

A Flexible Online Framework for Projection-Based STFT Phase Retrieval

Several recent contributions in the field of iterative STFT phase retrie...

0 Tal Peer, et al. ∙

research

∙ 06/22/2023

Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model

In this paper we present a method for single-channel wind noise reductio...

0 Jean-Marie Lemercier, et al. ∙

research

∙ 06/21/2023

Diffusion Posterior Sampling for Informed Single-Channel Dereverberation

We present in this paper an informed single-channel dereverberation meth...

0 Jean-Marie Lemercier, et al. ∙

research

∙ 06/05/2023

On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings

Since its inception, the field of deep speech enhancement has been domin...

0 Danilo de Oliveira, et al. ∙

research

∙ 06/02/2023

In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis

Speech emotion conversion aims to convert the expressed emotion of a spo...

0 Navin Raj Prabhu, et al. ∙

research

∙ 06/02/2023

Audio-Visual Speech Enhancement with Score-Based Generative Models

This paper introduces an audio-visual speech enhancement system that lev...

0 Julius Richter, et al. ∙

research

∙ 05/31/2023

Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model

We propose Audio-Visual Lightweight ITerative model (AVLIT), an effectiv...

0 Héctor Martel, et al. ∙

research

∙ 05/30/2023

Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models

In large part due to their implicit semantic modeling, self-supervised l...

0 Danilo de Oliveira, et al. ∙

research

∙ 05/15/2023

Integrating Uncertainty into Neural Network-based Speech Enhancement

Supervised masking approaches in the time-frequency domain aim to employ...

0 Huajian Fang, et al. ∙

research

∙ 04/24/2023

Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters

In a multi-channel separation task with multiple speakers, we aim to rec...

0 Kristina Tesch, et al. ∙

research

∙ 03/27/2023

Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise

Human-robot interaction relies on a noise-robust audio processing module...

0 Huajian Fang, et al. ∙

research

∙ 03/15/2023

Speech Signal Improvement Using Causal Generative Diffusion Models

In this paper, we present a causal speech signal improvement system that...

0 Julius Richter, et al. ∙

research

∙ 03/01/2023

Extending DNN-based Multiplicative Masking to Deep Subband Filtering for Improved Dereverberation

In this paper, we present a scheme for extending deep neural network-bas...

0 Jean-Marie Lemercier, et al. ∙

research

∙ 02/28/2023

Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement

Recently, score-based generative models have been successfully employed ...

0 Bunlong Lay, et al. ∙

research

∙ 12/22/2022

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Diffusion models have shown a great ability at bridging the performance ...

0 Jean-Marie Lemercier, et al. ∙

research

∙ 12/09/2022

Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models

Single-channel deep speech enhancement approaches often estimate a singl...

0 Huajian Fang, et al. ∙

research

∙ 11/12/2022

DriftRec: Adapting diffusion models to blind image restoration tasks

In this work, we utilize the high-fidelity generation abilities of diffu...

0 Simon Welker, et al. ∙

research

∙ 11/08/2022

DiffPhase: Generative Diffusion-based STFT Phase Retrieval

Diffusion probabilistic models have been recently used in a variety of t...

0 Tal Peer, et al. ∙

research

∙ 11/04/2022

Spatially Selective Deep Non-linear Filters for Speaker Extraction

In a scenario with multiple persons talking simultaneously, the spatial ...

0 Kristina Tesch, et al. ∙

research

∙ 11/04/2022

Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration

Diffusion-based generative models have had a high impact on the computer...

0 Jean-Marie Lemercier, et al. ∙

research

∙ 08/11/2022

Speech Enhancement and Dereverberation with Diffusion-based Generative Models

Recently, diffusion-based generative models have been introduced to the ...

0 Julius Richter, et al. ∙

research

∙ 07/25/2022

Label Uncertainty Modeling and Prediction for Speech Emotion Recognition using t-Distributions

As different people perceive others' emotional expressions differently, ...

0 Navin Raj Prabhu, et al. ∙

research

∙ 06/27/2022

Insights into Deep Non-linear Filters for Improved Multi-channel Speech Enhancement

The key advantage of using multiple microphones for speech enhancement i...

0 Kristina Tesch, et al. ∙

research

∙ 06/23/2022

Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes

The SepFormer architecture shows very good results in speech separation....

0 Danilo de Oliveira, et al. ∙

research

∙ 06/22/2022

On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement

Employing deep neural networks (DNNs) to directly learn filters for mult...

0 Kristina Tesch, et al. ∙

research

∙ 05/11/2022

Beyond Griffin-Lim: Improved Iterative Phase Retrieval for Speech

Phase retrieval is a problem encountered not only in speech and audio pr...

0 Tal Peer, et al. ∙

research

∙ 04/06/2022

Neural Network-augmented Kalman Filtering for Robust Online Speech Dereverberation in Noisy Reverberant Environments

In this paper, a neural network-augmented algorithm for noise-robust onl...

0 Jean-Marie Lemercier, et al. ∙

research

∙ 04/06/2022

End-To-End Optimization of Online Neural Network-supported Two-Stage Dereverberation for Hearing Devices

A two-stage online dereverberation algorithm for hearing devices is pres...

0 Jean-Marie Lemercier, et al. ∙

research

∙ 04/06/2022

Customizable End-to-end Optimization of Online Neural Network-supported Dereverberation for Hearing Devices

This work focuses on online dereverberation for hearing devices using th...

0 Jean-Marie Lemercier, et al. ∙

research

∙ 03/31/2022

Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain

Score-based generative models (SGMs) have recently shown impressive resu...

0 Simon Welker, et al. ∙

research

∙ 03/30/2022

Phase-Aware Deep Speech Enhancement: It's All About The Frame Length

While phase-aware speech processing has been receiving increasing attent...

0 Tal Peer, et al. ∙

research

∙ 03/04/2022

Integrating Statistical Uncertainty into Neural Network-Based Speech Enhancement

Speech enhancement in the time-frequency domain is often performed by es...

0 Huajian Fang, et al. ∙

research

∙ 12/04/2021

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Recent advances in the design of neural network architectures, in partic...

0 Xiaolin Hu, et al. ∙

research

∙ 10/07/2021

End-to-end label uncertainty modeling for speech emotion recognition using Bayesian neural networks

Emotions are subjective constructs. Recent end-to-end speech emotion rec...

0 Navin Raj Prabhu, et al. ∙

research

∙ 05/19/2021

Disentanglement Learning for Variational Autoencoders Applied to Audio-Visual Speech Enhancement

Recently, the standard variational autoencoder has been successfully use...

0 Guillaume Carbajal, et al. ∙

research

∙ 04/22/2021

Nonlinear Spatial Filtering in Multichannel Speech Enhancement

The majority of multichannel speech enhancement algorithms are two-step ...

0 Kristina Tesch, et al. ∙

research

∙ 02/17/2021

Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

Recently, a generative variational autoencoder (VAE) has been proposed f...

0 Huajian Fang, et al. ∙

research

∙ 02/12/2021

Guided Variational Autoencoder for Speech Enhancement With a Supervised Classifier

Recently, variational autoencoders have been successfully used to learn ...

0 Guillaume Carbajal, et al. ∙

research

∙ 11/11/2020

Reinforcement Learning with Time-dependent Goals for Robotic Musicians

Reinforcement learning is a promising method to accomplish robotic contr...

0 Thilo Fryen, et al. ∙

research

∙ 04/07/2020

SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement

This paper analyzes the generalization of speech enhancement algorithms ...

0 Robert Rehr, et al. ∙

research

∙ 02/29/2020

Robust Robotic Pouring using Audition and Haptics

Robust and accurate estimation of liquid height lies as an essential par...

0 Hongzhuo Liang, et al. ∙

research

∙ 10/25/2019

A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet

In this work, we investigate if the learned encoder of the end-to-end co...

0 David Ditter, et al. ∙

research

∙ 03/02/2019

Making Sense of Audio Vibration for Liquid Height Estimation in Robotic Pouring

In this paper, we focus on the challenging perception problem in robotic...

0 Hongzhuo Liang, et al. ∙

research

∙ 09/07/2017

Normalized Features for Improving the Generalization of DNN Based Speech Enhancement

Enhancing noisy speech is an important task to restore its quality and t...

0 Robert Rehr, et al. ∙

research

∙ 09/07/2017

Improving the Generalizability of Deep Neural Network Based Speech Enhancement

Enhancing noisy speech is an important task to restore its quality and t...

0 Robert Rehr, et al. ∙

research

∙ 08/10/2017

DNN and CNN with Weighted and Multi-task Loss Functions for Audio Event Detection

This report presents our audio event detection system submitted for Task...

0 Martin Krawczyk-Becker, et al. ∙

Timo Gerkmann

Featured Co-authors

Sign in with Google

Consider DeepAI Pro