John R. Hershey

research

∙ 08/21/2023

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

We present TokenSplit, a speech separation model that acts on discrete t...

0 Hakan Erdogan, et al. ∙

research

∙ 05/18/2023

Unsupervised Multi-channel Separation and Adaptation

A key challenge in machine learning is to generalize from training data ...

0 Cong Han, et al. ∙

research

∙ 07/20/2022

AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation

We introduce AudioScopeV2, a state-of-the-art universal audio-visual on-...

3 Efthymios Tzinis, et al. ∙

research

∙ 07/01/2022

Distance-Based Sound Separation

We propose the novel task of distance-based sound separation, where soun...

0 Katharine Patterson, et al. ∙

research

∙ 03/29/2022

CycleGAN-Based Unpaired Speech Dereverberation

Typically, neural network-based speech dereverberation models are traine...

0 Hannah Muckenhirn, et al. ∙

research

∙ 10/20/2021

Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training

The recently-proposed mixture invariant training (MixIT) is an unsupervi...

0 Aswin Sivaraman, et al. ∙

research

∙ 06/30/2021

DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement

Single-channel speech enhancement (SE) is an important task in speech pr...

0 Yuma Koizumi, et al. ∙

research

∙ 06/17/2021

Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention

We introduce a state-of-the-art audio-visual on-screen sound separation ...

0 Efthymios Tzinis, et al. ∙

research

∙ 06/01/2021

Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation

Supervised neural network training has led to significant progress on si...

0 Scott Wisdom, et al. ∙

research

∙ 05/05/2021

Self-Supervised Learning from Automatically Separated Sound Scenes

Real-world sound scenes consist of time-varying collections of sound sou...

0 Eduardo Fonseca, et al. ∙

research

∙ 05/05/2021

End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

We present an end-to-end deep network model that performs meeting diariz...

0 Soumi Maiti, et al. ∙

research

∙ 12/17/2020

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording

Leveraging additional speaker information to facilitate speech separatio...

0 Cong Han, et al. ∙

research

∙ 11/03/2020

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

Multi-speaker speech recognition of unsegmented recordings has diverse a...

0 Desh Raj, et al. ∙

research

∙ 11/02/2020

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

Recent progress in deep learning has enabled many advances in sound sepa...

5 Efthymios Tzinis, et al. ∙

research

∙ 06/23/2020

Unsupervised Sound Separation Using Mixtures of Mixtures

In recent years, rapid progress has been made on the problem of single-c...

0 Scott Wisdom, et al. ∙

research

∙ 11/18/2019

Alternating Between Spectral and Spatial Estimation for Speech Separation and Enhancement

This work investigates alternation between spectral separation using mas...

0 Zhong-Qiu Wang, et al. ∙

research

∙ 11/18/2019

Improving Universal Sound Separation Using Sound Classification

Deep learning approaches have recently achieved impressive performance o...

0 Efthymios Tzinis, et al. ∙

research

∙ 05/08/2019

Universal Sound Separation

Recent deep learning approaches have achieved impressive performance on ...

0 Ilya Kavalerov, et al. ∙

research

∙ 11/20/2018

Differentiable Consistency Constraints for Improved Deep Speech Enhancement

In recent years, deep networks have led to dramatic improvements in spee...

0 Scott Wisdom, et al. ∙

research

∙ 11/06/2018

SDR - half-baked or well done?

In speech enhancement and source separation, signal-to-noise ratio is a ...

0 Jonathan Le Roux, et al. ∙

research

∙ 10/02/2018

Phasebook and Friends: Leveraging Discrete Representations for Source Separation

Deep learning based speech enhancement and source separation systems hav...

0 Jonathan Le Roux, et al. ∙

research

∙ 05/15/2018

A Purely End-to-end System for Multi-speaker Speech Recognition

Recently, there has been growing interest in multi-speaker speech recogn...

0 Hiroshi Seki, et al. ∙

research

∙ 04/26/2018

End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction

This paper proposes an end-to-end approach for single-channel speaker-in...

0 Zhong-Qiu Wang, et al. ∙

research

∙ 11/21/2017

Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition

Far-field speech recognition in noisy and reverberant conditions remains...

0 Zhong Meng, et al. ∙

research

∙ 03/14/2017

Multichannel End-to-end Speech Recognition

The field of speech recognition is in the midst of a paradigm shift: end...

0 Tsubasa Ochiai, et al. ∙

research

∙ 01/11/2017

Attention-Based Multimodal Fusion for Video Description

Currently successful methods for video description are based on encoder-...

0 Chiori Hori, et al. ∙

research

∙ 11/18/2016

Deep Clustering and Conventional Networks for Music Separation: Stronger Together

Deep clustering is the first method to handle general audio separation s...

0 Yi Luo, et al. ∙

research

∙ 10/31/2016

Full-Capacity Unitary Recurrent Neural Networks

Recurrent neural networks are powerful models for processing sequential ...

0 Scott Wisdom, et al. ∙

research

∙ 07/07/2016

Single-Channel Multi-Speaker Separation using Deep Clustering

Deep clustering is a recently introduced deep learning architecture that...

0 Yusuf Isik, et al. ∙

research

∙ 03/23/2016

Global-Local Face Upsampling Network

Face hallucination, which is the task of generating a high-resolution fa...

1 Oncel Tuzel, et al. ∙

research

∙ 08/18/2015

Deep clustering: Discriminative embeddings for segmentation and separation

We address the problem of acoustic source separation in a deep learning ...

0 John R. Hershey, et al. ∙

research

∙ 09/09/2014

Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures

Model-based methods and deep neural networks have both been tremendously...

0 John R. Hershey, et al. ∙

John R. Hershey

Featured Co-authors

Sign in with Google

Consider DeepAI Pro