Neil Zeghidour

research

∙ 08/21/2023

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

We present TokenSplit, a speech separation model that acts on discrete t...

0 Hakan Erdogan, et al. ∙

research

∙ 06/22/2023

AudioPaLM: A Large Language Model That Can Speak and Listen

We introduce AudioPaLM, a large language model for speech understanding ...

0 Paul K. Rubenstein, et al. ∙

research

∙ 05/16/2023

SoundStorm: Efficient Parallel Audio Generation

We present SoundStorm, a model for efficient, non-autoregressive audio g...

0 Zalán Borsos, et al. ∙

research

∙ 03/23/2023

LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models

We introduce LMCodec, a causal neural speech codec that provides high qu...

0 Teerapat Jenrungrot, et al. ∙

research

∙ 03/13/2023

Speech Intelligibility Classifiers from 550k Disordered Speech Samples

We developed dysarthric speech intelligibility classifiers on 551,176 di...

0 Subhashini Venugopalan, et al. ∙

research

∙ 02/10/2023

DNArch: Learning Convolutional Neural Architectures by Backpropagation

We present Differentiable Neural Architectures (DNArch), a method that j...

0 David W. Romero, et al. ∙

research

∙ 02/07/2023

Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision

We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that...

0 Eugene Kharitonov, et al. ∙

research

∙ 01/30/2023

SingSong: Generating musical accompaniments from singing

We present SingSong, a system that generates instrumental music to accom...

4 Chris Donahue, et al. ∙

research

∙ 01/26/2023

MusicLM: Generating Music From Text

We introduce MusicLM, a model generating high-fidelity music from text d...

0 Andrea Agostinelli, et al. ∙

research

∙ 09/07/2022

AudioLM: a Language Modeling Approach to Audio Generation

We introduce AudioLM, a framework for high-quality audio generation with...

17 Zalán Borsos, et al. ∙

research

∙ 06/11/2022

Multi-instrument Music Synthesis with Spectrogram Diffusion

An ideal music synthesizer should be both interactive and expressive, ge...

6 Curtis Hawthorne, et al. ∙

research

∙ 03/29/2022

Disentangling speech from surroundings in a neural audio codec

We present a method to separate speech signals from noisy environments i...

11 Ahmed Omran, et al. ∙

research

∙ 03/29/2022

Learning neural audio features without supervision

Deep audio classification, traditionally cast as training a deep neural ...

51 Sarthak Yadav, et al. ∙

research

∙ 02/15/2022

General-purpose, long-context autoregressive modeling with Perceiver AR

Real-world data is high-dimensional: a book, image, or musical performan...

2 Curtis Hawthorne, et al. ∙

research

∙ 02/03/2022

Learning strides in convolutional neural networks

Convolutional neural networks typically contain several downsampling ope...

10 Rachid Riad, et al. ∙

research

∙ 07/07/2021

SoundStream: An End-to-End Neural Audio Codec

We present SoundStream, a novel neural audio codec that can efficiently ...

6 Neil Zeghidour, et al. ∙

research

∙ 05/28/2021

DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding

We introduce DIVE, an end-to-end speaker diarization algorithm. Our neur...

8 Neil Zeghidour, et al. ∙

research

∙ 03/17/2021

Self-Supervised Learning of Audio Representations from Permutations with Differentiable Ranking

Self-supervised pre-training using so-called "pretext" tasks has recentl...

12 Andrew N. Carr, et al. ∙

research

∙ 01/21/2021

LEAF: A Learnable Frontend for Audio Classification

Mel-filterbanks are fixed, engineered audio features which emulate human...

10 Neil Zeghidour, et al. ∙

research

∙ 10/21/2020

Contrastive Learning of General-Purpose Audio Representations

We introduce COLA, a self-supervised pre-training approach for learning ...

6 Aaqib Saeed, et al. ∙

research

∙ 02/20/2020

Wavesplit: End-to-End Speech Separation by Speaker Clustering

We introduce Wavesplit, an end-to-end speech separation system. From a s...

37 Neil Zeghidour, et al. ∙

research

∙ 05/30/2019

Deep multi-class learning from label proportions

We propose a learning algorithm capable of learning from label proportio...

0 Gabriel Dulac-Arnold, et al. ∙

research

∙ 12/17/2018

Fully Convolutional Speech Recognition

Current state-of-the-art speech recognition systems build on recurrent n...

0 Neil Zeghidour, et al. ∙

research

∙ 12/09/2018

To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition

Transcribed datasets typically contain speaker identity for each instanc...

14 Yossi Adi, et al. ∙

research

∙ 11/27/2018

Learning to detect dysarthria from raw speech

Speech classifiers of paralinguistic traits traditionally learn from div...

0 Juliette Millet, et al. ∙

research

∙ 10/23/2018

SING: Symbol-to-Instrument Neural Generator

Recent progress in deep learning for audio synthesis opens the way to mo...

0 Alexandre Défossez, et al. ∙

research

∙ 06/19/2018

End-to-End Speech Recognition From the Raw Waveform

State-of-the-art speech recognition systems rely on fixed, hand-crafted ...

0 Neil Zeghidour, et al. ∙

research

∙ 04/30/2018

Sampling strategies in Siamese Networks for unsupervised speech representation learning

Recent studies have investigated siamese network architectures for learn...

0 Rachid Riad, et al. ∙

research

∙ 11/03/2017

Learning Filterbanks from Raw Speech for Phone Recognition

We train a bank of complex filters that operates on the raw waveform and...

0 Neil Zeghidour, et al. ∙

research

∙ 06/01/2017

Fader Networks: Manipulating Images by Sliding Attributes

This paper introduces a new encoder-decoder architecture that is trained...

0 Guillaume Lample, et al. ∙

research

∙ 04/23/2017

Learning weakly supervised multimodal phoneme embeddings

Recent works have explored deep architectures for learning multimodal sp...

0 Rahma Chaabouni, et al. ∙

Neil Zeghidour

Featured Co-authors

Sign in with Google

Consider DeepAI Pro