Neil Zeghidour
Facebook A.I. Research
We present TokenSplit, a speech separation model that acts on discrete t...
We introduce AudioPaLM, a large language model for speech understanding ...
We present SoundStorm, a model for efficient, non-autoregressive audio
g...
We introduce LMCodec, a causal neural speech codec that provides high qu...
We developed dysarthric speech intelligibility classifiers on 551,176
di...
We present Differentiable Neural Architectures (DNArch), a method that
j...
We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that...
We present SingSong, a system that generates instrumental music to accom...
We introduce MusicLM, a model generating high-fidelity music from text
d...
We introduce AudioLM, a framework for high-quality audio generation with...
An ideal music synthesizer should be both interactive and expressive,
ge...
We present a method to separate speech signals from noisy environments i...
Deep audio classification, traditionally cast as training a deep neural
...
Real-world data is high-dimensional: a book, image, or musical performan...
Convolutional neural networks typically contain several downsampling
ope...
We present SoundStream, a novel neural audio codec that can efficiently
...
We introduce DIVE, an end-to-end speaker diarization algorithm. Our neur...
Self-supervised pre-training using so-called "pretext" tasks has recentl...
Mel-filterbanks are fixed, engineered audio features which emulate human...
We introduce COLA, a self-supervised pre-training approach for learning ...
We introduce Wavesplit, an end-to-end speech separation system. From a s...
We propose a learning algorithm capable of learning from label proportio...
Current state-of-the-art speech recognition systems build on recurrent n...
Transcribed datasets typically contain speaker identity for each instanc...
Speech classifiers of paralinguistic traits traditionally learn from div...
Recent progress in deep learning for audio synthesis opens the way to mo...
State-of-the-art speech recognition systems rely on fixed, hand-crafted
...
Recent studies have investigated siamese network architectures for learn...
We train a bank of complex filters that operates on the raw waveform and...
This paper introduces a new encoder-decoder architecture that is trained...
Recent works have explored deep architectures for learning multimodal sp...