We release Code Llama, a family of large language models for code based ...
Recent work has shown that it is possible to resynthesize high-quality s...
Deep generative models can generate high-fidelity audio conditioned on
v...
Large-scale generative models such as GPT and DALL-E have revolutionized...
We tackle the task of conditional music generation. We introduce MusicGe...
Expanding the language coverage of speech technology has the potential t...
In recent years, image generation has shown a great leap in performance,...
Speech language models (SpeechLMs) process and generate acoustic data on...
Backpropagation, which uses the chain rule, is the de-facto standard
alg...
Expressive speech-to-speech translation (S2ST) aims to transfer prosodic...
This work profoundly analyzes discrete self-supervised speech representa...
Prior works on improving speech quality with visual input typically stud...
Voice Conversion (VC) is the task of making a spoken utterance by one sp...
We present AERO, a audio super-resolution model that processes speech an...
We propose Im2Wav, an image guided open-domain audio generation system. ...
In this work, we study the task of Audio Language Modeling, in which we ...
We introduce a state-of-the-art real-time, high-fidelity, audio codec
le...
Generalization bounds which assess the difference between the true risk ...
Self-supervised representations have been extensively studied for
discri...
Convolutional neural networks contain strong priors for generating natur...
Symbolic music segmentation is the process of dividing symbolic melodies...
Speech enhancement has seen great improvement in recent years using
end-...
Discrete variational auto-encoders (VAEs) are able to represent semantic...
Direct speech-to-speech translation (S2ST) models suffer from data scarc...
We introduce dGSLM, the first "textless" model able to generate audio sa...
Unsupervised models of representations based on Contrastive Predictive C...
We present RemixIT, a simple yet effective self-supervised method for
tr...
Textless spoken language processing research aims to extend the applicab...
Speech emotion conversion is the task of modifying the perceived emotion...
We propose RemixIT, a simple and novel self-supervised training method f...
This paper presents fairseq S^2, a fairseq extension for speech synthesi...
Speech pre-training has primarily demonstrated efficacy on classificatio...
We present a direct speech-to-speech translation (S2ST) model that trans...
Deep neural networks have recently shown great success in the task of bl...
We propose to add independent pseudo quantization noise to model paramet...
We propose using self-supervised discrete representations for the task o...
Generative spoken language modeling involves learning jointly the acoust...
Speech enhancement has seen great improvement in recent years mainly thr...
We present a unified network for voice separation of an unknown number o...
We present a framework that allows to certify the fairness degree of a m...
Most existing deep learning based binaural speaker separation systems fo...
We present a wav-to-wav generative model for the task of singing voice
c...
We propose a self-supervised representation learning model for the task ...
We present a causal speech enhancement model working on the raw waveform...
We present a new method for separating a mixed audio sequence, in which
...
Generalization bounds which assess the difference between the true risk ...
Phoneme boundary detection plays an essential first step for a variety o...
Steganography is the science of hiding a secret message within an ordina...
Transcribed datasets typically contain speaker identity for each instanc...
Deep Neural Networks are powerful models that attained remarkable result...