We introduce X Fuse, a general approach for conditioning on visual
inf...
In this work, we study the task of Audio Language Modeling, in which we ...
Recent text-to-image generation methods provide a simple yet exciting
co...
Recent work has shown the potential of transformers for computer vision
...
Speech emotion conversion is the task of modifying the perceived emotion...
This paper presents fairseq S^2, a fairseq extension for speech synthesi...
Speech pre-training has primarily demonstrated efficacy on classificatio...
We present a direct speech-to-speech translation (S2ST) model that trans...
We propose using self-supervised discrete representations for the task o...
Generative spoken language modeling involves learning jointly the acoust...
Speech enhancement has seen great improvement in recent years mainly thr...
We present a wav-to-wav generative model for the task of singing voice
c...
We present a fully convolutional wav-to-wav network for converting betwe...
We present a method for translating music across musical instruments, ge...
Learning-based Text To Speech systems have the potential to generalize f...
We present a new neural text to speech (TTS) method that is able to tran...
We study the problem of transferring a sample in one domain to an analog...