Adam Polyak

research

∙ 03/02/2023

X Fuse: Fusing Visual Information in Text-to-Image Generation

We introduce X Fuse, a general approach for conditioning on visual inf...

0 Yuval Kirstain, et al. ∙

research

∙ 11/02/2022

Audio Language Modeling using Perceptually-Guided Discrete Representations

In this work, we study the task of Audio Language Modeling, in which we ...

0 Felix Kreuk, et al. ∙

research

∙ 03/24/2022

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Recent text-to-image generation methods provide a simple yet exciting co...

8 Oran Gafni, et al. ∙

research

∙ 12/09/2021

Locally Shifted Attention With Early Global Integration

Recent work has shown the potential of transformers for computer vision ...

9 Shelly Sheynin, et al. ∙

research

∙ 11/14/2021

Textless Speech Emotion Conversion using Decomposed and Discrete Representations

Speech emotion conversion is the task of modifying the perceived emotion...

6 Felix Kreuk, et al. ∙

research

∙ 09/14/2021

fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit

This paper presents fairseq S^2, a fairseq extension for speech synthesi...

0 Changhan Wang, et al. ∙

research

∙ 09/07/2021

Text-Free Prosody-Aware Generative Spoken Language Modeling

Speech pre-training has primarily demonstrated efficacy on classificatio...

14 Eugene Kharitonov, et al. ∙

research

∙ 07/12/2021

Direct speech-to-speech translation with discrete units

We present a direct speech-to-speech translation (S2ST) model that trans...

0 Ann Lee, et al. ∙

research

∙ 04/01/2021

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

We propose using self-supervised discrete representations for the task o...

10 Adam Polyak, et al. ∙

research

∙ 02/01/2021

Generative Spoken Language Modeling from Raw Audio

Generative spoken language modeling involves learning jointly the acoust...

11 Kushal Lakhotia, et al. ∙

research

∙ 01/31/2021

High Fidelity Speech Regeneration with Application to Speech Enhancement

Speech enhancement has seen great improvement in recent years mainly thr...

0 Adam Polyak, et al. ∙

research

∙ 08/06/2020

Unsupervised Cross-Domain Singing Voice Conversion

We present a wav-to-wav generative model for the task of singing voice c...

0 Adam Polyak, et al. ∙

research

∙ 04/18/2019

TTS Skins: Speaker Conversion via ASR

We present a fully convolutional wav-to-wav network for converting betwe...

0 Adam Polyak, et al. ∙

research

∙ 05/21/2018

A Universal Music Translation Network

We present a method for translating music across musical instruments, ge...

0 Noam Mor, et al. ∙

research

∙ 02/20/2018

Fitting New Speakers Based on a Short Untranscribed Sample

Learning-based Text To Speech systems have the potential to generalize f...

0 Eliya Nachmani, et al. ∙

research

∙ 07/20/2017

VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop

We present a new neural text to speech (TTS) method that is able to tran...

0 Yaniv Taigman, et al. ∙

research

∙ 11/07/2016

Unsupervised Cross-Domain Image Generation

We study the problem of transferring a sample in one domain to an analog...

0 Yaniv Taigman, et al. ∙

Adam Polyak

Featured Co-authors

Sign in with Google

Consider DeepAI Pro