Yossi Adi

research

∙ 08/24/2023

Code Llama: Open Foundation Models for Code

We release Code Llama, a family of large language models for code based ...

0 Baptiste Roziere, et al. ∙

research

∙ 08/10/2023

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

Recent work has shown that it is possible to resynthesize high-quality s...

0 Tu Anh Nguyen, et al. ∙

research

∙ 08/02/2023

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

Deep generative models can generate high-fidelity audio conditioned on v...

0 Robin San-Roman, et al. ∙

research

∙ 06/23/2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Large-scale generative models such as GPT and DALL-E have revolutionized...

0 Matthew Le, et al. ∙

research

∙ 06/08/2023

Simple and Controllable Music Generation

We tackle the task of conditional music generation. We introduce MusicGe...

0 Jade Copet, et al. ∙

research

∙ 05/22/2023

Scaling Speech Technology to 1,000+ Languages

Expanding the language coverage of speech technology has the potential t...

0 Vineel Pratap, et al. ∙

research

∙ 05/22/2023

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

In recent years, image generation has shown a great leap in performance,...

0 Guy Yariv, et al. ∙

research

∙ 05/22/2023

Textually Pretrained Speech Language Models

Speech language models (SpeechLMs) process and generate acoustic data on...

0 Michael Hassid, et al. ∙

research

∙ 05/21/2023

Layer Collaboration in the Forward-Forward Algorithm

Backpropagation, which uses the chain rule, is the de-facto standard alg...

0 Guy Lorberbom, et al. ∙

research

∙ 01/25/2023

A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

Expressive speech-to-speech translation (S2ST) aims to transfer prosodic...

0 Wen-Chin Huang, et al. ∙

research

∙ 01/02/2023

Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling

This work profoundly analyzes discrete self-supervised speech representa...

0 Amitay Sicherman, et al. ∙

research

∙ 12/21/2022

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement

Prior works on improving speech quality with visual input typically stud...

0 Wei-Ning Hsu, et al. ∙

research

∙ 12/19/2022

Speaking Style Conversion With Discrete Self-Supervised Units

Voice Conversion (VC) is the task of making a spoken utterance by one sp...

0 Gallil Maimon, et al. ∙

research

∙ 11/22/2022

AERO: Audio Super Resolution in the Spectral Domain

We present AERO, a audio super-resolution model that processes speech an...

0 Moshe Mandel, et al. ∙

research

∙ 11/06/2022

I Hear Your True Colors: Image Guided Audio Generation

We propose Im2Wav, an image guided open-domain audio generation system. ...

0 Roy Sheffer, et al. ∙

research

∙ 11/02/2022

Audio Language Modeling using Perceptually-Guided Discrete Representations

In this work, we study the task of Audio Language Modeling, in which we ...

0 Felix Kreuk, et al. ∙

research

∙ 10/24/2022

High Fidelity Neural Audio Compression

We introduce a state-of-the-art real-time, high-fidelity, audio codec le...

0 Alexandre Défossez, et al. ∙

research

∙ 10/12/2022

On the Importance of Gradient Norm in PAC-Bayesian Bounds

Generalization bounds which assess the difference between the true risk ...

0 Itai Gat, et al. ∙

research

∙ 09/30/2022

On The Robustness of Self-Supervised Representations for Spoken Language Modeling

Self-supervised representations have been extensively studied for discri...

8 Itai Gat, et al. ∙

research

∙ 07/21/2022

Deep Audio Waveform Prior

Convolutional neural networks contain strong priors for generating natur...

0 Arnon Turetzky, et al. ∙

research

∙ 07/02/2022

Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors

Symbolic music segmentation is the process of dividing symbolic melodies...

0 Shahaf Bassan, et al. ∙

research

∙ 06/22/2022

A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement

Speech enhancement has seen great improvement in recent years using end-...

3 Or Tal, et al. ∙

research

∙ 05/03/2022

Learning Discrete Structured Variational Auto-Encoder using Natural Evolution Strategies

Discrete variational auto-encoders (VAEs) are able to represent semantic...

0 Alon Berliner, et al. ∙

research

∙ 04/06/2022

Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation

Direct speech-to-speech translation (S2ST) models suffer from data scarc...

0 Sravya Popuri, et al. ∙

research

∙ 03/30/2022

Generative Spoken Dialogue Language Modeling

We introduce dGSLM, the first "textless" model able to generate audio sa...

5 Tu Anh Nguyen, et al. ∙

research

∙ 03/30/2022

Probing phoneme, language and speaker information in unsupervised speech representations

Unsupervised models of representations based on Contrastive Predictive C...

0 Maureen de Seyssel, et al. ∙

research

∙ 02/17/2022

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

We present RemixIT, a simple yet effective self-supervised method for tr...

8 Efthymios Tzinis, et al. ∙

research

∙ 02/15/2022

textless-lib: a Library for Textless Spoken Language Processing

Textless spoken language processing research aims to extend the applicab...

11 Eugene Kharitonov, et al. ∙

research

∙ 11/14/2021

Textless Speech Emotion Conversion using Decomposed and Discrete Representations

Speech emotion conversion is the task of modifying the perceived emotion...

6 Felix Kreuk, et al. ∙

research

∙ 10/19/2021

Continual self-training with bootstrapped remixing for speech enhancement

We propose RemixIT, a simple and novel self-supervised training method f...

1 Efthymios Tzinis, et al. ∙

research

∙ 09/14/2021

fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit

This paper presents fairseq S^2, a fairseq extension for speech synthesi...

0 Changhan Wang, et al. ∙

research

∙ 09/07/2021

Text-Free Prosody-Aware Generative Spoken Language Modeling

Speech pre-training has primarily demonstrated efficacy on classificatio...

14 Eugene Kharitonov, et al. ∙

research

∙ 07/12/2021

Direct speech-to-speech translation with discrete units

We present a direct speech-to-speech translation (S2ST) model that trans...

0 Ann Lee, et al. ∙

research

∙ 06/25/2021

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation

Deep neural networks have recently shown great success in the task of bl...

0 Ori Kabeli, et al. ∙

research

∙ 04/20/2021

Differentiable Model Compression via Pseudo Quantization Noise

We propose to add independent pseudo quantization noise to model paramet...

0 Alexandre Défossez, et al. ∙

research

∙ 04/01/2021

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

We propose using self-supervised discrete representations for the task o...

10 Adam Polyak, et al. ∙

research

∙ 02/01/2021

Generative Spoken Language Modeling from Raw Audio

Generative spoken language modeling involves learning jointly the acoust...

11 Kushal Lakhotia, et al. ∙

research

∙ 01/31/2021

High Fidelity Speech Regeneration with Application to Speech Enhancement

Speech enhancement has seen great improvement in recent years mainly thr...

0 Adam Polyak, et al. ∙

research

∙ 11/04/2020

Single channel voice separation for unknown number of speakers under reverberant and noisy settings

We present a unified network for voice separation of an unknown number o...

0 Shlomo E. Chazan, et al. ∙

research

∙ 09/03/2020

Fairness in the Eyes of the Data: Certifying Machine-Learning Models

We present a framework that allows to certify the fairness degree of a m...

27 Shahar Segal, et al. ∙

research

∙ 09/02/2020

SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

Most existing deep learning based binaural speaker separation systems fo...

0 Ke Tan, et al. ∙

research

∙ 08/06/2020

Unsupervised Cross-Domain Singing Voice Conversion

We present a wav-to-wav generative model for the task of singing voice c...

0 Adam Polyak, et al. ∙

research

∙ 07/27/2020

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation

We propose a self-supervised representation learning model for the task ...

7 Felix Kreuk, et al. ∙

research

∙ 06/23/2020

Real Time Speech Enhancement in the Waveform Domain

We present a causal speech enhancement model working on the raw waveform...

0 Alexandre Défossez, et al. ∙

research

∙ 02/29/2020

Voice Separation with an Unknown Number of Multiple Speakers

We present a new method for separating a mixed audio sequence, in which ...

0 Eliya Nachmani, et al. ∙

research

∙ 02/23/2020

On the generalization of bayesian deep nets for multi-class classification

Generalization bounds which assess the difference between the true risk ...

0 Yossi Adi, et al. ∙

research

∙ 02/11/2020

Phoneme Boundary Detection using Learnable Segmental Features

Phoneme boundary detection plays an essential first step for a variety o...

0 Felix Kreuk, et al. ∙

research

∙ 02/07/2019

Hide and Speak: Deep Neural Networks for Speech Steganography

Steganography is the science of hiding a secret message within an ordina...

0 Felix Kreuk, et al. ∙

research

∙ 12/09/2018

To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition

Transcribed datasets typically contain speaker identity for each instanc...

14 Yossi Adi, et al. ∙

research

∙ 08/20/2018

Out-of-Distribution Detection using Multiple Semantic Label Representations

Deep Neural Networks are powerful models that attained remarkable result...

0 Gabi Shalev, et al. ∙

Yossi Adi

Featured Co-authors

Sign in with Google

Consider DeepAI Pro