Generating a stereophonic presentation from a monophonic audio signal is...
Recent work has studied text-to-audio synthesis using large amounts of p...
Recent works have shown the capability of deep generative models to tack...
Single channel target speaker separation (TSS) aims at extracting a spea...
Universal sound separation consists of separating mixes with arbitrary s...
Removing background noise from speech audio has been the subject of
cons...
We investigate which loss functions provide better separations via
bench...
Upsampling artifacts are caused by problematic upsampling layers and due...
Version identification (VI) systems now offer accurate and scalable solu...
In this article, we aim to provide a review of the key ideas and approac...
Communication technologies like voice over IP operate under constrained
...
Music is a fundamental human construct, and harmony provides the buildin...
Score-based generative models provide state-of-the-art quality for image...
The setlist identification (SLI) task addresses a music recognition use ...
A number of recent advances in audio synthesis rely on neural upsamplers...
Applications of deep learning to automatic multitrack mixing are largely...
Version identification systems aim to detect different renditions of the...
Automatic speech quality assessment is an important, transversal task wh...
The version identification (VI) task deals with the automatic detection ...
Likelihood-based generative models are a promising resource to detect
ou...
End-to-end models for raw audio generation are a challenge, specially if...
The speech enhancement task usually consists of removing additive noise ...
Learning good representations without supervision is still an open issue...
We investigate supervised learning strategies that improve the training ...
Most methods of voice restoration for patients suffering from aphonia ei...
The conversion from text to speech relies on the accurate mapping from
l...
This document contains the outcome of the first Human behaviour and mach...
With current technology, a number of entities have access to user mobili...
We study the use of a time series encoder to learn representations that ...
Catastrophic forgetting occurs when a neural network loses the informati...
We investigate to what extent mobile use patterns can predict -- at the
...
Speech enhancement deep learning systems usually require large amounts o...
Collective urban mobility embodies the residents' local insights on the ...
Recommendation algorithms that incorporate techniques from deep learning...
Current speech enhancement techniques operate on the spectral domain and...
Finding repeated patterns or motifs in a time series is an important
uns...
Efficiently finding similar segments or motifs in time series data is a
...
Time series are ubiquitous, and a measure to assess their similarity is ...
The use of community detection algorithms is explored within the framewo...