Machine learning from training data with a skewed distribution of exampl...
Generating high quality music that complements the visual content of a v...
We introduce MusicLM, a model generating high-fidelity music from text
d...
Multimodal learning can benefit from the representation power of pretrai...
Music tagging and content-based retrieval systems have traditionally bee...
We propose a method of separating a desired sound source from a
single-c...
Many speech applications require understanding aspects beyond the words ...
We summarize the results of a host of efforts using giant automatic spee...
Humans perceive the world by concurrently processing and fusing
high-dim...
Supervised neural network training has led to significant progress on
si...
To reveal the importance of temporal precision in ground truth audio eve...
Real-world sound scenes consist of time-varying collections of sound sou...
Recent progress in deep learning has enabled many advances in sound
sepa...
The study of label noise in sound event recognition has recently gained
...
The ultimate goal of transfer learning is to reduce labeled data require...
Deep learning approaches have recently achieved impressive performance o...
Humans do not acquire perceptual abilities in the way we train machines....
Even in the absence of any explicit semantic annotation, vast collection...
Convolutional Neural Networks (CNNs) have proven very effective in image...
Zero-resource speech technology is a growing research area that aims to
...
In settings where only unlabelled speech data is available, speech techn...
Several popular graph embedding techniques for representation learning a...