Learning Signal-Agnostic Manifolds of Neural Fields

11/11/2021
by   Yilun Du, et al.
6

Deep neural networks have been used widely to learn the latent structure of datasets, across modalities such as images, shapes, and audio signals. However, existing models are generally modality-dependent, requiring custom architectures and objectives to process different classes of signals. We leverage neural fields to capture the underlying structure in image, shape, audio and cross-modal audiovisual domains in a modality-independent manner. We cast our task as one of learning a manifold, where we aim to infer a low-dimensional, locally linear subspace in which our data resides. By enforcing coverage of the manifold, local linearity, and local isometry, our model – dubbed GEM – learns to capture the underlying structure of datasets across modalities. We can then travel along linear regions of our manifold to obtain perceptually consistent interpolations between samples, and can further use GEM to recover points on our manifold and glean not only diverse completions of input images, but cross-modal hallucinations of audio or image signals. Finally, we show that by walking across the underlying manifold of GEM, we may generate new samples in our signal domains. Code and additional results are available at https://yilundu.github.io/gem/.

READ FULL TEXT

page 6

page 7

page 8

page 9

research
10/26/2021

Learning Explicit and Implicit Latent Common Spaces for Audio-Visual Cross-Modal Retrieval

Learning common subspace is prevalent way in cross-modal retrieval to so...
research
12/01/2017

Unsupervised Generative Adversarial Cross-modal Hashing

Cross-modal hashing aims to map heterogeneous multimedia data into a com...
research
07/03/2019

Cascade Attention Guided Residue Learning GAN for Cross-Modal Translation

Since we were babies, we intuitively develop the ability to correlate th...
research
06/02/2021

Exploring modality-agnostic representations for music classification

Music information is often conveyed or recorded across multiple data mod...
research
12/19/2016

Cross-Modal Manifold Learning for Cross-modal Retrieval

This paper presents a new scalable algorithm for cross-modal similarity ...
research
05/08/2023

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

The speech-to-singing (STS) voice conversion task aims to generate singi...
research
06/15/2016

Masking Strategies for Image Manifolds

We consider the problem of selecting an optimal mask for an image manifo...

Please sign up or login with your details

Forgot password? Click here to reset