Renderers are Good Zero-Shot Representation Learners: Exploring Diffusion Latents for Metric Learning

06/19/2023
by   Michael Tang, et al.
0

Can the latent spaces of modern generative neural rendering models serve as representations for 3D-aware discriminative visual understanding tasks? We use retrieval as a proxy for measuring the metric learning properties of the latent spaces of Shap-E, including capturing view-independence and enabling the aggregation of scene representations from the representations of individual image views, and find that Shap-E representations outperform those of the classical EfficientNet baseline representations zero-shot, and is still competitive when both methods are trained using a contrative loss. These findings give preliminary indication that 3D-based rendering and generative models can yield useful representations for discriminative tasks in our innately 3D-native world. Our code is available at <https://github.com/michaelwilliamtang/golden-retriever>.

READ FULL TEXT

page 4

page 8

research
08/31/2023

Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models

Zero-shot referring image segmentation is a challenging task because it ...
research
07/27/2019

Hybrid-Attention based Decoupled Metric Learning for Zero-Shot Image Retrieval

In zero-shot image retrieval (ZSIR) task, embedding learning becomes mor...
research
01/22/2019

Energy Confused Adversarial Metric Learning for Zero-Shot Image Retrieval and Clustering

Deep metric learning has been widely applied in many computer vision tas...
research
12/15/2021

Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval

This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) ...
research
03/01/2023

Bootstrapping Parallel Anchors for Relative Representations

The use of relative representations for latent embeddings has shown pote...
research
08/07/2019

Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings

Learning an effective similarity measure between image representations i...
research
05/23/2023

Parts of Speech-Grounded Subspaces in Vision-Language Models

Latent image representations arising from vision-language models have pr...

Please sign up or login with your details

Forgot password? Click here to reset