Zero-Shot Multi-Modal Artist-Controlled Retrieval and Exploration of 3D Object Sets

09/01/2022
by   Kristofer Schlachter, et al.
16

When creating 3D content, highly specialized skills are generally needed to design and generate models of objects and other assets by hand. We address this problem through high-quality 3D asset retrieval from multi-modal inputs, including 2D sketches, images and text. We use CLIP as it provides a bridge to higher-level latent features. We use these features to perform a multi-modality fusion to address the lack of artistic control that affects common data-driven approaches. Our approach allows for multi-modal conditional feature-driven retrieval through a 3D asset database, by utilizing a combination of input latent embeddings. We explore the effects of different combinations of feature embeddings across different input types and weighting methods.

READ FULL TEXT

page 1

page 3

research
05/27/2020

AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi-Modal Embeddings

In this paper, we solve for the problem of generalized zero-shot learnin...
research
12/08/2021

Everything at Once – Multi-modal Fusion Transformer for Video Retrieval

Multi-modal learning from video data has seen increased attention recent...
research
07/31/2023

Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks

In recent times there has been a surge of multi-modal architectures base...
research
05/24/2023

Multi-modal Machine Learning for Vehicle Rating Predictions Using Image, Text, and Parametric Data

Accurate vehicle rating prediction can facilitate designing and configur...
research
01/23/2019

Exploring Uncertainty in Conditional Multi-Modal Retrieval Systems

We cast visual retrieval as a regression problem by posing triplet loss ...
research
09/04/2023

Generative-based Fusion Mechanism for Multi-Modal Tracking

Generative models (GMs) have received increasing research interest for t...
research
07/23/2020

METEOR: Learning Memory and Time Efficient Representations from Multi-modal Data Streams

Many learning tasks involve multi-modal data streams, where continuous d...

Please sign up or login with your details

Forgot password? Click here to reset