Generating Compositional Color Representations from Text

09/22/2021
by   Paridhi Maheshwari, et al.
0

We consider the cross-modal task of producing color representations for text phrases. Motivated by the fact that a significant fraction of user queries on an image search engine follow an (attribute, object) structure, we propose a generative adversarial network that generates color profiles for such bigrams. We design our pipeline to learn composition - the ability to combine seen attributes and objects to unseen pairs. We propose a novel dataset curation pipeline from existing public sources. We describe how a set of phrases of interest can be compiled using a graph propagation technique, and then mapped to images. While this dataset is specialized for our investigations on color, the method can be extended to other visual dimensions where composition is of interest. We provide detailed ablation studies that test the behavior of our GAN architecture with loss functions from the contrastive learning literature. We show that the generative model achieves lower Frechet Inception Distance than discriminative ones, and therefore predicts color profiles that better match those from real images. Finally, we demonstrate improved performance in image retrieval and classification, indicating the crucial role that color plays in these downstream tasks.

READ FULL TEXT

page 4

page 6

page 8

research
07/26/2023

Neural-based Cross-modal Search and Retrieval of Artwork

Creating an intelligent search and retrieval system for artwork images, ...
research
06/19/2023

Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning

The robustness of multimodal deep learning models to realistic changes i...
research
10/20/2022

PalGAN: Image Colorization with Palette Generative Adversarial Networks

Multimodal ambiguity and color bleeding remain challenging in colorizati...
research
10/17/2021

Contrastive Learning of Visual-Semantic Embeddings

Contrastive learning is a powerful technique to learn representations th...
research
01/10/2022

Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image Representations

In tissue characterization and cancer diagnostics, multimodal imaging ha...
research
10/06/2020

Learning to Represent Image and Text with Denotation Graph

Learning to fuse vision and language information and representing them i...
research
03/30/2021

FONTNET: On-Device Font Understanding and Prediction Pipeline

Fonts are one of the most basic and core design concepts. Numerous use c...

Please sign up or login with your details

Forgot password? Click here to reset