Weakly Supervised Annotations for Multi-modal Greeting Cards Dataset

12/01/2022
by   Sidra Hanif, et al.
0

In recent years, there is a growing number of pre-trained models trained on a large corpus of data and yielding good performance on various tasks such as classifying multimodal datasets. These models have shown good performance on natural images but are not fully explored for scarce abstract concepts in images. In this work, we introduce an image/text-based dataset called Greeting Cards. Dataset (GCD) that has abstract visual concepts. In our work, we propose to aggregate features from pretrained images and text embeddings to learn abstract visual concepts from GCD. This allows us to learn the text-modified image features, which combine complementary and redundant information from the multi-modal data streams into a single, meaningful feature. Secondly, the captions for the GCD dataset are computed with the pretrained CLIP-based image captioning model. Finally, we also demonstrate that the proposed the dataset is also useful for generating greeting card images using pre-trained text-to-image generation model.

READ FULL TEXT

page 1

page 5

page 7

research
02/07/2021

Iconographic Image Captioning for Artworks

Image captioning implies automatically generating textual descriptions o...
research
09/27/2018

Semantically Invariant Text-to-Image Generation

Image captioning has demonstrated models that are capable of generating ...
research
05/24/2023

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

The recent popularity of text-to-image diffusion models (DM) can largely...
research
08/29/2019

Aesthetic Image Captioning From Weakly-Labelled Photographs

Aesthetic image captioning (AIC) refers to the multi-modal task of gener...
research
07/20/2023

Identifying Interpretable Subspaces in Image Representations

We propose Automatic Feature Explanation using Contrasting Concepts (FAL...
research
07/27/2018

Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics

Widely used in news, business, and educational media, infographics are h...
research
10/16/2022

Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works

Large-scale Text-to-image Generation Models (LTGMs) (e.g., DALL-E), self...

Please sign up or login with your details

Forgot password? Click here to reset