Sticker820K: Empowering Interactive Retrieval with Stickers

06/12/2023
by   Sijie Zhao, et al.
7

Stickers have become a ubiquitous part of modern-day communication, conveying complex emotions through visual imagery. To facilitate the development of more powerful algorithms for analyzing stickers, we propose a large-scale Chinese sticker dataset, namely Sticker820K, which consists of 820k image-text pairs. Each sticker has rich and high-quality textual annotations, including descriptions, optical characters, emotional labels, and style classifications. Although vision-language tasks in the domain of natural images have been well studied, directly applying the those models, such as CLIP, to sticker data is not an optimal solution due to the discrepant nature between natural and emotive image data. Therefore, we propose StickerCLIP as a benchmark model on the Sticker820K dataset. For the text-to-image retrieval task, our StickerCLIP demonstrates strong superiority over the CLIP, which achieves an absolute gain of 66.0% in mean recall on the Sticker820K test set. Additionally, we endeavor to extend the recently popularized LLM by means of prompt tuning, integrating its ability for sticker retrieval and allowing users to retrieve stickers through instructions. We validate the feasibility of this method, demonstrating the immense potential of prompt tuning in expanding LLM abilities while not affecting the quality of upstream tasks.

READ FULL TEXT

page 4

page 6

page 8

research
08/09/2021

Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models

We extend the task of composed image retrieval, where an input query con...
research
08/19/2023

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval

Recent studies have shown that dense retrieval models, lacking dedicated...
research
02/09/2021

Telling the What while Pointing the Where: Fine-grained Mouse Trace and Language Supervision for Improved Image Retrieval

Existing image retrieval systems use text queries to provide a natural a...
research
12/08/2022

Structured Vision-Language Pretraining for Computational Cooking

Vision-Language Pretraining (VLP) and Foundation models have been the go...
research
11/10/2019

Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

This paper explores the task of interactive image retrieval using natura...
research
07/24/2023

Towards a Visual-Language Foundation Model for Computational Pathology

The accelerated adoption of digital pathology and advances in deep learn...
research
03/13/2023

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

Foundation models trained on large-scale dataset gain a recent surge in ...

Please sign up or login with your details

Forgot password? Click here to reset