TagGPT: Large Language Models are Zero-shot Multimodal Taggers

04/06/2023
by   Chen Li, et al.
0

Tags are pivotal in facilitating the effective distribution of multimedia content in various applications in the contemporary Internet era, such as search engines and recommendation systems. Recently, large language models (LLMs) have demonstrated impressive capabilities across a wide range of tasks. In this work, we propose TagGPT, a fully automated system capable of tag extraction and multimodal tagging in a completely zero-shot fashion. Our core insight is that, through elaborate prompt engineering, LLMs are able to extract and reason about proper tags given textual clues of multimodal data, e.g., OCR, ASR, title, etc. Specifically, to automatically build a high-quality tag set that reflects user intent and interests for a specific application, TagGPT predicts large-scale candidate tags from a series of raw data via prompting LLMs, filtered with frequency and semantics. Given a new entity that needs tagging for distribution, TagGPT introduces two alternative options for zero-shot tagging, i.e., a generative method with late semantic matching with the tag set, and another selective method with early matching in prompts. It is well noticed that TagGPT provides a system-level solution based on a modular framework equipped with a pre-trained LLM (GPT-3.5 used here) and a sentence embedding model (SimCSE used here), which can be seamlessly replaced with any more advanced one you want. TagGPT is applicable for various modalities of data in modern social media and showcases strong generalization ability to a wide range of applications. We evaluate TagGPT on publicly available datasets, i.e., Kuaishou and Food.com, and demonstrate the effectiveness of TagGPT compared to existing hashtags and off-the-shelf taggers. Project page: https://github.com/TencentARC/TagGPT.

READ FULL TEXT

page 2

page 3

page 4

page 8

page 9

research
05/23/2023

Are Large Language Models Robust Zero-shot Coreference Resolvers?

Recent progress in domain adaptation for coreference resolution relies o...
research
09/02/2023

Zero-Shot Recommendations with Pre-Trained Large Language Models for Multimodal Nudging

We present a method for zero-shot recommendation of multimodal non-stati...
research
10/13/2021

TAG: Toward Accurate Social Media Content Tagging with a Concept Graph

Although conceptualization has been widely studied in semantics and know...
research
11/15/2022

Prompting Language Models for Linguistic Structure

Although pretrained language models (PLMs) can be prompted to perform a ...
research
12/16/2017

Train Once, Test Anywhere: Zero-Shot Learning for Text Classification

Zero-shot Learners are models capable of predicting unseen classes. In t...
research
04/01/2020

Adversarial Learning for Personalized Tag Recommendation

We have recently seen great progress in image classification due to the ...
research
07/17/2020

Augmented Understanding and Automated Adaptation of Curation Rules

Over the past years, there has been many efforts to curate and increase ...

Please sign up or login with your details

Forgot password? Click here to reset