TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification

09/13/2023
by   M. Jehanzeb Mirza, et al.
0

Vision and Language Models (VLMs), such as CLIP, have enabled visual recognition of a potentially unlimited set of categories described by text prompts. However, for the best visual recognition performance, these models still require tuning to better fit the data distributions of the downstream tasks, in order to overcome the domain shift from the web-based pre-training data. Recently, it has been shown that it is possible to effectively tune VLMs without any paired data, and in particular to effectively improve VLMs visual recognition performance using text-only training data generated by Large Language Models (LLMs). In this paper, we dive deeper into this exciting text-only VLM training approach and explore ways it can be significantly further improved taking the specifics of the downstream task into account when sampling text data from LLMs. In particular, compared to the SOTA text-only VLM training approach, we demonstrate up to 8.4 domain-specific adaptation, up to 8.7 and 3.1 strong baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training

Automatic text simplification systems help to reduce textual information...
research
09/02/2021

Learning to Prompt for Vision-Language Models

Vision-language pre-training has recently emerged as a promising alterna...
research
06/10/2021

Linguistically Informed Masking for Representation Learning in the Patent Domain

Domain-specific contextualized language models have demonstrated substan...
research
01/05/2023

CiT: Curation in Training for Effective Vision-Language Data

Large vision-language models are generally applicable to many downstream...
research
09/02/2023

Studying the impacts of pre-training using ChatGPT-generated text on downstream tasks

In recent times, significant advancements have been witnessed in the fie...
research
09/12/2023

Text Encoders Lack Knowledge: Leveraging Generative LLMs for Domain-Specific Semantic Textual Similarity

Amidst the sharp rise in the evaluation of large language models (LLMs) ...
research
04/28/2023

Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4

In this work, we carry out a data archaeology to infer books that are kn...

Please sign up or login with your details

Forgot password? Click here to reset