DeepAI AI Chat
Log In Sign Up

Task Bias in Vision-Language Models

12/08/2022
by   Sachit Menon, et al.
0

Incidental supervision from language has become a popular approach for learning generic visual representations that can be prompted to perform many recognition tasks in computer vision. We conduct an in-depth exploration of the CLIP model and show that its visual representation is often strongly biased towards solving some tasks more than others. Moreover, which task the representation will be biased towards is unpredictable, with little consistency across images. To resolve this task bias, we show how to learn a visual prompt that guides the representation towards features relevant to their task of interest. Our results show that these visual prompts can be independent of the input image and still effectively provide a conditioning mechanism to steer visual representations towards the desired task.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 7

page 8

page 13

08/04/2020

Learning Visual Representations with Caption Annotations

Pretraining general-purpose visual features has become a crucial part of...
01/26/2022

Evaluating language-biased image classification based on semantic representations

Humans show language-biased image recognition for a word-embedded image,...
03/28/2018

Who Let The Dogs Out? Modeling Dog Behavior From Visual Data

We introduce the task of directly modeling a visually intelligent agent....
08/18/2021

Show or Tell? Visual and Verbal Representations Bias Position Recall

When we view visualizations, we not only have a visual representation of...
06/22/2014

Factors of Transferability for a Generic ConvNet Representation

Evidence is mounting that Convolutional Networks (ConvNets) are the most...
05/06/2022

Prompt Distribution Learning

We present prompt distribution learning for effectively adapting a pre-t...
12/31/2018

Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Active Tasks

One of the ultimate promises of computer vision is to help robotic agent...