Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based Disparities

01/26/2023
by   Melissa Hall, et al.
0

We explore the extent to which zero-shot vision-language models exhibit gender bias for different vision tasks. Vision models traditionally required task-specific labels for representing concepts, as well as finetuning; zero-shot models like CLIP instead perform tasks with an open-vocabulary, meaning they do not need a fixed set of labels, by using text embeddings to represent concepts. With these capabilities in mind, we ask: Do vision-language models exhibit gender bias when performing zero-shot image classification, object detection and semantic segmentation? We evaluate different vision-language models with multiple datasets across a set of concepts and find (i) all models evaluated show distinct performance differences based on the perceived gender of the person co-occurring with a given concept in the image and that aggregating analyses over all concepts can mask these concerns; (ii) model calibration (i.e. the relationship between accuracy and confidence) also differs distinctly by perceived gender, even when evaluating on similar representations of concepts; and (iii) these observed disparities align with existing gender biases in word embeddings from language models. These findings suggest that, while language greatly expands the capability of vision tasks, it can also contribute to social biases in zero-shot vision settings. Furthermore, biases can further propagate when foundational models like CLIP are used by other models to enable zero-shot capabilities.

READ FULL TEXT
research
09/12/2022

VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

Vision-language models trained on large, randomly collected data had sig...
research
06/04/2023

SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model

With the development of large language models, many remarkable linguisti...
research
04/04/2023

Learning to Name Classes for Vision and Language Models

Large scale vision and language models can achieve impressive zero-shot ...
research
04/13/2023

What does CLIP know about a red circle? Visual prompt engineering for VLMs

Large-scale Vision-Language Models, such as CLIP, learn powerful image-t...
research
01/31/2023

Debiasing Vision-Language Models via Biased Prompts

Machine learning models have been shown to inherit biases from their tra...
research
07/16/2021

Intersectional Bias in Causal Language Models

To examine whether intersectional bias can be observed in language gener...
research
09/13/2023

Large Language Models Can Infer Psychological Dispositions of Social Media Users

As Large Language Models (LLMs) demonstrate increasingly human-like abil...

Please sign up or login with your details

Forgot password? Click here to reset