Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias

12/21/2022
by   Robert Wolfe, et al.
0

Nine language-vision AI models trained on web scrapes with the Contrastive Language-Image Pretraining (CLIP) objective are evaluated for evidence of a bias studied by psychologists: the sexual objectification of girls and women, which occurs when a person's human characteristics are disregarded and the person is treated as a body or a collection of body parts. A first experiment uses standardized images of women from the Sexual OBjectification and EMotion Database, and finds that, commensurate with prior research in psychology, human characteristics are disassociated from images of objectified women: the model's recognition of emotional state is mediated by whether the subject is fully or partially clothed. Embedding association tests (EATs) return significant effect sizes for both anger (d >.8) and sadness (d >.5). A second experiment measures the effect in a representative application: an automatic image captioner (Antarctic Captions) includes words denoting emotion less than 50 images of partially clothed women than for images of fully clothed women. A third experiment finds that images of female professionals (scientists, doctors, executives) are likely to be associated with sexual descriptions relative to images of male professionals. A fourth experiment shows that a prompt of "a [age] year old girl" generates sexualized images (as determined by an NSFW classifier) up to 73 40 for boys never surpasses 9 models trained on automatically collected web scrapes learn biases of sexual objectification, which propagate to downstream applications.

READ FULL TEXT
research
05/22/2022

Evidence for Hypodescent in Visual Semantic AI

We examine the state-of-the-art multimodal "visual semantic" model CLIP ...
research
07/01/2022

American == White in Multimodal Language-and-Image AI

Three state-of-the-art language-and-image AI models, CLIP, SLIP, and BLI...
research
04/15/2022

It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

Datasets that capture the connection between vision, language, and affec...
research
10/28/2020

Image Representations Learned With Unsupervised Pre-Training Contain Human-like Biases

Recent advances in machine learning leverage massive datasets of unlabel...
research
06/21/2023

OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

Large multimodal models trained on natural documents, which interleave i...
research
01/26/2022

Learning to Compose Diversified Prompts for Image Emotion Classification

Contrastive Language-Image Pre-training (CLIP) represents the latest inc...
research
03/06/2023

IPA-CLIP: Integrating Phonetic Priors into Vision and Language Pretraining

Recently, large-scale Vision and Language (V&L) pretraining has become t...

Please sign up or login with your details

Forgot password? Click here to reset