DeepAI AI Chat
Log In Sign Up

Computer Vision and Conflicting Values: Describing People with Automated Alt Text

by   Margot Hanley, et al.

Scholars have recently drawn attention to a range of controversial issues posed by the use of computer vision for automatically generating descriptions of people in images. Despite these concerns, automated image description has become an important tool to ensure equitable access to information for blind and low vision people. In this paper, we investigate the ethical dilemmas faced by companies that have adopted the use of computer vision for producing alt text: textual descriptions of images for blind and low vision people, We use Facebook's automatic alt text tool as our primary case study. First, we analyze the policies that Facebook has adopted with respect to identity categories, such as race, gender, age, etc., and the company's decisions about whether to present these terms in alt text. We then describe an alternative – and manual – approach practiced in the museum community, focusing on how museums determine what to include in alt text descriptions of cultural artifacts. We compare these policies, using notable points of contrast to develop an analytic framework that characterizes the particular apprehensions behind these policy choices. We conclude by considering two strategies that seem to sidestep some of these concerns, finding that there are no easy ways to avoid the normative dilemmas posed by the use of computer vision to automate alt text.


page 1

page 2

page 3

page 4


se-Shweshwe Inspired Fashion Generation

Fashion is one of the ways in which we show ourselves to the world. It i...

Image Specificity

For some images, descriptions written by multiple people are consistent ...

An Ethical Highlighter for People-Centric Dataset Creation

Important ethical concerns arising from computer vision datasets of peop...

Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy

Computer vision technology is being used by many but remains representat...

Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review

Research in the area of Vision and Language encompasses challenging topi...

An Image Processing Pipeline for Automated Packaging Structure Recognition

Dispatching and receiving logistics goods, as well as transportation its...

Photographic home styles in Congress: a computer vision approach

While members of Congress now routinely communicate with constituents us...