Safety and Fairness for Content Moderation in Generative Models

06/09/2023
by   Susan Hao, et al.
0

With significant advances in generative AI, new technologies are rapidly being deployed with generative components. Generative models are typically trained on large datasets, resulting in model behaviors that can mimic the worst of the content in the training data. Responsible deployment of generative technologies requires content moderation strategies, such as safety input and output filters. Here, we provide a theoretical framework for conceptualizing responsible content moderation of text-to-image generative technologies, including a demonstration of how to empirically measure the constructs we enumerate. We define and distinguish the concepts of safety, fairness, and metric equity, and enumerate example harms that can come in each domain. We then provide a demonstration of how the defined harms can be quantified. We conclude with a summary of how the style of harms quantification we demonstrate enables data-driven content moderation decisions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2023

SneakyPrompt: Evaluating Robustness of Text-to-image Generative Models' Safety Filters

Text-to-image generative models such as Stable Diffusion and DALL·E 2 ha...
research
07/16/2021

Measuring Fairness in Generative Models

Deep generative models have made much progress in improving training sta...
research
08/03/2023

Circumventing Concept Erasure Methods For Text-to-Image Generative Models

Text-to-image generative models can produce photo-realistic images for a...
research
10/14/2020

Recipes for Safety in Open-domain Chatbots

Models trained on large unlabeled corpora of human interactions will lea...
research
09/20/2023

Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge

Text-conditioned image generation models have recently achieved astonish...
research
05/24/2023

Can Copyright be Reduced to Privacy?

There is an increasing concern that generative AI models may produce out...
research
09/08/2023

Down the Toxicity Rabbit Hole: Investigating PaLM 2 Guardrails

This paper conducts a robustness audit of the safety feedback of PaLM 2 ...

Please sign up or login with your details

Forgot password? Click here to reset