SafeText: A Benchmark for Exploring Physical Safety in Language Models

10/18/2022
by   Sharon Levy, et al.
0

Understanding what constitutes safe text is an important issue in natural language processing and can often prevent the deployment of models deemed harmful and unsafe. One such type of safety that has been scarcely studied is commonsense physical safety, i.e. text that is not explicitly violent and requires additional commonsense knowledge to comprehend that it leads to physical harm. We create the first benchmark dataset, SafeText, comprising real-life scenarios with paired safe and physically unsafe pieces of advice. We utilize SafeText to empirically study commonsense physical safety across various models designed for text generation and commonsense reasoning tasks. We find that state-of-the-art large language models are susceptible to the generation of unsafe text and have difficulty rejecting unsafe advice. As a result, we argue for further studies of safety and the assessment of commonsense physical safety in models before release.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2023

Knowledge Graph-Augmented Korean Generative Commonsense Reasoning

Generative commonsense reasoning refers to the task of generating accept...
research
11/13/2019

Can a Gorilla Ride a Camel? Learning Semantic Plausibility from Text

Modeling semantic plausibility requires commonsense knowledge about the ...
research
12/19/2022

Foveate, Attribute, and Rationalize: Towards Safe and Trustworthy AI

Users' physical safety is an increasing concern as the market for intell...
research
10/17/2022

Mitigating Covertly Unsafe Text within Natural Language Systems

An increasingly prevalent problem for intelligent technologies is text s...
research
06/04/2023

Probing Physical Reasoning with Counter-Commonsense Context

In this study, we create a CConS (Counter-commonsense Contextual Size co...
research
08/08/2019

Do Neural Language Representations Learn Physical Commonsense?

Humans understand language based on the rich background knowledge about ...
research
04/20/2023

Safety Assessment of Chinese Large Language Models

With the rapid popularity of large language models such as ChatGPT and G...

Please sign up or login with your details

Forgot password? Click here to reset