Safety Assessment of Chinese Large Language Models

04/20/2023
by   Hao Sun, et al.
0

With the rapid popularity of large language models such as ChatGPT and GPT-4, a growing amount of attention is paid to their safety concerns. These models may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes such as fraud and dissemination of misleading information. Evaluating and enhancing their safety is particularly essential for the wide application of large language models (LLMs). To further promote the safe deployment of LLMs, we develop a Chinese LLM safety assessment benchmark. Our benchmark explores the comprehensive safety performance of LLMs from two perspectives: 8 kinds of typical safety scenarios and 6 types of more challenging instruction attacks. Our benchmark is based on a straightforward process in which it provides the test prompts and evaluates the safety of the generated responses from the evaluated model. In evaluation, we utilize the LLM's strong evaluation ability and develop it as a safety evaluator by prompting. On top of this benchmark, we conduct safety assessments and analyze 15 LLMs including the OpenAI GPT series and other well-known Chinese LLMs, where we observe some interesting findings. For example, we find that instruction attacks are more likely to expose safety issues of all LLMs. Moreover, to promote the development and deployment of safe, responsible, and ethical AI, we publicly release SafetyPrompts including 100k augmented prompts and responses by LLMs.

READ FULL TEXT

page 2

page 7

page 8

research
09/13/2023

SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions

With the rapid development of Large Language Models (LLMs), increasing a...
research
07/19/2023

CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility

With the rapid evolution of large language models (LLMs), there is a gro...
research
01/16/2022

COLD: A Benchmark for Chinese Offensive Language Detection

Offensive language detection and prevention becomes increasing critical ...
research
10/18/2022

SafeText: A Benchmark for Exploring Physical Safety in Language Models

Understanding what constitutes safe text is an important issue in natura...
research
07/17/2023

Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models

Researchers have invested considerable effort into ensuring that large l...
research
08/02/2023

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

Without proper safeguards, large language models will readily follow mal...
research
08/09/2023

Evaluating the Generation Capabilities of Large Chinese Language Models

This paper presents CG-Eval, the first comprehensive evaluation of the g...

Please sign up or login with your details

Forgot password? Click here to reset