SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions

09/13/2023
by   Zhexin Zhang, et al.
0

With the rapid development of Large Language Models (LLMs), increasing attention has been paid to their safety concerns. Consequently, evaluating the safety of LLMs has become an essential task for facilitating the broad applications of LLMs. Nevertheless, the absence of comprehensive safety evaluation benchmarks poses a significant impediment to effectively assess and enhance the safety of LLMs. In this work, we present SafetyBench, a comprehensive benchmark for evaluating the safety of LLMs, which comprises 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns. Notably, SafetyBench also incorporates both Chinese and English data, facilitating the evaluation in both languages. Our extensive tests over 25 popular Chinese and English LLMs in both zero-shot and few-shot settings reveal a substantial performance advantage for GPT-4 over its counterparts, and there is still significant room for improving the safety of current LLMs. We believe SafetyBench will enable fast and comprehensive evaluation of LLMs' safety, and foster the development of safer LLMs. Data and evaluation guidelines are available at https://github.com/thu-coai/SafetyBench. Submission entrance and leaderboard are available at https://llmbench.ai/safety.

READ FULL TEXT

page 2

page 4

research
04/20/2023

Safety Assessment of Chinese Large Language Models

With the rapid popularity of large language models such as ChatGPT and G...
research
08/12/2023

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

Safety lies at the core of the development of Large Language Models (LLM...
research
03/31/2023

Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations

As large language models (LLMs) gain popularity among speakers of divers...
research
08/28/2023

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

Although large language models (LLMs) demonstrate impressive performance...
research
03/29/2023

Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams

The present study aims to explore the capabilities of Language Models (L...
research
05/15/2023

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

New NLP benchmarks are urgently needed to align with the rapid developme...
research
05/22/2023

ExplainCPE: A Free-text Explanation Benchmark of Chinese Pharmacist Examination

As ChatGPT and GPT-4 spearhead the development of Large Language Models ...

Please sign up or login with your details

Forgot password? Click here to reset