Query-Efficient Black-Box Red Teaming via Bayesian Optimization

05/27/2023
by   Deokjae Lee, et al.
0

The deployment of large-scale generative models is often restricted by their potential risk of causing harm to users in unpredictable ways. We focus on the problem of black-box red teaming, where a red team generates test cases and interacts with the victim model to discover a diverse set of failures with limited query access. Existing red teaming methods construct test cases based on human supervision or language model (LM) and query all test cases in a brute-force manner without incorporating any information from past evaluations, resulting in a prohibitively large number of queries. To this end, we propose Bayesian red teaming (BRT), novel query-efficient black-box red teaming methods based on Bayesian optimization, which iteratively identify diverse positive test cases leading to model failures by utilizing the pre-defined user input pool and the past evaluations. Experimental results on various user input pools demonstrate that our method consistently finds a significantly larger number of diverse positive test cases under the limited query budget than the baseline methods. The source code is available at https://github.com/snu-mllab/Bayesian-Red-Teaming.

READ FULL TEXT
research
06/17/2022

Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete Sequential Data via Bayesian Optimization

We focus on the problem of adversarial attacks against models on discret...
research
12/23/2021

Flakify: A Black-Box, Language Model-based Predictor for Flaky Tests

Software testing assures that code changes do not adversely affect exist...
research
02/07/2022

Red Teaming Language Models with Language Models

Language Models (LMs) often cannot be deployed because of their potentia...
research
08/28/2023

RefSearch: A Search Engine for Refactoring

Developers often refactor source code to improve its quality during soft...
research
06/15/2023

Explore, Establish, Exploit: Red Teaming Language Models from Scratch

Deploying Large language models (LLMs) can pose hazards from harmful out...
research
02/10/2021

Explaining Inference Queries with Bayesian Optimization

Obtaining an explanation for an SQL query result can enrich the analysis...
research
08/29/2023

Evaluation of Real-World Risk-Based Authentication at Online Services Revisited: Complexity Wins

Risk-based authentication (RBA) aims to protect end-users against attack...

Please sign up or login with your details

Forgot password? Click here to reset