Debiasing should be Good and Bad: Measuring the Consistency of Debiasing Techniques in Language Models

05/23/2023
by   Robert Morabito, et al.
0

Debiasing methods that seek to mitigate the tendency of Language Models (LMs) to occasionally output toxic or inappropriate text have recently gained traction. In this paper, we propose a standardized protocol which distinguishes methods that yield not only desirable results, but are also consistent with their mechanisms and specifications. For example, we ask, given a debiasing method that is developed to reduce toxicity in LMs, if the definition of toxicity used by the debiasing method is reversed, would the debiasing results also be reversed? We used such considerations to devise three criteria for our new protocol: Specification Polarity, Specification Importance, and Domain Transferability. As a case study, we apply our protocol to a popular debiasing method, Self-Debiasing, and compare it to one we propose, called Instructive Debiasing, and demonstrate that consistency is as important an aspect to debiasing viability as is simply a desirable result. We show that our protocol provides essential insights into the generalizability and interpretability of debiasing methods that may otherwise go overlooked.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2017

Formal specification of the FlexRay protocol using FocusST

FlexRay is a communication protocol developed by the FlexRay Consortium....
research
01/25/2023

Tutorial on the Executable ASM Specification of the AB Protocol and Comparison with TLA^+

The main aim of this report is to provide an introductory tutorial on th...
research
06/06/2023

Impact of Large Language Models on Generating Software Specifications

Software specifications are essential for ensuring the reliability of so...
research
07/31/2019

What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models

Pre-training by language modeling has become a popular and successful ap...
research
05/23/2023

Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs

Large language models (LLMs) have achieved widespread success on a varie...
research
08/10/2023

Do Language Models Refer?

What do language models (LMs) do with language? Everyone agrees that the...
research
01/22/2023

SPEC5G: A Dataset for 5G Cellular Network Protocol Analysis

5G is the 5th generation cellular network protocol. It is the state-of-t...

Please sign up or login with your details

Forgot password? Click here to reset