An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models

01/22/2023
by   Saghar Hosseini, et al.
20

Large-scale Pre-Trained Language Models (PTLMs) capture knowledge from massive human-written data which contains latent societal biases and toxic contents. In this paper, we leverage the primary task of PTLMs, i.e., language modeling, and propose a new metric to quantify manifested implicit representational harms in PTLMs towards 13 marginalized demographics. Using this metric, we conducted an empirical analysis of 24 widely used PTLMs. Our analysis provides insights into the correlation between the proposed metric in this work and other related metrics for representational harm. We observe that our metric correlates with most of the gender-specific metrics in the literature. Through extensive experiments, we explore the connections between PTLMs architectures and representational harms across two dimensions: depth and width of the networks. We found that prioritizing depth over width, mitigates representational harms in some PTLMs. Our code and data can be found at https://github.com/microsoft/SafeNLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2023

On Pre-trained Language Models for Antibody

Antibodies are vital proteins offering robust protection for the human b...
research
10/16/2021

An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-Trained Language Models

Recent work has shown that pre-trained language models capture social bi...
research
05/18/2023

Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency

With growing capabilities of large language models, prompting them has b...
research
10/20/2020

Neural Language Modeling for Contextualized Temporal Graph Generation

This paper presents the first study on using large-scale pre-trained lan...
research
08/21/2021

CushLEPOR: Customised hLEPOR Metric Using LABSE Distilled Knowledge Model to Improve Agreement with Human Judgements

Human evaluation has always been expensive while researchers struggle to...
research
07/15/2021

Solving ESL Sentence Completion Questions via Pre-trained Neural Language Models

Sentence completion (SC) questions present a sentence with one or more b...

Please sign up or login with your details

Forgot password? Click here to reset