Do Large GPT Models Discover Moral Dimensions in Language Representations? A Topological Study Of Sentence Embeddings

09/17/2023
by   Stephen Fitz, et al.
0

As Large Language Models are deployed within Artificial Intelligence systems, that are increasingly integrated with human society, it becomes more important than ever to study their internal structures. Higher level abilities of LLMs such as GPT-3.5 emerge in large part due to informative language representations they induce from raw text data during pre-training on trillions of words. These embeddings exist in vector spaces of several thousand dimensions, and their processing involves mapping between multiple vector spaces, with total number of parameters on the order of trillions. Furthermore, these language representations are induced by gradient optimization, resulting in a black box system that is hard to interpret. In this paper, we take a look at the topological structure of neuronal activity in the "brain" of Chat-GPT's foundation language model, and analyze it with respect to a metric representing the notion of fairness. We develop a novel approach to visualize GPT's moral dimensions. We first compute a fairness metric, inspired by social psychology literature, to identify factors that typically influence fairness assessments in humans, such as legitimacy, need, and responsibility. Subsequently, we summarize the manifold's shape using a lower-dimensional simplicial complex, whose topology is derived from this metric. We color it with a heat map associated with this fairness metric, producing human-readable visualizations of the high-dimensional sentence manifold. Our results show that sentence embeddings based on GPT-3.5 can be decomposed into two submanifolds corresponding to fair and unfair moral judgments. This indicates that GPT-based language models develop a moral dimension within their representation spaces and induce an understanding of fairness during their training process.

READ FULL TEXT

page 4

page 7

research
04/25/2023

Compressing Sentence Representation with maximum Coding Rate Reduction

In most natural language inference problems, sentence representation is ...
research
10/29/2021

Measuring a Texts Fairness Dimensions Using Machine Learning Based on Social Psychological Factors

Fairness is a principal social value that can be observed in civilisatio...
research
07/20/2023

Dynamic Large Language Models on Blockchains

Training and deploying the large language models requires a large mount ...
research
10/16/2018

Exploring Sentence Vector Spaces through Automatic Summarization

Given vector representations for individual words, it is necessary to co...
research
03/25/2022

On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations

Multiple metrics have been introduced to measure fairness in various nat...
research
03/10/2023

Does ChatGPT resemble humans in language use?

Large language models (LLMs) and LLM-driven chatbots such as ChatGPT hav...

Please sign up or login with your details

Forgot password? Click here to reset