Investigating the Applicability of Self-Assessment Tests for Personality Measurement of Large Language Models

09/15/2023
by   Akshat Gupta, et al.
0

As large language models (LLM) evolve in their capabilities, various recent studies have tried to quantify their behavior using psychological tools created to study human behavior. One such example is the measurement of "personality" of LLMs using personality self-assessment tests. In this paper, we take three such studies on personality measurement of LLMs that use personality self-assessment tests created to study human behavior. We use the prompts used in these three different papers to measure the personality of the same LLM. We find that all three prompts lead very different personality scores. This simple test reveals that personality self-assessment scores in LLMs depend on the subjective choice of the prompter. Since we don't know the ground truth value of personality scores for LLMs as there is no correct answer to such questions, there's no way of claiming if one prompt is more or less correct than the other. We then introduce the property of option order symmetry for personality measurement of LLMs. Since most of the self-assessment tests exist in the form of multiple choice question (MCQ) questions, we argue that the scores should also be robust to not just the prompt template but also the order in which the options are presented. This test unsurprisingly reveals that the answers to the self-assessment tests are not robust to the order of the options. These simple tests, done on ChatGPT and Llama2 models show that self-assessment personality tests created for humans are not appropriate for measuring personality in LLMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs

Have Large Language Models (LLMs) developed a personality? The short ans...
research
09/28/2022

Who is GPT-3? An Exploration of Personality, Values and Demographics

Language models such as GPT-3 have caused a furore in the research commu...
research
04/06/2021

Subjective Assessment Experiments That Recruit Few Observers With Repetitions (FOWR)

Recent studies have shown that it is possible to characterize subject bi...
research
08/22/2023

Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions

Large Language Models (LLMs) have demonstrated remarkable capabilities i...
research
07/11/2022

Language Models (Mostly) Know What They Know

We study whether language models can evaluate the validity of their own ...
research
02/28/2023

Self-normalized, score-based tests of mixed models

Score-based tests have been used to study parameter heterogeneity across...
research
05/24/2023

ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of Mind

Theory of Mind (ToM), the capacity to comprehend the mental states of di...

Please sign up or login with your details

Forgot password? Click here to reset