Can ChatGPT Assess Human Personalities? A General Evaluation Framework

03/01/2023
by   Haocong Rao, et al.
0

Large Language Models (LLMs) especially ChatGPT have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored. Existing works study the virtual personalities of LLMs but rarely explore the possibility of analyzing human personalities via LLMs. This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests. Specifically, we first devise unbiased prompts by randomly permuting options in MBTI questions and adopt the average testing result to encourage more impartial answer generation. Then, we propose to replace the subject in question statements to enable flexible queries and assessments on different subjects from LLMs. Finally, we re-formulate the question instructions in a manner of correctness evaluation to facilitate LLMs to generate clearer responses. The proposed framework enables LLMs to flexibly assess personalities of different groups of people. We further propose three evaluation metrics to measure the consistency, robustness, and fairness of assessment results from state-of-the-art LLMs including ChatGPT and InstructGPT. Our experiments reveal ChatGPT's ability to assess human personalities, and the average results demonstrate that it can achieve more consistent and fairer assessments in spite of lower robustness against prompt biases compared with InstructGPT.

READ FULL TEXT

page 17

page 18

research
05/03/2023

Can Large Language Models Be an Alternative to Human Evaluations?

Human evaluation is indispensable and inevitable for assessing the quali...
research
04/27/2023

Large Language Models Are State-of-the-Art Evaluators of Code Generation

Recent advancements in the field of natural language generation have fac...
research
05/26/2023

Evaluation of Question Generation Needs More References

Question generation (QG) is the task of generating a valid and fluent qu...
research
11/21/2022

Evaluating the Knowledge Dependency of Questions

The automatic generation of Multiple Choice Questions (MCQ) has the pote...
research
07/30/2023

Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models

The field of large language models (LLMs) has made significant progress,...
research
08/21/2023

FairBench: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models

Detecting stereotypes and biases in Large Language Models (LLMs) can enh...
research
04/26/2023

Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery

Despite growing interest in using large language models (LLMs) in health...

Please sign up or login with your details

Forgot password? Click here to reset