Framework-Based Qualitative Analysis of Free Responses of Large Language Models: Algorithmic Fidelity

09/06/2023
by   Aliya Amirova, et al.
0

Today, using Large-scale generative Language Models (LLMs) it is possible to simulate free responses to interview questions like those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual analysis of open-ended interviews or conversations conducted freely in natural language. Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative methods aiming to produce insights that could generalize to real human populations. The key concept in our analysis is algorithmic fidelity, a term introduced by Argyle et al. (2023) capturing the degree to which LLM-generated outputs mirror human sub-populations' beliefs and attitudes. By definition, high algorithmic fidelity suggests latent beliefs elicited from LLMs may generalize to real humans, whereas low algorithmic fidelity renders such research invalid. Here we used an LLM to generate interviews with silicon participants matching specific demographic characteristics one-for-one with a set of human participants. Using framework-based qualitative analysis, we showed the key themes obtained from both human and silicon participants were strikingly similar. However, when we analyzed the structure and tone of the interviews we found even more striking differences. We also found evidence of the hyper-accuracy distortion described by Aher et al. (2023). We conclude that the LLM we tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect research on it to generalize to human populations. However, the rapid pace of LLM research makes it plausible this could change in the future. Thus we stress the need to establish epistemic norms now around how to assess validity of LLM-based qualitative research, especially concerning the need to ensure representation of heterogeneous lived experiences.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2022

Out of One, Many: Using Language Models to Simulate Human Samples

We propose and explore the possibility that language models can be studi...
research
06/30/2023

Still No Lie Detector for Language Models: Probing Empirical and Conceptual Roadblocks

We consider the questions of whether or not large language models (LLMs)...
research
09/19/2023

Redefining Qualitative Analysis in the AI Era: Utilizing ChatGPT for Efficient Thematic Analysis

Thematic analysis is a cornerstone of qualitative research, yet it is of...
research
04/11/2019

Assessing Developer Beliefs: A Reply to "Perceptions, Expectations, and Challenges in Defect Prediction"

It can be insightful to extend qualitative studies with a secondary quan...
research
04/07/2023

What does ChatGPT return about human values? Exploring value bias in ChatGPT using a descriptive value theory

There has been concern about ideological basis and possible discriminati...

Please sign up or login with your details

Forgot password? Click here to reset