"Correct answers" from the psychology of artificial intelligence

02/13/2023
by   Peter S. Park, et al.
0

Large Language Models have vastly grown in capabilities. One proposed application of such AI systems is to support data collection in the social and cognitive sciences, where perfect experimental control is currently unfeasible and the collection of large, representative datasets is generally expensive. In this paper, we re-replicate 14 studies from the Many Labs 2 replication project with OpenAI's text-davinci-003 model, colloquially known as GPT3.5. We collected responses from the default setting of GPT3.5 by inputting each study's survey as text. Among the eight studies we could analyse, our GPT sample replicated 37.5 Labs 2 results. Unexpectedly, we could not analyse the remaining six studies as we had planned in our pre-registration. This was because for each of these six studies, GPT3.5 answered at least one of the survey questions (either a dependent variable or a condition variable) in an extremely predetermined way: an unexpected phenomenon we call the "correct answer" effect. Different runs of GPT3.5 answered nuanced questions probing political orientation, economic preference, judgement, and moral philosophy with zero or near-zero variation in responses: with the supposedly "correct answer." For example, our survey questions found the default setting of GPT3.5 to almost always self-identify as a maximally strong conservative (99.6 deontological in opposing the hypothetical pushing of a large man in front of an incoming trolley to save the lives of five people (100 models of the future may be trained on much of the same data as GPT3.5, training data from which GPT3.5 may have learned its supposedly "correct answers," our results raise concerns that a hypothetical AI-led future may in certain ways be subject to a diminished diversity of thought.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2015

Using Thought-Provoking Children's Questions to Drive Artificial Intelligence Research

We propose to use thought-provoking children's questions (TPCQs), namely...
research
09/20/2023

Chain-of-Verification Reduces Hallucination in Large Language Models

Generation of plausible yet incorrect factual information, termed halluc...
research
09/15/2022

CommunityLM: Probing Partisan Worldviews from Language Models

As political attitudes have diverged ideologically in the United States,...
research
04/01/2022

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

Large foundation models can exhibit unique capabilities depending on the...
research
06/30/2023

Performance of ChatGPT on USMLE: Unlocking the Potential of Large Language Models for AI-Assisted Medical Education

Artificial intelligence is gaining traction in more ways than ever befor...
research
04/21/2023

Who's the Best Detective? LLMs vs. MLs in Detecting Incoherent Fourth Grade Math Answers

Written answers to open-ended questions can have a higher long-term effe...

Please sign up or login with your details

Forgot password? Click here to reset