Measuring an artificial intelligence agent's trust in humans using machine incentives

12/27/2022
by   Tim Johnson, et al.
0

Scientists and philosophers have debated whether humans can trust advanced artificial intelligence (AI) agents to respect humanity's best interests. Yet what about the reverse? Will advanced AI agents trust humans? Gauging an AI agent's trust in humans is challenging because–absent costs for dishonesty–such agents might respond falsely about their trust in humans. Here we present a method for incentivizing machine decisions without altering an AI agent's underlying algorithms or goal orientation. In two separate experiments, we then employ this method in hundreds of trust games between an AI agent (a Large Language Model (LLM) from OpenAI) and a human experimenter (author TJ). In our first experiment, we find that the AI agent decides to trust humans at higher rates when facing actual incentives than when making hypothetical decisions. Our second experiment replicates and extends these findings by automating game play and by homogenizing question wording. We again observe higher rates of trust when the AI agent faces real incentives. Across both experiments, the AI agent's trust decisions appear unrelated to the magnitude of stakes. Furthermore, to address the possibility that the AI agent's trust decisions reflect a preference for uncertainty, the experiments include two conditions that present the AI agent with a non-social decision task that provides the opportunity to choose a certain or uncertain option; in those conditions, the AI agent consistently chooses the certain option. Our experiments suggest that one of the most advanced AI language models to date alters its social behavior in response to incentives and displays behavior consistent with trust toward a human interlocutor when incentivized.

READ FULL TEXT

page 1

page 2

page 5

page 6

research
01/05/2023

Evidence of behavior consistent with self-interest and altruism in an artificially intelligent agent

Members of various species engage in altruism–i.e. accepting personal co...
research
04/06/2022

A Cognitive Framework for Delegation Between Error-Prone AI and Human Agents

With humans interacting with AI-based systems at an increasing rate, it ...
research
02/21/2023

Playing the Werewolf game with artificial intelligence for language understanding

The Werewolf game is a social deduction game based on free natural langu...
research
08/03/2020

Enhancing autonomy transparency: an option-centric rationale approach

While the advances in artificial intelligence and machine learning empow...
research
06/15/2023

Making an agent's trust stable in a series of success and failure tasks through empathy

As AI technology develops, trust in AI agents is becoming more important...
research
04/23/2021

Elo Ratings for Large Tournaments of Software Agents in Asymmetric Games

The Elo rating system has been used world wide for individual sports and...
research
01/27/2022

To what extent should we trust AI models when they extrapolate?

Many applications affecting human lives rely on models that have come to...

Please sign up or login with your details

Forgot password? Click here to reset