Value Alignment Verification

12/02/2020
by   Daniel S. Brown, et al.
0

As humans interact with autonomous agents to perform increasingly complicated, potentially risky tasks, it is important that humans can verify these agents' trustworthiness and efficiently evaluate their performance and correctness. In this paper we formalize the problem of value alignment verification: how to efficiently test whether the goals and behavior of another agent are aligned with a human's values? We explore several different value alignment verification settings and provide foundational theory regarding value alignment verification. We study alignment verification problems with an idealized human that has an explicit reward function as well as value alignment verification problems where the human has implicit values. Our theoretical and empirical results in both a discrete grid navigation domain and a continuous autonomous driving domain demonstrate that it is possible to synthesize highly efficient and accurate value alignment verification tests for certifying the alignment of autonomous agents.

READ FULL TEXT
research
10/18/2021

Value alignment: a formal approach

principles that should govern autonomous AI systems. It essentially stat...
research
05/12/2023

Multi-Value Alignment in Normative Multi-Agent System: Evolutionary Optimisation Approach

Value-alignment in normative multi-agent systems is used to promote a ce...
research
10/08/2021

Explaining Reward Functions to Humans for Better Human-Robot Collaboration

Explainable AI techniques that describe agent reward functions can enhan...
research
12/07/2019

Learning Norms from Stories: A Prior for Value Aligned Agents

Value alignment is a property of an intelligent agent indicating that it...
research
06/09/2016

Cooperative Inverse Reinforcement Learning

For an autonomous system to be helpful to humans and to pose no unwarran...
research
08/23/2023

From Instructions to Intrinsic Human Values – A Survey of Alignment Goals for Big Models

Big models, exemplified by Large Language Models (LLMs), are models typi...
research
09/16/2020

Value Alignment Equilibrium in Multiagent Systems

Value alignment has emerged in recent years as a basic principle to prod...

Please sign up or login with your details

Forgot password? Click here to reset