Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden Rewards

08/13/2023
by   Ilgin Dogan, et al.
0

In practice, incentive providers (i.e., principals) often cannot observe the reward realizations of incentivized agents, which is in contrast to many principal-agent models that have been previously studied. This information asymmetry challenges the principal to consistently estimate the agent's unknown rewards by solely watching the agent's decisions, which becomes even more challenging when the agent has to learn its own rewards. This complex setting is observed in various real-life scenarios ranging from renewable energy storage contracts to personalized healthcare incentives. Hence, it offers not only interesting theoretical questions but also wide practical relevance. This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal. The agent tackles a multi-armed bandit (MAB) problem to maximize their expected reward plus incentive. On top of the agent's learning, the principal trains a parallel algorithm and faces a trade-off between consistently estimating the agent's unknown rewards and maximizing their own utility by offering adaptive incentives to lead the agent. For a non-parametric model, we introduce an estimator whose only input is the history of principal's incentives and agent's choices. We unite this estimator with a proposed data-driven incentive policy within a MAB framework. Without restricting the type of the agent's algorithm, we prove finite-sample consistency of the estimator and a rigorous regret bound for the principal by considering the sequential externality imposed by the agent. Lastly, our theoretical results are reinforced by simulations justifying applicability of our framework to green energy aggregator contracts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2023

Repeated Principal-Agent Games with Unobserved Agent Rewards and Perfect-Knowledge Agents

Motivated by a number of real-world applications from domains like healt...
research
05/21/2019

Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem

We define and analyze a multi-agent multi-armed bandit problem in which ...
research
04/08/2020

A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

We define and analyze a multi-agent multi-armed bandit problem in which ...
research
11/11/2019

Optimal Common Contract with Heterogeneous Agents

We consider the principal-agent problem with heterogeneous agents. Previ...
research
08/10/2018

Simple versus Optimal Contracts

We consider the classic principal-agent model of contract theory, in whi...
research
09/15/2018

Incorporating Behavioral Constraints in Online AI Systems

AI systems that learn through reward feedback about the actions they tak...
research
03/28/2019

Towards a Theory of Systems Engineering Processes: A Principal-Agent Model of a One-Shot, Shallow Process

Systems engineering processes coordinate the effort of different individ...

Please sign up or login with your details

Forgot password? Click here to reset