Making Human-Like Trade-offs in Constrained Environments by Learning from Demonstrations

by   Arie Glazier, et al.

Many real-life scenarios require humans to make difficult trade-offs: do we always follow all the traffic rules or do we violate the speed limit in an emergency? These scenarios force us to evaluate the trade-off between collective norms and our own personal objectives. To create effective AI-human teams, we must equip AI agents with a model of how humans make trade-offs in complex, constrained environments. These agents will be able to mirror human behavior or to draw human attention to situations where decision making could be improved. To this end, we propose a novel inverse reinforcement learning (IRL) method for learning implicit hard and soft constraints from demonstrations, enabling agents to quickly adapt to new settings. In addition, learning soft constraints over states, actions, and state features allows agents to transfer this knowledge to new domains that share similar aspects. We then use the constraint learning method to implement a novel system architecture that leverages a cognitive model of human decision making, multi-alternative decision field theory (MDFT), to orchestrate competing objectives. We evaluate the resulting agent on trajectory length, number of violated constraints, and total reward, demonstrating that our agent architecture is both general and achieves strong performance. Thus we are able to capture and replicate human-like trade-offs from demonstrations in environments when constraints are not explicit.


page 1

page 2

page 3

page 4


Learning Behavioral Soft Constraints from Demonstrations

Many real-life scenarios require humans to make difficult trade-offs: do...

Maximum Causal Entropy Inverse Constrained Reinforcement Learning

When deploying artificial agents in real-world environments where they i...

Designing Interfaces to Help Stakeholders Comprehend, Navigate, and Manage Algorithmic Trade-Offs

Artificial intelligence algorithms have been applied to a wide variety o...

Adaptive Trade-Offs in Off-Policy Learning

A great variety of off-policy learning algorithms exist in the literatur...

Reasoning about Counterfactuals to Improve Human Inverse Reinforcement Learning

To collaborate well with robots, we must be able to understand their dec...

AI, how can humans communicate better with you?

Artificial intelligence(AI) systems and humans communicate more and more...

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Artificial agents have traditionally been trained to maximize reward, wh...

Please sign up or login with your details

Forgot password? Click here to reset