Accelerated Policy Evaluation: Learning Adversarial Environments with Adaptive Importance Sampling

by   Mengdi Xu, et al.

The evaluation of rare but high-stakes events remains one of the main difficulties in obtaining reliable policies from intelligent agents, especially in large or continuous state/action spaces where limited scalability enforces the use of a prohibitively large number of testing iterations. On the other hand, a biased or inaccurate policy evaluation in a safety-critical system could potentially cause unexpected catastrophic failures during deployment. In this paper, we propose the Accelerated Policy Evaluation (APE) method, which simultaneously uncovers rare events and estimates the rare event probability in Markov decision processes. The APE method treats the environment nature as an adversarial agent and learns towards, through adaptive importance sampling, the zero-variance sampling distribution for the policy evaluation. Moreover, APE is scalable to large discrete or continuous spaces by incorporating function approximators. We investigate the convergence properties of proposed algorithms under suitable regularity conditions. Our empirical studies show that APE estimates rare event probability with a smaller variance while only using orders of magnitude fewer samples compared to baseline methods in both multi-agent and single-agent environments.



page 8

page 9

page 17


Certifiable Deep Importance Sampling for Rare-Event Simulation of Black-Box Systems

Rare-event simulation techniques, such as importance sampling (IS), cons...

Deep Probabilistic Accelerated Evaluation: A Certifiable Rare-Event Simulation Methodology for Black-Box Autonomy

Evaluating the reliability of intelligent physical systems against rare ...

Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures

This paper addresses the problem of evaluating learning systems in safet...

Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation

While recent developments in autonomous vehicle (AV) technology highligh...

Robust temporal difference learning for critical domains

We present a new Q-function operator for temporal difference (TD) learni...

SYMPAIS: SYMbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis

Probabilistic software analysis aims at quantifying the probability of a...

Rare event simulation for electronic circuit design

In this work, we propose an algorithm to simulate rare events for electr...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.