MDPFuzzer: Finding Crash-Triggering State Sequences in Models Solving the Markov Decision Process

12/06/2021
by   Qi Pang, et al.
0

The Markov decision process (MDP) provides a mathematical framework for modeling sequential decision-making problems, many of which are crucial to security and safety, such as autonomous driving and robot control. The rapid development of artificial intelligence research has created efficient methods for solving MDPs, such as deep neural networks (DNNs), reinforcement learning (RL), and imitation learning (IL). However, these popular models for solving MDPs are neither thoroughly tested nor rigorously reliable. We present MDPFuzzer, the first blackbox fuzz testing framework for models solving MDPs. MDPFuzzer forms testing oracles by checking whether the target model enters abnormal and dangerous states. During fuzzing, MDPFuzzer decides which mutated state to retain by measuring if it can reduce cumulative rewards or form a new state sequence. We design efficient techniques to quantify the "freshness" of a state sequence using Gaussian mixture models (GMMs) and dynamic expectation-maximization (DynEM). We also prioritize states with high potential of revealing crashes by estimating the local sensitivity of target models over states. MDPFuzzer is evaluated on five state-of-the-art models for solving MDPs, including supervised DNN, RL, IL, and multi-agent RL. Our evaluation includes scenarios of autonomous driving, aircraft collision avoidance, and two games that are often used to benchmark RL. During a 12-hour run, we find over 80 crash-triggering state sequences on each model. We show inspiring findings that crash-triggering states, though look normal, induce distinct neuron activation patterns compared with normal states. We further develop an abnormal behavior detector to harden all the evaluated models and repair them with the findings of MDPFuzzer to significantly enhance their robustness without sacrificing accuracy.

READ FULL TEXT

page 3

page 9

research
12/05/2019

Reinforcement Learning with Non-Markovian Rewards

The standard RL world model is that of a Markov Decision Process (MDP). ...
research
05/13/2021

Reinforcement Learning Based Safe Decision Making for Highway Autonomous Driving

In this paper, we develop a safe decision-making method for self-driving...
research
11/11/2019

Multi-Agent Connected Autonomous Driving using Deep Reinforcement Learning

The capability to learn and adapt to changes in the driving environment ...
research
05/01/2022

Processing Network Controls via Deep Reinforcement Learning

Novel advanced policy gradient (APG) algorithms, such as proximal policy...
research
04/15/2020

Reinforcement Learning in a Physics-Inspired Semi-Markov Environment

Reinforcement learning (RL) has been demonstrated to have great potentia...
research
03/16/2022

Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act

Traditionally, Reinforcement Learning (RL) aims at deciding how to act o...

Please sign up or login with your details

Forgot password? Click here to reset