-
Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning
A novel reinforcement learning scheme to synthesize policies for continu...
read it
-
Robust Finite-State Controllers for Uncertain POMDPs
Uncertain partially observable Markov decision processes (uPOMDPs) allow...
read it
-
Robust Combination of Local Controllers
Planning problems are hard, motion planning, for example, isPSPACE-hard....
read it
-
Thompson Sampling for Learning Parameterized Markov Decision Processes
We consider reinforcement learning in parameterized Markov Decision Proc...
read it
-
Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints
We study the problem of synthesizing a policy that maximizes the entropy...
read it
-
Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints
We formalize the problem of maximizing the mean-payoff value with high p...
read it
-
Fixed Points of the Set-Based Bellman Operator
Motivated by uncertain parameters encountered in Markov decision process...
read it
Scenario-Based Verification of Uncertain MDPs
We consider Markov decision processes (MDPs) in which the transition probabilities and rewards belong to an uncertainty set parametrized by a collection of random variables. The probability distributions for these random parameters are unknown. The problem is to compute the probability to satisfy a temporal logic specification within any MDP that corresponds to a sample from these unknown distributions. In general, this problem is undecidable, and we resort to techniques from so-called scenario optimization. Based on a finite number of samples of the uncertain parameters, each of which induces an MDP, the proposed method estimates the probability of satisfying the specification by solving a finite-dimensional convex optimization problem. The number of samples required to obtain a high confidence on this estimate is independent from the number of states and the number of random parameters. Experiments on a large set of benchmarks show that a few thousand samples suffice to obtain high-quality confidence bounds with a high probability.
READ FULL TEXT
Comments
There are no comments yet.