
Formal Controller Synthesis for ContinuousSpace MDPs via ModelFree Reinforcement Learning
A novel reinforcement learning scheme to synthesize policies for continu...
read it

Robust FiniteState Controllers for Uncertain POMDPs
Uncertain partially observable Markov decision processes (uPOMDPs) allow...
read it

Robust Combination of Local Controllers
Planning problems are hard, motion planning, for example, isPSPACEhard....
read it

Thompson Sampling for Learning Parameterized Markov Decision Processes
We consider reinforcement learning in parameterized Markov Decision Proc...
read it

Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints
We study the problem of synthesizing a policy that maximizes the entropy...
read it

LearningBased MeanPayoff Optimization in an Unknown MDP under OmegaRegular Constraints
We formalize the problem of maximizing the meanpayoff value with high p...
read it

Fixed Points of the SetBased Bellman Operator
Motivated by uncertain parameters encountered in Markov decision process...
read it
ScenarioBased Verification of Uncertain MDPs
We consider Markov decision processes (MDPs) in which the transition probabilities and rewards belong to an uncertainty set parametrized by a collection of random variables. The probability distributions for these random parameters are unknown. The problem is to compute the probability to satisfy a temporal logic specification within any MDP that corresponds to a sample from these unknown distributions. In general, this problem is undecidable, and we resort to techniques from socalled scenario optimization. Based on a finite number of samples of the uncertain parameters, each of which induces an MDP, the proposed method estimates the probability of satisfying the specification by solving a finitedimensional convex optimization problem. The number of samples required to obtain a high confidence on this estimate is independent from the number of states and the number of random parameters. Experiments on a large set of benchmarks show that a few thousand samples suffice to obtain highquality confidence bounds with a high probability.
READ FULL TEXT
Comments
There are no comments yet.