Minimizing the Outage Probability in a Markov Decision Process

02/28/2023
by   Vincent Corlay, et al.
0

Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is greater than a given value. The algorithm can be seen as an extension of the value iteration algorithm. We also show how the proposed algorithm could be generalized to use neural networks, similarly to the deep Q learning extension of Q learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

Markov Decision Process with an External Temporal Process

Most reinforcement learning algorithms treat the context under which the...
research
11/17/2022

AlphaSnake: Policy Iteration on a Nondeterministic NP-hard Markov Decision Process

Reinforcement learning has recently been used to approach well-known NP-...
research
05/26/2022

Dynamic Network Reconfiguration for Entropy Maximization using Deep Reinforcement Learning

A key problem in network theory is how to reconfigure a graph in order t...
research
03/21/2023

An MDP approach for radio resource allocation in urban Future Railway Mobile Communication System (FRMCS) scenarios

In the context of railway systems, the application performance can be ve...
research
09/30/2022

Application of Deep Q Learning with Stimulation Results for Elevator Optimization

This paper presents a methodology for combining programming and mathemat...
research
07/07/2022

Multi-objective Optimization of Notifications Using Offline Reinforcement Learning

Mobile notification systems play a major role in a variety of applicatio...
research
09/30/2021

Learning the Markov Decision Process in the Sparse Gaussian Elimination

We propose a learning-based approach for the sparse Gaussian Elimination...

Please sign up or login with your details

Forgot password? Click here to reset