A Reinforcement Learning Formulation of the Lyapunov Optimization: Application to Edge Computing Systems with Queue Stability

12/14/2020
by   Sohee Bae, et al.
0

In this paper, a deep reinforcement learning (DRL)-based approach to the Lyapunov optimization is considered to minimize the time-average penalty while maintaining queue stability. A proper construction of state and action spaces is provided to form a proper Markov decision process (MDP) for the Lyapunov optimization. A condition for the reward function of reinforcement learning (RL) for queue stability is derived. Based on the analysis and practical RL with reward discounting, a class of reward functions is proposed for the DRL-based approach to the Lyapunov optimization. The proposed DRL-based approach to the Lyapunov optimization does not required complicated optimization at each time step and operates with general non-convex and discontinuous penalty functions. Hence, it provides an alternative to the conventional drift-plus-penalty (DPP) algorithm for the Lyapunov optimization. The proposed DRL-based approach is applied to resource allocation in edge computing systems with queue stability and numerical results demonstrate its successful operation.

READ FULL TEXT

page 1

page 2

page 9

research
10/11/2020

Deep-Reinforcement-Learning-Based Scheduling with Contiguous Resource Allocation for Next-Generation Cellular Systems

Scheduling plays a pivotal role in multi-user wireless communications, s...
research
03/29/2023

Physical Deep Reinforcement Learning Towards Safety Guarantee

Deep reinforcement learning (DRL) has achieved tremendous success in man...
research
11/13/2022

Social Welfare Maximization for Collaborative Edge Computing: A Deep Reinforcement Learning-Based Approach

Collaborative Edge Computing (CEC) is an effective method that improves ...
research
01/15/2021

Reinforcement learning based recommender systems: A survey

Recommender systems (RSs) are becoming an inseparable part of our everyd...
research
09/06/2023

Reinforcement Learning Based Gasoline Blending Optimization: Achieving More Efficient Nonlinear Online Blending of Fuels

The online optimization of gasoline blending benefits refinery economies...
research
07/17/2020

Hierarchical Deep Reinforcement Learning Approach for Multi-Objective Scheduling With Varying Queue Sizes

Multi-objective task scheduling (MOTS) is the task scheduling while opti...
research
09/06/2020

An SMDP-Based Approach to Thermal-Aware Task Scheduling in NoC-based MPSoC platforms

One efficient approach to control chip-wide thermal distribution in mult...

Please sign up or login with your details

Forgot password? Click here to reset