Reinforcement Learning-based Joint User Scheduling and Link Configuration in Millimeter-wave Networks
In this paper, we develop algorithms for joint user scheduling and three types of mmWave link configuration: relay selection, codebook optimization, and beam tracking in millimeter wave (mmWave) networks. Our goal is to design an online controller that dynamically schedules users and configures their links to minimize the system delay. To solve this complex scheduling problem, we model it as a dynamic decision-making process and develop two reinforcement learning-based solutions. The first solution is based on deep reinforcement learning (DRL), which leverages the proximal policy optimization to train a neural network-based solution. Due to the potential high sample complexity of DRL, we also propose an empirical multi-armed bandit (MAB)-based solution, which decomposes the decision-making process into a sequential of sub-actions and exploits classic maxweight scheduling and Thompson sampling to decide those sub-actions. Our evaluation of the proposed solutions confirms their effectiveness in providing acceptable system delay. It also shows that the DRL-based solution has better delay performance while the MAB-based solution has a faster training process.
READ FULL TEXT