Deep Reinforcement Learning for IRS Phase Shift Design in Spatiotemporally Correlated Environments
The paper studies the problem of designing the Intelligent Reflecting Surface (IRS) phase shifters for Multiple Input Single Output (MISO) communication systems in spatiotemporally correlated channel environments, where the destination can move within a confined area. The objective is to maximize the expected sum of SNRs at the receiver over infinite time horizons. The problem formulation gives rise to a Markov Decision Process (MDP). We propose a deep actor-critic algorithm that accounts for channel correlations and destination motion by constructing the state representation to include the current position of the receiver and the phase shift values and receiver positions that correspond to a window of previous time steps. The channel variability induces high frequency components on the spectrum of the underlying value function. We propose the preprocessing of the critic's input with a Fourier kernel which enables stable value learning. Finally, we investigate the use of the destination SNR as a component of the designed MDP state, which is common practice in previous work. We provide empirical evidence that, when the channels are spatiotemporally correlated, the inclusion of the SNR in the state representation interacts with function approximation in ways that inhibit convergence.
READ FULL TEXT