Constrained Thompson Sampling for Wireless Link Optimization
Wireless communication systems operate in complex time-varying environments. Therefore, selecting the optimal configuration parameters in these systems is a challenging problem. For wireless links, rate selection is used to select the optimal data transmission rate that maximizes the link throughput subject to an application-defined latency constraint. We model rate selection as a stochastic multi-armed bandit (MAB) problem, where a finite set of transmission rates are modeled as independent bandit arms. For this setup, we propose Con-TS, a novel constrained version of the Thompson sampling algorithm, where the latency requirement is modeled by a linear constraint on arm selection probabilities. Since our algorithm learns a Bayesian model of the wireless link, it can be adapted to exploit prior knowledge often available in practical wireless networks. Through numerical results from simulated experiments, we demonstrate that Con-TS significantly outperforms state-of-the-art bandit algorithms proposed in the literature. Further, we compare Con-TS with the outer loop link adaptation (OLLA) scheme, which is the state-of-the-art in practical wireless networks and relies on carefully tuned offline link models. We show that Con-TS outperforms OLLA in simulations, further, it can elegantly incorporate information from the offline link models to substantially improve performance.
READ FULL TEXT