Collaborative Multi-Agent Multi-Armed Bandit Learning for Small-Cell Caching

01/12/2020
by   Xianzhe Xu, et al.
0

This paper investigates learning-based caching in small-cell networks (SCNs) when user preference is unknown. The goal is to optimize the cache placement in each small base station (SBS) for minimizing the system long-term transmission delay. We model this sequential multi-agent decision making problem in a multi-agent multi-armed bandit (MAMAB) perspective. Rather than estimating user preference first and then optimizing the cache strategy, we propose several MAMAB-based algorithms to directly learn the cache strategy online in both stationary and non-stationary environment. In the stationary environment, we first propose two high-complexity agent-based collaborative MAMAB algorithms with performance guarantee. Then we propose a low-complexity distributed MAMAB which ignores the SBS coordination. To achieve a better balance between SBS coordination gain and computational complexity, we develop an edge-based collaborative MAMAB with the coordination graph edge-based reward assignment method. In the non-stationary environment, we modify the MAMAB-based algorithms proposed in the stationary environment by proposing a practical initialization method and designing new perturbed terms to adapt to the dynamic environment. Simulation results are provided to validate the effectiveness of our proposed algorithms. The effects of different parameters on caching performance are also discussed.

READ FULL TEXT

page 3

page 5

page 6

page 7

page 8

page 9

page 14

page 15

research
02/28/2021

Cache Placement Optimization in Mobile Edge Computing Networks with Unaware Environment – An Extended Multi-armed Bandit Approach

Caching high-frequency reuse contents at the edge servers in the mobile ...
research
06/09/2023

Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit

We study a structured multi-agent multi-armed bandit (MAMAB) problem in ...
research
09/23/2020

EXP4-DFDC: A Non-Stochastic Multi-Armed Bandit for Cache Replacement

In this work we study a variant of the well-known multi-armed bandit (MA...
research
06/30/2018

Multi-agent Learning for Cooperative Large-scale Caching Networks

Caching networks are designed to reduce traffic load at backhaul links, ...
research
07/02/2018

Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings

Setting up the future Internet of Things (IoT) networks will require to ...
research
08/20/2019

How to gamble with non-stationary X-armed bandits and have no regrets

In X-armed bandit problem an agent sequentially interacts with environme...
research
02/01/2022

Learning to Speak on Behalf of a Group: Medium Access Control for Sending a Shared Message

The rapid development of Industrial Internet of Things (IIoT) technologi...

Please sign up or login with your details

Forgot password? Click here to reset