EXP4-DFDC: A Non-Stochastic Multi-Armed Bandit for Cache Replacement

09/23/2020
by   Farzana Beente Yusuf, et al.
0

In this work we study a variant of the well-known multi-armed bandit (MAB) problem, which has the properties of a delay in feedback, and a loss that declines over time. We introduce an algorithm, EXP4-DFDC, to solve this MAB variant, and demonstrate that the regret vanishes as the time increases. We also show that LeCaR, a previously published machine learning-based cache replacement algorithm, is an instance of EXP4-DFDC. Our results can be used to provide insight on the choice of hyperparameters, and optimize future LeCaR instances.

READ FULL TEXT

page 1

page 2

page 3

research
09/20/2020

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

EXP-based algorithms are often used for exploration in multi-armed bandi...
research
01/30/2020

HAMLET – A Learning Curve-Enabled Multi-Armed Bandit for Algorithm Selection

Automated algorithm selection and hyperparameter tuning facilitates the ...
research
09/13/2021

Machine Learning for Online Algorithm Selection under Censored Feedback

In online algorithm selection (OAS), instances of an algorithmic problem...
research
12/14/2019

Adapting Behaviour for Learning Progress

Determining what experience to generate to best facilitate learning (i.e...
research
05/19/2022

Multi-Armed Bandits in Brain-Computer Interfaces

The multi-armed bandit (MAB) problem models a decision-maker that optimi...
research
01/12/2020

Collaborative Multi-Agent Multi-Armed Bandit Learning for Small-Cell Caching

This paper investigates learning-based caching in small-cell networks (S...
research
02/14/2012

Graphical Models for Bandit Problems

We introduce a rich class of graphical models for multi-armed bandit pro...

Please sign up or login with your details

Forgot password? Click here to reset