Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?

06/05/2021
by   S. V. Sai Santosh, et al.
0

Multi-armed Bandit (MAB) algorithms identify the best arm among multiple arms via exploration-exploitation trade-off without prior knowledge of arm statistics. Their usefulness in wireless radio, IoT, and robotics demand deployment on edge devices, and hence, a mapping on system-on-chip (SoC) is desired. Theoretically, the Bayesian approach-based Thompson Sampling (TS) algorithm offers better performance than the frequentist approach-based Upper Confidence Bound (UCB) algorithm. However, TS is not synthesizable due to Beta function. We address this problem by approximating it via a pseudo-random number generator-based approach and efficiently realize the TS algorithm on Zynq SoC. In practice, the type of arms distribution (e.g., Bernoulli, Gaussian, etc.) is unknown and hence, a single algorithm may not be optimal. We propose a reconfigurable and intelligent MAB (RI-MAB) framework. Here, intelligence enables the identification of appropriate MAB algorithms for a given environment, and reconfigurability allows on-the-fly switching between algorithms on the SoC. This eliminates the need for parallel implementation of algorithms resulting in huge savings in resources and power consumption. We analyze the functional correctness, area, power, and execution time of the proposed and existing architectures for various arm distributions, word-length, and hardware-software co-design approaches. We demonstrate the superiority of the RI-MAB over TS and UCB only architectures.

READ FULL TEXT

page 1

page 6

page 9

page 10

research
02/18/2020

Intelligent and Reconfigurable Architecture for KL Divergence Based Online Machine Learning Algorithm

Online machine learning (OML) algorithms do not need any training phase ...
research
10/13/2020

Multi-Armed Bandits with Dependent Arms

We study a variant of the classical multi-armed bandit problem (MABP) wh...
research
03/27/2013

Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization

We address the problem of finding the maximizer of a nonlinear smooth fu...
research
03/13/2023

Differential Good Arm Identification

This paper targets a variant of the stochastic multi-armed bandit proble...
research
05/08/2022

Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

When using multi-armed bandit algorithms, the potential impact of missin...
research
09/06/2022

Hardware Software Co-design of Statistical and Deep Learning Frameworks for Wideband Sensing on Zynq System on Chip

With the introduction of spectrum sharing and heterogeneous services in ...
research
10/31/2020

Resource Allocation in Multi-armed Bandit Exploration: Overcoming Nonlinear Scaling with Adaptive Parallelism

We study exploration in stochastic multi-armed bandits when we have acce...

Please sign up or login with your details

Forgot password? Click here to reset