Intelligent and Reconfigurable Architecture for KL Divergence Based Online Machine Learning Algorithm

02/18/2020
by   S. V. Sai Santosh, et al.
0

Online machine learning (OML) algorithms do not need any training phase and can be deployed directly in an unknown environment. OML includes multi-armed bandit (MAB) algorithms that can identify the best arm among several arms by achieving a balance between exploration of all arms and exploitation of optimal arm. The Kullback-Leibler divergence based upper confidence bound (KLUCB) is the state-of-the-art MAB algorithm that optimizes exploration-exploitation trade-off but it is complex due to underlining optimization routine. This limits its usefulness for robotics and radio applications which demand integration of KLUCB with the PHY on the system on chip (SoC). In this paper, we efficiently map the KLUCB algorithm on SoC by realizing optimization routine via alternative synthesizable computation without compromising on the performance. The proposed architecture is dynamically reconfigurable such that the number of arms, as well as type of algorithm, can be changed on-the-fly. Specifically, after initial learning, on-the-fly switch to light-weight UCB offers around 10-factor improvement in latency and throughput. Since learning duration depends on the unknown arm statistics, we offer intelligence embedded in architecture to decide the switching instant. We validate the functional correctness and usefulness of the proposed architecture via a realistic wireless application and detailed complexity analysis demonstrates its feasibility in realizing intelligent radios.

READ FULL TEXT

page 3

page 4

page 6

page 24

page 25

page 27

page 29

page 30

research
06/05/2021

Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?

Multi-armed Bandit (MAB) algorithms identify the best arm among multiple...
research
12/02/2020

Towards Intelligent Reconfigurable Wireless Physical Layer (PHY)

Next-generation wireless networks are getting significant attention beca...
research
12/27/2013

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

The paper proposes a novel upper confidence bound (UCB) procedure for id...
research
05/20/2022

Actively Tracking the Optimal Arm in Non-Stationary Environments with Mandatory Probing

We study a novel multi-armed bandit (MAB) setting which mandates the age...
research
07/14/2020

Generic Outlier Detection in Multi-Armed Bandit

In this paper, we study the problem of outlier arm detection in multi-ar...
research
01/12/2016

Infomax strategies for an optimal balance between exploration and exploitation

Proper balance between exploitation and exploration is what makes good d...
research
08/17/2020

Harnessing The Multi-Stability Of Kresling Origami For Reconfigurable Articulation In Soft Robotic Arms

This study examines a biology-inspired approach of using reconfigurable ...

Please sign up or login with your details

Forgot password? Click here to reset