Online Learning with Costly Features in Non-stationary Environments

07/18/2023
by   Saeed Ghoorchian, et al.
0

Maximizing long-term rewards is the primary goal in sequential decision-making problems. The majority of existing methods assume that side information is freely available, enabling the learning agent to observe all features' states before making a decision. In real-world problems, however, collecting beneficial information is often costly. That implies that, besides individual arms' reward, learning the observations of the features' states is essential to improve the decision-making strategy. The problem is aggravated in a non-stationary environment where reward and cost distributions undergo abrupt changes over time. To address the aforementioned dual learning problem, we extend the contextual bandit setting and allow the agent to observe subsets of features' states. The objective is to maximize the long-term average gain, which is the difference between the accumulated rewards and the paid costs on average. Therefore, the agent faces a trade-off between minimizing the cost of information acquisition and possibly improving the decision-making process using the obtained information. To this end, we develop an algorithm that guarantees a sublinear regret in time. Numerical results demonstrate the superiority of our proposed policy in a real-world scenario.

READ FULL TEXT
research
07/18/2023

Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related Rewards

Sequential decision-making under uncertainty is often associated with lo...
research
02/11/2016

Data-Driven Online Decision Making with Costly Information Acquisition

In most real-world settings such as recommender systems, finance, and he...
research
12/25/2022

Linear Combinatorial Semi-Bandit with Causally Related Rewards

In a sequential decision-making problem, having a structural dependency ...
research
04/06/2017

Geometry of Policy Improvement

We investigate the geometry of optimal memoryless time independent decis...
research
09/13/2021

Pre-emptive learning-to-defer for sequential medical decision-making under uncertainty

We propose SLTD (`Sequential Learning-to-Defer') a framework for learnin...
research
07/10/2023

Online Ad Procurement in Non-stationary Autobidding Worlds

Today's online advertisers procure digital ad impressions through intera...
research
01/29/2022

Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal Point Processes

We consider a sequential decision making problem where the agent faces t...

Please sign up or login with your details

Forgot password? Click here to reset