Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach

06/17/2020
by   Arun Verma, et al.
0

In this paper, we study a novel Stochastic Network Utility Maximization (NUM) problem where the utilities of agents are unknown. The utility of each agent depends on the amount of resource it receives from a network operator/controller. The operator desires to do a resource allocation that maximizes the expected total utility of the network. We consider threshold type utility functions where each agent gets non-zero utility if the amount of resource it receives is higher than a certain threshold. Otherwise, its utility is zero (hard real-time). We pose this NUM setup with unknown utilities as a regret minimization problem. Our goal is to identify a policy that performs as `good' as an oracle policy that knows the utilities of agents. We model this problem setting as a bandit setting where feedback obtained in each round depends on the resource allocated to the agents. We propose algorithms for this novel setting using ideas from Multiple-Play Multi-Armed Bandits and Combinatorial Semi-Bandits. We show that the proposed algorithm is optimal when all agents have the same utility. We validate the performance guarantees of our proposed algorithms through numerical experiments.

READ FULL TEXT
research
04/12/2021

Censored Semi-Bandits for Resource Allocation

We consider the problem of sequentially allocating resources in a censor...
research
12/16/2020

Learning-NUM: Network Utility Maximization with Unknown Utility Functions and Queueing Delay

Network Utility Maximization (NUM) studies the problems of allocating tr...
research
09/04/2019

Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback

In this paper, we study Censored Semi-Bandits, a novel variant of the se...
research
05/10/2021

Combinatorial Multi-armed Bandits for Resource Allocation

We study the sequential resource allocation problem where a decision mak...
research
07/12/2023

On Collaboration in Distributed Parameter Estimation with Resource Constraints

We study sensor/agent data collection and collaboration policies for par...
research
03/13/2020

Learning and Fairness in Energy Harvesting: A Maximin Multi-Armed Bandits Approach

Recent advances in wireless radio frequency (RF) energy harvesting allow...
research
05/11/2021

Resource Allocation for Smooth Streaming: Non-convexity and Bandits

User dissatisfaction due to buffering pauses during streaming is a signi...

Please sign up or login with your details

Forgot password? Click here to reset