Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

05/31/2022
by   David Mguni, et al.
7

Many real-world settings involve costs for performing actions; transaction costs in financial systems and fuel costs being common examples. In these settings, performing actions at each time step quickly accumulates costs leading to vastly suboptimal outcomes. Additionally, repeatedly acting produces wear and tear and ultimately, damage. Determining when to act is crucial for achieving successful outcomes and yet, the challenge of efficiently learning to behave optimally when actions incur minimally bounded costs remains unresolved. In this paper, we introduce a reinforcement learning (RL) framework named Learnable Impulse Control Reinforcement Algorithm (LICRA), for learning to optimally select both when to act and which actions to take when actions incur costs. At the core of LICRA is a nested structure that combines RL and a form of policy known as impulse control which learns to maximise objectives when actions incur costs. We prove that LICRA, which seamlessly adopts any RL method, converges to policies that optimally select when to perform actions and their optimal magnitudes. We then augment LICRA to handle problems in which the agent can perform at most k<∞ actions and more generally, faces a budget constraint. We show LICRA learns the optimal value function and ensures budget constraints are satisfied almost surely. We demonstrate empirically LICRA's superior performance against benchmark RL methods in OpenAI gym's Lunar Lander and in Highway environments and a variant of the Merton portfolio problem within finance.

READ FULL TEXT

page 8

page 9

research
06/02/2023

Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations

In real-world reinforcement learning (RL) systems, various forms of impa...
research
11/11/2019

Driving Reinforcement Learning with Models

Over the years, Reinforcement Learning (RL) established itself as a conv...
research
10/06/2021

Nested Policy Reinforcement Learning

Off-policy reinforcement learning (RL) has proven to be a powerful frame...
research
03/16/2022

Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act

Traditionally, Reinforcement Learning (RL) aims at deciding how to act o...
research
07/16/2022

BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion

We utilize an offline reinforcement learning (RL) model for sequential t...
research
11/22/2020

Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems

Typical reinforcement learning (RL) methods show limited applicability f...
research
06/20/2020

From Predictions to Decisions: Using Lookahead Regularization

Machine learning is a powerful tool for predicting human-related outcome...

Please sign up or login with your details

Forgot password? Click here to reset