Hedging using reinforcement learning: Contextual k-Armed Bandit versus Q-learning

07/03/2020
by   Loris Cannelli, et al.
0

The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton, is not only unrealistic but it is also undesirable due to high transaction costs. Over the last decades stochastic optimal-control methods have been developed to balance between effective replication and losses. More recently, with the rise of artificial intelligence, temporal-difference Reinforcement Learning, in particular variations of Q-learning in conjunction with Deep Neural Networks, have attracted significant interest. From a practical point of view, however, such methods are often relatively sample inefficient, hard to train and lack performance guarantees. This motivates the investigation of a stable benchmark algorithm for hedging. In this article, the hedging problem is viewed as an instance of a risk-averse contextual k-armed bandit problem, for which a large body of theoretical results and well-studied algorithms are available. We find that the k-armed bandit model naturally fits to the P&L formulation of hedging, providing for a more accurate and sample efficient approach than Q-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2022

A Confirmation of a Conjecture on the Feldman's Two-armed Bandit Problem

Myopic strategy is one of the most important strategies when studying ba...
research
10/23/2021

Multi-armed Bandit Algorithm against Strategic Replication

We consider a multi-armed bandit problem in which a set of arms is regis...
research
08/15/2019

Exponential two-armed bandit problem

We consider exponential two-armed bandit problem in which incomes are de...
research
09/10/2017

Variational inference for the multi-armed contextual bandit

In many biomedical, science, and engineering problems, one must sequenti...
research
05/15/2018

Graph Signal Sampling via Reinforcement Learning

We formulate the problem of sampling and recovering clustered graph sign...
research
12/14/2018

Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function

As reinforcement learning algorithms are being applied to increasingly c...
research
01/25/2019

Gaussian One-Armed Bandit and Optimization of Batch Data Processing

We consider the minimax setup for Gaussian one-armed bandit problem, i.e...

Please sign up or login with your details

Forgot password? Click here to reset