Hedging using reinforcement learning: Contextual k-Armed Bandit versus Q-learning

by   Loris Cannelli, et al.

The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton, is not only unrealistic but it is also undesirable due to high transaction costs. Over the last decades stochastic optimal-control methods have been developed to balance between effective replication and losses. More recently, with the rise of artificial intelligence, temporal-difference Reinforcement Learning, in particular variations of Q-learning in conjunction with Deep Neural Networks, have attracted significant interest. From a practical point of view, however, such methods are often relatively sample inefficient, hard to train and lack performance guarantees. This motivates the investigation of a stable benchmark algorithm for hedging. In this article, the hedging problem is viewed as an instance of a risk-averse contextual k-armed bandit problem, for which a large body of theoretical results and well-studied algorithms are available. We find that the k-armed bandit model naturally fits to the P&L formulation of hedging, providing for a more accurate and sample efficient approach than Q-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.


page 1

page 2

page 3

page 4


A Confirmation of a Conjecture on the Feldman's Two-armed Bandit Problem

Myopic strategy is one of the most important strategies when studying ba...

Multi-armed Bandit Algorithm against Strategic Replication

We consider a multi-armed bandit problem in which a set of arms is regis...

Exponential two-armed bandit problem

We consider exponential two-armed bandit problem in which incomes are de...

Variational inference for the multi-armed contextual bandit

In many biomedical, science, and engineering problems, one must sequenti...

Graph Signal Sampling via Reinforcement Learning

We formulate the problem of sampling and recovering clustered graph sign...

Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function

As reinforcement learning algorithms are being applied to increasingly c...

Gaussian One-Armed Bandit and Optimization of Batch Data Processing

We consider the minimax setup for Gaussian one-armed bandit problem, i.e...

Please sign up or login with your details

Forgot password? Click here to reset