QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds

by   Igor Halperin, et al.

This paper presents a discrete-time option pricing model that is rooted in Reinforcement Learning (RL), and more specifically in the famous Q-Learning method of RL. We construct a risk-adjusted Markov Decision Process for a discrete-time version of the classical Black-Scholes-Merton (BSM) model, where the option price is an optimal Q-function. Pricing is done by learning to dynamically optimize risk-adjusted returns for an option replicating portfolio, as in the Markowitz portfolio theory. Using Q-Learning and related methods, once created in a parametric setting, the model is able to go model-free and learn to price and hedge an option directly from data generated from a dynamic replicating portfolio which is rebalanced at discrete times. If the world is according to BSM, our risk-averse Q-Learner converges, given enough training data, to the true BSM price and hedge ratio of the option in the continuous time limit, even if hedges applied at the stage of data generation are completely random (i.e. it can learn the BSM model itself, too!), because Q-Learning is an off-policy algorithm. If the world is different from a BSM world, the Q-Learner will find it out as well, because Q-Learning is a model-free algorithm. For finite time steps, the Q-Learner is able to efficiently calculate both the optimal hedge and optimal price for the option directly from trading data, and without an explicit model of the world. This suggests that RL may provide efficient data-driven and model-free methods for optimal pricing and hedging of options, once we depart from the academic continuous-time limit, and vice versa, option pricing methods developed in Mathematical Finance may be viewed as special cases of model-based Reinforcement Learning. Our model only needs basic linear algebra (plus Monte Carlo simulation, if we work with synthetic data).


page 1

page 2

page 3

page 4


The QLBS Q-Learner Goes NuQLear: Fitted Q Iteration, Inverse RL, and Option Portfolios

The QLBS model is a discrete-time option hedging and pricing model that ...

Call and Put Option Pricing with Discrete Linear Investment Strategy

We study the Option pricing with linear investment strategy based on dis...

Can we imitate stock price behavior to reinforcement learn option price?

This paper presents a framework of imitating the price behavior of the u...

RLOP: RL Methods in Option Pricing from a Mathematical Perspective

Abstract In this work, we build two environments, namely the modified QL...

About subordinated generalizations of 3 classical models of option pricing

In this paper, we investigate the relation between Bachelier and Black-S...

Quantile LASSO with changepoints in panel data models applied to option pricing

Panel data are modern statistical tools which are commonly used in all k...

Economics of NFTs: The Value of Creator Royalties

Non-Fungible Tokens (NFTs) promise to revolutionize how content creators...

Please sign up or login with your details

Forgot password? Click here to reset