The QLBS Q-Learner Goes NuQLear: Fitted Q Iteration, Inverse RL, and Option Portfolios

by   Igor Halperin, et al.

The QLBS model is a discrete-time option hedging and pricing model that is based on Dynamic Programming (DP) and Reinforcement Learning (RL). It combines the famous Q-Learning method for RL with the Black-Scholes (-Merton) model's idea of reducing the problem of option pricing and hedging to the problem of optimal rebalancing of a dynamic replicating portfolio for the option, which is made of a stock and cash. Here we expand on several NuQLear (Numerical Q-Learning) topics with the QLBS model. First, we investigate the performance of Fitted Q Iteration for a RL (data-driven) solution to the model, and benchmark it versus a DP (model-based) solution, as well as versus the BSM model. Second, we develop an Inverse Reinforcement Learning (IRL) setting for the model, where we only observe prices and actions (re-hedges) taken by a trader, but not rewards. Third, we outline how the QLBS model can be used for pricing portfolios of options, rather than a single option in isolation, thus providing its own, data-driven and model independent solution to the (in)famous volatility smile problem of the Black-Scholes model.


page 1

page 2

page 3

page 4


QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds

This paper presents a discrete-time option pricing model that is rooted ...

About subordinated generalizations of 3 classical models of option pricing

In this paper, we investigate the relation between Bachelier and Black-S...

Deep Reinforcement Learning for Optimal Stopping with Application in Financial Engineering

Optimal stopping is the problem of deciding the right time at which to t...

RLOP: RL Methods in Option Pricing from a Mathematical Perspective

Abstract In this work, we build two environments, namely the modified QL...

Machine learning for option pricing: an empirical investigation of network architectures

We consider the supervised learning problem of learning the price of an ...

Reinforcement learning for options on target volatility funds

In this work we deal with the funding costs rising from hedging the risk...

Quantile LASSO with changepoints in panel data models applied to option pricing

Panel data are modern statistical tools which are commonly used in all k...

Please sign up or login with your details

Forgot password? Click here to reset