Off-Policy Optimization of Portfolio Allocation Policies under Constraints

12/21/2020
by   Nymisha Bandi, et al.
10

The dynamic portfolio optimization problem in finance frequently requires learning policies that adhere to various constraints, driven by investor preferences and risk. We motivate this problem of finding an allocation policy within a sequential decision making framework and study the effects of: (a) using data collected under previously employed policies, which may be sub-optimal and constraint-violating, and (b) imposing desired constraints while computing near-optimal policies with this data. Our framework relies on solving a minimax objective, where one player evaluates policies via off-policy estimators, and the opponent uses an online learning strategy to control constraint violations. We extensively investigate various choices for off-policy estimation and their corresponding optimization sub-routines, and quantify their impact on computing constraint-aware allocation policies. Our study shows promising results for constructing such policies when back-tested on historical equities data, under various regimes of operation, dimensionality and constraints.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/23/2019

Off-policy Learning for Multiple Loggers

It is well known that the historical logs are used for evaluating and le...
research
06/15/2020

Piecewise-Stationary Off-Policy Optimization

Off-policy learning is a framework for evaluating and optimizing policie...
research
03/10/2021

On Modeling Human Perceptions of Allocation Policies with Uncertain Outcomes

Many policies allocate harms or benefits that are uncertain in nature: t...
research
04/17/2019

Information and Memory in Dynamic Resource Allocation

We propose a general framework, dubbed Stochastic Processing under Imper...
research
04/03/2021

Optimal multiple testing and design in clinical trials

A central goal in designing clinical trials is to find the test that max...
research
09/16/2022

Sales Channel Optimization via Simulations Based on Observational Data with Delayed Rewards: A Case Study at LinkedIn

Training models on data obtained from randomized experiments is ideal fo...
research
02/04/2023

Getting to "rate-optimal” in ranking selection

In their 2004 seminal paper, Glynn and Juneja formally and precisely est...

Please sign up or login with your details

Forgot password? Click here to reset