Trading algorithms with learning in latent alpha models

by   Philippe Casgrain, et al.

Alpha signals for statistical arbitrage strategies are often driven by latent factors. This paper analyses how to optimally trade with latent factors that cause prices to jump and diffuse. Moreover, we account for the effect of the trader's actions on quoted prices and the prices they receive from trading. Under fairly general assumptions, we demonstrate how the trader can learn the posterior distribution over the latent states, and explicitly solve the latent optimal trading problem. We provide a verification theorem, and a methodology for calibrating the model by deriving a variation of the expectation-maximization algorithm. To illustrate the efficacy of the optimal strategy, we demonstrate its performance through simulations and compare it to strategies which ignore learning in the latent factors. We also provide calibration results for a particular model using Intel Corporation stock as an example.



There are no comments yet.



Generating Trading Signals by ML algorithms or time series ones?

This research investigates efficiency on-line learning Algorithms to gen...

Trading the Twitter Sentiment with Reinforcement Learning

This paper is to explore the possibility to use alternative data and art...

Algorithmic Bidding for Virtual Trading in Electricity Markets

We consider the problem of optimal bidding for virtual trading in two-se...

A Comparative Evaluation of Predominant Deep Learning Quantified Stock Trading Strategies

This study first reconstructs three deep learning powered stock trading ...

A dynamic network model with persistent links and node-specific latent variables, with an application to the interbank market

We propose a dynamic network model where two mechanisms control the prob...

Active and Passive Portfolio Management with Latent Factors

We address a portfolio selection problem that combines active (outperfor...

Deep Learning Statistical Arbitrage

Statistical arbitrage identifies and exploits temporal price differences...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The phrase “All models are wrong, but some are useful” (Box, 1978) rings true across all areas in finance, and intraday trading is no exception. If an investor wishes to efficiently trade assets, she must use a strategy that can anticipate the asset’s price trajectory while simultaneously being mindful of the flaws in her model, as well as the costs borne from transaction fees and her own impact on prices. With all of the complexities in intraday markets, it is no surprise that strategies differ substantially based on what assumptions are made about asset price dynamics. Trading with an incorrect model can be very costly to an investor, and therefore being able to mitigate model risk is valuable.

The availability of information at very high frequencies can help a trader partially overcome the problem of model selection. The information provided from realized trajectories of the asset price and the incoming flow of orders of other traders, allows her to infer which model best fits the observed data, and in turn she may use it to predict future movements in asset prices. Ideally, the trader should be able to incorporate this information in an on-line manner. In other words, the trader should be continuously updating her model as she observes new information, keeping in mind that the market may switch between a number of regimes over the course of the trading period. Furthermore, the trader would like to have some means of incorporating a-priori knowledge about markets into her trading strategy before beginning to trade.

This paper studies the optimal trading strategy for a single asset when there are latent alpha components to the asset price dynamics, and where the trader uses price information to learn about the latent factor. Prices can diffuse as well as jump. The trader’s goal is to optimally trade subject to this model uncertainty, and end the trading horizon with zero inventory. By treating the trader’s problem as a continuous time control problem where information is partially obscured, we succeed in obtaining a closed form strategy, up to the computation of an expectation that is specific to the trader’s prior assumptions on the model dynamics. The optimal trading strategy we find can be computed with ease for a wide variety of models, and we demonstrate its performance by comparing, in simulation, with approaches that that do not make use of learning.

Early works on partial information include Detemple (1986), Detemple (1991), who study optimal technology investment problems (where the states that drive production are obfuscated by gaussian noise); Gennotte (1986), who studies the optimal portfolio allocation problem when returns are hidden but satisfy an Ornstein-Uhlenbeck process; Dothan and Feldman (1986), who analyzes a production and exchange economy with a single unobservable source of nondiversifiable risk; Karatzas and Xue (1991), who studies utility maximization under partial observations; Bäuerle and Rieder (2005), Bäuerle and Rieder (2007) and Frey et al. (2012), who study model uncertainty in the context of portfolio optimization and the optimal allocation of assets; and Papanicolaou (2018), who studies an optimal portfolio allocation problem where the drift of the assets are latent Ito diffusions.

There are a few recent papers on partial information that are related to this study. Ekstrom and Vaicenavicius (2016) investigates the optimal timing problem associated with liquidating a single unit of an asset when the asset price is a geometric Brownian motion with random (unobserved) drift. Colaneri et al. (2016) studies the optimal liquidation problem when the asset midprice is driven by a Poisson random measure with unknown mean-measure. Gârleanu and Pedersen (2013) study the optimal trading strategy for maximizing the discounted, and penalized, future expected excess returns in a discrete-time, infinite-time horizon problem. In their model, prices contain an unpredictable martingale component, and an independent stationary (visible) predictable component – the alpha component. Guéant and Pu (2016)

study models in which the drift of the asset price process is a latent random variable in an optimal portfolio selection setting, for an investor who seeks to maximize a CARA and CRRA objective function, as well as in the cases of optimal liquidation in an Almgren-Chriss like setting.

The approach we take differs in several ways from the extant literature, but the two key generalizations are: (i) we account for quite general latent factors which drive the drift and jump components in the asset’s midprice; and (ii) we include both temporary and permanent impact that the agent’s trading has on the market.

The structure of the remainder of this paper is as follows. Section 2 outlines our modelling assumptions, as well as providing the optimization problem with partial information that the trader wishes to solve. Section 3 provides the filter which the trader uses to make proper inference on the underlying model driving the data she is observing. Section 4 shows that the original optimization problem presented in Section 2 can be simplified to an optimization problem with complete information using the filter presented in Section 3. Section 5 shows how to solve the reduced optimization problem from Section 4 and verifies that the resulting strategy indeed solves the original optimization problem. Lastly, Section 6 provides some numerical examples by applying the theory to a few specific models, and compares the resulting strategy, using simulations, to an alternative which does not learn from price dynamics.

2 The Model

We work on the filtered probability space

, where , and finite, is some fixed time horizon. The filtration is the natural one generated by the paths of the un-impacted asset midprice process , the counting processes for the number of buy and sell market orders which cause price changes, denoted and , and a latent process . The exact nature of these processes will be provided in more detail in the remainder of the section.

The trader’s optimization problem is to decide on a dynamic trading strategy to buy/sell an asset over the course of a trading horizon to maximize some performance criteria. We assume the trader executes orders continuously at a (controlled) rate denoted by . The trader’s inventory, given some strategy , is denoted , with the initial condition . may be zero, positive (a long position), or negative (a short position) – and, hence, the inventory at time can be written as


The above can be interpreted as the investor purchasing shares over the period . A positive (negative) value for represents the trader buying (selling) the asset. The rate at which the investor buys or sells the asset affects prices through two mechanisms. Firstly, a temporary price impact, which is effectively a transaction cost that increases with increasing trading rate. Secondly, a permanent impact, which incorporates the fact that when there are excess buy orders, prices move up, and excess sell orders, prices move down.

We further assume that other market participants also have a permanent impact on the asset midprice through their own buy and sell market orders (MOs). To model this, we let be doubly stochastic Poisson processes with respective intensity processes , which count the number of market orders that cause prices to move. In the remainder of the paper, we write .

2.1 Asset Midprice Dynamics

To model the permanent price impact of trades, we define two processes and to represent the asset midprice and the asset midprice without the trader’s impact, respectively. As shown by Cartea and Jaimungal (2016), intraday permanent price impact (over short time scales) is well approximated by a linear model. Hence, we write


where controls the strength of the trader’s impact on the asset midprice. Alternatively, one could write this as a pure jump model

where are controlled doubly stochastic Poisson processes with –intensities and , respectively. The results will be identical to that obtained using the continuous model above.

We assume the investor does not have complete knowledge of the dynamics of the asset midprice, nor the rates of arrival of market orders. This uncertainty is modeled by assuming there is a latent continuous time Markov Chain

(with and ), which modulates the dynamics of state variables, but is not observable by the trader. The latent process is assumed to have a known generator matrix333The generator matrix of a -state continuous time Markov chain has non-diagonal entries if and diagonal entries . is defined so that , where is element of the matrix exponential of . and the trader places a prior ,

, on the initial state of the latent process, all estimated e.g., by the EM algorithm (see Section 

7.1 for details).

Conditional on a path of , the unaffected midprice is assumed to satisfy the SDE

where have –intensities


and is a -Brownian Motion. Moreover, we assume that each of the are –adapted processes, where is the natural filtration generated by the paths of the processes (note that can be inferred from this filtration, and strategies are therefore also adapted to the paths of ). Furthermore, we assume is a –adapted Markov process, where . The Markov assumption will, after modifying the problem to deal with partial information, allow a dynamic programming principle (DPP) and result in a dynamic programming equation (DPE). We assume that either (i) or (ii) and , to prevent cases where the model is driven by a counting process but also has a continuous drift. In case (ii), the asset price may indeed drift, but the drift will be due to imbalance in intensities so that prices remain on a discrete grid. To compress notation, we define the process where as well as the processes where for each . Finally we make the technical assumption that


This class of intensity models contains, among many others, deterministic intensities, shot-noise processes, and cross-exciting Hawkes processes with finite-dimensional Markov representations444To achieve this, we may extend to include the state variables necessary for the model to be Markov., all modulated by the latent factor(s). We provide some explicit examples in Section 6 where we also conduct numerical experiments.

The random variable indexes the possible models for the asset’s drift and the rates at which other market participants’ market orders arrive. Because is (potentially) stochastic, it may change over time, hence, so will the underlying model. Furthermore, because is invisible to the investor, to make intelligent trading decisions, the investor must infer from observations what is the current (and future) underlying model driving asset prices.

2.2 Cash Process

The price at which the trader either buys or sells each unit of the asset will be denoted as . Because there is limited liquidity at the best bid or ask price (the touch), the investor must “walk the book” starting at the bid (ask) and buy (sell) her assets at higher (lower) prices as she increases the size of each of her market orders. For tractability, and as Frei and Westray (2015) (among others) note, a linear model for this ‘temporary price impact’ fits the data well, and adding in concavity, while empirically more accurate, does not improve the beyond . Hence, here we adopt a linear temporary price impact model and write the execution price as


where controls the asset’s liquidity, and hence the impact of trades.

The investor’s cash process, i.e., the accumulated funds from trading for some fixed strategy , is denoted , and is given by


2.3 Objective Criterion

Over the course of the trading window , the trader wishes to find a trading strategy which maximizes the objective criterion


where is the set of admissible trading strategies, here consisting of the collection of all –predictable processes such that .

The objective criterion (2.7) consists of three different parts. The first is , which represents the amount of cash the trader has accumulated from her trading over the period . Next is the amount of cash received from liquidating all remaining exposure at the end of the trading horizon. The value (per share) of liquidating these shares is penalized by an amount , where . The amount represents the liquidity penalty taken by the trader if she chooses to sell or buy an amount of assets all at once. We eventually take the limit to ensure that the trader ends with zero inventory. The last term represents a running penalty that penalizes the trader for having a non-zero inventory throughout the trading horizon, and allows her to control her exposure. This penalty can also be interpreted as the quadratic variation of the book-value of the traders position (ignoring jumps in the asset price), or can be seen as stemming from model uncertainty as shown in Cartea et al. (2017).

Note that we take trading strategies to be –predictable. –predictability ensures that the trader does not have access to any information regarding the path of the process , which governs the model driving the asset midprice drift and the intensities of . As well, –predictability prevents the trader for foreseeing a jump occurring at the same instant in time – in other words her decisions are based on the left limits of , and hence also . Because admissible controls are –predictable, and not –predictable (the full filtration), maximizing (2.7) is a control problem with partial information.

Solving control problems with partial information is very difficult to do directly, because most tools that are used to work with the case of complete information no longer work. The former requires an indirect approach in which, firstly, we find an alternate –adapted representation for the dynamics of the state variable process, and secondly, we extend the state variable process so that it becomes Markov when using Markov controls. The key step in this approach is to find the best guess for conditional on the reduced filtration available at that time.

3 Filtering

Because the investor cannot observe , she wishes to formulate a best guess for its value. The best possible guess for the distribution of will be the distribution of conditional on the information accumulated up until that time. Therefore, she wishes to compute

The filter process is –adapted with initial condition . It represents the posterior latent state distribution (given all information accumulated by the investor up until ).

Theorem 3.1.

Let us assume that the Novikov condition


holds. Then the filter admits a representation with components


where . If , for each , solves the SDE


with initial condition . If and , for each , solves the SDE


with the same initial condition.


See A. ∎

The process admits a simple closed form solution when . This case corresponds to when the latent regimes are constant over the trading period – in others words, the case of parameter uncertainty, but the model does not switch between regimes throughout the trading horizon. When , solutions to the filter can be approximated reasonably well for most purposes by using methods outlined in George et al. (2004), which will be discussed further in Section 6.

An SDE also exists for the normalized version of the filter , however, for simplicity, we keep track of the processes , and define the function (with a slight abuse of notation) via


so that . This choice of mapping into guarantees that , even when numerically approximating (3.3).

4 -Dynamics Projection

In this section, we show there exists an –adapted representation for the price dynamics, and the intensity processes. The sequence of arguments resemble those found in (Bäuerle and Rieder, 2007, Section 3), adapted to the case where the observable process contains both jump and diffusive terms.

First, define the –adapted martingales to be the compensated versions of the Poisson processes , i.e.,


The theorem below provides the necessary ingredients to provide the –adapted representations of the state processes.

Theorem 4.1.

If , define the processes , by the following relations


where and are the filtered drift and intensities, defined as and . Then,

  1. the process is an –adapted –Brownian motion;

  2. the process is an –adapted –martingale; and

  3. and , –almost surely.

  4. are –adapted doubly stochastic Poisson processes with -intensities .

If and , define as in (4.2b). Then, (B) and (D) hold and , –almost surely.


See B. ∎

Theorem 4.1 tells us that , in addition to being viewed as a –adapted doubly stochastic Poisson process with –intensity of , can be viewed as an –adapted doubly stochastic process with -intensity . That is, is a doubly stochastic Poisson process with respect to both the and filtrations, but with differing intensities.

Theorem 4.1 allow us to represent the dynamics of in their –predictable form as


Let us also note that because and , because are –adapted, we may take a conditional expectation with respect to to yield that and . Therefore we may define the functions, and as


so that and .

Hence, the collection of processes are –adapted. The optimal control problem corresponding to maximizing (2.7), within the admissible set, can therefore be regarded as a problem with complete information with respect to the extended state variable process . The joint dynamics of this state process are all –adapted and do not depend on the process . Therefore, the dynamics of the extended state process are completely visible to the investor, which reduces the control problem with partial information, in which we did not know the dynamics of the state variables, into a control problem with full information.

In the next section, we solve this control problem by using the fact that the extended state variable dynamics are –adapted for each . Hence, the dynamic programming principle can be applied to the optimization problem (2.7) and we derive a dynamic programming equation for the new problem.

5 Solving the Dynamic Programming Problem

5.1 The Dynamic Programming Equation

Using the definitions for and in (2.2) and (2.1), we can write as


as well we can write


which allows to be defined independently of . Hence, the trader’s objective criterion (2.7) becomes


With given by (5.2), the trader’s objective function does not depend on the value of the process . For the remainder of this section, we will use the above definition for the trader’s objective criterion.

To optimize the objective criterion 5.3, we use the fact that , the -dimensional state variable process is –adapted and, hence, has dynamics visible to the trader. First, let us define the functional


and the value function


where we use to represent the expected value given the initial conditions , where . The definition of implies that , where , is the objective criterion defined in equation (5.3). Furthermore, a control is optimal and solves the optimization problem described in Section 2.3 if it satisfies


Given the –adapted version of the dynamics of the state variables, for any Markov admissible control , there exists some function , such that . For such controls, the function must satisfy the Dynamic Programming Principle and the Dynamic Programming Equation (DPE) (see, e.g., (Pham, 2009, Chapter 3)) applies. The DPE for our specific problem suggests that satisfies the PDE


where is the infinitesimal generator for the state process using the predictable representation for the dynamics of and the intensity of , given a fixed control . Furthermore, the operator acts on functions , once differentiable in , twice differentiable in and all (componentwise) cross-derivatives, and once differentiable in , as follows

where is the infinitesimal generator of the process using its –predictable representation, which is independent of the control . This portion of the generator can be fairly generic because we have not specified the precise nature of the dynamics of the intensity processes – which is the impetus for separating this portion of the generator.

5.2 Dimensional Reduction

The Dynamic Programming Equation (5.7) can be simplified by introducing the ansatz

where for , we write . The PDE (5.7) then simplifies significantly to a PDE for ,


where the functions and are defined in equation (4.4). This PDE implies that the feedback control for this problem should be


In other words, the second line of the PDE (5.8) attains its supremum at defined above.

5.3 Solving the DPE

The ansatz provided above permits us to indeed find a solution to the PDE (5.7) which is presented in the proposition that follows.

Proposition 5.1 (Candidate Solution).

The PDE (5.7), admits the classical solution

where . Let, denote expectation conditional on the initial conditions , and define the constants and . We have that
(i) if , then


(ii) if , then


where .


See C.1. ∎

For the remainder of the paper, we will concern ourselves with the case where , because in most applications the trader wishes to completely liquidate by the end of the trading horizon, and so , while is comparatively small.

The above proposition and equation (5.9) suggest that the optimal trading speed the investor should employ is


This optimal trading strategy is a combination of two terms (i) the classical Almgren-Chriss (AC) liquidation strategy represented by ; and (ii) a term which adjusts the strategy based on expected future midprice movements, represented by . From the representation of in (5.10b) (or (5.11b)), this latter term is the weighted average of the expected future drift of the asset’s midprice. Therefore if, based on her current information, the trader believes that the asset midprice drift will remain largely positive for the remainder of the trading period, she will buy more of the asset relative to the AC strategy. This is reasonable, because she knows she will be able to sell the asset at a higher price once asset prices have risen. The exact opposite occurs when she expects the asset price drift to remain mostly negative over the rest of the trading period.

The result in (5.12) illustrates how the investor uses the filter

for the posterior probability of what latent state is currently prevailing, to consistently update her strategy based on her predictions of the future path of the asset midprice. Moreover, the solution here closely resembles the result obtained by 

Cartea and Jaimungal (2016), however, it explicitly incorporates latent information and jumps in the asset price.

Computing the expectation appearing in directly is not easy. There is, however, an alternate representation of this expectation. For any , we have


where denotes expectation conditioning on the initial condition and . The alternative form in the rhs above is almost always easier to compute than a direct computation of the lhs.

Next, we provide a verification theorem showing that the candidate solution in Proposition 5.1 is exactly equal to the value function defined in equation (5.5).

Theorem 5.2 (Verification Theorem).

Suppose that is the solution to the PDE (5.8), and that . Let , where .
Then is equal to the value function defined in (5.5). Furthermore the control


is optimal and satisfies


See C.2. ∎

The theorem above guarantees that the control provided above indeed solves the optimization problem presented in Section 2.3. In retrospect, the optimal control to the trader’s optimization problem with partial information is a Markov control. The key steps were to introduce the predictable representation for the dynamics of the process , and to extend the original state process to include the unnormalized posterior distribution of the latent states .

5.4 Zero Terminal Inventory

A useful limiting case is when the trader is forced to eliminate her market exposure before time . This corresponds to taking the limit and the resulting optimal control simplifies to


A second interesting case is to additionally take the limit of no running inventory penalty, in which case the optimal strategy results in


This strategy corresponds to a time weighted average price (TWAP) strategy plus an adjustment for the weighted expected future drift of the asset’s midprice.

All of the expressions above for the optimal control can be computed in closed form for a large variety of models. In the next section we provide two explicit, and useful, examples together with numerical experiments to illustrate the strategies dynamic behaviour.

6 Numerical Examples

In this section we will carry out some numerical experiments to test the performance of the optimal trading algorithms developed in Section 5. The examples show how the optimal trading performs using situations for two model set-ups.

6.1 Mean-Reverting Diffusion

This section investigates the case where the trader wishes to liquidate her inventory before some specified time . The asset price is assumed to be a pure diffusive Ornstein-Uhlenbeck process – alternatively, one can think of this midprice as the number of long-short position in a pairs trading strategy. The trader knows the volatility and rate of mean reversion, but does not know the level at which prices revert to. In this example, the mean-reversion level will remain constant over the course of the trading period . More specifically, we assume that the asset midprice in USD has the dynamics


where is a random variable taking values in the set with probabilities . It remains constant over time but its value is hidden from the trader. This model does not contain any jumps so we can ignore the variables and .

As mentioned in Section 3, there exists an exact closed form for the filter when is constant in time. For the regime switching OU model in (6.1), the exact solution for the un-normalized filter is


Because, in practice, is observed only discretely, the integrals above are approximated using the appropriate Riemann sums. The more frequently the trader observes , the more accurate the filter will be.

The solution to the optimal control when can be computed exactly as

For the simulations, here, we assume there are two possible values the asset price mean-reverts to, so that and we set and . Furthermore, we assume the investor has an equal prior on the two possibilities, so that . The remaining parameters used in the simulation are provided in Table 1.

, , , ,
, , .
Table 1: The parameters in the OU model. All of the time-sensitive parameters are defined on an hourly scale.

When simulating sample paths, we generate paths using . The trader will need to detect this value as she observes the price path.

Figure 1: Simulation Results with an Ornstein-Uhlenbeck process

Figure 1 shows the results of the simulation. The top right panel contains a heat map of the posterior probability of the two models. It shows, as time advances the trader on average will detect that the true rate of mean reversion is . Moreover, by the end of the trading period, she is on average at least confident that model 2 is the true model governing asset prices. The top left graph in figure 1 shows a heat-map of the trading speed for the investor, where the dashed line represents the classical AC strategy. The dotted-and-dashed line represents the median of the traders’s strategy. The heat-map shows how the trader adjusts her positions in a manner consistent with her predictions: as the investor discovers that , she expects the asset price to rise over time. Because of this she slows down her rate of liquidation initially, so that she can sell her asset at a higher price towards the end of the trading period. She then must speed up trading towards the end in order to unwind her position. The bottom left panel shows the histogram of the excess return per share of the optimal control over the AC control, where the excess return is defined as and is the total cash the trader earns using the AC liquidation strategy. As the histogram shows, the filtered strategy outperforms the AC strategy during at least of the simulations. Lastly, the bottom right panel shows the trader’s liquidation value per share over the trading period.

Figure 2: Sample simulation paths with an Ornstein-Uhlenbeck process.

Figure 2 displays sample paths of the asset price and the filter. The top left plot demonstrates how the trader quickly detects the correct model based on the asset trajectory. In this simulation, the asset price initially drops, but then increases consistently. The posterior probability of model 1 adjusts accordingly, and initially rises, but then quickly drops and remains low. In the simulated path in the middle panel, the path of the midprice fluctuates around over the entire time period. The trader’s estimate for the posteriori probability of model 1 varies according to the price movements she is observing. The resulting strategy induced by the filter fluctuating is an advantage to the trader, because the fluctuating filter more accurately reflects the actual behaviour of the asset price path – as opposed to being certain that the true model is model 2. Finally, the bottom panel of the figure shows a collection of sample midprice path trajectories.

6.2 Mean-Reverting Pure Jump Process

In this section, we investigate the case where the trader begins with no inventory and aims to gain profits from her alpha model through the use of a round-trip trading strategy. The asset price is assumed to be completely driven by the market order-flow, so that there is no diffusion or drift in the unaffected midprice. We assume the asset price mean-reverts to some unknown level which the trader must detect. More specifically, the asset midprice in USD satisfies the SDE


where and are doubly stochastic Poisson processes with intensities and defined by


where and denote the positive and negative parts of , respectively.

We assume is a Markov chain with generator matrix , specified in Table 2. The filter for cannot be computed explicitly, but it may be approximated via a Euler-Maruyama scheme of the SDE for the logarithm of the filter (see the SDE in Theorem 3.1). The resulting approximation for the value of the filter, given that the values of have been observed at times , where and is obtained via the recursive formula