1 Introduction
The phrase “All models are wrong, but some are useful” (Box, 1978) rings true across all areas in finance, and intraday trading is no exception. If an investor wishes to efficiently trade assets, she must use a strategy that can anticipate the asset’s price trajectory while simultaneously being mindful of the flaws in her model, as well as the costs borne from transaction fees and her own impact on prices. With all of the complexities in intraday markets, it is no surprise that strategies differ substantially based on what assumptions are made about asset price dynamics. Trading with an incorrect model can be very costly to an investor, and therefore being able to mitigate model risk is valuable.
The availability of information at very high frequencies can help a trader partially overcome the problem of model selection. The information provided from realized trajectories of the asset price and the incoming flow of orders of other traders, allows her to infer which model best fits the observed data, and in turn she may use it to predict future movements in asset prices. Ideally, the trader should be able to incorporate this information in an online manner. In other words, the trader should be continuously updating her model as she observes new information, keeping in mind that the market may switch between a number of regimes over the course of the trading period. Furthermore, the trader would like to have some means of incorporating apriori knowledge about markets into her trading strategy before beginning to trade.
This paper studies the optimal trading strategy for a single asset when there are latent alpha components to the asset price dynamics, and where the trader uses price information to learn about the latent factor. Prices can diffuse as well as jump. The trader’s goal is to optimally trade subject to this model uncertainty, and end the trading horizon with zero inventory. By treating the trader’s problem as a continuous time control problem where information is partially obscured, we succeed in obtaining a closed form strategy, up to the computation of an expectation that is specific to the trader’s prior assumptions on the model dynamics. The optimal trading strategy we find can be computed with ease for a wide variety of models, and we demonstrate its performance by comparing, in simulation, with approaches that that do not make use of learning.
Early works on partial information include Detemple (1986), Detemple (1991), who study optimal technology investment problems (where the states that drive production are obfuscated by gaussian noise); Gennotte (1986), who studies the optimal portfolio allocation problem when returns are hidden but satisfy an OrnsteinUhlenbeck process; Dothan and Feldman (1986), who analyzes a production and exchange economy with a single unobservable source of nondiversifiable risk; Karatzas and Xue (1991), who studies utility maximization under partial observations; Bäuerle and Rieder (2005), Bäuerle and Rieder (2007) and Frey et al. (2012), who study model uncertainty in the context of portfolio optimization and the optimal allocation of assets; and Papanicolaou (2018), who studies an optimal portfolio allocation problem where the drift of the assets are latent Ito diffusions.
There are a few recent papers on partial information that are related to this study. Ekstrom and Vaicenavicius (2016) investigates the optimal timing problem associated with liquidating a single unit of an asset when the asset price is a geometric Brownian motion with random (unobserved) drift. Colaneri et al. (2016) studies the optimal liquidation problem when the asset midprice is driven by a Poisson random measure with unknown meanmeasure. Gârleanu and Pedersen (2013) study the optimal trading strategy for maximizing the discounted, and penalized, future expected excess returns in a discretetime, infinitetime horizon problem. In their model, prices contain an unpredictable martingale component, and an independent stationary (visible) predictable component – the alpha component. Guéant and Pu (2016)
study models in which the drift of the asset price process is a latent random variable in an optimal portfolio selection setting, for an investor who seeks to maximize a CARA and CRRA objective function, as well as in the cases of optimal liquidation in an AlmgrenChriss like setting.
The approach we take differs in several ways from the extant literature, but the two key generalizations are: (i) we account for quite general latent factors which drive the drift and jump components in the asset’s midprice; and (ii) we include both temporary and permanent impact that the agent’s trading has on the market.
The structure of the remainder of this paper is as follows. Section 2 outlines our modelling assumptions, as well as providing the optimization problem with partial information that the trader wishes to solve. Section 3 provides the filter which the trader uses to make proper inference on the underlying model driving the data she is observing. Section 4 shows that the original optimization problem presented in Section 2 can be simplified to an optimization problem with complete information using the filter presented in Section 3. Section 5 shows how to solve the reduced optimization problem from Section 4 and verifies that the resulting strategy indeed solves the original optimization problem. Lastly, Section 6 provides some numerical examples by applying the theory to a few specific models, and compares the resulting strategy, using simulations, to an alternative which does not learn from price dynamics.
2 The Model
We work on the filtered probability space
, where , and finite, is some fixed time horizon. The filtration is the natural one generated by the paths of the unimpacted asset midprice process , the counting processes for the number of buy and sell market orders which cause price changes, denoted and , and a latent process . The exact nature of these processes will be provided in more detail in the remainder of the section.The trader’s optimization problem is to decide on a dynamic trading strategy to buy/sell an asset over the course of a trading horizon to maximize some performance criteria. We assume the trader executes orders continuously at a (controlled) rate denoted by . The trader’s inventory, given some strategy , is denoted , with the initial condition . may be zero, positive (a long position), or negative (a short position) – and, hence, the inventory at time can be written as
(2.1) 
The above can be interpreted as the investor purchasing shares over the period . A positive (negative) value for represents the trader buying (selling) the asset. The rate at which the investor buys or sells the asset affects prices through two mechanisms. Firstly, a temporary price impact, which is effectively a transaction cost that increases with increasing trading rate. Secondly, a permanent impact, which incorporates the fact that when there are excess buy orders, prices move up, and excess sell orders, prices move down.
We further assume that other market participants also have a permanent impact on the asset midprice through their own buy and sell market orders (MOs). To model this, we let be doubly stochastic Poisson processes with respective intensity processes , which count the number of market orders that cause prices to move. In the remainder of the paper, we write .
2.1 Asset Midprice Dynamics
To model the permanent price impact of trades, we define two processes and to represent the asset midprice and the asset midprice without the trader’s impact, respectively. As shown by Cartea and Jaimungal (2016), intraday permanent price impact (over short time scales) is well approximated by a linear model. Hence, we write
(2.2) 
where controls the strength of the trader’s impact on the asset midprice. Alternatively, one could write this as a pure jump model
where are controlled doubly stochastic Poisson processes with –intensities and , respectively. The results will be identical to that obtained using the continuous model above.
We assume the investor does not have complete knowledge of the dynamics of the asset midprice, nor the rates of arrival of market orders. This uncertainty is modeled by assuming there is a latent continuous time Markov Chain
(with and ), which modulates the dynamics of state variables, but is not observable by the trader. The latent process is assumed to have a known generator matrix^{3}^{3}3The generator matrix of a state continuous time Markov chain has nondiagonal entries if and diagonal entries . is defined so that , where is element of the matrix exponential of . and the trader places a prior ,, on the initial state of the latent process, all estimated e.g., by the EM algorithm (see Section
7.1 for details).Conditional on a path of , the unaffected midprice is assumed to satisfy the SDE
where have –intensities
(2.3) 
and is a Brownian Motion. Moreover, we assume that each of the are –adapted processes, where is the natural filtration generated by the paths of the processes (note that can be inferred from this filtration, and strategies are therefore also adapted to the paths of ). Furthermore, we assume is a –adapted Markov process, where . The Markov assumption will, after modifying the problem to deal with partial information, allow a dynamic programming principle (DPP) and result in a dynamic programming equation (DPE). We assume that either (i) or (ii) and , to prevent cases where the model is driven by a counting process but also has a continuous drift. In case (ii), the asset price may indeed drift, but the drift will be due to imbalance in intensities so that prices remain on a discrete grid. To compress notation, we define the process where as well as the processes where for each . Finally we make the technical assumption that
(2.4) 
This class of intensity models contains, among many others, deterministic intensities, shotnoise processes, and crossexciting Hawkes processes with finitedimensional Markov representations^{4}^{4}4To achieve this, we may extend to include the state variables necessary for the model to be Markov., all modulated by the latent factor(s). We provide some explicit examples in Section 6 where we also conduct numerical experiments.
The random variable indexes the possible models for the asset’s drift and the rates at which other market participants’ market orders arrive. Because is (potentially) stochastic, it may change over time, hence, so will the underlying model. Furthermore, because is invisible to the investor, to make intelligent trading decisions, the investor must infer from observations what is the current (and future) underlying model driving asset prices.
2.2 Cash Process
The price at which the trader either buys or sells each unit of the asset will be denoted as . Because there is limited liquidity at the best bid or ask price (the touch), the investor must “walk the book” starting at the bid (ask) and buy (sell) her assets at higher (lower) prices as she increases the size of each of her market orders. For tractability, and as Frei and Westray (2015) (among others) note, a linear model for this ‘temporary price impact’ fits the data well, and adding in concavity, while empirically more accurate, does not improve the beyond . Hence, here we adopt a linear temporary price impact model and write the execution price as
(2.5) 
where controls the asset’s liquidity, and hence the impact of trades.
The investor’s cash process, i.e., the accumulated funds from trading for some fixed strategy , is denoted , and is given by
(2.6) 
2.3 Objective Criterion
Over the course of the trading window , the trader wishes to find a trading strategy which maximizes the objective criterion
(2.7) 
where is the set of admissible trading strategies, here consisting of the collection of all –predictable processes such that .
The objective criterion (2.7) consists of three different parts. The first is , which represents the amount of cash the trader has accumulated from her trading over the period . Next is the amount of cash received from liquidating all remaining exposure at the end of the trading horizon. The value (per share) of liquidating these shares is penalized by an amount , where . The amount represents the liquidity penalty taken by the trader if she chooses to sell or buy an amount of assets all at once. We eventually take the limit to ensure that the trader ends with zero inventory. The last term represents a running penalty that penalizes the trader for having a nonzero inventory throughout the trading horizon, and allows her to control her exposure. This penalty can also be interpreted as the quadratic variation of the bookvalue of the traders position (ignoring jumps in the asset price), or can be seen as stemming from model uncertainty as shown in Cartea et al. (2017).
Note that we take trading strategies to be –predictable. –predictability ensures that the trader does not have access to any information regarding the path of the process , which governs the model driving the asset midprice drift and the intensities of . As well, –predictability prevents the trader for foreseeing a jump occurring at the same instant in time – in other words her decisions are based on the left limits of , and hence also . Because admissible controls are –predictable, and not –predictable (the full filtration), maximizing (2.7) is a control problem with partial information.
Solving control problems with partial information is very difficult to do directly, because most tools that are used to work with the case of complete information no longer work. The former requires an indirect approach in which, firstly, we find an alternate –adapted representation for the dynamics of the state variable process, and secondly, we extend the state variable process so that it becomes Markov when using Markov controls. The key step in this approach is to find the best guess for conditional on the reduced filtration available at that time.
3 Filtering
Because the investor cannot observe , she wishes to formulate a best guess for its value. The best possible guess for the distribution of will be the distribution of conditional on the information accumulated up until that time. Therefore, she wishes to compute
The filter process is –adapted with initial condition . It represents the posterior latent state distribution (given all information accumulated by the investor up until ).
Theorem 3.1.
Let us assume that the Novikov condition
(3.1) 
holds. Then the filter admits a representation with components
(3.2) 
where . If , for each , solves the SDE
(3.3) 
with initial condition . If and , for each , solves the SDE
(3.4) 
with the same initial condition.
Proof.
See A. ∎
The process admits a simple closed form solution when . This case corresponds to when the latent regimes are constant over the trading period – in others words, the case of parameter uncertainty, but the model does not switch between regimes throughout the trading horizon. When , solutions to the filter can be approximated reasonably well for most purposes by using methods outlined in George et al. (2004), which will be discussed further in Section 6.
An SDE also exists for the normalized version of the filter , however, for simplicity, we keep track of the processes , and define the function (with a slight abuse of notation) via
(3.5) 
so that . This choice of mapping into guarantees that , even when numerically approximating (3.3).
4 Dynamics Projection
In this section, we show there exists an –adapted representation for the price dynamics, and the intensity processes. The sequence of arguments resemble those found in (Bäuerle and Rieder, 2007, Section 3), adapted to the case where the observable process contains both jump and diffusive terms.
First, define the –adapted martingales to be the compensated versions of the Poisson processes , i.e.,
(4.1) 
The theorem below provides the necessary ingredients to provide the –adapted representations of the state processes.
Theorem 4.1.
If , define the processes , by the following relations
(4.2a)  
(4.2b) 
where and are the filtered drift and intensities, defined as and . Then,

the process is an –adapted –Brownian motion;

the process is an –adapted –martingale; and

and , –almost surely.

are –adapted doubly stochastic Poisson processes with intensities .
If and , define as in (4.2b). Then, (B) and (D) hold and , –almost surely.
Proof.
See B. ∎
Theorem 4.1 tells us that , in addition to being viewed as a –adapted doubly stochastic Poisson process with –intensity of , can be viewed as an –adapted doubly stochastic process with intensity . That is, is a doubly stochastic Poisson process with respect to both the and filtrations, but with differing intensities.
Theorem 4.1 allow us to represent the dynamics of in their –predictable form as
(4.3) 
Let us also note that because and , because are –adapted, we may take a conditional expectation with respect to to yield that and . Therefore we may define the functions, and as
(4.4) 
so that and .
Hence, the collection of processes are –adapted. The optimal control problem corresponding to maximizing (2.7), within the admissible set, can therefore be regarded as a problem with complete information with respect to the extended state variable process . The joint dynamics of this state process are all –adapted and do not depend on the process . Therefore, the dynamics of the extended state process are completely visible to the investor, which reduces the control problem with partial information, in which we did not know the dynamics of the state variables, into a control problem with full information.
In the next section, we solve this control problem by using the fact that the extended state variable dynamics are –adapted for each . Hence, the dynamic programming principle can be applied to the optimization problem (2.7) and we derive a dynamic programming equation for the new problem.
5 Solving the Dynamic Programming Problem
5.1 The Dynamic Programming Equation
Using the definitions for and in (2.2) and (2.1), we can write as
(5.1) 
as well we can write
(5.2) 
which allows to be defined independently of . Hence, the trader’s objective criterion (2.7) becomes
(5.3) 
With given by (5.2), the trader’s objective function does not depend on the value of the process . For the remainder of this section, we will use the above definition for the trader’s objective criterion.
To optimize the objective criterion 5.3, we use the fact that , the dimensional state variable process is –adapted and, hence, has dynamics visible to the trader. First, let us define the functional
(5.4) 
and the value function
(5.5) 
where we use to represent the expected value given the initial conditions , where . The definition of implies that , where , is the objective criterion defined in equation (5.3). Furthermore, a control is optimal and solves the optimization problem described in Section 2.3 if it satisfies
(5.6) 
Given the –adapted version of the dynamics of the state variables, for any Markov admissible control , there exists some function , such that . For such controls, the function must satisfy the Dynamic Programming Principle and the Dynamic Programming Equation (DPE) (see, e.g., (Pham, 2009, Chapter 3)) applies. The DPE for our specific problem suggests that satisfies the PDE
(5.7) 
where is the infinitesimal generator for the state process using the predictable representation for the dynamics of and the intensity of , given a fixed control . Furthermore, the operator acts on functions , once differentiable in , twice differentiable in and all (componentwise) crossderivatives, and once differentiable in , as follows
where is the infinitesimal generator of the process using its –predictable representation, which is independent of the control . This portion of the generator can be fairly generic because we have not specified the precise nature of the dynamics of the intensity processes – which is the impetus for separating this portion of the generator.
5.2 Dimensional Reduction
The Dynamic Programming Equation (5.7) can be simplified by introducing the ansatz
where for , we write . The PDE (5.7) then simplifies significantly to a PDE for ,
(5.8) 
where the functions and are defined in equation (4.4). This PDE implies that the feedback control for this problem should be
(5.9) 
In other words, the second line of the PDE (5.8) attains its supremum at defined above.
5.3 Solving the DPE
The ansatz provided above permits us to indeed find a solution to the PDE (5.7) which is presented in the proposition that follows.
Proposition 5.1 (Candidate Solution).
The PDE (5.7), admits the classical solution
where . Let, denote expectation conditional on the initial conditions , and define the constants
and
. We have that
(i) if , then
(5.10a)  
(5.10b)  
(5.10c) 
(ii) if , then
(5.11a)  
(5.11b)  
(5.11c) 
where .
Proof.
See C.1. ∎
For the remainder of the paper, we will concern ourselves with the case where , because in most applications the trader wishes to completely liquidate by the end of the trading horizon, and so , while is comparatively small.
The above proposition and equation (5.9) suggest that the optimal trading speed the investor should employ is
(5.12) 
This optimal trading strategy is a combination of two terms (i) the classical AlmgrenChriss (AC) liquidation strategy represented by ; and (ii) a term which adjusts the strategy based on expected future midprice movements, represented by . From the representation of in (5.10b) (or (5.11b)), this latter term is the weighted average of the expected future drift of the asset’s midprice. Therefore if, based on her current information, the trader believes that the asset midprice drift will remain largely positive for the remainder of the trading period, she will buy more of the asset relative to the AC strategy. This is reasonable, because she knows she will be able to sell the asset at a higher price once asset prices have risen. The exact opposite occurs when she expects the asset price drift to remain mostly negative over the rest of the trading period.
The result in (5.12) illustrates how the investor uses the filter
for the posterior probability of what latent state is currently prevailing, to consistently update her strategy based on her predictions of the future path of the asset midprice. Moreover, the solution here closely resembles the result obtained by
Cartea and Jaimungal (2016), however, it explicitly incorporates latent information and jumps in the asset price.Computing the expectation appearing in directly is not easy. There is, however, an alternate representation of this expectation. For any , we have
(5.13) 
where denotes expectation conditioning on the initial condition and . The alternative form in the rhs above is almost always easier to compute than a direct computation of the lhs.
Next, we provide a verification theorem showing that the candidate solution in Proposition 5.1 is exactly equal to the value function defined in equation (5.5).
Theorem 5.2 (Verification Theorem).
Proof.
See C.2. ∎
The theorem above guarantees that the control provided above indeed solves the optimization problem presented in Section 2.3. In retrospect, the optimal control to the trader’s optimization problem with partial information is a Markov control. The key steps were to introduce the predictable representation for the dynamics of the process , and to extend the original state process to include the unnormalized posterior distribution of the latent states .
5.4 Zero Terminal Inventory
A useful limiting case is when the trader is forced to eliminate her market exposure before time . This corresponds to taking the limit and the resulting optimal control simplifies to
(5.16) 
A second interesting case is to additionally take the limit of no running inventory penalty, in which case the optimal strategy results in
(5.17) 
This strategy corresponds to a time weighted average price (TWAP) strategy plus an adjustment for the weighted expected future drift of the asset’s midprice.
All of the expressions above for the optimal control can be computed in closed form for a large variety of models. In the next section we provide two explicit, and useful, examples together with numerical experiments to illustrate the strategies dynamic behaviour.
6 Numerical Examples
In this section we will carry out some numerical experiments to test the performance of the optimal trading algorithms developed in Section 5. The examples show how the optimal trading performs using situations for two model setups.
6.1 MeanReverting Diffusion
This section investigates the case where the trader wishes to liquidate her inventory before some specified time . The asset price is assumed to be a pure diffusive OrnsteinUhlenbeck process – alternatively, one can think of this midprice as the number of longshort position in a pairs trading strategy. The trader knows the volatility and rate of mean reversion, but does not know the level at which prices revert to. In this example, the meanreversion level will remain constant over the course of the trading period . More specifically, we assume that the asset midprice in USD has the dynamics
(6.1) 
where is a random variable taking values in the set with probabilities . It remains constant over time but its value is hidden from the trader. This model does not contain any jumps so we can ignore the variables and .
As mentioned in Section 3, there exists an exact closed form for the filter when is constant in time. For the regime switching OU model in (6.1), the exact solution for the unnormalized filter is
(6.2) 
Because, in practice, is observed only discretely, the integrals above are approximated using the appropriate Riemann sums. The more frequently the trader observes , the more accurate the filter will be.
The solution to the optimal control when can be computed exactly as
For the simulations, here, we assume there are two possible values the asset price meanreverts to, so that and we set and . Furthermore, we assume the investor has an equal prior on the two possibilities, so that . The remaining parameters used in the simulation are provided in Table 1.
,  ,  ,  , 
,  ,  . 
When simulating sample paths, we generate paths using . The trader will need to detect this value as she observes the price path.
Figure 1 shows the results of the simulation. The top right panel contains a heat map of the posterior probability of the two models. It shows, as time advances the trader on average will detect that the true rate of mean reversion is . Moreover, by the end of the trading period, she is on average at least confident that model 2 is the true model governing asset prices. The top left graph in figure 1 shows a heatmap of the trading speed for the investor, where the dashed line represents the classical AC strategy. The dottedanddashed line represents the median of the traders’s strategy. The heatmap shows how the trader adjusts her positions in a manner consistent with her predictions: as the investor discovers that , she expects the asset price to rise over time. Because of this she slows down her rate of liquidation initially, so that she can sell her asset at a higher price towards the end of the trading period. She then must speed up trading towards the end in order to unwind her position. The bottom left panel shows the histogram of the excess return per share of the optimal control over the AC control, where the excess return is defined as and is the total cash the trader earns using the AC liquidation strategy. As the histogram shows, the filtered strategy outperforms the AC strategy during at least of the simulations. Lastly, the bottom right panel shows the trader’s liquidation value per share over the trading period.
Figure 2 displays sample paths of the asset price and the filter. The top left plot demonstrates how the trader quickly detects the correct model based on the asset trajectory. In this simulation, the asset price initially drops, but then increases consistently. The posterior probability of model 1 adjusts accordingly, and initially rises, but then quickly drops and remains low. In the simulated path in the middle panel, the path of the midprice fluctuates around over the entire time period. The trader’s estimate for the posteriori probability of model 1 varies according to the price movements she is observing. The resulting strategy induced by the filter fluctuating is an advantage to the trader, because the fluctuating filter more accurately reflects the actual behaviour of the asset price path – as opposed to being certain that the true model is model 2. Finally, the bottom panel of the figure shows a collection of sample midprice path trajectories.
6.2 MeanReverting Pure Jump Process
In this section, we investigate the case where the trader begins with no inventory and aims to gain profits from her alpha model through the use of a roundtrip trading strategy. The asset price is assumed to be completely driven by the market orderflow, so that there is no diffusion or drift in the unaffected midprice. We assume the asset price meanreverts to some unknown level which the trader must detect. More specifically, the asset midprice in USD satisfies the SDE
(6.3) 
where and are doubly stochastic Poisson processes with intensities and defined by
(6.4) 
where and denote the positive and negative parts of , respectively.
We assume is a Markov chain with generator matrix , specified in Table 2. The filter for cannot be computed explicitly, but it may be approximated via a EulerMaruyama scheme of the SDE for the logarithm of the filter (see the SDE in Theorem 3.1). The resulting approximation for the value of the filter, given that the values of have been observed at times , where and is obtained via the recursive formula
(6.5a)  
and  
Comments
There are no comments yet.