Multi-period portfolio choice is a central problem in finance. It is described by an investor who faces the problem of determining how to sequentially allocate his capital to maximise some performance measure over multiple periods. Online portfolio selection algorithms tackle the problem of maximising cumulative wealth by adaptively identifying and exploiting patterns in historical data . The key feature of these algorithms is that they are online: patterns and portfolio decisions update upon the arrival of new data, thereby adapting to changing market conditions .
Online portfolio selection algorithms can be classified according to their update scheme. Traditional algorithms forecast asset returns and are used to update the current portfolio. Liet al. (2014) classifies traditional algorithms into the following categories :
Follow-The-Winner (FTR) algorithms assume that recent stock performance would persist and so transfer capital from the worst-performing stocks to the best-performing stocks.
Follow-The-Loser (FTL) algorithms assume that recent stock performance would revert to a long-run mean and so transfer capital from the best-performing stocks to the worst-performing stocks.
Pattern-Matching based (PM) algorithms assume that market conditions repeat themselves and so they allocate capital based on what was optimal for similar historical periods.
The Pattern-Matching based approach has the least restrictive assumption about market behaviour. This affords greater flexibility in algorithm design and allows these algorithms to exploit a wider range of market conditions, thereby outperforming the other approaches [2, 3, 4, 5]. In particular, the CORN-K (CORrelation-driven Non-parametric learning) algorithm appears to demonstrate the best results. Recently, the CORN-K algorithm has been extended to incorporate risk in its portfolio selection [4, 5].
However, the CORN-K algorithm (and its extensions) often output a cautious portfolio which restrict its returns. In short, this occurs when the algorithm is unable to detect a subset of historical data that is similar to the recent data and therefore, allocates wealth equally across assets.
To do this, we propose the Aggressive Multi-Temporal Allocation (AMA-K) algorithm, which combines the Pattern-Matching and Follow-the-Winner principles.
Ii Online Portfolio Selection
An investor wants to allocate his initial capital into a portfolio of securities for each of the trading days to maximise his terminal wealth . The investor’s portfolios are represented by , where is a proportion of the capital invested in security at time . Furthermore, portfolio positions are constrained to be non-negative and all capital is invested at each period .
Define the price relative for security at day as , where
denotes the log-price. Hence, denote the price relative vector as. A sequence of price relative vectors are used to define a market window , where is the given window size.
An online portfolio selection algorithm is a function that takes the historical price data at time and outputs a portfolio:
The portfolio is constructed at the start of period , using all information up until then. The terminal wealth at the end of period is given by:
For tractability, we make the following assumptions: each asset is arbitrarily divisible, desired quantities can be traded at the most recent closing price, and market prices are not affected by the investor’s actions. In addition, we ignore trading costs and do not allow for borrowing or short-selling.
Iii Correlation-driven Non-parametric Learning
CORN-Based strategies use experts that construct portfolios using previous market windows.
Experts have a portfolio and have a cumulative wealth , at time . experts are considered, each defined by a window size ) and a Pearson product-moment correlation coefficient
Pearson product-moment correlation coefficientthreshold ) . In top based strategies, the experts with the most cumulative wealth at time have their portfolios combined. Each expert is responsible for wealth of the portfolio’s allocation for a given day. This combined portfolio is the agent’s portfolio at time .
Each expert compares the most recent market window at time with all historical market windows of the same size. Each expert searches for their optimal portfolio using a given set of data that is equal to or greater than their respective . Days that match this required are called correlation similar days and is represented by . Experts update their wealth at the end of a day using this portfolio and the day’s returns.
An expert’s portfolio is determined by b that maximises Equation 3 at time :
At times, the correlation similar set of days may be small or empty. In this case the expert returns uniform portfolio [3, 4, 5]. A uniform portfolio is when wealth is equally distributed amongst all assets - which generally have lower returns. DRICORN-K is a variation of CORN-K that classifies the market and adjusts accordingly. Classification is done through the use of (market Beta) in searching for the optimal portfolio . Utilising allows for more aggressive/defensive portfolios based on current market conditions . At times, does not impact the portfolio construction, in which case DRICORN-K returns a similar portfolio to CORN-K.
Clustering previously has been employed in online portfolio selection [6, 7, 8, 9, 10]. Similar approaches are given by Khedmati et al. (2020) where portfolios are optimised using clustering techniques, market windows, Pattern-Matching and similar day samples [8, 6]. Nanda et al.
(2010) found that K-means clustering provided the best result for online portfolio selection based on cluster compactness using the Bombay Stock Exchange. Our work extends on these previous successes by directly integrating clustering into the CORN-K framework. Further, we introduce a more effective low dimensional representation for market windows that improve the clustering results.
A limitation of CORN-K’s use of correlation similar days is correlation similar days are rare and usually only common amongst experts with smaller window sizes and lower values for . Hence, experts that have a suitable quantity of data to produce inference in the market dynamics are unable to do so. To overcome this limitation, we use online K-Means clustering (K-online) with Manhattan distance as an alternative to discover sets of cluster similar days. These cluster similar days do not require a correlation coefficient threshold and are considered similar if they belong to the same centroid. Manhattan distance is selected for computational efficiency - alternate metrics could be considered.
Our variation to CORN-K lies in dealing with empty sets of correlation similar days. If we encounter a day where the agent has and the market window size is , we make use of our current day’s (day ) market vector’s assigned cluster as created in Algorithm 1. We let the correlation similar set be all days assigned to the same cluster and maximise using Equation 3.
Iv-a Aggressive Multi-Temporal Allocation
We have chosen to maintain the method of choosing best experts in our algorithm. Furthermore, we use the concept of market windows ( for a market window of size at day ), these are matrices that represent consecutive market days’ price relative vectors across all shares.
Iv-A1 Agent Memory
At the start, all the agent’s experts have days of market history. As the algorithm proceeds, new days are added to the agent’s memory until it has days of market history. At days, the agent forgets all but the most recent days of market history. The choice of ensures that the agent considers only recent price movements and the length of keeps the agent’s portfolio allocations “stable“. A small value results in an myopic agent that exploits volatility. A high value results in an agent that looks for“blue chips“ that represent long-term growth trends.
Each market window is represented by a market vector with components: the sum of each stock’s mean in the market window (Equation 4
), the market window’s mean, the sum of each stock’s variance (Equation5) and the variance of the market window .
where represents the asset.
where is the average price relative for the asset in the given market window.
We initialise the algorithm with , windows of size where market vectors and is the maximum window size. ’s value is from the original CORN paper . We initialise the number of cluster centroids as ( represents the number of days). The initial centroid amount was determined using the validation data.
Subsequently, we shift the market window forward by one day and assign new market vectors for their respective windows to their nearest clusters for that agent’s experts. Noting that market vectors ( are assigned to experts with the same window size. We re-do the clustering every days to keep the allocation of market vectors uniform and relevant. The additional centroids allow for the new unseen market vectors to be represented. Even though this readjustment interval was determined using the validation set, the strategy was found to be insensitive to intervals in the range to .
Since the K-online algorithm does not converge, we terminate the clustering when reassignments affect only of market vectors. The is a manually determined parameter for the general case when the assignment of market vectors to clusters proceeded normally. If the threshold is not obtained within ten re-initialisation attempts, we use the last readjustment. This maximum number of attempts allow the clustering process to proceed which may yield undesirable cluster assignments.
Every days we reset the K-online clustering using the most recent days of data and centroids. The reason why we reset every days is that it allows the algorithm to provide a balance between factoring in new information and acting less erratically from continuously switching between asset allocations.
Given that the various (10-day, 120-day and 190-day) agents perform well for their respective time horizons as shown by Table II, we create an algorithm that combines the agents. This combination creates an agent with three time horizons that we have defined as; short (10-day), medium (120-day) and long (190-day). Each time horizon has its own set of experts associated to it. The clustering algorithm is repeated for each sub-agent using their value. Here a sub-agent is an agent over a specific -day time horizon. We take the portfolio for each time horizon and normalize it such that it represents the proportion of the sub-agent’s wealth allocated to each asset at day . These portfolios are merged. The resulting portfolio is then divided by the number of time horizons under consideration (here we divide by three). This resultant portfolio may have a diverse range of assets, which should reduce risk whilst maintaining a high expected return. This idea of efficient diversification is well-founded by Markowitz (1952) in his famous paper Portfolio Selection .
In comparing our algorithm to similar approaches, we use the following metrics that aim to measure performance in a generalised manner.
Maximum Drawdown (MDD) 
MDD is a risk evaluation metric which represents the maximum decline from a historical peak of the total wealth() achieved at the time . The smaller the MDD value, the more risk tolerant the trading strategy.
Annualised Percentage Yield (APY) 
Here is the total return after trading periods, and is the number of years corresponding to . APY measures the rate of return that was achieved and it takes into account the effect of compounding. Typically a greater APY is desired.
Annualised Sharpe Ratio(ASR) 
Here represents the annualised Sharpe Ratio after periods, is an Annualised Percentage Yield (Equation 7). is the risk-free rate of return and
is the annualised standard deviation of daily returns. We use the same assumptions as in the DRICORN-K to calculate Equation8 . Where is set to 4% and is set to as a result of assuming an average number of 252 trading days in a given year. The Sharpe Ratio captures the “return per unit of risk“. A higher value ASR is preferred.
Iv-C Training and Validation
The data sets used are given in Table I. Assets had their prices adjusted for dividends and stock splits. Validation and Testing represent the validation and testing data sets respectively. All sets are in years - where 252 days is the average number of trading days in a year. The sets consisted of 6000, 5040 and 2520 days for training, validation and testing respectively.
In demonstrating memory’s effects, we trained agents using different values of in intervals of ten between 10 and 230 days. Table II is a subset of results for the best performing sizes for in various periods.
It was observed anecdotally that in markets that experienced high volatility with the best performing asset constantly changing, the 10-day agent performed best. In markets that had a consistent best stock over a long period, the 190-day agent performed better. The 120-day agent represents a “middle of the way“ agent that yielded an overall better strategy as shown by the performance across the presented metrics.
Each testing data set consists of one year of data, with 300 days prior for CORN-based strategies to train with. This extra data is to allow CORN-based strategies to have a more fair comparison against AMA-K. The CORN-based algorithms were tuned with their optimal hyper-parameters as set out in their respective papers [3, 5, 4]. In the case of our implementation, we have segmented days for each algorithm, where is the size of the largest sub-agent’s memory - here this would be 190 days. We will compare our approaches to some common baselines such as UBAH, CRP and Best Stock . We also compare our method to EG (Exponential Gradient) as a showcase of a Follow-The-Winner strategy . In EG we have set to be .
V Results and Discussion
In the Tables III, IV and V, the number next to a stock exchange represents which data set it came from. The mean () is the average for the metric across each of the markets and is the standard deviation.
V-a Individual Metric Performance
As seen in Table III, the best performing strategies are EG and AMA-K. EG outperforms AMA-K by a considerable margin on average. It should be noted that AMA-K achieves a low standard deviation for its MDD. Despite this lower standard deviation, AMA-K had the greatest MDD value hence, it represents the riskiest strategy. This is expected given that AMA-K is extremely aggressive in terms of portfolio allocation. Furthermore, we note that in the two markets that have periods of flatness, we see that AMA-K performed the worst. AMA-K received an MDD value that is twice and thrice greater than the other strategies (in general) for NAS-1 and JSE-2 respectively.
In Table IV we see that AMA-K had the best performance, with AMA-K achieving in the worst case (on NAS-1) 38.87% better annualised percentage yield for the period compared to the second-best. AMA-K had the highest mean APY, however it should be noted that its standard deviation is fairly large. The second-best strategies are CORN-K and DRICORN-K. These strategies had nearly half the mean APY of AMA-K. AMA-K’s better performance to CORN-K and DRICORN-K results from AMA-K searching for portfolios in days that these CORN-based strategies’ experts would have returned uniform portfolios. AMA-K’s strategy has turned enough of these days into profitable trading days - as reflected by its mean APY.
Based on Table V, we can see that the best performing algorithm is AMA-K. The performance benefits of AMA-K can be seen in the example of JSE-1, where AMA-K performs 4 times better than the second-best algorithm in JSE-1. Therefore, we can conclude that per unit of risk with a given risk-free rate of 4%, the AMA-K algorithm returns showcase the risk to reward trade-off at play.
V-B General Performance
Looking at the cumulative return of the algorithm on various markets, we see that the approaches have different patterns in general. For example for NAS-1 in Figure 4, the general market trend is upwards, with AMA-K performing the best. The second-best algorithm is CORN-K and DRICORN-K which have performed the same. For the first 100 days AMA-K’s MDD risks are prevalent and AMA-K does the worst. AMA-K fluctuates heavily in this market and from a bottom at day 68 where AMA-K lost 7.8% of its total wealth in four days to being the best performing algorithm at the end.
In Figure 3 we see the market stays relatively flat before decreasing slightly for the first 100 days. Comparing our 10-day, 120-day and 190-day sub-agents to AMA-K we see that AMA-K cannot beat the best performing agent. AMA-K’s other agents have also done well for the period and for the initial 55 days we see that the sub-agents perform well. The 10-day agent is clearly more volatile and loses more of its cumulative wealth than the other agents. Towards the end of the period, the 120-day agent has out-performed all other strategies. AMA-K has benefited from this sub-agent, but the benefit is dampened by the other sub-agents.
When CORN-K’s and DRICORN-K’s experts pick up enough similarity, their performance is close to ours. This can be seen in Figure 3. Despite our approach making considerable gains early on, the CORN-K and DRICORN-K approaches identified an asset our approach was unable to. This resulted in these algorithms far outpacing our method. The asset that CORN-K and DRICORN-K identified was most likely in a period our agents had forgotten.
In general our approach is competitive and leverages CORN-based strategies to produce further gains by searching for optimal portfolios on days that CORN-based strategies would not. Although the risk presented by our algorithm as shown in Figure 4 can be significant, it is an example of a risk to reward trade-off.
Vi Conclusion and Further Development
Our approach can generate high returns using our memory-based method. The approaches can be volatile and merging them typically results in a more stable strategy (as shown in Tables 6, 7, 8) at the cost of reducing returns of the best agents. Here are possible areas to develop further: Changing how we search for the optimal portfolio to include regret or penalise volatility - for lower-risk strategies. Doing a fine-grained grid search for the optimal amount of memory for the agents in different periods. Testing AMA-K using a variety of combinations for the -day agents with different time horizons. Lastly, further testing using other well known or niche financial metrics should be conducted to further understand the performance of the method in comparison to other modern approaches.
-  B. H. Li and S. C.H, “Online portfolio selection: A survey,” ACM Computing Surveys, vol. 46, 2014.
-  L. Gyorfi, G. Lugosi, and F. Udina, “Nonparametric kernel-based sequential investment strategies,” Mathematical Finance, vol. 16, pp. 337–357, 2006.
-  B. H. Li, S. C.H, and V. Gopalkrishnan, “Corn: Correlation-driven nonparametric learning approach for portfolio selection,” ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 21–29, 2011.
-  Y. Wang, D. Wang, and T. Zheng, “Racorn-k: risk-aversion pattern matching-based portfolio selection,” Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1816–1820, 2018.
-  S. Sooklal, T. van Zyl, and A. Paskaramoorthy, “Dricorn-k: A dynamic risk correlation-driven non-parametric algorithm for online portfolio selection,” Proceedings of the First Southern African Conference for AI Research, pp. 183–196, 2020.
-  S.-H. Liao, H. hui Ho, and H. wen Lin, “Mining stock category association and cluster on taiwan stock market,” Expert Systems with Applications, vol. 35, no. 1, pp. 19–29, 2008.
-  S. K. Kumari, P. Kumar, J. Priya, S. Surya, and A. K. Bhurjee, “Mean-value at risk portfolio selection problem using clustering technique : A case study,” AIP Conference Proceedings, vol. 2112, no. 1, p. 020178, 2019.
-  M. Khedmati and P. Azin, “An online portfolio selection algorithm using clustering approaches and considering transaction costs,” Expert Systems with Applications, vol. 159, p. 113546, 2020.
-  P. Zuccolotto and G. De Luca, “Dynamic tail dependence clustering of financial time series,” Statistical Papers, vol. 58, 09 2017.
-  S. Nanda, B. Mahanty, and M. Tiwari, “Clustering indian stock market data for portfolio management,” Expert Systems with Applications, vol. 37, no. 12, pp. 8793–8798, 2010.
-  H. Markowitz, “Portfolio selection,” The Journal of Finance, vol. 7, pp. 77–91, 1952.
-  M. Magdon-Ismail and A. Atiya, “Maximum drawdown,” Risk Magazine, vol. 10, pp. 99–102, 2004.
-  K. Elton, J. Gruber, and J. Brown, Modern Portfolio Theory and Investment Analysis. J. Wiley & Sons, 2003.
-  W. Sharpe, “The sharpe ratio,” The Journal of Portfolio Management, vol. 21, no. 1, pp. 49–58, 1994.
-  D. P. Helmbold, R. E. Schapire, Y. Singer, and M. K. Warmuth, “On-line portfolio selection using multiplicative updates,” Mathematical Finance, vol. 8, pp. 325–347, 1998.