Log In Sign Up

Exploring Coevolutionary Dynamics of Competitive Arms-Races Between Infinitely Diverse Heterogenous Adaptive Automated Trader-Agents

We report on a series of experiments in which we study the coevolutionary "arms-race" dynamics among groups of agents that engage in adaptive automated trading in an accurate model of contemporary financial markets. At any one time, every trader in the market is trying to make as much profit as possible given the current distribution of different other trading strategies that it finds itself pitched against in the market; but the distribution of trading strategies and their observable behaviors is constantly changing, and changes in any one trader are driven to some extent by the changes in all the others. Prior studies of coevolutionary dynamics in markets have concentrated on systems where traders can choose one of a small number of fixed pure strategies, and can change their choice occasionally, thereby giving a market with a discrete phase-space, made up of a finite set of possible system states. Here we present first results from two independent sets of experiments, where we use minimal-intelligence trading-agents but in which the space of possible strategies is continuous and hence infinite. Our work reveals that by taking only a small step in the direction of increased realism we move immediately into high-dimensional phase-spaces, which then present difficulties in visualising and understanding the coevolutionary dynamics unfolding within the system. We conclude that further research is required to establish better analytic tools for monitoring activity and progress in co-adapting markets. We have released relevant Python code as open-source on GitHub, to enable others to continue this work.


page 1

page 2

page 3

page 4


Time Matters: Exploring the Effects of Urgency and Reaction Speed in Automated Traders

We consider issues of time in automated trading strategies in simulated ...

OSOUM Framework for Trading Data Research

In the last decades, data have become a cornerstone component in many bu...

Detecting data-driven robust statistical arbitrage strategies with deep neural networks

We present an approach, based on deep neural networks, that allows ident...

Imperfect Oracles: The Effect of Strategic Information on Stock Markets

Modern financial market dynamics warrant detailed analysis due to their ...

Market Impact in Trader-Agents: Adding Multi-Level Order-Flow Imbalance-Sensitivity to Automated Trading Systems

Financial markets populated by human traders often exhibit "market impac...

QuantNet: Transferring Learning Across Systematic Trading Strategies

In this work we introduce QuantNet: an architecture that is capable of t...

1 Introduction

In the past 20 years most of the world’s major financial markets have seen a sharp rise in the level of automated trading on those markets, with many human traders being replaced by adaptive algorithmic “robot traders” at the point of execution. Although this has been a significant shift, affecting both patterns of employment and the dynamics of the markets concerned, it can plausibly be argued that at a macro-level little has changed: these major markets are still populated by traders working on behalf of major financial institutions such as investment banks or fund-management companies; the difference is just that now those institutions are represented in the markets not by teams of human traders but by teams of robots. To be more precise, it is more often the case that within any one institution entire teams of human traders have been replaced by a single monolithic automated trading system that does the work previously performed by tens of hard-working human traders.

The success or failure of any one automated trading system is determined primarily by how much profit it can generate, but underlying that simple observation is a circularity. In any realistic market scenario, the profitability of a given robot trader will be determined at least in part by the extent to which its actions in the market are well-tuned to the likely reactions of other traders in that market. Thus, in contemporary markets, is likely to be designed to adapt its trading behavior to the current market circumstances, and yet those circumstances are significantly determined by the behavior of other traders in the market, most of which are robots , , and so on, each of which are themselves adapting to the circumstances they experience in the market, which are to some extent influenced by the actions and reactions of .

In the natural world, in the Darwinian survival-of-the-fittest interactions among evolving species of organisms, exactly this kind of circular interaction and dependency is commonplace. Just as the profit-driven adaptive trading behavior of robot can be affected by the profit-driven adaptive trading behavior of , and vice versa, so the reproductive fitness of a predator animal (a cheetah, say) is determined to some extent by how well adapted it is to catching its prey, and the reproductive fitness of individuals in its prey species (antelopes, say) is starkly dependent on the extent to which they are adapted to evade being caught by their predators. If a mutation in the predator species gives rise to individuals that can run faster for longer when chasing prey, perhaps that will subsequently be countered by the prey species evolving to turn more sharply or to jump higher or farther than the predator can deal with. Similarly if the prey species happens to evolve sharper eyesight so they can better see the predator coming, perhaps the predator species will then evolve to exhibit stealthier ways of tracking their food. This circular arms-race dynamic, where evolutionary adaptations in one species are driven by the current distribution of genes in one or more other species , , and so on, and where in turn evolutionary adaptations in one or more of those other species , etc are driven by the current distribution of genes in species , is known technically as coevolution. Theoretical biologists have studied coevolution for many years, and have developed various game-theoretical analyses that give insights on the dynamics of the arms-races between competitively coevolving species: see e.g. [33, 28, 49, 25]).

In this paper we report on empirical simulation studies for which the starting-point draws direct inspiration from those theoretical biology studies of coevolutionary dynamics. Our motivation here is to try to better understand, to gain insights into, the practical extent to which the various adaptive trading systems in a market are affecting each other, and specifically to investigate whether the population of adaptive traders is ever likely to converge on a situation where all of the traders are well-adapted to each others’ behavior yet each trader is not as profitable as it could otherwise be. That is: could the competitive interactions and adaptations of traders in the market collectively converge on a stable set of trading behaviors that are sub-optimal? And, if so, can we recognize when that has happened, or when it is about to happen? Similarly, might we be able to identify when the coevolutionary dynamics are about to lead to a flash-crash? We have commenced a sequence of empirical studies, starting with minimal but realistic simulations that are principled approximations of present-day highly automated financial markets.

Specifically, our ultimate aim has been to create agent-based models (ABMs) involving agents where each agent represents one financial legal entity (i.e. either an individual independent trader or an institution such as a bank or fund-management company) operating a single profit-driven automated trading system that trades in competition with the other agents, in an electronic market operating a continuous double auction (CDA) with a limit order book (LOB: see e.g. [24, 37, 1]

) – which is the present-day situation in many of the world’s major financial markets. Each entity can in principle be adapting its strategy/behavior in real-time (e.g. using a machine learning mechanism) but is not required to do so. That is, an entity’s trading strategy can be non-adaptive if that is the more profitable option. Furthermore at any time the entity can elect to totally change the strategy that it is operating, modelling the case where a financial institution switches a trading algorithm that has previously been in development and testing (commonly referred to as a

dev algo) into full use, the dev algo replacing the previously-running production algorithm (commonly known as the prod algo). Thus each trading entity in our model internally maintains a minimum of two strategies, each of which could potentially be adaptive: a prod algo and a dev algo. When the agent’s dev algo replaces the prod, a new dev algo is created and is subsequently tested and refined until there is sufficient evidence that it is an improvement on the agent’s current prod algo, at which point the dev again replaces prod and then another new dev is created. The trade-off between exploiting the prod algo and exploring the dev algo has manifest links to studies of multi-armed bandit problems (see e.g. [36, 31, 42]). We report here on the construction of two simulation models of this kind of system and on results from hundreds of thousands of simulated market sessions.

Simulation modelling of financial markets very often involves populating a market mechanism with some number of trader-agents: autonomous entities that have “agency” in the sense that they are empowered to buy and/or sell items within the particular market mechanism that is being simulated: this approach, known as agent-based computational economics (ACE: see e.g. [26]), has a history stretching back for more than 30 years, and much of the work in ACE studies of trading behaviours in models of financial markets owes a clear intellectual debt to work in experimental economics as pioneered by Vernon Smith (see e.g. [43, 44, 27, 45, 38]).

Over the multi-decade history of ACE, a small number of specific trader-agent algorithms, i.e. precise mathematical and algorithmic specifications of particular trading strategies, have been frequently used for modelling various aspects of financial markets, and the convention that has emerged is to refer to each such strategy via a short sequence of letters, reminiscent of a stock-market ticker-symbol. Notable trading strategies in this literature include (in chronological sequence): SNPR [41], ZIC [23], ZIP [4], GD [22], RE [18], MGD [48], GDX [47], HBL [21], and AA [52]; several of which are explained in more detail later in this paper. Of these, ZIC (invented by the economists Gode & Sunder [23]) is notable for being both highly stochastic and extremely simple, and yet it gives surprisingly human-like market dynamics; GD and ZIP were the first two strategies to be demonstrated as superior to human traders, a fact established in a landmark paper by IBM researchers [12], (see also: [14, 15, 16]), which is now commonly pointed to as initiating the rise of algorithmic trading in real financial markets; and until very recently AA was widely considered to be the best-performing strategy in the public domain. ZIC was the first instance of a zero intelligence trading strategy, which have proven to be surprisingly useful in ACE research: see, e.g., [19, 30]

. With the exception of SNPR and ZIC, all later strategies in this sequence are adaptive, using some kind of machine learning (ML) or artificial intelligence (AI) method to modify their responses over time, better-fitting their trading behavior to the specific market circumstances that they find themselves operating in, and details of these algorithms were often published in major AI/ML conferences and journals.

The supposed dominance of AA has recently been questioned in a series of publications [51, 7, 46, 40, 9] which demonstrated AA to have been less robust than was previously thought. Most notably, [40, 9] report on trials where AA is tested against two minimally simple algorithms that each involve no AI or ML at all: these two strategies are known as GVWY and SHVR [5, 6], and each share the pared-back minimalism of Gode & Sunder’s ZIC mechanism. In the studies that have been published thus far, depending on the circumstances, it seems (surprisingly) that GVWY and SHVR can each outperform not only AA but also many of the other AI/ML-based trader-agent strategies in the set listed above. Given this surprising recent result, there is an appetite for further ACE-style market-simulation studies involving GVWY and SHVR. One compelling issue to explore is the coevolutionary dynamics of markets populated by traders that can choose to play one of the three strategies from GVWY, SHVR, and ZIC, in a manner similar to that studied by [53] who employed replicator dynamics modelling techniques borrowed from theoretical evolutionary biology to explore the coevolutionary dynamics of markets populated by traders that could choose between SNPR, ZIP, and GD; each trader playing their chosen strategy for as long as it seems (to that trader) to be the most profitable strategy, and occasionally switching to (or “replicating”) use one of the other two strategies in the set if the current strategy appears (to that trader) to be weak. This replicator dynamics approach was also used in [52] to argue that AA was dominant over prior leading strategies, and in [51] to demonstrate that AA could in fact be dominated by other strategies.

Replicator dynamics studies are typically limited to visualising and analysing the coevolutionary dynamics of simple, restricted systems where the restrictions are introduced to constrain the systems in such a way that they can be easily visualised and analysed. For instance, replicator dynamics studies often involve studying a population of agents that can switch between two, three, or at most four distinct pure strategies, and this decision often seems driven by the fact that visualisation of the dynamics, characterising the entire system dynamics, is often best done by reference to the system’s phase space, i.e. to plot some factor of interest for every possible state of the system. Let be the set of distinct pure strategies that the agents in our system can choose between, let and refer to the of those strategies. Also let be the number of agents in the system, each of which makes a choice of some . Such a system can be characterised in full, all possible points in its finite phase space enumerated and plotted, by considering each possible combination of allowable strategy choices or assignments made by the population of agents: if all the agents have the same choice, and each can choose any of the strategies, then the number of possible system states, the number of points in its phase space, is , a number that may grow large but will forever be finite.

When , the system phase space can be characterised as points on a line, spanning from all agents playing , through to a 50:50 mix of :, to all agents playing . When , the phase space can be characterised and visualised as points on the 2D unit simplex, an equilateral triangle where a point within or on the perimeter of the triangle represents a particular ratio of ::, plotted in a barycentric coordinate frame. Technically, the one-dimensional (1D) line used for the phase-space of a system is a 2D unit simplex; the 3D unit simplex is a 2D triangle; and then the 4D unit simplex is a 3D object, a tetrahedron, the volume bounded by four planar faces each being an equilateral triangle. Higher-dimensional simplices are mathematically well-formed objects, but they are devils to visualise: try plotting the 40D unit simplex. Although the original authors do not explicitly state their reasons, it seems reasonable to conclude that each of [53, 52] and [51] chose to study replicator dynamics systems in which and not any higher number because of the rapidly escalating difficulty of visualising the phase space for any higher value. Yet real-world markets do not involve all entities each selecting from a choice of two or three pure trading strategies, so there is then a major concern over the extent to which these studies adequately capture the much richer degree of heterogeneity in real-world markets: this brings to mind the old adage about the late-night drunkard looking for his lost house-keys under a streetlamp not because that is where he mislaid them, but because the light is better there.

So, although one way of studying coevolutionary dynamics in markets where the traders can choose to either deploy GVWY, SHVR, or ZIC is to give each trader a discrete choice of one from that set of three strategies, so at any one time any individual trader is either operating according to GVWY or SHVR or ZIC, it is appealing to instead design experiments where the traders can continuously vary their trading strategy, exploring a potentially infinite range of differing trading strategies, where the space of possible strategies includes GVWY, SHVR, and ZIC. This is made possible by the recent introduction of a new minimal-intelligence trading strategy called PRZI [8]. PRZI’s trading behavior is determined by a strategy parameter . When , the trader behaves identically to ZIC, and when it behaves the same as GVWY or SHVR. And, crucially, when a PRZI trader’s -value is some other value, either part-way between and or part-way between and , its trading behavior is a kind of hybrid, part-way between that of ZIC and SHVR, or part-way between ZIC and GVWY. Because the PRZI strategy-parameter is a real number, and its effect on the trading behavior is smooth and continuous, in principle any one PRZI trader can make microscopically small adjustments and hence the space of possible strategies available to a single PRZI trader is infinite, and the phase-space of a market of agents is a bounded volume within .

In Section 2 we discuss our experiences in working with populations of coevolving PRZI traders, where we immediately come up against the limits of applicable visualisation techniques for this type of dynamical system. While markets of PRZI traders allow for continuous and infinite heterogeneity in the population of agents, the bounded nature of the PRZI strategy-space is a limitation that reduces the realism of the model. To address this, we have commenced work on an unboundedly infinite system, where each coevolving trader’s strategy can in principle grow to be arbitrarily complex and sophisticated (that is, in principle they can be anything that is expressible as a program in a Turing-complete list-based functional programming language), which we discuss in Section 3. For all our simulation studies reported here, we use the BSE simulator of a CDA market with a LOB (see [5, 6]), a mature open-source platform for ACE studies of electronic markets with automated trading.

2 Coevolution in a Bounded Infinite Space: PRZI

Full details of our initial work with coevolving populations of PRZI traders are given in [2], which this section is only a very brief summary of.

As a first illustration, we set up a minimal coevolutionary system, one in which only two of the traders could change their strategy by altering their PRZI -value. Let’s refer to these two traders as and : the two are independent, so can set its strategy value regardless of the value chosen by , and vice versa. We set to to be a buyer, and we set to be a seller and hence, because any seller in the market needs to find a buyer as a counter-party and vice versa, the profitability of ’s choice of will be partially dependent on ’s choice of , and vice versa. We take the natural step of treating profitability as ‘fitness’ in the evolutionary sense, and hence this system is as simple as we can get while still being coevolutionary.

For the adaptation process, each adaptive trader operates a simple Adaptive Climber (AC) algorithm defined in [2], which echoes the dev/prod development cycle discussed in the previous section: the trader maintains two separate strategies, to different PRZI -values, referred to as and . is initially set to some value, and is set to a ‘mutated’ version of

, by adding a small random value (e.g. a sample from a uniform distribution over the range

). The AC method executes some number trades using strategy and then executes trades using strategy . After that, if the profitability of is greater than that of then the trader generates a new ; but if the profitability of exceeds that of then is used to replace , and then a new is generated as a mutant value of the new . That is, AC is a minimally simple two-point stochastic hill-climber algorithm.

Figure 1 shows a quiver plot of the phase space for an instance of this system, in which initial values of are set at random from a uniform distribution for all traders. One of the two adaptive traders is designated as a buyer with strategy , and the other as a seller with strategy . Both the buyer and the seller can adjust their strategy value over time, using the AC method just described. The horizontal axis is the buyer’s value and the vertical is the sellers’s , and these two continuous values define the system’s phase-space, i.e.

. Uniform-length vectors have been plotted at regular intervals giving a discrete grid that indicates the system’s direction of travel in phase-space. The phase-space has a single point-attractor, the point of convergence marked by a red dot at

, and an obvious plateau area close to the origin: within the plateau area, the system will exhibit random drift, and will eventually step outside the plateau; once outside the plateau, the system evolves toward the attractor.

The 2D quiver plot in Figure 1 is made possible because we constrained our system to only have two adaptive traders. As soon as we relax that constraint and have all agents in our system adapting and coevolving against all the others, we need to make an -dimensional plot. Given that we routinely use values of 50 or more, and that 50-dimensional quiverplots are not easy to plot or understand, this mode of visualization runs out of steam as soon as gets to plausibly interesting numbers.

Figure 1: Quiver plot over the phase-space for a minimal coevolutionary market populated wholly by PRZI traders in which only two of the traders are each independently adapting their strategy-values, while all other traders in the market hold their values constant. See text for further explanation and discussion.

An -dimensional coevolutionary market system is an instance of an -dimensional dynamical system, and a popular method of characterising the dynamics of high-dimensional dynamical systems is the recurrence plot (RP: see e.g. [17, 32]). This purely graphical technique can be extended by various quantitative methods known collectively as recurrence quantification analysis (RQA: see [54]). As is discussed at length in [2], we have explored the use of RPs and RQA for visualising and analysing our coevolutionary PRZI markets.

In brief, for our purposes a RP visualization of an -dimensional real-valued dynamical system is a rectangular grid of square binary pixels, i.e. pixels that are in one of two states: often either black or white. Let be the state of the system at time . A pixel is shaded black to represent that has recurred at time , i.e. has previously been seen at some earlier time , and is shaded white otherwise. Recurrence can be defined in various ways, but the simplest is to take the -dimensional Euclidian distance and to declare recurrence to have occurred if is less than some threshold value. The co-ordinates for each pixel, each cell, in a RP are set by its values of and . Figure 2 shows a RP for one instance of our coevolving market of PRZI traders: there is nontrivial structure in the plot, which is subjected to further detailed RQA analysis in [2].

Figure 2: Recurrence plot for a market of coevolving PRZI traders in which the number of adaptive traders . See text for further explanation.

In our work with coevolving PRZI traders, merely by allowing each zero-intelligence trader to have adaptive control of its single real-valued strategy parameter, for a market populated by such traders, we have an -dimensional phase space, a bounded hypercubic volume , and monitoring the system’s temporal evolution within that hypercube becomes immediately problematic. Analysis methods based on RPs and RQAs, an approach currently popular and productive in many fields, get us only so far toward our ultimate aim of being able to understand what the system is doing and where it is going (as documented in [2]) – and, unfortunately, they do not get us far enough. While it is tempting to invest time and effort in developing better RP/RQA methods for analysis of the PRZI market-system’s phase-space trajectories in its subspace of , the results we present in the next section cast doubt on whether that would actually be a useful thing to do. There, we discuss the consequences of taking a second small step in the direction of greater realism: one in which the space of possible strategies is still infinite, but is also unbounded. Once we get there, RP/RQA analysis totally runs out of steam.

3 Coevolution in an Unbounded Infinite Space: STGP

While the work discussed in the previous section is illuminating, our PRZI-market model can be criticised for its lack of realism in the sense that each adaptive PRZI trader is constrained to play a zero-intelligence strategy that is either ZIC, GVWY, SHVR, or some intermediate hybrid mix: traders in the coevolving PRZI market are never even going to play a more sophisticated minimal-intelligence strategy like AA, GDX, or ZIP. But our work is motivated by the observation that in real-world coevolving markets, the trading entities are not constrained to select between a fixed number of existing pure strategies, and nor are they constrained to choose a point in some continuous subspace that includes specific pure strategies as special cases. In real markets, any entity at any time is free to invent its own strategy or to alter/extend an existing one. Our work with PRZI has revealed some of the issues of visualising and analysing such systems, but the bounded nature of its subspace means that it can never show the kind of coevolutionary dynamics of the class of system that we seek to ultimately address in our work. Thus, we need a model in which the space of strategies is not only infinite but also unbounded. In this section, we briefly describe early results from ongoing work in which each entity does have the freedom to adapt by innovating, by creating wholly new strategies, and in which the space of possible strategies is unbounded and hence infinite.

Genetic Programming (GP: see e.g. [29, 39]

) is a form of evolutionary computing in which a genetic algorithm operates on ‘genomes’ that are encodings of programs in a list-based functional language such as

Lisp [50] or Clojure [34]. Starting with an initial population of programs , each of the individuals in is evaluated via a fitness function which assigns a scalar fitness value to that individual. When all individuals in have been evaluated and assigned a value, a new population of individuals is created by a process of breeding where pairs of individuals in

are selected with a probability proportionate to their fitness (so fitter individuals are more likely to be selected for breeding) and one or more

children are created that have genomes which inherit from the pair of parents in ways inspired by real-world sexual reproduction with mutation. In this way, the population of new children becomes the next generation of the system; the old population is typically discarded, each individual in then has its fitness evaluated, and the next generation is then bred from

’s fitter members: if this process is repeated for sufficiently many generations, and if hyperparameters such as the mutation rate are set correctly, then useful novel programs can be created by the ‘Blind Watchmaker’

[13] of Darwinian evolution.

To illustrate this, consider a simple functional language that allows for expressions computable by a four-function pocket calculator, where multiplication has the symbol , division has , subtraction , and addition . The expression (which evaluates to 11) could be written in a list-based style as , and can be visualised as a tree structure, as illustrated in Figure 3, which also illustrates the breeding process. Although we have shown only simple mathematical expressions here, when GP is used with Turing-complete languages such as Lisp or Clojure, complete executable programs of arbitrary complexity and sophistication can in principle be evolved.

Figure 3: In Genetic Programming, expressions or programs in a list-based functional language are evolved via sexual reproduction and mutation. Here we see genomes for simple mathematical expressions, evaluable on a four function calculator (A is addition; D is division; M is multiplication; S is subtraction). For each genome we show the list representation of the expression, under that the infix mathematical expression in italic font, and below that the tree diagram for the list. On the left we see two parents selected for breeding; on the right we see their two kids, child genomes formed by crossover (in which a randomly-chosen subtree on one parent is swapped for a randomly-chosen subtree on the other, and vice versa), and mutation (indicated by the lightning-flash icons, where the value at a randomly-chosen node in the tree is switched to some randomly chosen other value that is valid at that node). As is shown here, the child genomes can be either longer or shorter than those of their parents. And, if longer genomes encode more sophisticated algorithms that confer greater fitness, then in principle the sophistication of the programs encoded on the genomes can increase indefinitely.

In our work we are using a variant of GP known as strongly-typed genetic programming (STGP), where data-type constraints are enforced between connecting nodes of a program trees [35]. For example, an and node that takes two boolean inputs can be guaranteed that it will only connect to two booleans. Now each entity in our model market, rather than using the Adaptive Climber algorithm to optimize a single numeric strategy value, instead uses a STGP process to create new programs that implement trading strategies: we start with a population seeded with minimally simple programs, and then we unleash them, allowing the coevolutionary process to proceed, during which each entity is at liberty to create programs of growing complexity and sophistication, if in doing so they generate greater profits.

Full details of our STGP work are given in [20], to which the reader is referred for further detail; here we present only the briefest of results, from a single successful experiment, to motivate discussion of the problems of visualization and analysis that arise when working in this unbounded infinite space of possible programmatic trading strategies.

As an initial exploration into the dynamics of the STGP traders coevolving in BSE, a simulation was run over 40 generations for 10000 units of time. 100 ZIC sellers were run against 50 ZIC and 50 STGP buyers; both buyers and sellers were regularly replenished with fresh “customer orders” (i.e., an instruction to buy or to sell, and an associated private limit price for that transaction) to execute. The STGP traders were each initialised with a price-improvement expression of , where is the subtraction operator, is the best price on the same side of the LOB as the trader, and is the limit price for this customer order to be executed by trader . This expression represents the zero-intelligence SHVR trader, expressed in STGP tree form.

Summary results, a plot of profit-values in each generation, are shown in Figure 4. As can be seen, the profitability data are biphasic: there is an initial brief phase of rapid growth in profitability; followed by a prolonged phase where profitability steadily declines. The initial rise in profitability is as would be expected, and hoped for: the STGP coevolution is discovering ever more profitable trading strategies over successive early generations. The second phase, where profits are steadily eroded, is perhaps less expected and less desired, but can readily be explained by the competitive coevolutionary process progressively eating away at profits: if one SHVR-like trader is profitable by shaving off of the best price on each revision, then it can be beaten to the deal by another SHVR-like trader who instead shaves ; but that trader could in turn be beaten by a SHVR-style trader who instead shaves off the best price, and so on: price-competition among the coevolving traders awards higher fitness to those individuals that get more deals by shaving greater amounts off of the current best price on the LOB, but in doing so the most successful cut their margins ever smaller, eventually hitting a zero margin at which point they are playing not SHVR but GVWY.

Figure 4: Results from an illustrative successful STGP experiment: horizontal axis is generation-number; vertical axis is profit. At each generation the maximum profit achieved by a STGP trader is plotted, along with the mean profitability of all 50 STGP traders; error-bars show standard deviation around the mean.

Table 1 shows the genome of the elite (most profitable) trader in a selection of generations from the experiment illustrated in Figure 4. There are two things to note in the genomes shown here. First, STGP (and vanilla GP too) frequently suffers from bloat, creating viable expressions or programs that get the job done, but which are expressed in very verbose form: for example, the elite individual at generation 30 has a genome that translates to , which any competent programmer would immediate rewrite as (i.e., as a shortened genome of . Second, because the functional languages used in (ST)GP are richly expressive (that is, the same algorithm or expression can be written in many different ways), the use of methods based on recurrence plots (RPs) becomes deeply problematic: the recurrence of any one particular strategy that had occurred earlier in the evolutionary process may be difficult to automatically detect. For instance, if the elite genome is at generation 30, and is at generation 60, and is at generation 60, then we humans can see by inspection that the same strategy is recurring every 30 generations, but an automated analysis technique would need to go beyond the lexical/syntactic dissimilarity in these expressions and instead reason about the underlying semantics of the functional programming language. For the simple mathematical expressions being discussed here, it is reasonable to operationally reduce them each to some agreed canonical form, but for only slightly more sophisticated (and stateful) algorithms such as AA, GDX, or ZIP, a many-to-one mapping, a reduction of all possible implementations, all possible expressions, of that algorithm down to a single canonical form is unlikely to ever be achievable. And so, RP-based methods cease to have any applicability here too.

Once again, we take a small step in the direction of increased realism in our coevolutionary models, and the visualization/analytics tool-box is empty.

Gen Expression Tree
1 (S,(S,,1), )
2 (S,(S,,1),1)
3 (S,(S,,1),1)
4 (S,(S,(S,,1),1),1)
26 (S,(S,(S,(S,(S,(S,(S,(S,(S,,1),7),1),1),7),1),7),7),1)
27 (S,(S,(S,(S,(S,(S,(S,(S,(S,,1),7),1),7),7),1),1),7),1)
28 (S,(S,(S,(S,(S,(S,(S,(S,(S,,1),7),1),1),7),1),7),7),1)
29 (S,(S,(S,(S,(S,(S,(S,(S,(S,(S,,1),7),7),1),1),7),1),1),7),1)
30 (S,(S,(S,(S,(S,(S,(S,(S,(S,(S,,1),7),7),1),1),7),1),1),7),1)
Table 1: Selected STGP genomes for the best individual in the population at various generations in the experiment illustrated in Figure 4: see text for discussion.

4 Discussion and Conclusion

The experiments and results that we have described here have demonstrated that, when we move our ACE-style market models ever so slightly in the direction of being closer to real-world markets, we find that the toolbox for visualisation and analysis of the resultant system dynamics starts to look very empty. While it is relatively easy to make the changes necessary to extend existing models to make them more realistic, it is relatively hard to work out what the extended systems are actually doing, and hence we need new tools to help us do that. Our current work is concentrated on exploring the use of Ciao Plots [10, 11] in characterising the coevolutionary dynamics of our STGP system, although as [3] discuss, this is a visualisation technique that is not without its complexities.

While many research papers in science and engineering are written to describe the solution to some problem, this is not one of those papers. Instead, this is a paper that describes a problem in need of a solution. Or, more specifically, a problem that we expect to be tackled from multiple perspectives, one that eventually yields to multiple complementary solutions. In future work, we intend to develop novel visualisation and analysis techniques for coevolutionary market systems with unboundedly infinite continuous strategy spaces, which we will report on in due course; but in writing this paper we hope to encourage other researchers to work on this challenging problem too. To facilitate that, we have made our Python source-code freely available as open-source releases on GitHub, which is where in future we will also release our own visualisation and analysis methods as we develop them.222The Python code in the main BSE GitHub repository [5] has been extended by addition of a minimally simple adaptive PRZI trader, a -point stochastic hill climber, referred to as PRZI-SHC- (pronounced prezzy-shuck), for which the case is a close relative of the AC algorithm described in Section 2 and which can readily be used for studies of coevolutionary dynamics. The source-code for our STGP work is available separately at


  • [1] Abergel, F., Anane, M., Chakraboti, A., Jedidi, A., Toke, I.: Limit Order Books. Cambridge University Press (2016)
  • [2] Alexandrov, N.: Competitive arms-races among autonomous trading agents: Exploring the co-adaptive dynamics. Master’s thesis, University of Bristol (2021)
  • [3] Cartlidge, J., Bullock, S.: Unpicking tartan CIAO plots: Understanding irregular coevolutionary cycling. Adaptive Behavior 12(2) (2004) 69–92
  • [4] Cliff, D.: Minimal-intelligence agents for bargaining behaviours in market-based environments. Technical Report HPL-97-91, HP Labs Technical Report (1997)
  • [5] Cliff, D.: Bristol Stock Exchange: open-source financial exchange simulator. (2012)
  • [6] Cliff, D.: BSE : A Minimal Simulation of a Limit-Order-Book Stock Exchange. In Bruzzone, F., ed.: Proc. 30th Euro. Modeling and Simulation Symposium (EMSS2018). (2018) 194–203
  • [7] Cliff, D.: Exhaustive testing of trader-agents in realistically dynamic continuous double auction markets: AA does not dominate. In Rocha, A., Steels, L., van den Herik, J., eds.: Proceedings of the 11th International Conference on Agents and Artificial Intelligence (ICAART 2019). ScitePress (2019) 224–236
  • [8] Cliff, D.: Parameterized-Response Zero-Intelligence Traders. SSRN:3823317 (2021)
  • [9] Cliff, D., Rollins, M.: Methods matter: A trading algorithm with no intelligence routinely outperforms AI-based traders. In: Proceedings of IEEE Symposium on Computational Intelligence in Financial Engineering (CIFEr2020). (2020)
  • [10] Cliff, D., Miller, G.: Tracking the Red Queen: Measurements of adaptive progress in co-evolutionary simulations. In Morán, F., Moreno, A., Guervós, J.J.M., Chacón, P., eds.: Advances in Artificial Life, Third European Conference on Artificial Life, Granada, Spain, June 4-6, 1995, Proceedings. Volume 929 of Lecture Notes in Computer Science., Springer (1995) 200–218
  • [11] Cliff, D., Miller, G.: Visualizing coevolution with CIAO plots. Artificial Life 12 (02 2006) 199–202
  • [12] Das, R., Hanson, J., Kephart, J., Tesauro, G.: Agent-human interactions in the continuous double auction. In: Proc. IJCAI-2001. (2001) 1169–1176
  • [13] Dawkins, R.: The Blind Watchmaker. W. W. Norton (1986)
  • [14] De Luca, M., Cliff, D.: Agent-human interactions in the continuous double auction, redux: Using the OpEx lab-in-a-box to explore ZIP and GDX. In: Proceedings of the 2011 International Conference on Agents and Artificial Intelligence (ICAART2011). (2011)
  • [15] De Luca, M., Cliff, D.: Human-agent auction interactions: Adaptive-Aggressive agents dominate. In: Proceedings IJCAI-2011. (2011) 178–185
  • [16] De Luca, M., Szostek, C., Cartlidge, J., Cliff, D.: Studies of interaction between human traders and algorithmic trading systems. Technical report, UK Government Office for Science, London (September 2011)
  • [17] Eckmann, J.P., Oliffson Kamphorst, S., Ruelle, D.: Recurrence plots of dynamical systems. Europhysics Letters 5 (1987) 973–977
  • [18] Erev, I., Roth, A.:

    Predicting how people play games: Reinforcement learning in experimental games with unique, mixed-strategy equilibria.

    The American Economic Review 88(4) (September 1998) 848–881
  • [19] Farmer, J.D., Patelli, P., Zovko, I.: The Predictive Power of Zero Intelligence in Financial Markets. Proceedings of the National Academy of Sciences 102(6) (2005) 2254–2259
  • [20] Figuero, C.: Evolving trader-agents via strongly typed genetic programming. Master’s thesis, University of Bristol Department of Computer Science (2021)
  • [21] Gjerstad, S.: The impact of pace in double auction bargaining. Technical report, Department of Economics, University of Arizona (2003)
  • [22] Gjerstad, S., Dickhaut, J.: Price formation in double auctions. Games and Economic Behavior 22(1) (1998) 1–29
  • [23] Gode, D., Sunder, S.: Allocative Efficiency of Markets with Zero-Intelligence Traders: Market as a Partial Substitute for Individual Rationality. Journal of Political Economy 101(1) (1993) 119–137
  • [24] Gould, M., Porter., M., Williams, S., McDonald, M., Fenn, D., Howison, S.: Limit order books. Quantitative Finance 13(11) (2013) 1709–1742
  • [25] Hebbron, T., Bullock, S., Cliff, D.: NKalpha: Non-uniform epistatic interactions in an extended NK model. In Bullock, S., Noble, J., Watson, R., Bedau, M., eds.: Artificial Life XI: Proceedings of the Eleventh International Conference on the Simulation and Synthesis of Living Systems. MIT Press (2008) 234–241
  • [26] Hommes, C., LeBaron, B., eds.: Computational Economics: Heterogeneous Agent Modeling. North-Holland (2018)
  • [27] Kagel, J., Roth, A.: The Handbook of Experimental Economics. Princeton University Press (1997)
  • [28] Kauffman, S.: The Origins of Order: Self-Organization and Selection in Evolution. Oxford University Press (1993)
  • [29] Koza, J.: Genetic Programming: On the Programming of Computers by means of Natural Selection. MIT Press (1993)
  • [30] Ladley, D.: Zero Intelligence in Economics and Finance.

    The Knowledge Engineering Review

    27(2) (2012) 273–286
  • [31] Lattimore, T., Szepesvari, C.: Bandit Algorithms. Cambridge University Press (2020)
  • [32] Marwan, N.: How to avoid potential pitfalls in recurrence plot based data analysis. International Journal of Bifurcation and Chaos 21(4) (2011) 1003–1017
  • [33] Maynard Smith, J.: Evolution and the Theory of Games. Cambridge University Press (1982)
  • [34] Miller, A., Halloway, S., Bedra, A.: Programming Clojure. Third edn. Pragmatic Bookshelf (2018)
  • [35] Montana, D.: Strongly typed genetic programming. Evolutionary Computation 3(2) (June 1995) 199–230
  • [36] Myles White, J.: Bandit Algorithms for Website Optimization: Developing, Deploying, and Debugging. O’Reilly (2012)
  • [37] Nolte, I., Salmon, M., Adcock, C., eds.: High Frequency Trading and Limit Order Book Dynamics. Routledge (2014)
  • [38] Plott, C., Smith, V., eds.: Handbook of Experimental Economics Results, Volume 1. North-Holland (2008)
  • [39] Poli, R., Langdon, W., McPhee, N.: A Field Guide to Genetic Programming. Lulu (2008)
  • [40] Rollins, M., Cliff, D.: Which trading agent is best? using a threaded parallel simulation of a financial market changes the pecking-order. In: Proceedings of the 32nd European Modeling and Simulation Symposium (EMSS2020). (2020)
  • [41] Rust, J., Miller, J., Palmer, R.: Behavior of trading automata in a computerized double auction market. In Friedman, D., Rust, J., eds.: The Double Auction Market: Institutions, Theories, and Evidence. Addison-Wesley (1992) 155–198
  • [42] Slivkins, A.: Introduction to Multi-Armed Bandits. Arxiv:1904.07272v6 (2021)
  • [43] Smith, V.: An Experimental Study of Competitive Market Behaviour. Journal of Political Economy 70(2) (1962) 111–137
  • [44] Smith, V.: Papers in Experimental Economics. Cambridge University Press (1991)
  • [45] Smith, V., ed.: Bargaining and Market Behavior: Essays in Experimental Economics. Cambridge University Press (2000)
  • [46] Snashall, D., Cliff, D.: Adaptive-Aggressive traders don’t dominate. In van Herik, J., Rocha, A., Steels, L., eds.: Agents and Artificial Intelligence: Selected papers from ICAART2019. Springer (2019)
  • [47] Tesauro, G., Bredin, J.: Sequential strategic bidding in auctions using dynamic programming. In: Proceedings AAMAS 2002. (2002)
  • [48] Tesauro, G., Das, R.: High-performance bidding agents for the continuous double auction. In: Proc. 3rd ACM Conference on Electronic Commerce. (2001) 206–209
  • [49] Thompson, J.: The Coevolutionary Process. University of Chicago Press (1994)
  • [50] Touretzky, D.: Common LISP: A Gentle Introduction to Symbolic Computation. Revised edn. Dover Publications Inc (2013)
  • [51] Vach, D.: Comparison of double auction bidding strategies for automated trading agents. Master’s thesis, Charles University in Prague (2015)
  • [52] Vytelingum, P., Cliff, D., Jennings, N.: Strategic bidding in continuous double auctions. Artificial Intelligence 172(14) (2008) 1700–1729
  • [53] Walsh, W., Das, R., Tesauro, G., Kephart, J.: Analyzing complex strategic interactions in multiagent systems. In: Proc. of the AAAI Workshop on Game-Theoretic and Decision-Theoretic Agents. (2002)
  • [54] Webber, C., Marwan, N., eds.: Recurrence Quantification Analysis: Theory and Best Practice. Springer (2015)