Log In Sign Up

Biased or Limited: Modeling Sub-Rational Human Investors in Financial Markets

Multi-agent market simulation is an effective tool to investigate the impact of various trading strategies in financial markets. One way of designing a trading agent in simulated markets is through reinforcement learning where the agent is trained to optimize its cumulative rewards (e.g., maximizing profits, minimizing risk, improving equitability). While the agent learns a rational policy that optimizes the reward function, in reality, human investors are sub-rational with their decisions often differing from the optimal. In this work, we model human sub-rationality as resulting from two possible causes: psychological bias and computational limitation. We first examine the relationship between investor profits and their degree of sub-rationality, and create hand-crafted market scenarios to intuitively explain the sub-rational human behaviors. Through experiments, we show that our models successfully capture human sub-rationality as observed in the behavioral finance literature. We also examine the impact of sub-rational human investors on market observables such as traded volumes, spread and volatility. We believe our work will benefit research in behavioral finance and provide a better understanding of human trading behavior.


page 1

page 2

page 3

page 4


Towards Realistic Market Simulations: a Generative Adversarial Networks Approach

Simulated environments are increasingly used by trading firms and invest...

Market Self-Learning of Signals, Impact and Optimal Trading: Invisible Hand Inference with Free Energy

We present a simple model of a non-equilibrium self-organizing market wh...

Implementation of a Port-graph Model for Finance

In this paper we examine the process involved in the design and implemen...

CTMSTOU driven markets: simulated environment for regime-awareness in trading policies

Market regimes is a popular topic in quantitative finance even though th...

The Effects Of Technology Driven Information Categories On Performance In Electronic Trading Markets

Electronic trading markets have evolved rapidly with continued adoption ...

Detecting Collusive Cliques in Futures Markets Based on Trading Behaviors from Real Data

In financial markets, abnormal trading behaviors pose a serious challeng...

Batch Exchanges with Constant Function Market Makers: Axioms, Equilibria, and Computation

Batch trading systems and constant function market makers (CFMMs) are tw...

1. Introduction

Research in finance is well facilitated by the versatile market simulations, which provide feasible experiment control and concrete market observations (Friedman, 2018). Multi-agent market simulators have been applied in financial research to reproduce the scaling laws for returns, assess the benefits of co-location, investigate the impact of large orders, and evaluate trading strategies (Lux and Marchesi, 1999; Byrd et al., 2019; Balch et al., 2019). These simulators promote the use of reinforcement learning (RL) algorithms to learn complex trading strategies. For example, (Amrouni et al., 2021) use RL to learn a trading strategy for daily investors, (Spooner et al., 2018; Dwarakanath et al., 2021) use RL to design market makers that provide liquidity in the market.

These RL agents are trained in market simulations to learn a trading strategy that optimizes the specified reward function (e.g., to make profits or to provide liquidity). In other words, the agent obtains an optimal trading strategy upon sufficient training. This is coherent with the notion of homo economicus, which assumes that humans are ideal decision-makers who are perfectly rational and have unlimited access to information.

However, humans in real life are complex entities that may not always make perfect decisions. Studies show that various psychological biases affect the human decision-making process (Thaler et al., 1997; Benartzi and Thaler, 1999; Chrisman and Patel, 2012). Moreover, humans may attempt to make decisions that are satisfactory rather than optimal due to limited access to information and processing power (Simon, 1990, 1997). We refer to such behaviour as being sub-rational, as opposed to perfectly rational decision-making. Although several models have been proposed to consider human sub-optimality in the RL setting, existing work mostly focuses on inferring the reward function from real human demonstration data, rather than to explain the human decision making process and evaluate the consequences of sub-rational decisions.

In this paper, we model and examine the behavior of sub-rational human investors in financial markets. We introduce two types of sub-rational human investors: psychologically myopic and bounded rational. For each type of human investor, we investigate the relation between the investor’s profits and the degree of sub-rationality. We also demonstrate the corresponding trading strategy in a hand-crafted market scenario to intuitively explain the strategy. In addition, our experimental analysis discovers the impact of sub-rational investors on the market. We believe our models will provide an effective framework that captures and examines human investors while aiding in better understanding of their influence in financial markets.

2. Related Work

Multi-agent simulators have become increasingly prevalent for modeling financial markets. Tux et al. (Lux and Marchesi, 1999) introduced a multi-agent model of financial markets to support the time scaling law from mutual interactions of participants. In recent contributions, Byrd et al. (Byrd et al., 2019) developed a discrete event simulator to investigate the market impact of a large market order. Additionally, Vyetrenko et al. (Vyetrenko et al., 2020) proposed realism metrics to evaluate the fidelity of the simulated markets. While these multi-agent market simulators can be populated with rule based trading agents, they allow for the use of reinforcement learning to develop agents that seek to optimize certain objectives. Amrouni et al. (Amrouni et al., 2021) wrapped market simulations in an OpenAI Gym framework facilitating the use of off-the-shelf RL algorithms, and train investors in various market environments. Dwarakanath et al. (Dwarakanath et al., 2021) utilized RL to obtain the policy of market makers and subsequently investigated their impact on market equitability.

Theoretical efforts have been made to model the human decision-making process. A related field is inverse reinforcement learning (IRL), which aims to bypass the need for reward design by inferring the reward from observed human demonstrations. There are numerous hypotheses behind sub-rationality of humans. Raja et al. (Raja and Lesser, 2001) attributed human sub-rationality to constraints on resources and address a meta-level control problem for a resource-bounded rational human. Evans et al. (Evans and Goodman, 2015; Evans et al., 2016) modeled structured deviations from optimality with different hierarchical levels of rationality when inferring preferences and beliefs. In a recent work, Chan et al. (Chan et al., 2021) investigated the effects of human irrationality on reward inference. They introduced a framework to describe different aspects of human irrationality using the Bellman equation Equation 1 by modifying the max operator, the transition function, the reward function, and the sum between reward and future value. While literature in IRL encompasses various interesting models of humans, the goal is to infer the reward function from real human demonstrations. However, in financial markets, it is rarely possible to acquire historical trading data of human investors. The goal of our work instead is to use observations about human trading activity and to model an investor agent that trades like a human in financial markets, and then subsequently analyze human traders’ impact on the market in a simulated environment.

Regulatory literature has examined the impact of electronic traders as compared to that of human traders in markets. Boehmer et al. (Boehmer et al., 2021) investigated the impact of algorithmic trading on equity market quality measured in terms of quoted spread, price efficiency and volatility. They observed that more algorithmic trading lead to narrower spreads, better price efficiency but higher volatility based on real data, with the effects differing based on asset size. Woodward et al. (Woodward, 2017) investigated the market impact of electronic traders that feature high frequency trading (HFT) techniques, and provided insights to control it from the perspective of regulators. Note that these studies examine the influence of algorithmic and high frequency trading techniques, which are used by a specialized class of electronic investors. In this work, we compare the impact of sub-rational human investors with that of perfectly rational electronic investors (with no change in trading frequency) in financial markets.

3. Models of Human Sub-Rationality

In this section, we discuss two aspects of human sub-rationality and introduce models that capture them in the reinforcement learning framework. Formally, we consider the Markov decision process (MDP)

where and are the sets of states and actions.

is the transition probability from state

to by taking the action , and is the immediate reward of the transition. The goal in RL is to maximize the expected sum of discounted rewards. The Bellman equation relates the value of the current state to the one step reward and the value at the next state as


where is the discount factor. We define a rational agent as one that picks the action that maximizes the right hand side of Equation 1 when in state . The corresponding optimal (rational) policy is


Inspired by the framework provided by (Chan et al., 2021), we alter the in Equation 2 to model sub-rational human behavior caused by psychological bias, and modify the operator to model that caused by computational limitation.

3.1. Myopic Behavior

Humans in reality may be psychologically biased and only care about the short-term results. They make myopic decisions without considering how the actions may affect them far into the future. A typical example of myopic human behavior in finance is myopic loss aversion (Thaler et al., 1997; Benartzi and Thaler, 1999; Chrisman and Patel, 2012). Investors that focus on short-term return may react too passively to recent losses. As a result, they abandon their existing long-term-oriented investment plan and lose the potential to achieve better benefits in the future. Studies have shown that financial media can often facilitate myopic trading behavior by constantly reporting market news and portraying a sense of urgency to act. Investors who receive such information too frequently tend to avoid investing in riskier assets that may yield better long-term rewards (Larson et al., 2016).

To model the myopic human investors, we decrease the in the Bellman equation Equation 1 and corresponding policy Equation 2. When , the agent is fully rational and considers both short-term and long-term rewards. As , the agent becomes myopic and only acts to maximize the one-step reward .

Figure 1. An example of the Boltzmann rationality model from (Laidlaw and Dragan, 2022) for three actions with different rewards (left). The Boltzmann model gives the probability to take an action using the parameter that adjusts the degree of the rationality (right). If , each action has the same probability to be selected. When , the model becomes more rational and only the action with highest reward is likely to be selected.

3.2. Bounded Rationality

The notion of bounded rationality was first introduced by Simon in (Simon, 1997, 1990), where it is argued that human decision-making departs from perfect economic rationality. A perfectly rational decision requires access to necessary information about all alternative choices, and the calculation of potential benefits. Since real humans typically do not have unlimited access to information and processing power, they are inclined to find satisfactory solutions to problems rather than the optimal solutions.

A well-established model of bounded rational decision making is the Boltzmann rationality model (Baker et al., 2005; Asadi and Littman, 2017). Unlike the rational policy in Equation 2 that gives the optimal action using , this model considers a probabilistic policy given by the Boltzmann softmax operator as follows:


where  (Asadi and Littman, 2017). The Boltzmann softmax operator introduces a ”soft” principle of optimality, which allows the agent to take sub-optimal decisions with a preference for high-utility actions. Figure 1 gives an example of modeling human behavior with Boltzmann rationality. The parameter controls how likely the agent is to select the most valuable action. As , the Boltzmann operator approaches the argmax in Equation 2, and the agent makes fully rational decisions. When , the agent makes uniformly random decisions which gives the same probability to select each action.

4. Multi-Agent Market Simulation

In this paper, we employ ABIDES-Gym, a discrete event multi-agent simulator (DEMAS) that provides a high-fidelity market environment with thousands of trading agents, wrapped in an OpenAI Gym framework  (Byrd et al., 2019; Amrouni et al., 2021). Here, we describe the key features of ABIDES-Gym, and the definitions of RL agents and background agents in our simulated market.

Figure 2. A snapshot of the LOB structure.

Limit Order Book (LOB) structure. Similar to public exchanges like NASDAQ and NYSE, the market operates on a limit order book (LOB), which represents the supplies and demands for the asset at a given time (Vyetrenko et al., 2020). There are two types of orders in the market: market orders and limit orders. A market order is placed by a trader that wants to buy or sell immediately at the current market price. A limit order, on the other hand, specifies the minimum price that the trader intends to sell at, or the maximum price that the trader is willing to buy at. Figure 2 shows an example snapshot of the LOB. Orders placed by agents are collected in the LOB and executed on matching buyers to sellers on a first-in-first-out basis. The is the average of the best ask and best bid prices, while the is the difference between the best ask and best bid.

Figure 3. (left) Training rewards of myopic investor with different . (center) Testing rewards of myopic investor with different . (right) Testing rewards of bounded rational investor with different .

4.1. RL Investor Agents

The RL agent represents an investor that tries to make profits by trading in the market. We model the rational electronic investors and sub-rational human investors using the same state space, action space, and reward function.

4.1.1. State Space

The states include the agent’s observation of the market along with its internal states, including

  • : the amount of cash in the investor’s account at time . The agent starts the day with 1,000,000 cents.

  • : the number of shares of the asset held by the agent at time . The agent starts the day with no holding.

  • : a vector reflecting the market momentum over the past 1, 10, and 30 minutes.

  • .

  • represents the market volatility in the past 30 minutes.

  • Quote history: the information of quoted/placed orders in the past five minutes, including quoted prices and volumes.

  • Trade history: the information of the executed trades in the past five minutes, including traded prices and volumes.

4.1.2. Action Space

The RL agents wake up ever minute between 9:30am to 4:00pm, and take an action defined by the following parameters

  • Direction: {BUY, HOLD, SELL}. In the case of BUY or SELL, the agent places a limit order of size 2.

  • Limit order price w.r.t. the mid price: if the agent takes a BUY action, or if the agent takes a SELL action.

4.1.3. Reward

We define the one step reward as


where is the value of the investor’s portfolio marked to the market, and is the executed price of the last transaction before time . The intuition behind defining the reward function as in Equation 4 is as follows. With and the horizon being a single trading day, the cumulative reward equals the agent’s monetary profits at the end of the trading day.

4.2. Background Agents

The simulated market contains the following background agents with different trading strategies and incentives.

  • Value Agents: The value agents are designed to emulate fundamental traders who trade according to their belief of the exogenous value of an asset, i.e., fundamental price. In this paper, we generate the time series of the fundamental price with a discrete-time mean-reverting Ornstein-Uhlenbeck process. Each value agent makes a noisy observation of the fundamental price, and places a sell order if the current mid price is higher than the observed fundamental price, or vice versa.

  • Market Maker Agent: The market maker agent acts as a liquidity provider in the market, by placing limit orders on both sides of the LOB at regular intervals. The agent places equal volumes of liquidity at various price levels that are pre-defined with respect to the mid-price.

  • Momentum Agents: The momentum agents trade based on the momentum of the asset price, by comparing the long-term average of the mid-price with the short-term average. Each agent places a buy order if the short-term average is higher, based on the belief that the price will increase in the future. On the other hand, if the long-term average is higher, the agent places a sell order.

  • Noise Agents: The noise agents mimic retail traders that trade based on their own demand with no consideration of the LOB micro structure. Each noise agent arrives to the market at a random time, and places an order on a random side of the LOB.

5. Experimental Results

We now examine the behavior of a myopic investor and a bounded rational investor in financial markets using high-fidelity market simulations. For each type of investor, we aim to answer the following questions:

  1. How does the degree of sub-rationality affect the agent’s ability to achieve rewards?

  2. What are the differences between sub-rational investor behavior and fully rational investor behavior?

  3. How does sub-rational trading behavior affect the financial market?

We consider a default market environment with 2 value agents, 1 market maker, 2 momentum agents, and 20 noise agents. The fundamental value of the market is generated as a stochastic Ornstein-Uhlenbeck process. To address the first question, we add a human investor with different degree of sub-rationality to the default market scenario, and evaluate the profits during training and testing (Section 5.1). We then apply the human investor alongside a fully rational electronic investor in the market with the same background trading agent configuration, but with a hand-crafted fundamental series to address the second question (Section 5.2 and 5.3). For the last question, we add human investors and electronic investors for to the default market and compare the market observables (Section 5.4).

5.1. Performance of Sub-Rational Investors

We first look at the effects of the degree of sub-rationality on the cumulative rewards of the agent in the simulated market. Figure 3 (left) shows the average training reward of the myopic investor as a function of the training episode for various (see Equation 2). We observe that during training, an agent with higher learns a more profitable strategy, whereas an agent with small learns a policy that results in negative rewards. We observe the same when we apply the trained myopic investor in test simulations as in Figure 3.

Similarly, the performance of a bounded rational investor in terms of cumulative rewards increases with an increase in the rationality parameter (see Figure 3 (right)). As defined in Equation 3, bounded rational investors with

take uniformly random actions at each step. As a result, their decisions have high variance with the distribution of rewards spreading across a larger range. Considering the definition of cumulative reward in

Equation 4, our results indicate that investors with a higher degree of sub-rationality make lesser profits at the end of the trading day.

Buy / Sell Ratio Trading Ratio
Electronic Investor 1.12 4.6 1.44 64% 56% 24%
Human Investor 1.91 0.32 1.48 80% 87% 68%
Table 1. The decisions of myopic human investor.

5.2. Behavior of Myopic Investors

In order to distinguish between the behavior of myopic investors and that of fully rational investors, we create a hand-crafted market scenario and apply both investors in the simulated market. The human investor is trained in the default market simulations with Ornstein-Uhlenbeck fundamental with , while the electronic investor is trained with (see Equation 2). Figure 4 shows the market prices and the actions of the agents in a trading day. The trading day can be decomposed into three stages. In the first and last stages, the price increases in a linear trend, while the market is under a shock in the second stage with the price temporarily dropping by 1%. Since the asset price increases continuously within the first stage of the market, both electronic investor and human investor place more buys than sells (Table 1), and they achieve similar rewards. When the market is under a shock, the myopic human investor tries to sell out their shares to mitigate the loss from the temporary price drop. The rational electronic investor, on the other hand, places more buy orders based on the belief that the price trend will return to normal and continue to increase in the future. As a result, the electronic investor obtains a significantly better reward compared to the myopic human investor.

The myopic human investor and the electronic investor also show different strategies in addressing the risk in the market. According to Figure 4 and Table 1, the myopic human investor reacts to the market trend with little consideration of the volatility, placing buy/sell orders 87% of the time during the shock. Meanwhile, the electronic investor tends to hold rather than trade when the volatility is high, and places less buy/sell orders during the second and the last stages (56% and 24% of the time).

Figure 4. The behavior of myopic investors in the simulated market. Compared to the fully rational electronic investor, a myopic human investor reacts too passively to the short-term losses and abandons its long-term investment plan.

In summary, the hand-crafted market scenario demonstrates that our model of psychologically myopic behavior successfully reproduces the myopic investing behavior observed in financial markets, i.e., myopic loss aversion (as described in Section 3.1).

5.3. Behavior of bounded rational Investors

Next, we compare the behavior of a bounded rational human investor to that of a fully rational electronic investor with another hand-crafted market. In particular, we create a market with fundamental price following a sine wave and deploy a bounded rational human investor with and a rational electronic investor with (see Equation 3). Figure 5 shows the market price and the actions of the two agents through a trading day. Since the market is simulated over a single time period of the sine wave with the same starting and closing price, the optimal strategy is to sell when the price is higher than the closing price (first half of the day), and buy when the price is lower than the closing price (second half of the day).

Figure 5. The behavior of bounded rational investors in the simulated market. Compared to the fully rational electronic investor, a bounded rational human investor takes sub-optimal actions that are similar but inferior to the best.
Figure 6. Markets with different ratios of sub-rational human investors.

Overall, we observe similar decisions of human and electronic investors based on the holding position in Figure 5: both of them tend to sell during the first half of the day, and buy during the second half of the day. However, the bounded rational human investor does not sell and buy at the maximum capacity compared to the electronic investor that has a deeper holding position. As a result, the human investor can not achieve the maximum reward even though the decisions are similar to those of the electronic investor to some extent. In addition, we plot the price of limit orders placed by both investors in Figure 5. Compare to the bounded rational human investor, the electronic investor is more confident and often places the order deeper in the LOB. Limit orders placed far from the mid price yield more rewards as the agent is buying and selling at a better price, but have higher risk of not being executed.

In summary, the bounded rational model emulates human traders with limited information availability and processing capacity. We observe that such limitations lead to sub-optimal decision-making that is similar albeit inferior compared to the homo economicus.

5.4. Impact of Sub-Rational Investors on the market

Here, we investigate how sub-rational trading behaviors affect the financial market. In particular, we examine and compare simulated markets populated with human investors and electronic investors where . We perform this exercise for both models of human sub-rationality: psychologically myopic () as well as bounded rationality (). For each , we simulate 100 single day markets and show the distribution of average quoted volume, average traded volume, average spread, and average volatility in Figure 6. We do not observe significant changes in the quoted and traded volumes as well as spread when increasing the ratio of myopic human investors. Markets populated by only myopic investors () or only electronic investors () have higher volatility. One reason is that all human investors share the same policy, as do all the electronic investors. Therefore, within each group of human/electronic investors, their trading strategies correspond to placing orders on the same side and at similar prices. This can easily push the asset price towards one or another direction. This contrasts with markets populated by both human and electronic traders with different incentives.

Our findings in Section 5.3 suggest that compared to fully rational electronic investors, the bounded rational human investors tend to place limit orders closer to the mid price. Although they might not buy and sell at a highly profitable price, their orders are more likely to be executed. According to Figure 6, a market populated with more bounded rational human investors has larger traded volume, smaller spread, and less volatility.

6. Discussion

In this work, we introduced two approaches to model sub-rational human investors in an RL setting: (1) The psychologically myopic investors that maximize only short-term rewards regardless of the future, and (2) The bounded rational investors that make sub-optimal decisions due to limitations in information access and computational power. We examine the relationship between profits of the sub-rational human investor as a function of their degree of sub-rationality in regular market scenarios. We also intuitively explain the trading strategies of both types of sub-rational human investors in hand-crafted market scenarios.

Upon populating the simulated market with varying ratios of human to electronic investors, we evaluate the effect on market observables such as traded volume, spread and volatility. We found that a market populated with more bounded rational investors and fewer electronic investors exhibited larger traded volume, smaller spread and lower volatility. We note that our comparative study is different from previous regulatory work on algorithmic and high-frequency trading (Boehmer et al., 2021; Woodward, 2017) in that we compare investors with different degrees of rationality, while not varying their trading frequencies in simulated markets.

A future direction of this work would be considering other aspects of human sub-rationality. For example, economic studies show that investors make sub-optimal decisions due to overconfidence, which can be described as resulting from an illusion of knowledge or an illusion of control(Barber and Odean, 2002; Song et al., 2013).


We would like to thank Haibei Zhu for his help on the market simulations. This paper was prepared for informational purposes by the Artificial Intelligence Research group of JPMorgan Chase & Co. and its affiliates (“JP Morgan”), and is not a product of the Research Department of JP Morgan. JP Morgan makes no representation and warranty whatsoever and disclaims all liability, for the completeness, accuracy or reliability of the information contained herein. This document is not intended as investment research or investment advice, or a recommendation, offer or solicitation for the purchase or sale of any security, financial instrument, financial product or service, or to be used in any way for evaluating the merits of participating in any transaction, and shall not constitute a solicitation under any jurisdiction or to any person, if such solicitation under such jurisdiction or to such person would be unlawful.


  • S. Amrouni, A. Moulin, J. Vann, S. Vyetrenko, T. Balch, and M. Veloso (2021) ABIDES-gym: gym environments for multi-agent discrete event simulation and application to financial markets. In Proceedings of the Second ACM International Conference on AI in Finance, pp. 1–9. Cited by: §1, §2, §4.
  • K. Asadi and M. L. Littman (2017) An alternative softmax operator for reinforcement learning. In

    International Conference on Machine Learning

    pp. 243–252. External Links: Link Cited by: §3.2.
  • C. Baker, R. Saxe, and J. Tenenbaum (2005) Bayesian models of human action understanding. Advances in neural information processing systems 18. External Links: Link Cited by: §3.2.
  • T. H. Balch, M. Mahfouz, J. Lockhart, M. Hybinette, and D. Byrd (2019) How to evaluate trading strategies: single agent market replay or multiple agent interactive simulation?. arXiv preprint arXiv:1906.12010. Cited by: §1.
  • B. M. Barber and T. Odean (2002) Online investors: do the slow die first?. The Review of financial studies 15 (2), pp. 455–488. Cited by: §6.
  • S. Benartzi and R. H. Thaler (1999) Risk aversion or myopia? choices in repeated gambles and retirement investments. Management science 45 (3), pp. 364–381. Cited by: §1, §3.1.
  • E. Boehmer, K. Fong, and J. (. Wu (2021) Algorithmic trading and market quality: international evidence. Journal of Financial and Quantitative Analysis 56 (8), pp. 2659–2688. External Links: Document Cited by: §2, §6.
  • D. Byrd, M. Hybinette, and T. H. Balch (2019) Abides: towards high-fidelity market simulation for ai research. arXiv preprint arXiv:1904.12066. Cited by: §1, §2, §4.
  • L. Chan, A. Critch, and A. Dragan (2021) Human irrationality: both bad and good for reward inference. arXiv preprint arXiv:2111.06956. Cited by: §2, §3.
  • J. J. Chrisman and P. C. Patel (2012) Variations in r&d investments of family and nonfamily firms: behavioral agency and myopic loss aversion perspectives. Academy of management Journal 55 (4), pp. 976–997. Cited by: §1, §3.1.
  • K. Dwarakanath, S. S. Vyetrenko, and T. Balch (2021) Profit equitably: an investigation of market maker’s impact on equitable outcomes. In Proceedings of the Second ACM International Conference on AI in Finance, pp. 1–8. Cited by: §1, §2.
  • O. Evans and N. D. Goodman (2015) Learning the preferences of bounded agents. In NIPS Workshop on Bounded Optimality, Vol. 6, pp. 2–1. Cited by: §2.
  • O. Evans, A. Stuhlmüller, and N. Goodman (2016) Learning the preferences of ignorant, inconsistent agents. In Thirtieth AAAI Conference on Artificial Intelligence, Cited by: §2.
  • D. Friedman (2018) The double auction market institution: a survey. In The Double Auction Market Institutions, Theories, and Evidence, pp. 3–26. Cited by: §1.
  • C. Laidlaw and A. Dragan (2022) The boltzmann policy distribution: accounting for systematic suboptimality in human models. External Links: Link Cited by: Figure 1.
  • F. Larson, J. A. List, and R. D. Metcalfe (2016) Can myopic loss aversion explain the equity premium puzzle? evidence from a natural field experiment with professional traders. Technical report National Bureau of Economic Research. Cited by: §3.1.
  • T. Lux and M. Marchesi (1999) Scaling and criticality in a stochastic multi-agent model of a financial market. Nature 397 (6719), pp. 498–500. Cited by: §1, §2.
  • A. Raja and V. Lesser (2001) Towards bounded-rationality in multi-agent systems: a reinforcement-learning based approach. University of Massachusetts Computer Science Technical Report 34, pp. 2001. Cited by: §2.
  • H. A. Simon (1990) Bounded rationality. In Utility and probability, pp. 15–18. Cited by: §1, §3.2.
  • H. A. Simon (1997) Models of bounded rationality: empirically grounded economic reason. Vol. 3, MIT press. Cited by: §1, §3.2.
  • R. Song, S. Jang, D. Hanssens, and J. Suh (2013) When overconfidence meets reinforcement learning. Cited by: §6.
  • T. Spooner, J. Fearnley, R. Savani, and A. Koukorinis (2018) Market making via reinforcement learning. arXiv preprint arXiv:1804.04216. Cited by: §1.
  • R. H. Thaler, A. Tversky, D. Kahneman, and A. Schwartz (1997) The effect of myopia and loss aversion on risk taking: an experimental test. The quarterly journal of economics 112 (2), pp. 647–661. Cited by: §1, §3.1.
  • S. Vyetrenko, D. Byrd, N. Petosa, M. Mahfouz, D. Dervovic, M. Veloso, and T. Balch (2020) Get real: realism metrics for robust limit order book market simulations. In Proceedings of the First ACM International Conference on AI in Finance, pp. 1–8. Cited by: §2, §4.
  • M. Woodward (2017) The need for speed: regulatory approaches to high frequency trading in the united states and the european union. Vand. J. Transnat’l L. 50, pp. 1359. Cited by: §2, §6.