Mislearning from Censored Data: Gambler's Fallacy in a Search Problem

by   Kevin He, et al.

In the context of a sequential search problem, I explore large-generations learning dynamics for agents who suffer from the "gambler's fallacy" - the statistical bias of anticipating too much regression to the mean for realizations of independent random events. Searchers are uncertain about search pool qualities of different periods but infer these fundamentals from search outcomes of the previous generation. Searchers' stopping decisions impose a censoring effect on the data of their successors, as the values they would have found in later periods had they kept searching remain unobserved. While innocuous for rational agents, this censoring effect interacts with the gambler's fallacy and creates a feedback loop between distorted stopping rules and pessimistic beliefs about search pool qualities of later periods. In general settings, the stopping rules used by different generations monotonically converge to a steady-state rule that stops searching earlier than optimal. In settings where true pool qualities increase over time - so there is option value in rejecting above-average early draws - learning is monotonically harmful and welfare strictly decreases across generations.



There are no comments yet.


page 1

page 2

page 3

page 4


Optimal Stopping with Behaviorally Biased Agents: The Role of Loss Aversion and Changing Reference Points

People are often reluctant to sell a house, or shares of stock, below th...

A sequential estimation problem with control and discretionary stopping

We show that "full-bang" control is optimal in a problem that combines f...

Stopping Rules for Bag-of-Words Image Search and Its Application in Appearance-Based Localization

We propose a technique to improve the search efficiency of the bag-of-wo...

Optimal Stopping of a Brownian Bridge with an Uncertain Pinning Time

We consider the problem of optimally stopping a Brownian bridge with an ...

Early stopping for statistical inverse problems via truncated SVD estimation

We consider truncated SVD (or spectral cut-off, projection) estimators f...

Maximizing the Expected Value of a Lottery Ticket: How to Sell and When to Buy

Unusually large prize pools in lotteries like Mega Millions and Powerbal...

A Note on the Expected Number of Interviews When Talent is Uniformly Distributed

Optimal stopping problems give rise to random distributions describing h...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The gambler’s fallacy is widespread. Many people believe that a fair coin has a higher chance of landing on tails after landing on heads three times in a row, think a son is “due” to a woman who has given birth to consecutive daughters, and, in general, expect too much reversal from sequential realizations of independent random events. Studies have documented the gambler’s fallacy in lottery games where the bias is strictly costly (Terrell, 1994; Suetens, Galbo-Jørgensen, and Tyran, 2016) and in incentivized lab experiments (Benjamin, Moore, and Rabin, 2017). Recent analysis of field data by Chen, Moskowitz, and Shue (2016) shows this bias also affects experienced decision-makers in high-stakes decisions, such as judges in asylum courts. Section 1.3 surveys more of this empirical literature.

This paper highlights novel implications of the gambler’s fallacy in optimal-stopping problems when agents are uncertain about the underlying distributions. As a running example, consider an HR manager (Alice) recruiting for a job opening, sequentially interviewing candidates. In deciding whether to hire a candidate, Alice needs to form a belief about the labor pool with regard to the distribution of potential future applicants should she keep the position open. She consults with other managers who have recruited for similar positions to learn about the distribution of talent in the labor pool, then decides on a stopping strategy for her own hiring problem. Suppose all managers believe in the gambler’s fallacy — that is, they exaggerate how unlikely it is to get consecutive above-average or consecutive below-average applicants (relative to the labor pool mean). This error stems from the same psychology that leads people to exaggerate how unlikely it is to get consecutive heads or consecutive tails when tossing a fair coin. What are the implications of this bias for the managers’ beliefs and behavior over time?

In this example and other natural optimal-stopping problems, agents tend to stop when early draws are deemed “good enough,” leading to an asymmetric censoring of experience. When a manager discovers a very strong candidate early in the hiring cycle, she stops her recruitment efforts and future managers do not observe what alternative candidates she would have found for the same job opening with a longer search. This endogenous censoring effect interacts with the gambler’s fallacy bias and leads to pessimistic inference about the labor pool. Suppose Alice’s predecessors held the correct beliefs about the labor pool, and the qualities of different candidates are objectively independent. Predecessors with below-average early interviewees continue searching, but they are systematically surprised because their subsequent interviewees also turn out to be below-average half of the time, contrary to their (false) expectations of positive reversals after bad initial outcomes. When these managers communicate their disappointment to Alice, she becomes overly pessimistic about the labor pool. This pessimism informs Alice’s stopping strategy and affects the kind of (censored) experience that she communicates to future managers in turn.

This paper examines the endogenous learning dynamics of a society of agents believing in the gambler’s fallacy. All agents face a common stage game: an optimal-stopping problem with draws in different periods independently generated from fixed yet unknown distributions.111Using panel data, Conlon, Pilossoph, Wiswall, and Zafar (2018) find that unemployed workers make inferences about the wage distribution in the labor market using previously received (and rejected) offers, an empirical example of agents using histories to learn about the draw-generating distributions of an optimal-stopping problem. Interestingly, their paper also documents overinference from small samples, suggesting workers exhibit a “law of small numbers” psychology that underlies the gambler’s fallacy. They take turns playing the stage game, with the game’s outcome determining each agent’s payoff. Agents are Bayesians, except for the statistical bias. That is, they start with a prior belief over a class of feasible models

about the joint distribution of draws. Feasible models are Gaussian distributions indexed by different unconditional means of the draws (the

fundamentals). Reflecting a mistaken belief in reversals, all feasible models specify the same negative correlation between draws. Biased agents dogmatically believe that worse earlier draws lead to better distributions of later draws, conditional on the fundamentals. Before playing her own stage game, each agent observes the stage-game histories of her predecessors, then applies Bayes’ rule to update her beliefs about the fundamentals. This inference procedure amounts to misspecified Bayesian learning in the class of feasible models, a class that excludes the true draw-generating distribution.

I consider two social-learning environments. When agents play the stage game one at a time, the stochastic processes of their beliefs and behavior almost surely converge globally to a unique steady state in which agents are over-pessimistic about the fundamentals and stop too early relative to the objectively optimal strategy. This result formalizes the intuition about how the gambler’s fallacy interacts with the censoring effect to produce pessimistic inference.

When agents arrive in large generations, with everyone in the same generation playing simultaneously, society converges to the same steady state as the previous environment. This large-generations model features deterministic learning dynamics and illustrates a positive-feedback cycle between distorted beliefs and distorted stopping strategies. More severely censored datasets lead to more pessimistic beliefs, while more pessimistic beliefs lead to earlier stopping and, as a consequence, heavier history censoring. Mapping back to the hiring example, suppose a firm appoints HR managers in cohorts. Upon arrival, each junior manager learns the recruiting experience of all previous managers. If managers in the first cohort start with correct beliefs about labor-market conditions, then the average hiring outcome monotonically deteriorates across all successive cohorts. After Alice and others in her cohort consult with their predecessors and end up with over-pessimistic inferences, their beliefs lead them to be less “choosy” when hiring and only keep searching if their first interviewees prove to be truly unsatisfactory. On average, managers in Alice’s cohort who reject their early applicants under these newly lowered acceptance standards become disappointed by the quality of their later interviewees, even given these managers’ already pessimistic beliefs about the labor pool. This is because biased agents expect more positive reversal following worse early outcomes, so for a fixed realization of later interviewee’s quality, managers experience greater disappointment following worse earlier applicants. The pessimism that managers in Alice’s generation started with thus becomes amplified in the next generation, leading to a further lowering of acceptance thresholds and a further decrease in the average quality of the hired candidate.

The endogenous-data setting leads to novel comparative statics predictions about how the payoff parameters of the stage game affect learning outcomes under the gambler’s fallacy. For instance, suppose managers become more impatient, incurring a larger waiting cost when they decide to continue searching. If data is exogenous or if agents are correctly specified, then learning outcomes are independent of the details of the decision problem. When agents believing in the gambler’s fallacy learn from endogenously censored histories, however, lower patience in the stage game leads to more distorted long-run beliefs about the fundamentals. This result is another expression of the positive feedback between actions and beliefs. Impatient managers use lower acceptance thresholds, so they tend to be more disappointed by the lack of reversals in the search process compared to their more patient (and therefore choosier) counterparts. This implies that more impatient agents also become more pessimistic about the fundamentals, thus compounding their initial lowering of the cutoff threshold (due to impatience) with a further change in the same direction (due to beliefs).

Finally, I expand the set of feasible models and consider agents who are uncertain about both the means and variances of the draw-generating distributions. I show that in this joint estimation, agents make the same misinference about the means as in the baseline model. However, they exaggerate the variances in a way that depends on the censoring of histories in their dataset. In the hiring example, this exaggeration corresponds to Alice believing that applicants for different vacancies come from different labor pools that vary in average quality, when in reality applicants for all vacancies originate from the same pool with a fixed quality distribution. Alice’s belief in vacancy-specific fixed effects helps her explain the experience of her predecessors who had consecutive below-average interviewees, reasoning that it must have been especially difficult to find good candidates for these particular job openings. The severity of censoring in Alice’s dataset determines her belief about the variance in average candidate quality across different vacancies, a belief that also influences her stopping strategy. I derive two results that illustrate how this belief in

fictitious variation

interacts with endogenous learning. First, when the stage-game payoff function is convex in draws (such as when previously rejected candidates can be recalled with some probability in the sequential interviewing problem), the positive-feedback cycle of the baseline environment strengthens. This is because a more severely censored dataset not only makes future agents more pessimistic about the fundamentals due to the usual censoring effect, but also decreases their belief in fictitious variation. Due to the convexity of the optimal-stopping problem, both forces encourage earlier stopping, leading to even heavier data censoring in the future. Second, a society where agents are uncertain about the variances can end up with a different long-run belief about the means than another society where agents know the correct variances. This is despite the fact that agents in both societies would make the same (mis)inference about the means given the same dataset of histories.

While I focus on (misspecified) Bayesian agents estimating parameters of a Gaussian model, my main results remain robust to a range of alternative specifications. These include a non-Bayesian method-of-moments inference procedure and general distributional assumptions.

1.1 Key Contributions

This work contributes to two strands of literature: the behavioral economics literature on inference mistakes for biased learners, and the theoretical literature on the dynamics of misspecified endogenous learning.

As a contribution to behavioral economics, I highlight a novel channel of misinference for behavioral agents — the interaction between psychological bias and data censoring. In many natural environments, agents learn from censored data. The economics literature has recently focused on the learning implications of selection neglect in these settings, where agents act as if their dataset is not censored.222See, for example, Enke (2017) and Jehiel (2018). My work points out that other well-documented behavioral biases can also interact with data censoring in interesting ways, producing important and novel implications in these environments. Mislearning in my model stems precisely from this interaction, not from either censored data or the gambler’s fallacy alone. If agents were correctly specified (i.e. they do not suffer from the statistical bias), then they would correctly learn the fundamentals even from a dataset of censored histories. On the other hand, consider an “uncensored” environment where agents observe what their predecessors would have drawn in each period of the optimal-stopping problem, regardless of the actual stopping decisions of these predecessors. In such a counterfactual scenario, even biased agents would learn the fundamentals correctly. The intuition is that the gambler’s fallacy is a “symmetric” bias. The “asymmetric” outcome of over-pessimism only occurs when the bias interacts with an (endogenous) asymmetric censoring mechanism that tends to produce data containing negative streaks but not positive streaks.

As a theoretical contribution, I prove convergence of beliefs and behavior in a non-self-confirming misspecified setting with a continuum of states of the world. Economists study many kinds of misspecifications with the property that even the “best-fitting” feasible belief does not match data exactly — that is to say, no feasible self-confirming belief exists. The gambler’s fallacy falls into this family, since all feasible models imply a negative correlation absent in the data. I analyze the stochastic processes of belief and behavior under this statistical bias, proving their global almost-sure convergence to a unique steady state. This is a technically challenging problem. For biased agents facing a large dataset of histories generated from a fixed censoring threshold, the “best-fitting” feasible belief333

More precisely, the feasible model whose implied history distribution has the minimum (though still positive) Kullback-Leibler divergence relative to the observed history distribution, given the history censoring threshold.

depends on the threshold. But in the environment where agents act in a sequence, histories of predecessors are generated one at a time based on their ex-ante random and correlated stopping strategies, which may start arbitrarily far away from the steady-state stopping behavior. In related work, Heidhues, Koszegi, and Strack (2018) study learning dynamics under a different bias: overconfidence about one’s own ability. Despite being biased, agents in their setting always have some feasible belief that exactly rationalizes data, and so their learning steady-state is a self-confirming equilibrium. By contrast, the steady state in my paper is not self-confirming. In addition, I prove my convergence result in a setting with multiple dimensions of uncertainty (the distributional parameters for different periods of the stage game), whereas Heidhues, Koszegi, and Strack (2018) consider convergence of misspecified learning with one-dimensional uncertainty. Fudenberg, Romanyuk, and Strack (2017)

study a continuous-time model of active learning under misspecification, but their learning problem has an even more restricted state space. The agent’s belief is binary, that is to say her prior is supported on exactly two feasible models. In my setting, agents’ prior belief about each distributional parameter is supported on a continuum of feasible values.

As another contribution to the theoretical literature on misspecified learning dynamics, my project studies a new mechanism of endogeneity: the censoring effect in a dynamic stage game. A dynamic stage game is essential for studying learning under the gambler’s fallacy, a behavioral bias concerning the serial correlation of data. The censoring effect relies on the dynamic structure of the decision problem and has no analog in the static stage-games of Heidhues, Koszegi, and Strack (2018) and Fudenberg, Romanyuk, and Strack (2017). In my setting, the type of data that an agent generates depends on her beliefs. To understand the distinction from the existing literature, consider the classic paper in this area, Nyarko (1991), who studies a monopolist setting a price on each date and observing the resulting sales. No matter what action the monopolist takes, she observes the same type of data: quantity sold. Similarly, the agent in Fudenberg, Romanyuk, and Strack (2017) always observes payoffs and the agent in Heidhues, Koszegi, and Strack (2018) always observes output levels, after any action. Endogenous learning in these other papers takes the form of agents attributing different meanings to the same type of data, when interpreted through the lenses of different actions. On the other hand, we may think of stage-game histories censored with different thresholds as different types of data about the fundamentals. The distinction is that these different types of data, by themselves, lead to different beliefs about the fundamentals for biased learners. Actions play no role in inference except to generate these different types of data, since the likelihood of a (feasible) history does not depend on the censoring threshold that produced it.

1.2 Other Related Theoretical Work

Rabin (2002) and Rabin and Vayanos (2010) are the first to study the inferential mistakes implied by the gambler’s fallacy. Except for an example in Rabin (2002), discussed below, all such investigations focus on passive inference, whereby learners observe an exogenous information process. By contrast, I examine an endogenous learning setting where the actions of predecessors censor the dataset of future learners. This setting allows me to ask whether the feedback loop between learners’ actions and biased beliefs will attenuate or exaggerate the distortions caused by the fallacy over the course of learning. In addition, relative to this existing literature, the present paper uniquely focuses on the dynamics of mislearning under the gambler’s fallacy. I prove that the stochastic process of beliefs and behavior almost surely converges when biased agents act one at a time, and I trace out the exact trajectory of beliefs and behavior when agents act in generations.

Section 7 of Rabin (2002) discusses an example of endogenous learning under a finite-urn model of the gambler’s fallacy. The nature of Rabin (2002)’s endogenous data, however, is unrelated to the censoring effect central to my paper.444In Rabin (2002)’s example, biased agents (correctly) believe that the part of the data which is always observable is independent of the part of the data which is sometimes missing. However, what I term the “censoring effect” is about misinference resulting from agents wrongly believing in negative correlation between the early draws that are always observed and the later draws that may be censored, depending on the realizations of the early draws. Therefore, my central mechanism highlights a novel interaction between data censoring and the gambler’s fallacy bias that is absent in the previous literature. In Appendix E, I modify that example to induce the censoring effect. I find a misinference result in his finite-urn model of the gambler’s fallacy, similar to what I find in the continuous Gaussian model of this paper. This exercise shows the robustness of my results within different modeling frameworks of the same statistical bias.

My steady state corresponds to Esponda and Pouzo (2016)’s Berk-Nash equilibrium. Rather than focusing only on equilibrium analysis, however, I focus on non-equilibrium learning dynamics and prove global convergence. That is, in the environment with agents acting one at a time, society converges to the steady state for all prior beliefs satisfying regularity conditions. In the environment where agents act in large generations, society converges for all initial conditions of the first generation. The large-generations environment also allows me to study how the positive feedback between beliefs and stopping strategies leads to monotonic convergence.

Although my learning framework involves short-lived agents learning from predecessors’ histories, the social-learning aspect of my framework is not central to the results. In fact, the environment where a sequence of short-lived agents acts one at a time is equivalent to the environment where a single long-lived agent plays the stage game repeatedly, myopically maximizing her expected payoff in each iteration of the stage game. In the growing literature on social learning with misspecified Bayesians (e.g., Eyster and Rabin (2010); Gaurino and Jehiel (2013); Bohren (2016); Bohren and Hauser (2018); Frick, Iijima, and Ishii (2018)), agents observe their predecessors’ actions but make errors when inverting these actions to deduce said predecessors’ information about the fundamentals. This kind of action inversion does not take place in my framework: later agents observe all the information that their predecessors have seen, so actions of predecessors are uninformative.

The econometrics literature has also studied data-generating processes with censoring — for example, the Tobit model and models of competing risks.555References can be found in Amemiya (1985) and Crowder (2001). This literature has primarily focused on the issue of model identification from censored data (Cox, 1962; Tsiatis, 1975; Heckman and Honoré, 1989). In my setting, there is no identifiability problem for correctly specified agents, since censored histories can identify the mean and the covariance matrix of the draws. Instead, I study how agents make wrong inferences from censored data when they have a family of misspecified models. Another contrast is that the econometrics literature has focused on exogenous data-censoring mechanisms, but censoring is endogenous in my setting and depends on the beliefs of previous agents. As discussed before, this endogeneity is central to my results.

1.3 Empirical Evidence on the Gambler’s Fallacy

Bar-Hillel and Wagenaar (1991) review classical psychology studies on the gambler’s fallacy. The earliest lab evidence involves two types of tasks. In “production tasks,” subjects are asked to write down sequences using a given alphabet, with the goal of generating sequences that resemble the realizations of an i.i.d. random process. Subjects tend to produce sequences with too many alternations between symbols, as they attempt to locally balance out symbol frequencies. In “judgment tasks” where subjects are asked to identify which sequence of binary symbols appears most like consecutive tosses of a fair coin, subjects find sequences with an alternation probability of 0.6 more random than those with an alternation probability of 0.5. While most of these studies are unincentivized, Benjamin, Moore, and Rabin (2017) have found the gambler’s fallacy with strict monetary incentives, where a bet on a fair coin continuing its streak pays strictly more than the bet on the streak reversing. Barron and Leider (2010) have shown that experiencing a streak of binary outcomes one at a time exacerbates the gambler’s fallacy, compared with simply being told the past sequence of outcomes all at once.

Other studies have identified the gambler’s fallacy using field data on lotteries and casino games. Unlike in experiments, agents in field settings are typically not explicitly told the underlying probabilities of the randomization devices. In state lotteries, players tend to avoid betting on numbers that have very recently won. This under-betting behavior is strictly costly for the players when lotteries have a pari-mutuel payout structure (as in the studies of Terrell (1994) and Suetens, Galbo-Jørgensen, and Tyran (2016)), since it leads to a larger-than-average payout per winner in the event that the same number is drawn again the following week. Using security video footage, Croson and Sundali (2005) show that roulette gamblers in casinos bet more on a color after a long streak of the opposite color. Narayanan and Manchanda (2012) use individual-level data tracked using casino loyalty cards to find that a larger recent win has a negative effect on the next bet that the gambler places, while a larger recent loss increases the size of the next bet. Finally, using field data from asylum judges, loan officers, and baseball umpires, Chen, Moskowitz, and Shue (2016) show that even very experienced decision-makers show a tendency to alternate between two decisions across a sequence of randomly ordered decision problems. This can be explained by the gambler’s fallacy, as the fallacy leads to the belief that the objectively “correct” decision is negatively auto-correlated across a sequence of decision problems. The authors rule out several other explanations, including contrast effect and quotas.

As Rabin (2002) and Rabin and Vayanos (2010) have argued, someone who dogmatically believes in the gambler’s fallacy must attribute the lack of reversals in the data to the fundamental probabilities of the randomizing device, leading to overinference from small samples. This overinference can be seen in the field data. Cumulative win/loss (as opposed to very recent win/loss) on a casino trip is positively correlated with the size of future bets (Narayanan and Manchanda, 2012). A player who believes in the gambler’s fallacy rationalizes his persistent good luck on a particular day by thinking he must be in a “hot” state, where his fundamental probability of winning in each game is higher than usual. In a similar vein, a number that has been drawn more often in the past six weeks, excluding the most recent past week, gets more bets in the Denmark lottery (Suetens, Galbo-Jørgensen, and Tyran, 2016). This kind of overinference resulting from small samples persists even in a market setting where participants have had several rounds of experience and feedback (Camerer, 1987). In line with these studies, the model I consider involves agents who dogmatically believe in the gambler’s fallacy and misinfer some parameter of the world as a result — though the misinference mechanism in my model is further complicated by the presence of endogenous data censoring.

2 Overview

This section presents the basic elements of the model, previews my main results, and provides intuition for how the censoring effect drives my conclusions. I describe a class of optimal-stopping problems serving as the (single-player) stage game. Agents are uncertain about the distribution of draws in the stage game. They entertain a prior belief over a family of distributions that they find plausible, the feasible models of how draws are generated. All feasible models specify the same negative correlation between the draws, even though draws are objectively independent: an error reflecting the gambler’s fallacy. Sections 3 and 4 embed these model elements into social-learning environments. In each environment, a society of agents takes turns playing the stage game, making inferences over feasible models using others’ stage-game histories. Section 5 contains a number of extensions that verify the robustness of my main results with regard to different specifications.

2.1 Basic Elements of the Model

2.1.1 Optimal-Stopping Problem as a Dynamic Stage Game

The stage game is a two-period optimal-stopping problem. In the first period, the agent draws and decides whether to stop. If she stops at her payoff is and the stage game ends. Otherwise, she continues to the second period, where she draws The stage game then ends with the agent getting payoffs .

The payoff functions and satisfy some regularity conditions to be introduced in Assumption 1. The following example satisfies Assumption 1 and will be used to illustrate my results throughout this paper.

Example 1 (search with probability of recall).

Many industries have a regular hiring cycle each year. Consider a firm in such an industry and its HR manager, who must fill a job opening during this year’s cycle. In the early phase of the hiring cycle, she finds a candidate who would bring net benefit to the firm if hired. She must decide between hiring this candidate immediately or waiting. Choosing to wait means she will continue searching in the late phase of the hiring cycle, finding another candidate who would bring benefit to the organization. Waiting, however, carries the risk that the early candidate accepts an offer from a different firm in the interim. Suppose there is probability that the early candidate will remain available in the late hiring phase. This situation then has payoff functions and . That is, in the late phase, there is probability the manager gets payoff equal to the higher of the two candidates’ qualities, and probability that she only has the option to hire the second candidate.

I now present regularity conditions on the payoff functions that define the class of optimal-stopping problems I study.

Assumption 1 (regularity conditions).

The payoff functions satisfy :

  1. For and and

  2. For and any

  3. There exist so that while .

  4. are continuous. Also, for any , is absolutely integrable with respect to any Gaussian distribution on

Assumption 1(a) says are strictly increasing in the draws in their respective periods. Assumption 1(b) says a higher realization of the early draw increases first-period payoff more than it changes second-period payoff. Under Assumption 1(a), Assumption 1(b) is satisfied whenever is not a function of , as in optimal-stopping problems where stopping in period gives payoff only depending on the -th draw. More generally, Assumption 1(b) is satisfied when is separable across the draws of the two periods with at all . Assumption 1(c) says there exist a good enough realization and bad enough realization , so that the agent prefers stopping in period 1 after than continuing when she knows for sure that her second draw will be Conversely, there are so that she prefers continuing after if she knows she will get in the second period for sure. Assumption 1(d) is a technical condition. The absolute integrability requirement ensures that the expected payoff from choosing to continue is always well-defined. These conditions are satisfied by my recurrent example.666Omitted proofs from the main text can be found in Appendix A.

Claim 1.

Example 1 satisfies Assumption 1.

I now define strategies and histories of the stage game.

Definition 1.

A strategy is a function that maps the realization of the first-period draw into a stopping decision.

Without loss I only consider pure strategies, because there always exists a payoff-maximizing pure strategy under any belief about the distribution of draws.

Definition 2.

The history of the stage game is an element . If an agent decides to stop after , her history is . If the agent continues after and draws in the second period, her history is .

The symbol is a censoring indicator, emphasizing that the hypothetical second-period draw is unobserved when an agent does not continue into the second period. In Example 1, if the HR manager hires the first candidate, she stops her recruitment efforts early and the counterfactual second candidate that she would have found had she kept the position open remains unknown.

2.1.2 Feasible Models and the Objective Model

Objectively, draws in the stage game are independently distributed with Gaussian distributions and for some . The parameters are fixed and called true fundamentals. In Example 1, and stand for the underlying qualities of the two applicant pools in the early and late phases of the hiring season.

Agents are uncertain about the distribution of . The next definition provides a language to discuss the set of distributions that a gambler’s fallacy agent deems plausible.

Definition 3.

The set of feasible models is a family of joint distributions of indexed by feasible fundamentals , for some bias parameter . Here refers to the subjective model

where is the conditional distribution of given

Every feasible model has the property that decreases in , which reflects the gambler’s fallacy. Conditional on the fundamentals, if the realization of is higher than expected, then the agent believes bad luck is due in the near future and the second draw is likely below average.777

I study gambler’s fallacy for continuous random variables, where the magnitude of

affects the agent’s prediction about Chen, Moskowitz, and Shue (2016)’s analysis of baseball umpire data provides support for the continuous version of the statistical bias. They find that an umpire is more likely to call the current pitch a ball after having called the previous pitch a strike, controlling for the actual location of the pitch. Crucially, the effect size is larger after more obvious strikes, where “obviousness” is based on the distance of the pitch to the center of the regulated strike zone. This distance can be thought of as a continuous measure of the “quality” of each pitch. Conversely, an exceptionally bad early draw likely portends above-average luck in the next period. This expected luck reversal is more obvious in the following equivalent formulation of :

where and . The zero-mean terms represent the idiosyncratic factors, or “luck,” that determine how the realizations of and deviate from their unconditional means and in the model . The subjective model stipulates reversal of luck, since are negatively correlated. Larger implies greater magnitude in these expected reversals and thus more bias.

The set of feasible models is indexed by the set of feasible fundamentals, which correspond to the unconditional means888Section 5.2 discusses the extension where agents are also uncertain about the variances and jointly estimate means and variances from censored histories. of and Therefore, the agent’s prior belief over the feasible models is given by a prior belief supported on the feasible fundamentals.

Remark 1.

I will consider several specifications of in this paper. I list them here and provide interpretations below.

  1. . The agent thinks all values are possible.

  2. , where is a bounded parallelogram in whose left and right edges are parallel to the -axis, whose top and bottom edges have slope . The agent is uncertain about both and but her uncertainty has bounded support.999Any prior belief over fundamentals supported on a bounded set in can be arbitrarily well-approximated by a prior belief over a large enough .

  3. The agent has a correct, dogmatic belief about , but has uncertainty about supported on a bounded interval.

  4. . The agent is convinced that the first-period and second-period fundamentals are the same, but is uncertain what this common parameter is.

While the agent can freely update her belief about the fundamentals on she holds a dogmatic belief about .101010Section 5.3 studies the extension where agents are uncertain about , but the support of their prior belief about lies to the left of 0 and is bounded away from it. This implies that the set of feasible models excludes the true model, , so the support of the agent’s prior belief is misspecified. I maintain this misspecification to match the field evidence of Chen, Moskowitz, and Shue (2016), where even very experienced decision-makers continue to exhibit a non-negligible amount of the gambler’s fallacy in high-stakes settings. Another reason why agents may never question their misspecified prior is that the misspecification is “attentionally stable” in the sense of Gagnon-Bartsch, Rabin, and Schwartzstein (2018). Under the theory that the true model falls within the feasible models, an agent finds it harmless to coarsen her dataset by only paying attention to certain “summary statistics.” In large datasets, the statistics extracted by the limited-attention agent do not lead her to question the validity of her theory. I discuss this further in Appendix F.

I write and throughout for expectation and probability with respect to the subjective model . When and are used without subscripts, they refer to expectation and probability under the true model

Before stating my main results, I first establish a result about the optimal stage-game strategy under any feasible model, which will motivate a slight strengthening of Assumption 1 that I need for some results. For write for the cutoff strategy such that if and only if . That is, accepts all early draws above a cutoff threshold .

Proposition 1.

Under Assumption 1 and for ,

  • Under each subjective model , there exists a cutoff threshold such that it is strictly optimal to continue whenever and strictly optimal to stop whenever .

  • For every is strictly increasing.

  • For every is Lipschitz continuous with Lipschitz constant .

The content of this lemma is threefold.

First, it shows that the best strategy for the class of optimal-stopping problems I study takes a cutoff form, regardless of the underlying distributions. This is because a higher both increases the payoff to stopping and, under the gambler’s fallacy, predicts worse draws in the next period. Both forces push in the direction of stopping. The optimality of cutoff strategies leads to an endogenous, asymmetric censoring of histories, formalizing the idea that agents stop after “good enough” draws.

Second, holding fixed , the cutoff threshold is higher when is higher. In other words, the definition of a “good enough” early draw increases with This is because agents can afford to be choosier in the first period when facing improved prospects in the second period.

The third statement about Lipschitz continuity, on the other hand, gives a bound on how quickly increases. To understand why it holds, suppose that one agent believes draws are generated according to , while another agent believes they are generated according to . Under any feasible model, when increases by the predicted conditional mean of falls by . Therefore, the indifference condition of the first agent at cutoff implies the second agent prefers stopping after , since the expected reversal cancels out the relative optimism of the second agent about the unconditional distribution of .

The Lipschitz constant is guaranteed for every optimal-stopping problem satisfying Assumption 1 and for every . But, it may not be the best Lipschitz constant. My results use the slightly stronger condition that has a Lipschitz constant strictly smaller than Intuitively this should be easy to satisfy, but instead of assuming it directly, I consider the following condition on primitives that implies the desired infinitesimally stronger Lipschitz continuity. It is a joint restriction on and the stage game.

Assumption 2 (-Lipschitz continuity).

There exists so that for every and

This condition is satisfied for search with probability of recall.

Claim 2.

Example 1 satisfies Assumption 2 with for every probability of recall and every bias .

Assumption 2 strengthens Assumption 1(b), which already implies . For any which makes the inequality harder to satisfy as adding a positive term to the second argument of makes the RHS larger.

2.2 Main Results

I now state my two main results, which concern learning dynamics under the gambler’s fallacy in two different social-learning environments. I defer precise details of these environments to later sections.

In the first environment, short-lived agents arrive one per round, . All agents start with the same full-support prior density , where is a bounded parallelogram in as in Remark 1(b). Agent observes the stage-game histories of all predecessors, updates her prior to a posterior density , then chooses a cutoff threshold to maximize her expected payoff based on this posterior belief. In this environment, the sequences of cutoffs and posterior beliefs are stochastic processes whose randomness derives from the randomness of draws. Draws are objectively independent, both between the two periods in the same round of the stage game and across different rounds.

Theorem 1.

Suppose Assumptions 1 and 2 hold and and are continuous on . There exists a unique steady state not dependent on , so that provided , almost surely and . The steady state satisfies and where is the objectively optimal cutoff threshold.

In other words, almost surely behavior and belief converge in the society, and this steady state is independent of the prior over fundamentals (provided its support is large enough). In the steady state, agents hold overly pessimistic beliefs about the fundamentals and stop too early, relative to the objectively optimal strategy.

In the second environment, short-lived agents arrive in generations, with a continuum of agents per generation. Agents’ prior belief about the fundamentals is given by a full-support density on , as in Remark 1(a). Each agent observes the stage-game histories of all predecessors from all past generations to make inferences about the fundamentals. Due to the large generations, cutoffs and beliefs are deterministic in generations which I denote as and respectively. The society is initialized at an arbitrary cutoff strategy in the 0th generation, the initial condition.

Theorem 2.

Suppose Assumption 1 holds. Starting from any initial condition and any , cutoffs and beliefs form monotonic sequences across generations. When Assumption 2 also holds, there exists a unique steady state so that and monotonically, regardless of the initial condition and . These steady states are the same as those in Theorem 1.

The monotonicity of beliefs and cutoffs across generations reflects a positive-feedback loop between changes in beliefs and changes in behavior. Suppose generation is more pessimistic than generation about the second-period fundamental, . The monotonicity result implies that beliefs move in the same direction again in generation , that is . The information of generation differs from that of generation only in that agents in generation observe all stage-game histories of generation This means generation ’s stopping behavior differs from that of generation in such a way as to generate histories that amplify, not dampen, the initial change in beliefs from generation to generation

2.3 Intuition for the Main Results

In the learning environments I study, each agent censors the data of future agents through her stopping strategy, where the strategy choice depends on her beliefs. To build intuition for how this censoring effect relates to the two main theorems, I first consider a biased agent with feasible fundamentals , facing a large sample of histories all censored according to some cutoff threshold . I characterize her inference about fundamentals when the sample size grows and analyze how her inference depends on the cutoff threshold .

For a cutoff strategy and a subjective model , refers to the distribution of histories when draws are generated by and histories censored according to , or more precisely:

Definition 4.

For and a subjective model, is the distribution of histories given by

where is the collection of Borel subsets of .

I abbreviate as simply the true distribution of histories under the true model of draws and the cutoff threshold . The next definition gives a measure of how much the implied distribution of histories under the feasible model with fundamentals differs from the true distribution of histories, both generated with the same censoring.

Definition 5.

For the Kullback-Leibler (KL) divergence from to , denoted by , is

where is Gaussian density with mean and variance .

The minimizers of KL divergence with respect to cutoff ,

are called the pseudo-true fundamentals with respect to .

To interpret, the likelihood of the history with is under the true model , under the feasible model . The likelihood of the history with is under the true model, under the feasible model. The likelihoods of all other histories are 0 under both models. So the KL divergence expression in Definition 5 is the expected log-likelihood ratio of the history under the true model versus under the feasible model with fundamentals where expectation over histories is taken under the true model. In general, this optimization objective depends on the cutoff threshold that determines how histories are censored. I will therefore denote the pseudo-true fundamentals as to emphasize this dependence.

The pseudo-true fundamentals correspond to the biased agent’s inference about the fundamentals in large samples (hence the name). More precisely, suppose an agent starts with a prior belief on the fundamentals supported on and observes a large but finite sample of histories drawn from . Proposition A.1 in Appendix B shows that as the sample size grows, her posterior belief almost surely converges in to the point mass on .

The next proposition explicitly solves the pseudo-true fundamentals in a simple closed-form expression.111111This result shows the pseudo-true fundamentals have a method-of-moments interpretation. Suppose that instead of minimizing KL divergence, agents find so that matches in terms of two moments: the means of the first- and second-period draws in the distribution of censored histories. We can show that in fact, and for all This provides an alternative, non-Bayesian foundation for agents’ inference behavior. In Appendix C, I study the large-generations learning dynamics for agents who apply this kind of method-of-moments inference to a family of general, non-Gaussian feasible models of draws. This result makes much of the later analysis tractable and contains the key intuition behind the two main theorems.

Proposition 2.

For the pseudo-true fundamentals are and

So for all and strictly increases in .

The directional data censoring where histories only contain following low values of leads to over-pessimism, for all . In every feasible model of draws , the realization of depends on two factors: the second-period fundamental and a reversal effect based on the realization of . Under the correct or over-optimistic belief about , a biased agent would be systematically disappointed by realizations of in her dataset. This is because is only uncensored when is low enough, a contingency where the agent expects positive reversal on average.121212This intuition presumes that agents understand selection in the dataset. Selection neglect is unlikely in this environment due to the salience of censoring. In large datasets, agents observe both censored histories with length 1 and uncensored histories with length 2, so the presence of censoring is highly explicit in the data. By contrast, both intuition about selection neglect and experiments documenting it (e.g., Enke (2017)) have focused on settings where the dataset does not contain “reminders” about censoring and could be reasonably mistaken as a dataset without selection. Interestingly, Enke (2017) finds that the simple hint “Also think about the players whom you do not communicate with!” reduces the fraction of selection neglecters by 40%. This suggests the salience of censoring in my setting should mitigate selection neglect even further. In Online Appendix OA 3.2, I show that the presence of a fraction of selection neglecters in the population moderates the pessimism of the baseline gambler’s fallacy agents, but does not eliminate it. Over-pessimism can therefore be thought of as “two wrongs making a right,” as the biased agent’s pessimism about the unconditional mean counteracts her false expectation of positive reversals in the dataset of censored histories.

This mechanism explains the long-run pessimism in Theorem 1 and Theorem 2. In fact, in the large-generations setting of Theorem 2, every generation holds strictly pessimistic beliefs, so over-pessimism is also a short-run phenomenon provided there are enough predecessors per generation. The idea that asymmetric data censoring combined with the gambler’s fallacy leads to pessimistic inference is highly robust. It continues to hold when the feasible fundamentals reflect agents’ knowledge that as in Remark 1(d) (Section 5.4), when agents are uncertain about variances (Section 5.2

), under a joint relaxation of Bayesian inference and Gaussian models (Appendix

C), when the stage game has more than two periods (Appendix D), under additional behavioral biases in inference (Online Appendices OA 3.2 and OA 3.3), when higher draws bring worse payoffs (Online Appendix OA 3.1), and with high probability after observing a finite dataset containing just 100 censored histories (Online Appendix OA 4.1).

Not only are the pseudo-true fundamentals always too pessimistic, the severity of censoring also increases pessimism. To understand the intuition, consider two datasets of histories from the distributions and , where The acceptance threshold is lower in the second dataset, implying that uncensored values of are preceded by worse values of there, as the second draw is only observed when the first draw falls below the threshold. A biased agent expects a greater amount of positive reversal in the dataset with distribution than the one with distribution , but in truth uncensored has the same distribution in both datasets, since and are objectively independent. For a fixed realization an agent is more disappointed when she expects more positive reversal, so inference about is pessimistic in the more heavily censored dataset.

The comparative static is central to the positive-feedback loop from Theorem 2. In the large-generations model, Generation 1 observes a large dataset of histories drawn from and chooses a cutoff . Generation 2 then observes histories from all predecessor generations, that is histories drawn from both and . If , then Generation 2’s dataset features (on average) more severe censoring than Generation 1’s dataset. Thus, Generation 2 comes to a more pessimistic inference about the second-period fundamental. By Proposition 1, this leads to a further lowering of the cutoff threshold, and the pattern continues.

3 Convergence, Over-Pessimism, and Early Stopping

In this section, I study a social-learning environment where biased agents act one at a time, inferring fundamentals from predecessors’ histories. I begin by defining the steady state of the stage game for biased agents. The steady state depends on the optimal-stopping problem, the true fundamentals , and the bias parameter , but is independent of the details of the agents’ prior density over fundamentals. I prove existence and uniqueness of the steady state and show it features over-pessimism about fundamentals and early stopping. Then, I turn to the stochastic process of beliefs and behavior in the social-learning environment, showing that this process almost surely converges to the steady state I defined.

3.1 Steady State: Existence, Uniqueness, and Other Properties

A steady state is a triplet consisting of fundamentals and a cutoff threshold that endogenously determine each other. The cutoff strategy with acceptance threshold maximizes expected payoff under the subjective model , while the fundamentals are the pseudo-true fundamentals under data censoring with threshold . More precisely,

Definition 6.

A steady state consists of such that:

  1. and .

Steady states correspond to Esponda and Pouzo (2016)’s Berk-Nash equilibria for an agent whose prior is supported on the feasible models with feasible fundamentals , under the restriction that equilibrium belief puts full confidence in a single fundamental pair. The set of steady states depends on , since the severity of the bias changes both the optimal cutoff thresholds under different fundamentals and inference about fundamentals from stage-game histories.

The terminology “steady state” will soon be justified, as I will show the “steady state” defined here almost surely characterizes the long-run learning outcome in the society where biased agents act one by one. This convergence does not follow from Esponda and Pouzo (2016), for their results only imply local convergence from prior beliefs sufficiently close to the equilibrium beliefs, and only in a “perturbed game” environment where learners receive idiosyncratic payoff shocks to different actions. I will show global convergence of the stochastic processes of beliefs and behavior without payoff shocks.

Like almost all examples of Berk-Nash equilibrium in Esponda and Pouzo (2016), my steady state generates data with positive KL divergence relative to the implied data distribution under the steady-state beliefs. That is, , so the steady state is not a self-confirming equilibrium.131313For example, under the history distribution ,

since draws are objectively independent. However, under the history distribution driven by the steady-state feasible model , we must have
since . This is because for every censoring threshold (and in particular for the KL divergences of the true history distribution to the implied history distributions under different feasible models is bounded away from 0.

To prove the existence and uniqueness of steady state, I define the following belief iteration map on the second-period fundamental.

Definition 7.

For the iteration map is given by

Given the explicit expression of the pseudo-true fundamentals in Proposition 2, it is not difficult to see that all steady states must have correct belief about and that steady-state beliefs about are in bijection with fixed points of .

Proposition 3.

Under Assumptions 1 and 2, is a contraction mapping with contraction constant . Therefore, a unique steady state exists.

As hinted at in Section 2.1.2 the contraction mapping property of comes from the Lipschitz continuity of the indifference threshold implied by Assumption 2.

Lemma 1.

Under Assumptions 1 and 2, is Lipschitz continuous with Lipschitz constant .

Even under Assumption 1 alone, the basic regularity conditions we maintain throughout, it turns out is “almost” a contraction mapping for any , in the sense that for every . But, there is no guarantee of a uniform contraction constant strictly less than 1. The slight strengthening in Assumption 2 ensures such a uniform contraction constant exists, providing the crucial step needed for existence and uniqueness of a steady state.

Since for all by Proposition 2, this shows steady state belief about is exhibits over-pessimism. From the same Proposition, .

Proposition 4.

Every steady state satisfies , .

I now show the steady-state stopping threshold always features stopping too early. For every the objectively optimal stopping strategy takes the form of a cutoff , where means always stopping and means never stopping.141414This follows from Lemma A.2 in the Appendix, which shows even when , the difference between stopping payoff at and expected continuation payoff after is strictly increasing and continuous in I show that for every steady-state cutoff . (This result only requires Assumption 2 and does not require uniqueness of steady states.)

This result does not directly follow from over-pessimism. In fact, short of the steady state, there is an intuition that a biased agent may stop later than a rational agent, not earlier. For a concrete illustration, consider Example 1 with , so there is no probability of recall. Suppose the true fundamentals are , meaning the late applicant pool is much worse than the early applicant pool. If a biased agent has the correct beliefs about the fundamentals, she perceives a greater continuation value after than a rational agent with the same correct beliefs, since the former holds a false expectation of positive reversals after a bad early draw. Even though and the rational agent chooses to stop, the biased agent chooses to continue and has an indifference threshold strictly above . By continuity, the biased agent’s cutoff threshold remains strictly above even under slightly pessimistic beliefs about

Nevertheless, the next Proposition shows that in the steady state, it is unambiguous that the biased agent stops too early relative to the objectively optimal threshold.

Proposition 5.

Under Assumption 2, every steady-state stopping threshold is strictly lower than the objectively optimal threshold,

The early-stopping result strengthens the over-pessimism result. In the steady state, agents must be sufficiently pessimistic as to overcome the opposite intuition about late stopping that I discussed before.

3.2 Social Learning with Agents Acting One by One

This section shows the “steady state” defined and studied earlier warrants its name — it corresponds to the long-run learning outcome for a society of biased agents acting one at a time. I outline the convergence proof for a simpler variant of Theorem 1, where agents start off knowing and only entertain uncertainty over That is, the feasible fundamentals are given by Remark 1(c) rather than Remark 1(b). This simplification is without much loss: even when agents are initially uncertainty about they will almost surely learn it in the long run regardless of the stochastic process of their predecessors’ stopping strategies. Intuitively, this is because can never be censored, so no belief distortion in is possible.151515This is similar to the intuition for why for every . Once agents have learned , the rest of the argument proceeds much like the case where is known from the start. In the next section I comment on the key steps in extending the proof to the case uncertainty over two-dimensional fundamentals , but will defer the details to Online Appendix OA 2.

In the learning environment, time is discrete and partitioned into rounds161616I use the term “rounds” to refer to different iterations of the stage game, reserving the term “periods” for the dynamic aspect within the stage game. One short-lived agent arrives per round. Agent observes the stage-game histories of all predecessors171717Results are unchanged if agent does not know the order in which her predecessors moved. and forms a posterior belief about the fundamentals using Bayes’ rule. Next, agent chooses a cutoff threshold maximizing expected payoff based on expected utility, plays the stage game, and exits. Her stage-game history, , then becomes part of the dataset for all future agents.

The sequences are stochastic processes whose randomness stem from randomness of the stage-game draws realizations in different rounds. The convergence theorem is about the almost sure convergence of processes and To define the probability space formally, consider the -valued stochastic process , where and are independent for . Within each , are also independent. Interpret as the pair of potential draws in the -th round of the stage game. Clearly, there exists a probability space , with sample space interpreted as paths of the process just described, the Borel -algebra on and the measure on sample paths so that the process has the desired distribution. The term “almost surely” means “with probability 1 with respect to the realization of the infinite sequence of all (potential) draws”, i.e. -almost surely. The processes are defined on this probability space and adapted to the filtration , where is the sub--algebra generated by draws up to round , .

Under Assumptions 1 and 2, by Proposition 3 there exists a unique steady state . Following the specification of feasible models in Remark 1(c), let feasible fundamentals be and suppose agents’ prior belief over fundamentals is given by a common prior density . Theorem 1 shows that, provided the support of contains and is continuous, the stochastic processes and almost surely converge to the steady state. This is a global convergence result since the bounded interval can be arbitrarily large and the prior density can assign arbitrarily small probability to neighborhoods around .

Theorem 1.

Suppose Assumptions 1 and 2 hold, where is the unique steady-state belief, and agents have prior density with continuous. Almost surely, and , where is the unique steady-state cutoff threshold.

I will now discuss the obstacles to proving convergence and provide the outline of my argument. In each round the cutoff choice of the -th agent determines how history will be censored. We can think of each as generating a different “type” of data. As we saw in Proposition 2, different “types” of data (in large samples) lead to different inferences about the fundamentals for biased agents. Yet this cutoff is an endogenous, ex-ante random object that depends on the belief of the -th agent, which complicates the analysis of learning dynamics.

To be more precise, the log-likelihood of all data up to the end of round under fundamental is the random variable

The -th summand contains the indicator , referring to the fact that would be censored if exceeds the cutoff . The cutoff depends on histories in periods hence indirectly on

This makes the summands non-exchangeable: they are correlated and non-identically distributed. So the usual law of large numbers does not apply.

A first step to gaining traction on this problem is use a statistical tool from Heidhues, Koszegi, and Strack (2018), a version of law of large numbers for martingales whose quadratic variation grows linearly.

Proposition 10 from Heidhues, Koszegi, and Strack (2018): Let be a martingale that satisfies a.s. for some constant We have that a.s. .

After simplifying the problem with this result, I can establish a pair of mutual bounds on asymptotic behavior and asymptotic beliefs. If we know cutoff thresholds are asymptotically bounded between and then beliefs about must be asymptotically supported on the interval . Conversely, if belief is asymptotically supported on the subinterval , then cutoff thresholds must be asymptotically bounded between and .

Lemma A.11. For , if almost surely , then almost surely

Also, for if almost surely , then almost surely

Lemma A.12. For