Log In Sign Up

Flexible social inference facilitates targeted social learning when rewards are not observable

by   Robert D. Hawkins, et al.

Relying on others can be as risky as it can be rewarding. Advice seekers must disentangle good advice from bad, and balance the potential benefits of shared wisdom against the risks of being misled. Groups are most effective at sharing information and solving problems together when everyone is sensitive to “who knows what.” Acquiring such knowledge in the first place, however, is not trivial – especially in contexts where background information is limited. What underlying cognitive abilities are needed for social learning to be useful in information-limited environments? Here, we propose that the capacity for flexible social inference plays a key role in human group behavior, allowing latent properties such as success or skill to be inferred from others' outward behavior even when there is no direct access to others' private rewards and "success" manifests differently from context to context. We begin by formalizing our proposal in a cognitive model and comparing this model's predictions against those of simpler heuristics in a series of computational simulations. We then evaluated these predictions in three large-scale behavioral experiments using a multi-agent search paradigm with hidden rewards. In Experiment 1, we found that average performance improves as a function of group size at a rate predicted by our model but not by three simpler alternatives. In Experiment 2, we placed human participants in controlled scenarios with artificial agents to more systematically evaluate the conditions under which people choose to rely on social information. Finally, in Experiment 3, we generalized these findings to a more complex and noisy environment, suggesting regimes where inferences may break down. Taken together, we find that even the most rudimentary social cognition abilities may facilitate the characteristic flexibility of human collective behavior.


page 7

page 11

page 15


Group Cohesion in Multi-Agent Scenarios as an Emergent Behavior

In this paper, we elaborate on the design and discuss the results of a m...

Deep reinforcement learning models the emergent dynamics of human cooperation

Collective action demands that individuals efficiently coordinate how mu...

Heuristics facilitates the evolution of transitive inference and social hierarchy in a large group

Transitive inference (TI) refers to social cognition that facilitates th...

Theory of Minds: Understanding Behavior in Groups Through Inverse Planning

Human social behavior is structured by relationships. We form teams, gro...

The world seems different in a social context: a neural network analysis of human experimental data

Human perception and behavior are affected by the situational context, i...

A Framework For Identifying Group Behavior Of Wild Animals

Activity recognition and, more generally, behavior inference tasks are g...

1 Introduction

The study of social learning examines how people and other animals make use of information from others around them. A large body of work in both human and non-human animals has proposed a variety of social learning strategies and heuristics that allow social learning to be effective Laland (2004); Hoppitt and Laland (2013); Rendell et al. (2011); Laland (2017). Indiscriminate copying, for example, is not an effective decision strategy. As more individuals rely on imitation, rather than relying on their own independent asocial learning, it becomes increasingly likely that a random target of imitation is using outdated or inaccurate information, degrading the innovation and outcomes of the group Rogers (1988). For the group to benefit from social learning, imitation must be deployed selectively Kameda and Nakanishi (2003); Boyd and Richerson (1995); Kendal et al. (2005), both in choosing the appropriate time to learn from others (when strategies) and choosing the appropriate individuals to learn from (who strategies). For example, a “copy-when-uncertain” heuristic allows an individual to deploy social learning only when independent learning becomes challenging. A “copy-successful-individuals” heuristic allows an individual to filter out low-quality social information and target other individuals most likely to increase their own outcomes.

Attention in the study of social learning has increasingly turned from documenting evidence for individual strategies or heuristics to investigating the abilities underlying the flexible use of different strategies Heyes (2016a); Kendal et al. (2018). Participants in social learning experiments often use hybrid strategies, combining multiple sources of who or when information, or deploy different strategies in different contexts McElreath et al. (2008). Thus, it may be useful to view social learning behavior not as the application of an inventory of simple copying rules, but as meaningfully selected and flexibly determined social learning behaviours structured by deeper cognitive abilities. Especially in the case of humans, and some non-human primates, there has been substantial interest in the extent to which social learning relies on abilities like meta-cognition Heyes (2016b) or theory of mind Shafto et al. (2012)

that go beyond pure associative learning

Behrens et al. (2008); Heyes (2012b, a). Meta-cognition and theory of mind allow individuals to maintain explicit representations of “who knows” and thus concentrate social learning on particularly knowledgeable targets for imitation. Similar cognitive abilities have been implicated in organization science as predictors of collective intelligence in small groups Woolley et al. (2010); Engel et al. (2014).

In this paper, we ask whether human social inference abilities may shed light on a puzzle raised by who heuristics like “copy-successful-individuals.” How is knowledge of success actually acquired when rewards are not public? Computational simulations Schlag (1998); Lazer and Friedman (2007); Rendell et al. (2010) and human experiments Mason et al. (2008); Mesoudi (2008); Mason and Watts (2012); Derex et al. (2013) typically provide individuals in the experiments the ability to directly observe the private information of others (sometimes at a cost for the information). In such experiments, “success” in the experimental task—i.e., high task performance—is made clearly and unambiguously visible, such as by displaying the “scores” of coincident participants, so that determining who is successful in the task does not require any inference. A similar assumption is made in the collective intelligence literature, which has examined “transactive memory systems” Wegner (1987); Peltokorpi and Hood (2019) where individuals receive explicit information about “who knows what.”

However, most real-world situations do not involve such direct and unambiguous social information, posing a challenge for heuristics. When information about the success or payoff of an individual’s actions is hidden from others, the benefits of selective copying may actually be reversed; the solutions of different individuals cannot be compared Wisdom et al. (2013). Thus, accounts of selective copying that rely on information about who is successful or knowledgeable must also provide an account of how individuals come to know about others’ success or knowledge. While it is possible that associative learning allows individuals to adopt particular external cues as proxies for private information (e.g. visible health or conspicuous wealth), we suggest that social inference abilities present a more flexible alternative. Humans continually move between different contexts where “success” manifests in different observable behaviors: a reliable cue of success in one environment may not be reliable in another. By inverting a generative model of behavior <e.g.¿jara2016naive,baker2017rational, individuals can make context-sensitive predictions and flexibly infer the hidden success or knowledge of others.

Such inference abilities have been extensively investigated in cognitive science. Psychological studies have shown that even young children are able to rapidly infer which partners are more trustworthy and knowledgeable than others, and prefer to learn from them Wood et al. (2013); Sobel and Kushnir (2013); Poulin-Dubois and Brosseau-Liard (2016); Mills and Landrum (2016), and adults can appropriately discount unreliable social information in their decision-making Hawthorne-Madell and Goodman (2019); Vélez and Gweon (2019); Whalen et al. (2017). However, this cognitive science literature has largely developed independently from work on the consequences of social learning strategies in groups and from the broader literature on collective behaviour. In part, this gap may attributed to the significant methodological challenges associated with running real-time, interactive studies with human groups at the necessary scale <but see¿[]shirado2017locally, almaatouq2021task, almaatouq2022beyond. In the present work we bridge these two literatures by examining the behavior of human groups in a multi-agent sensing task111A multi-agent sensing, or ”collective sensing”, task is one in which there is a spatial field with multiple interacting agents, i.e. individuals capable of action in the environment, searching for some hidden information about the environment. This type of task is related to collective foraging tasks Dechaume-Moncharmont et al. (2005); Goldstone et al. (2005), but the ‘resource’ is not consumable so there are no competitive dynamic. where others’ information, specifically linked to reward or success in the search task, is not directly observable. The specific multi-agent sensing task we use builds on the structure an experiment that was designed to study the collective sensing of fish schools Berdahl et al. (2013), and extends the style of experiments from the economics literature on social learning that involve private information Bikhchandani et al. (1998); Anderson and Holt (1997); Whalen et al. (2017) to a more naturalistic setting.

In our experiments, human participants controlled avatars in a virtual world. Each location corresponded to a hidden score value that fluctuated over time. They could continually observe the movements of other individuals but only had access to the score at their own current location. Across three experiments, we used this virtual environment to investigate how the performance of groups—i.e. how effectively groups of participants tracked the fluctuating score field—changes as a function of group size (Experiment 1); evaluate the individual social learning mechanisms driving collective success (Experiment 2); and study the effect of increasing environment noise/uncertainty on social learning (Experiment 3). Taken together, our work suggests that in novel environments where rewards are not directly accessible, even rudimentary forms of social inference can enable targeted social learning that boosts the group’s overall performance. These findings further emphasize the importance of flexible inference abilities for understanding how people make sense of ambiguous social cues in social learning settings, and further elaborate the centrality of social reasoning to human collective behavior.

2 Results

2.1 Comparing computational models of social learning

How should individuals weigh ambiguous social cues when they do not have direct knowledge of others’ payoffs? In this section, we formalize our key hypothesis: when private information leads to different (task-dependent) behavioral traces, individuals equipped with a rudimentary capacity for social cognition may use observations of these visible traces to make flexible inferences about hidden task performance. As an illustrative example, consider a group of students exploring a large food court. A student in the group may reason that people are more likely to sit down longer to enjoy a meal at a food truck they find they like, and move on quickly from food trucks they don’t like—perhaps without even finishing their meal. This kind of reasoning is sometimes called a forward model or generative model of social reasoning: this reasoning makes different social predictions about behavior depending on unobservable latent states like beliefs and preferences. Critically, a forward model can be inverted to yield predictions about those latent states given observations. For example, after seeing someone lingering at a particular vendor, our student may infer that this vendor is more likely to be good and head over to try it out themselves.

Figure 1: Example states of the multi-agent tracking task used in Exp. 1 and computational simulations. (A) The hidden scoring region is shown in grey, slowly drifting over time. (B) Participants receive a bonus reward upon entering the region. The halo indicating this bonus was only visible to the participant inside the region, and not to the other participants.

This idea is naturally formalized in recent probabilistic models of social cognition <

e.g.¿baker2017rational, jara2016naive. These models suggest that inferential reasoning can be understood as approximately implementing Bayesian statistical inference using rich probabilistic models of the world. We formulate and test a social inference model based in this framework, and compare it to alternatives that suppose human behavior in our experiments can be explained by simpler perceptual heuristic strategies that do not employ explicit social inference. Under our social inference model, agents do not simply operate on shallow perceptual cues, but are able to update their beliefs about the latent states of other agents (e.g. the private reward they are getting) and use these inferences to guide their own downstream behavior. A normative account of belief updating is given by Bayes’ Rule, which derives the “posterior probability” an agent should assign to different (hidden) states of affairs

after making some observations :

Intuitively, this equation decomposes the posterior probability into two terms. The

likelihood term represents the probability that would be observed under different states . And the prior term represents the probability of state in the absence of any information.

As a more concrete domain to explore the consequences of social inferences for collective behavior, we take inspiration from the multi-agent sensing literature Berdahl et al. (2013). In a multi-agent sensing task, a group of agents is placed in a laboratory environment with a shared “score field” that gradually shifts over time (see Fig. 1). This underlying field determines the scalar reward obtained at each spatial location at each time step in the environment. Agents are only able to observe the private reward they are earning at their current location; they cannot see the rewards available elsewhere in the environment, or the rewards being obtained by other agents. To succeed at this task, agents must balance exploitation (staying in a known area of high reward) with exploration (searching for unknown regions that may provide even higher reward). Importantly, however, agents have another source of information at their disposable: the movement trajectories of other agents up to the current time point .

We hypothesise that people are able to use social expectations about how agents will act in order to back out information about the reward at other regions of the environment. In terms of Bayes’ Rule, an agent’s beliefs about the (hidden) reward being obtained by another agent in the environment is derived from integrating their background prior beliefs about the overall distribution of rewards in the environment and the likelihood that an agent would behave that way if they were earning a reward (relative to the way they would behave if they were earning a different reward), . To simplify this calculation, we designed the environment to use a binary score field, yielding particularly strong statistical information from behavioral trajectories. When an agent gets a non-zero reward, they are expected to slow down or stop, relative to how they move when earning no reward. For shorthand, we call the resulting (publicly visible) trajectory “exploiting behavior“ ( = exploit). Because agents are presumed so much more likely to make this movement when receiving a higher reward, i.e.

then, via Bayes’ Rule, the posterior beliefs simplify to . This equation corresponds to the inference that if an individual stops at some location, it is highly likely that there is a large reward there (because otherwise they would be unlikely to stop).222Importantly, the validity of this inference is contingent on the details of the local task environment. It is straightforward inference only when equipped with a predictive model of how agents with certain goals and information will act, but it is not obvious why ’copy players that stop or slow down’ would be pre-equipped as a generic heuristic across environments. The visual appearance of exploiting behavior may differ from situation to situation (e.g. if the spotlight moved in a different way, stopping or slowing may be a sub-optimal response and an unreliable cue). Furthermore, neither our simulated agents nor our human participants will have had prior experience in the specific task we use in our experiment.

Figure 2:

(A) Predictions of different models at best-fitting parameter values, (B) Mean performance of human participants in second half of Experiment 1 as a function of group size. Larger groups saw significant gains in performance. Error bars are 95% bootstrap confidence intervals using the group as the primary bootstrap unit. Only the social inference model is able to account for the magnitude of the group size effect.

We compared the predictions of this model against simpler heuristic alternatives by simulating groups of agents following different strategies. As alternative models, we consider (1) an asocial model, (2) a “move-to-center” heuristic, and (3) a “naive copy” heuristic. In contrast to our social inference model, an agent implementing the baseline asocial strategy does not pay any attention at all to other agents and randomly chooses destinations to explore. An agent implementing the move-to-center heuristic similarly sets a destination to explore but is biased toward the centroid of the other agents’ positions at the time of their choice, where the degree of bias is a free parameter. Finally, an agent implementing the naive copy heuristic randomly chooses between independently exploring versus indiscriminately choosing a single agent to copy (i.e., choosing uniformly from the set of other agents, without making an inference about their current reward).333We implemented all models in the Python programming language, with code and full descriptions of the implementations available at For the score field in our simulations we used a slowly moving circular “spotlight” along paths between randomly chosen locations (Fig. 1). The overall field was hidden from agents, who only had access to the score at their current location. We simulated homogeneous groups of different sizes (2A) and measured the average score achieved across the entire group.444We also examined heterogeneous or “hybrid” groups mixing together different combinations of agents using different strategies. However, performance in these groups was strictly dominated by the proportion of agents using the social inference strategy; we did not observe any particularly compositions yielding synergies between different strategies, so we have omitted these analyses for simplicity. We also included “groups” of size one in order to establish a baseline level of performance in the absence of any social partners.

We find that groups containing individuals with the capacity to make social inferences perform significantly better as a function of group size. These improvements are greater than those observed in alternative models. First, as expected, the asocial model predicts no increase in performance as a function of group size: agents are entirely blind to one another and remain at the individual baseline (). Second, the heuristic models predict a small improvement over the asocial baseline, but even at the best-performing parameter values, they quickly asymptote to a performance ceiling ( for groups up to size six; this threshold is only exceeded in much larger groups of size 16 to 32).555These models only have one free parameter, the “independent explore probability” , which governs the frequency with which agents set an independent destination or use a social heuristic destination. When , these models are equivalent to the asocial model; when , agents are strongly tethered to one another and end up stuck in a clump. The best-performing values for the “move-to-center” model and “naive copy” model were and , respectively. Meanwhile, the social inference model already yields substantially improved performance for small groups ( already at size 3) and continues to improve for larger groups. Thus, in purely qualitative terms, groups with the capacity to infer latent reward information from external behavior see larger benefits of group size above and beyond those predicted by simpler heuristics. However, it is not clear a priori which of these strategies best explains the behavior of real human groups.

2.2 Experiment 1: Multi-agent sensing across group sizes

To evaluate the predictions of these different models, we designed a multi-agent tracking environment where we could examine how groups of human participants behave under the same conditions as the simulated agents we considered in the previous section. Participants were connected into groups of size 1 through 6 over the web and controlled avatars by clicking and using two keyboard keys. Their avatars automatically moved forward, and clicking within the playing area instantly oriented the avatar to move toward the clicked location (visually marked with a cross). Participants could hold the “a” key to accelerate or hold down the “s” key to stop. We used the same “spotlight” score fields as above (i.e. a reward of 1 inside the circular scoring region and 0 everywhere else), and showed participants binary feedback about their current score. Critically, this feedback was only visible to the participant controlling that avatar; participants did not directly observe whether other participants were in the scoring region. They could only see the spatial location and orientation of other participants, which were updated in real time.

We hypothesized that individuals in larger groups would be able to achieve significantly higher scores on average than individuals in smaller groups, as predicted by the social inference model. To test this hypothesis, we examined the average performance of each group across the 5-minute session, to ensure that scores were directly comparable to the simulations. In cases where one or more participants were disconnected or removed during the session, we used the size of the group at the end of the session. We constructed a linear mixed-effects regression predicting each individual participant’s average score, including fixed effects of period (first vs. second half), the continuous number of participants in their environment (one through six), and their interaction. We also included random intercepts for each group and each of the five underlying score fields. First, we found a main effect of practice: scores were significantly higher on the second half of the session (). However, we also found a significant interaction with group size: while performance on the first half was similar across group sizes, the performance of each individual on the second half significantly increased in larger groups, from a second-half score of 0.16 in groups of 1 to a second-half score of 0.24 in groups of 6 (, , ; see Fig. 2B). Critically, the magnitude of these performance gains in human groups exceeded the theoretical limits of the two heuristic models even at the best-fitting parameter values; only the social inference model was capable of explaining the empirical group size effect.666Although we focus on the qualitative differences between models, we note that a small amount of exploration noise must be introduced to the social inference model for an accurate quantitative fit, possibly reflecting limitations on attention and motor control that lead to deviations from perfect copying. This gap is also the motivation for comparing model performance to second-half scores: the simulated “ideal” agents do not face the same learning curve as human participants with respect to aspects of the task such as motor control and instruction comprehension, which we are less interested in explicitly modeling.

2.3 Experiment 2: Evaluating copying strategies

What cognitive abilities allowed humans in Experiment 1 to benefit from social learning even when the payoff information of other individuals is not directly accessible? We hypothesized that human behavior in this environment is driven by two underlying strategies: (1) independent exploration and (2) precise, targeted copying based on social inferences about success. These hypothesized strategies rely on cognitive abilities allowing humans to infer “who knows” about high-scoring locations based on outward behavioral traces (e.g. slowing down or stopping in a region) and also to inhibit social influence to act independently when appropriate. The design of Experiment 1 made it challenging to disentangle these strategies. For example, we were interested in analyzing participant clicks to detect signatures of selective copying, but because there was a unique ‘spotlight’ at each point in time, different copying strategies were confounded: participants who were already obtaining reward and trying to stay inside the spotlight were, by necessity, clicking close to other participants who were obtaining reward, even if they were not intentionally copying them.

For our second set of experiments, then, we designed a sequence of controlled scenarios that are more diagnostic for testing the use of these different strategies. We placed participants into an environment with artificial agents that we designed and controlled, rather than with other humans, and we manipulated the location of the score field to estimate the probability of copying different agents under different conditions. As in our first experiment, participants were given control of an avatar to explore a virtual environment and were rewarded based on their location according to a hidden “score field.” The interface and controls were the same as in Experiment 1, but the procedure differed in several ways. Instead of a single 5-minute session, we designed a sequence of shorter scenarios that were informative for distinguishing between several different potential mechanisms that could be used in the game. These scenarios carefully controlled score field dynamics and bot behaviors.

Figure 3: Design of Experiment 2. (A) The timeline of the test round involves a baseline condition with no score field, and two causal interventions on the score field beginning at approximately 10 seconds and 40 seconds. (B) These interventions manipulate the location of the score field to ensure the participant is receiving high reward or not, respectively, while a subset of other agents are receiving high reward.

After four one-minute practice rounds, where no other agents were present, participants were placed in two one-minute test rounds that were the focus of our analyses. In one of the two test rounds, no other agents were present (non-social condition), and in the other there were four bots in the environment (social condition). Each of these rounds was further divided into three conditions, where we causally intervened on the score fields to better test our hypotheses about exploring, copying, and exploiting behavior. Most of the session was spent in the baseline condition, where there was no non-zero scoring region at all. Around the ten second mark and the forty second mark in each round, we introduced high-scoring regions into the game (see Fig. 3A). In the distant intervention condition, we placed these regions directly on top of two bots. In the local intervention condition, we also placed a high scoring region on the participant, wherever they were at the time, such that they automatically received a high score for roughly the ten second duration that the high scoring regions were present (see Fig. 3B).

For conceptual clarity about our predictions, it is helpful to define three broad ‘states’ an agent may occupy: exploring, exploiting, and copying <cf.¿rendell_why_2010. We define exploiting as selecting an action that maximizes the expected score given the individual’s current knowledge of the environment, i.e. staying close to a known location of the spotlight. We define copying as forward motion, sometimes accelerated, toward the location of another agent. We define exploring as selecting an action that has an unknown outcome, often moving to a region without other agents. In this environment, exploiting, exploring, and copying behavior were associated with distinct and recognizable movements. The social inference model can be operationalized as selective deployment of these three states: exploiting rather than copying or exploring when one is in a high-scoring region, and copying rather than exploring in low-scoring regions only when it can be inferred that another agent is receiving a high score, based on outward behavioral signatures.

Figure 4: Participants selectively copy other agents who appear to be exploiting, but only when they themselves are not receiving reward (i.e. the distant intervention condition). We operationalized copying in terms of the spatial distance from the click to the (nearest) bot’s location, so lower distance is evidence of copying. Error bars are bootstrapped 95% CIs.

The social inference model predicts that participants will selectively copy stopped agents when they themselves were not receiving reward, but otherwise mostly inhibited social influence to explore independently. We tested this prediction by examining which agents participants choose to copy. Because clicking near another agent moved them closer to their target’s location, and success is based on spatial location, we operationalized copying via the proximity of each click to other agents. To test whether participants selectively copy agents who appear to be exploiting, we computed both the distance to the nearest agent who is stopped, and the nearest agent who is not stopped. We then compared copying rates across the two score field conditions: we predicted that participants would inhibit selective copying in the local condition, when they automatically received a score in their current location, relative to the distant condition, where the score field was only placed on top of artificial agents. To test this prediction, we constructed a mixed-effects regression model predicting the proximity of each click to the nearest agent as a function of experimental condition (local vs. distant), visible behavior (exploiting vs. not exploiting), and their interaction. We included the maximal random effects supported by our within-participant design, allowing random intercepts, main effects, and interactions at the participant-level.

We found a significant interaction, , indicating a selective preference for copying other exploiting agents, but only in the condition when the participant was not themselves receiving a reward (see Fig. 4). To control for the possibility that this result is a product of generic biases in the spatial pattern of clicks, rather than the use of social information, we conducted the same analysis on clicks in the non-social condition, where no artificial agents were visible but the underlying score field dynamics were the same. In other words, this condition allows us to examine the proximity of clicks to where other agents would have been. We found no significant interaction in this condition, . Our sample was under-powered for a stronger test of the three-way interaction in a single model testing whether the interaction estimated in the social condition differed significantly from the one in the non-social condition. Unsurprisingly, this three-way interaction was not significant, . Exploring the baseline variability of clicks in non-social environments is likely to be a fruitful target for future work using a more highly-powered sample. Taken together, these results confirm that human participants selectively copy exploiting bots but only when they are not themselves receiving high scores.

2.4 Experiment 3: Generalizing to more complex environments

To generalize our understanding of these findings to more complex environments, we conducted a final experiment using the specific conditions and materials designed by berdahl_emergent_2013 to examine collective sensing in fish, our prior experiments having used simplifications of this environment. These environments are substantially more complex than the binary spotlight and border environments we used in Experiments 1 and 2. They require individuals to use continuous gradients to navigate noisy and fluctuating score fields. We manipulated the level of noise across different groups, predicting that the cognitive abilities discussed in the previous sections may be less reliable under noisier conditions. To test that the social learning strategies identified in the previous experiments also generalize to different external behavioral signatures, we also modified several other aspects of the experiment interface, including the movement controls. This small change created a different behavioral cue of success (spinning in place rather than stopping or slowing), which individuals equipped with social inference mechanisms should be able to use for selective copying just as effectively as participants in the previous experiments. Our analyses focus on two primary questions: (1) how does the introduction of a noisier environment affect average performance, and (2) how do the selective social learning strategies identified in Experiment 2 play out in such an environment, when inferences about the success of other individuals may be less reliable?

Figure 5: Mean performance as a function of group size under different noise conditions. Error bars are 95% bootstrap confidence intervals using the group as the primary bootstrap unit.

2.4.1 Effects of noise on average performance

We begin by analyzing patterns of average performance across groups of different sizes and across the different noise conditions. As our measure of performance, we computed the average score obtained by participants over each half of the experiment. To test effects of performance, we constructed a linear regression model with main effects of group size (1 through 5), half (‘first’ vs. ‘last’) and noise condition (‘low’ or ‘high’), as well as their interactions. All three main effects were significant: all else being equal, scores tended to increase with group size,

, were higher in the second half compared to the first half, , and were higher in the low-noise condition than the high-noise condition, . The only significant interaction was between noise condition and group size, , indicating a stronger effect of group size in the low-noise condition than the high-noise condition (see Fig. 5). We also conducted a mixed-effects regression including random intercepts for each group (i.e. controlling for possible correlations between participants in the same group) and for each score field (i.e. controlling for the possibility that some randomly generated score fields were more difficult than others). We found that the main effects of group size, game half, and noise condition were robust , respectively) but the interaction was no longer significant,

, with the group-level random intercept accounting for the bulk of the additional variance. We suspect this discrepancy is due to a loss in power to detect an interaction after shifting from the participant-level unit of analysis to the group-level unit of analysis, especially given imbalances in sample sizes across noise conditions. Thus, we believe this effect merits further investigation. Overall these results indicate an important role of the environment in group success: under low noise, larger groups perform systemically better than smaller groups, similar to the effect found in Experiment 1, yet this advantage appears to be weaker under high noise.

2.4.2 Analysis of social learning strategies

In order to understand the mechanisms that may have contributed to effects of noise on average performance, we more closely analyzed the underlying behavior of the participants in our games. While we relied on click data as a useful measure of copying in Experiment 2, here we used a simple state-based coarse-graining of participant trajectories based on their keyboard actions. We automatically coded the state of each participant at each point in time — exploring, exploiting, or copying — using a simple set of criteria (see Methods for details; an example is shown in Fig. S3, and in the Supplementary Videos). All of these criteria depended only on public information that was observable to the participant in the game (i.e., the state does not depend directly on the hidden scores of other individuals). Thus, while this classification is purely intended for analyzing and interpreting behavioral patterns (distinct from our computational model), we can use these states as proxies for what other participants might in principle be able to infer from external cues. Additionally, because the states do not depend in any way on actual score values, we can meaningfully quantify the relationship between state and performance.

Figure 6: (A) The probability of an individual being in a particular behavioral state as a function of the individual’s score, combined across both conditions. (B) Participants tended to begin exploiting at lower background values in the high-noise condition, leading to less copying and exploration.

We now proceed to use these state classifications to analyze the behavioral strategies used by participants in our game. First, in line with our finding in Experiment 2 that participants inhibit copying when they find themselves in high-scoring regions, we predicted that the probability of the exploiting state would increase as participants receive higher scores. To test this prediction, we constructed a logistic mixed-effects regression model predicting the probability that each individual is in the ‘exploiting’ state at each time step. We included fixed effects of their current background score and noise condition, as well as their interaction. We also included random intercepts for each group and score field. First, we found a strong main effect of the current score: regardless of noise condition, participants are significantly more likely to exploit in higher scoring locations than in lower scoring locations, (see Fig. 6A). Selective exploiting is clearly adaptive, as participants will tend to remain in high scoring regions but quickly move away from low scoring regions either by exploring independently or by copying other individuals. At the same time, strategies differed dramatically across noise conditions: we find a significant main effect of noise, , indicating that participants are significantly more likely to engage in exploitation in the high noise condition at all background values. We also found a significant interaction between condition and background value, , indicating that the increased likelihood of exploitation is especially pronounced at lower background values (see Fig. 6B). Considering the greatly increased variability both spatially and temporally in the high noise condition, the pattern of exploitation at lower background levels is likely due to participants expecting a smaller range of achievable scores. Similar regressions predicting the probability of ‘copying’ and ‘exploring’ states found that participants were also less likely to be exploring, , and more likely to be copying, , at lower point values.

A lower threshold for exploitation may also help explain gaps in average performance across noise conditions. First, a willingness to exploit at lower point values may, by definition, lead to lower overall performance. Second, it may make copying less effective, preventing social learning mechanisms from improving performance in larger groups. That is, if participants are willing to exploit at lower background values, then external cues of exploiting (i.e. “spinning” behavior) will provide statistically weaker evidence of underlying success. To test this hypothesis, we identified all of the events in our dataset where one participant copied another and measured the current score of the target of copying. We found that targets of copying tend to be in lower scoring regions in the high-noise condition, . These results clarify the interaction between human social learning strategies and environmental conditions, and raise interesting questions about the robustness of social inference based copying.

3 Methods

3.1 Experiment 1: Manipulating group size


We recruited 781 unique participants from Amazon Mechanical Turk to participate in an interactive web experiment Hawkins (2014). All participants were from the United States. After excluding 52 participants due to inactivity or latency, and 9 others for disconnecting in the first half of the game, we were left with usable data from 720 participants in 312 groups. These groups ranged in size from one to six individuals. We paid participants 75 cents for completing our instructions and comprehension checks, and the participants could receive a bonus of up to $1.25 during the five minutes of gameplay. Each point in the game corresponded to $0.01 of bonus. Each participant was also paid 15 cents per minute for any time spent in the waiting room, minus any time that participant spent moving into a wall. These numbers were chosen so that the participants were expected to receive at least a wage of $9 per hour for the totality of their time active in the experiment.


The virtual game environment measured 480 pixels in width and 285 pixels in height. Avatars were represented by triangles that were 10 pixels in length and 5 pixels in width, rotated to the direction the avatar is facing. The avatars automatically moved forward at a constant velocity of 17 pixels per second if no buttons were pressed, but instantaneously increased to a constant velocity of 57 pixels per second for the duration of time that the “a” key was held down and decreased to 0 pixels per second for the duration of time the ”s” key was held down. Locations were updated every 125 milliseconds. As soon as an avatar entered the “spotlight” reward region, it was surrounded by a salient sparkling halo and the border of the playing area turned green (see Supplementary Fig. S1 for screenshots). To discourage inactivity, participants also received 2/3 of a point for each second they were actively participating in the game. For any moment when an avatar was touching a wall, we displayed a large warning message and set the participant’s current score to zero so that they stopped accumulating points.

We generated score fields by first initializing a circular region with a diameter of 50 pixels at a random location on the playing area. Inside this region, the score was set to 1. Outside this region, the score was set to 0. We then moved this region along a straight line to a randomly chosen target location within the playing area at a speed of 17 pixels per second. Once it reached this location, we selected another target location, and repeated the process for the duration of the 5-minute session. We pre-generated 5 unique score fields in this way, and randomly groups to one of these fields.


After agreeing to participate in our experiment, participants were presented with a set of instructions describing the mechanics of the game, using a cover story framing the game as a search for the “magical bonus region”. The participants were informed about the dynamics of the underlying score field and also explicitly informed that “There is no competition between players; the magical region is not consumed by players. It simply changes location over time.” Participants were not explicitly instructed or suggested to cooperate or coordinate with each other. After successfully completing a comprehension test, participants were redirected to a waiting room. Each waiting room was assigned a group size between 1 and 6, and the game began as soon as the target number of participants was reached, or after 5 minutes of waiting, whichever came first. While in the waiting room, participants could familiarize themselves with the controls of the game. Participants were not shown any score in the waiting room unless the participant was against a wall, in which case the border of the playing region would turn red and a warning appeared on screen. All participants spent at least one minute in the waiting room to help ensure familiarity with the controls before starting the game. Participants then played a single continuous game lasting for 5 minutes, and were paid a bonus proportional to the total score they (individually) accrued. Both in the waiting room and the actual game, participants were removed for inactivity if we detected that they had switched to another browser tab for more than 30 seconds total throughout the game or if the participant’s avatar was moving into a wall for 30 consecutive seconds. We also removed participants if their ping response latencies were greater than 125ms for more than 75 seconds in total throughout the game. To minimize disruption of large groups, we allowed multi-participant games to continue after a participant disconnected or was removed, as long as at one or more participants remained.

3.2 Experiment 2: Manipulating the behavior of other agents in micro-scenarios


We recruited 28 unique participants from Amazon Mechanical Turk. All participants were from the United States.

Stimuli & procedure.

To acclimate participants to the task environment, each game began with four one-minute long practice rounds. In the first and third practice rounds, the score field was visible to the participant so they could observe its dynamics. In the second and fourth practice rounds, the score field was invisible to the participants, as in Experiment 1. Additionally, we randomized participants into two different groups, who practiced with different score field dynamics. In a “wall-following” pattern, the high scoring region moved contiguously along the walls of the playing area. In a “random-walk” pattern, the high scoring region slowly drifted, as in Experiment 1, from one random location to another within the playing region. Because we did not observe substantial differences in participant behavior depending on the score field dynamics observed during the practice phase, we collapsed over this factor in our analyses.

Bots followed a simple selective copying algorithm. They were programmed to immediately stop upon entering a high-scoring area. If a bot in the environment was stopped, they copied the stopped bot. Otherwise, they explored non-socially. The wall-following bots only copied other wall-following bots, and the bots in the center region similarly only copied each other. Bots were not responsive to the participant’s behavior, only to each other. In the non-social round, we simulated where the same bots would be, so that the distribution of score field positions was held constant across the two conditions. The score field manipulation was triggered for the bots approximately two seconds after it was triggered for the participant in the local intervention condition. We offset the trigger time in order to ensure that participants were already aware of their own score before observing any reward-related bot behavior.

The within-game interventions were implemented for bots as follows. First, in the baseline condition, when there was no score field and all bots were randomly exploring, two were randomly exploring along walls (in association with the wall score field) and two were exploring the center region (associated with the random walk score field). For the interventions, we superimposed the wall-following and random-walk score field patterns to create a bi-modal dynamic score field. One high-scoring region was centered on a wall-following bot and one high scoring-region was centered on a bot in the center region. We randomized both the order of the social and non-social micro-sessions and the order in which the distant and local interventions appeared within each session.

3.3 Experiment 3: Manipulating noise in the environment


We recruited 563 unique participants from Amazon Mechanical Turk to participate in our experiment. All participants were from the United States. After excluding 72 participants due to inactivity or latency, and 6 others for disconnecting in the first half of the game, we were left with usable data from 437 participants in 224 groups. 113 individuals (63 groups) were in the low noise condition and 324 individuals (161 groups) were in the high noise condition. These groups ranged in size from one to six individuals. Since only one group of size six completed the task without disconnections, we ignored this group in our analysis.

Stimuli and Procedure

The primary change from Experiments 1 and 2 was replacing the binary score field with a more complex, gradient score landscape. These more complex fields were generated using the method reported by berdahl_emergent_2013. We began with the same randomly moving “spotlight” as before. However, we then combined the spotlight with a field of spatially correlated, temporally varying noise. By manipulating the proportional weighting of the noise field and the spotlight, we generated two different conditions, corresponding to two of the noise levels used by Berdahl et al.. In the low noise condition, the spotlight was weighted strongly compared to the noise field (10% noise), with the noise field providing minor background variation (see Supplemental Fig. S2, left). In the high noise condition, the weighting of the noise field was increased (25% noise), providing more extreme fluctuation outside of the spotlight (see Supplemental Fig. S2, right). To decrease variability and increase statistical power, we generated only four distinct score fields per noise level, so multiple groups experienced the same fields.

In addition to these more complex score fields, we made several adjustments to the interface. First, rather than showing their current score as binary—a glowing halo around the participant when inside the spotlight—their score was presented as a percentage at the top of the playing area (see Supplemental Fig. S4 for a screenshot). Second, rather than clicking to change direction, participants controlled their avatars using their keyboard. The left and right arrow keys were used to turn (at a rate of per second) and the spacebar was used to accelerate. Unlike before, we did not provide a mechanism to stop completely. Given the closer relation to berdahl_emergent_2013 in this experiment, it is also relevant that the speeds of the avatars and the playing area dimensions () throughout all of our experiments were matched to those reported by Berdahl et al.; in this experiment, we additionally used the same total task length of six minutes. The procedure was otherwise identical to Experiment 1.

Automated state classification

Our criteria for classifying agents into one of the three states are as follows.

  • Exploiting behavior was not trivial for participants since the avatars always move at least at a slow constant velocity. Unlike in the previous experiments, where the “s” could could be pressed to stop in place, a participant in Experiment 3 could either meander around a particular location or persistently hold down one of the arrow keys while moving at a slow speed, which creates a relatively tight circular motion around a particular location. We call this second activity “spinning” because of its distinctive appearance. We then classify a participant as exploiting if the participant is spinning for 1 second, or if the participant moves at a slow speed for 3 seconds and has not traveled more than two thirds of the possible distance that the participant could have traveled in that time. The second condition is captures the meandering behavior of individuals who have not discovered how to spin.

  • Copying behavior is more difficult to identify, but is likely characterized by directed movements towards other participants. We thus classify a participant as copying if they move at the fast speed in a straight line towards any particular other participant for a 500ms window. We consider a participant to be moving towards another participant if the second participant is within a on either side of the first participant’s straight-line trajectory. It is possible that this coding scheme under-estimates the overall rates of what we might still want to call ‘copying’ behaviors: for example, it does not capture more graded biases toward the group centroid. Because such under-estimation would be constant across background values, we do not expect it to affect the comparisons of interest.

  • We classify behavior as exploring if the participant is neither exploiting nor copying. Thus, a participant will be classified as exploring if that participant is either moving slowly but not staying in the same general location, if the participant is moving quickly but not towards any particular person, or if the participant is moving quickly and turning.

4 Acknowledgments

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1122374 to PK and Grant No. DGE-114747 to RXDH. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the National Science Foundation. This material is based upon work supported by the Center for Minds, Brains and Machines (CBMM), funded by NSF STC award CCF-1231216. Special thanks to Colin Torney for providing the code to make the score field gradients and Robert Goldstone for helpful feedback on the interpretation of our results.


  • L. R. Anderson and C. A. Holt (1997) Information cascades in the laboratory. The American Economic Review, pp. 847–862. Cited by: §1.
  • T. E. Behrens, L. T. Hunt, M. W. Woolrich, and M. F. Rushworth (2008) Associative learning of social value. Nature 456 (7219), pp. 245–249. Cited by: §1.
  • A. Berdahl, C. J. Torney, C. C. Ioannou, J. J. Faria, and I. D. Couzin (2013) Emergent Sensing of Complex Environments by Mobile Animal Groups. Science 339 (6119). Cited by: §1, §2.1, §3.3.
  • S. Bikhchandani, D. Hirshleifer, and I. Welch (1998) Learning from the behavior of others: conformity, fads, and informational cascades. Journal of economic perspectives 12 (3), pp. 151–170. Cited by: §1.
  • R. Boyd and P. J. Richerson (1995) Why does culture increase human adaptability?. Ethology and sociobiology 16, pp. 125–125. Cited by: §1.
  • F. Dechaume-Moncharmont, A. Dornhaus, A. I. Houston, J. M. McNamara, E. J. Collins, and N. R. Franks (2005) The hidden cost of information in collective foraging. Proceedings of the Royal Society B: Biological Sciences 272 (1573), pp. 1689–1695. Cited by: footnote 1.
  • M. Derex, M. Beugin, B. Godelle, and M. Raymond (2013) Experimental evidence for the influence of group size on cultural complexity. Nature 503 (7476), pp. 389–391. Cited by: §1.
  • D. Engel, A. W. Woolley, L. X. Jing, C. F. Chabris, and T. W. Malone (2014) Reading the mind in the eyes or reading between the lines? Theory of mind predicts collective intelligence equally well online and face-to-face. PlOS ONE 9 (12). Cited by: §1.
  • R. L. Goldstone, B. C. Ashpole, and M. E. Roberts (2005) Knowledge of resources and competitors in human foraging. Psychonomic Bulletin & Review 12 (1), pp. 81–87. Cited by: footnote 1.
  • R. D. Hawkins (2014) Conducting real-time multiplayer experiments on the web. Behavior Research Methods 47, pp. 966–976. Cited by: §3.1.
  • D. Hawthorne-Madell and N. D. Goodman (2019) Reasoning about social sources to learn from actions and outcomes.. Decision 6 (1), pp. 17. Cited by: §1.
  • C. Heyes (2012a) Simple minds: a qualified defence of associative learning. Philosophical Transactions of the Royal Society B: Biological Sciences 367 (1603), pp. 2695–2703. Cited by: §1.
  • C. Heyes (2012b) What’s social about social learning?. Journal of Comparative Psychology 126 (2), pp. 193–202. Cited by: §1.
  • C. Heyes (2016a) Blackboxing: social learning strategies and cultural evolution. Philosophical Transactions of the Royal Society B: Biological Sciences 371 (1693), pp. 20150369. Cited by: §1.
  • C. Heyes (2016b) Who knows? metacognitive social learning strategies. Trends in Cognitive Sciences 20 (3), pp. 204–213. Cited by: §1.
  • W. Hoppitt and K. N. Laland (2013) Social learning: an introduction to mechanisms, methods, and models. Princeton University Press, Princeton, NJ. Cited by: §1.
  • T. Kameda and D. Nakanishi (2003) Does social/cultural learning increase human adaptability?: rogers’s question revisited. Evolution and Human Behavior 24 (4), pp. 242–260. Cited by: §1.
  • R. L. Kendal, N. J. Boogert, L. Rendell, K. N. Laland, M. Webster, and P. L. Jones (2018) Social learning strategies: bridge-building between fields. Trends in Cognitive Sciences 22 (7), pp. 651–665. Cited by: §1.
  • R. L. Kendal, I. Coolen, Y. van Bergen, and K. N. Laland (2005) Trade-offs in the adaptive use of social and asocial learning. Advances in the Study of Behavior 35, pp. 333–379. Cited by: §1.
  • K. N. Laland (2004) Social learning strategies. Animal Learning & Behavior 32 (1), pp. 4–14. Cited by: §1.
  • K. N. Laland (2017) Darwin’s unfinished symphony. Princeton University Press, Princeton, NJ. Cited by: §1.
  • D. Lazer and A. Friedman (2007) The network structure of exploration and exploitation. Administrative Science Quarterly 52 (4), pp. 667–694. Cited by: §1.
  • W. Mason, A. Jones, and R. L. Goldstone (2008) Propagation of innovations in networked groups.. Journal of Experimental Psychology: General 137 (3), pp. 422. Cited by: §1.
  • W. Mason and D. J. Watts (2012) Collaborative learning in networks. Proceedings of the National Academy of Sciences 109 (3), pp. 764–769. Cited by: §1.
  • R. McElreath, A. V. Bell, C. Efferson, M. Lubell, P. J. Richerson, and T. Waring (2008) Beyond existence and aiming outside the laboratory: estimating frequency-dependent and pay-off-biased social learning strategies. Philosophical Transactions of the Royal Society B: Biological Sciences 363 (1509), pp. 3515–3528. Cited by: §1.
  • A. Mesoudi (2008) An experimental simulation of the “copy-successful-individuals” cultural learning strategy: adaptive landscapes, producer–scrounger dynamics, and informational access costs. Evolution and Human Behavior 29 (5), pp. 350–363. Cited by: §1.
  • C. M. Mills and A. R. Landrum (2016) Learning who knows what: children adjust their inquiry to gather information from others. Frontiers in Psychology 7, pp. 951. Cited by: §1.
  • V. Peltokorpi and A. C. Hood (2019) Communication in theory and research on transactive memory systems: a literature review. Topics in Cognitive Science 11 (4), pp. 644–667. Cited by: §1.
  • D. Poulin-Dubois and P. Brosseau-Liard (2016) The developmental origins of selective social learning. Current Directions in Psychological Science 25 (1), pp. 60–64. Cited by: §1.
  • L. Rendell, R. Boyd, D. Cownden, M. Enquist, K. Eriksson, M. W. Feldman, L. Fogarty, S. Ghirlanda, T. Lillicrap, and K. N. Laland (2010) Why copy others? insights from the social learning strategies tournament. Science 328 (5975), pp. 208–213. Cited by: §1.
  • L. Rendell, L. Fogarty, W. J.E. Hoppitt, T. J.H. Morgan, M. M. Webster, and K. N. Laland (2011) Cognitive culture: theoretical and empirical insights into social learning strategies. Trends in Cognitive Sciences 15 (2), pp. 68–76. Cited by: §1.
  • A. R. Rogers (1988) Does Biology Constrain Culture?. American Anthropologist 90 (4), pp. 819–831 (en). External Links: ISSN 1548-1433 Cited by: §1.
  • K. H. Schlag (1998) Why imitate, and if so, how?: a boundedly rational approach to multi-armed bandits. Journal of economic theory 78 (1), pp. 130–156. Cited by: §1.
  • P. Shafto, N. D. Goodman, and M. C. Frank (2012) Learning from others: the consequences of psychological reasoning for human learning. Perspectives on Psychological Science 7 (4), pp. 341–351. Cited by: §1.
  • D. M. Sobel and T. Kushnir (2013) Knowledge matters: how children evaluate the reliability of testimony as a process of rational inference.. Psychological Review 120 (4), pp. 779. Cited by: §1.
  • N. Vélez and H. Gweon (2019) Integrating incomplete information with imperfect advice. Topics in Cognitive Science 11 (2), pp. 299–315. Cited by: §1.
  • D. M. Wegner (1987) Transactive memory: a contemporary analysis of the group mind. In Theories of Group Behavior, pp. 185–208. Cited by: §1.
  • A. Whalen, T. L. Griffiths, and D. Buchsbaum (2017) Sensitivity to shared information in social learning. Cognitive Science 42 (1), pp. 168–187. Cited by: §1.
  • T. N. Wisdom, X. Song, and R. L. Goldstone (2013) Social Learning Strategies in Networked Groups. Cognitive Science 37 (8), pp. 1383–1425. Cited by: §1.
  • L. A. Wood, R. L. Kendal, and E. G. Flynn (2013) Whom do children copy? Model-based biases in social learning. Developmental Review 33 (4), pp. 341–356. Cited by: §1.
  • A. W. Woolley, C. F. Chabris, A. Pentland, N. Hashmi, and T. W. Malone (2010) Evidence for a collective intelligence factor in the performance of human groups. Science 330 (6004), pp. 686–688. Cited by: §1.