Empirica: a virtual lab for high-throughput macro-level experiments

06/19/2020 ∙ by Abdullah Almaatouq, et al. ∙ 0

Virtual labs allow researchers to design high-throughput and macro-level experiments that are not feasible in traditional in-person physical lab settings. Despite the increasing popularity of online research, researchers still face many technical and logistical barriers when designing and deploying virtual lab experiments. While several platforms exist to facilitate the development of virtual lab experiments, they typically present researchers with a stark trade-off between usability and functionality. We introduce Empirica: a modular virtual lab that offers a solution to the usability-functionality trade-off by employing a "flexible defaults" design strategy. This strategy enables us to maintain complete "build anything" flexibility while offering a development platform that is accessible to novice programmers. Empirica's architecture is designed to allow for parameterizable experimental designs, re-usable protocols, and rapid development. These features will increase the accessibility of virtual lab experiments, remove barriers to innovation in experiment design, and enable rapid progress in the understanding of distributed human computation.



There are no comments yet.


page 1

page 3

page 4

page 7

page 8

page 9

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Related Work

1.1 Virtual Lab Participants

It has long been recognized that the internet presents researchers with new opportunities to recruit remote participants for behavioral, social, and economic experiments. For instance, remote participation allows researchers to recruit more diverse samples of participants than are available on college campuses or local communities. It also facilitates longitudinal and other multi-phase studies by eliminating the need for participants to repeatedly travel to the laboratory. The flexibility around time and space that is afforded by remote participation has enabled researchers to design experiments that would be difficult or even impossible to run in a physical lab.

Arguably the most common current strategy for recruiting online participants involves crowdsourcing services [17, 23]. The main impact of these services has been to dramatically reduce the cost and difficulty of recruiting participants, resulting in an extraordinary number of publications in the past decade. Unfortunately a limitation of the most popular platforms such as Amazon Mechanical Turk is that they were designed for simple labeling tasks that can typically be completed independently and with little effort by individual “workers” who vary widely in quality and persistence on the service [15]. Moreover, Amazon’s terms of use prevent researchers from knowing whether their participants have participated in similar experiments in the past, raising concerns that many Amazon “turkers” are becoming “professional” experiment participants [10]. In response to concerns such as these, services such as Prolific111https://www.prolific.co/ [25] have adapted the crowd work model to accommodate the special needs of behavioral research. For example, Prolific offers researchers more control over participant sampling and quality as well as recruiting participants who are intrinsically motivated to contribute to scientific studies.

In addition to crowdsourcing services, online experiments have attracted even larger and more diverse populations of participants who participate voluntarily out of intrinsic interest to assist in scientific research. For example, one experiment collected almost forty million moral decisions from over a million unique participants in over 200 countries [4]. Unfortunately, while the appeal of “massive samples for free” is obvious, all such experiments necessarily rely on some combination of gamification, personalized feed-back, and other strategies to make participation intrinsically rewarding [16]. As a consequence, the model has proven hard to generalize to arbitrary research questions of interest.

1.2 Existing Virtual Lab Solutions

While early online experiments often required extensive up-front customized software development, a number of virtual lab software packages and frameworks have now been developed that reduce the overhead associated with building and running experiments. As a result, it is now easier to implement designs in which dozens of individuals interact synchronously in groups [3, 2, 37] or via networks [7], potentially comprising a mixture of human and algorithmic agents [19, 32, 31].

Virtual lab solutions can be roughly grouped by their emphasis on usability or functionality. Here we describe free or open-source tools that allow synchronous, real-time interaction between participants, leaving aside tools such as jsPsych 

[12] and Pushkin [16] that do not explicitly support multi-participant interactions as well as commercial platforms such as LabVanced [13].

Platforms such as WEXTOR [28], Breadboard [24], and LIONESS [14] provide excellent options for individuals with little-to-no coding experience. These platforms allow researchers to design their experiments either directly with a graphical user interface (GUI) or via a simple, proprietary scripting language. However, while these structures enable researchers to quickly develop experiments within predetermined paradigms, they constrain the range of possible interface designs. These platforms do not allow the researcher to design “anything that can run in a web browser.”

On the other hand, many excellent tools including oTree [11], nodeGame [6], Dallinger222http://docs.dallinger.io, and TurkServer [21] offer high flexibility in experiment design. However, this flexibility comes at the expense of decreased usability, as these tools require significant time and skill to employ. They are flexible precisely because they are very general, which means additional labor is required to achieve any complete design.

2 Empirica

The Empirica platform333https://empirica.ly is a free, open-source, general-purpose virtual lab platform for developing and conducting synchronous and interactive human-participant experiments. The platform implements an Application Programming Interface (API) that allows an experiment designer to devote their effort to implementing participant-facing views and experiment-specific logic. In the background, Empirica handles the necessary but generic tasks of coordinating browser/server interactions, batching participants, launching games, and storing/retrieving data.

Experiments are deployed from a GUI web interface that allows the researcher to watch the experiment progress in real-time. With no installation required on the participant’s part, experiments can run on any web browser including desktop computers, laptops, smartphones, and tablets.

Empirica is designed using a “flexible default” strategy: the platform provides a default structure and settings that enable novice javascript users to design an experiment by modifying pre-populated templates; at the same time, unlimited customization is possible for advanced users. The goal of this design is to develop a platform that is accessible to researchers with modest programming experience — the target user is the typical computational social science researcher — while maintaining a “build anything” level of flexibility.

Empirica has an active and growing community of contributors, including professional developers, method-focused researchers, question-driven social scientists, and outcome-oriented professionals.

2.1 System Design

Figure 2: Empirica provides a scaffolding for researchers to design and administer experiments via three components: (1) Server-side callbacks use Javascript to define the running of a game through the client-side and server-side API; (2) the client-side interface uses Javascript to define the player experience; and (3) the GUI admin interface enables configuration and monitoring of experiments. These components are all run and connected by the Empirica core engine.

Empirica’s architecture was designed from the start to enable real-time multi-participant interactions, although single player experiments are easy to create as well. The API is purposefully concise, using a combination of data synchronization primitives and callbacks (i.e., event hooks) triggered in different parts of the experiment. The core functionality is abstracted by the platform: data synchronization, concurrency control, reactivity, network communication, experiment sequencing, persistent storage, timer management, and other low-level functions are provided automatically by Empirica. As a result, researchers can focus on designing the logic of their participants’ experience (see Figure 2 for an overview).

To initiate development, Empirica provides an experiment scaffold generator that initializes an empty (but playable) experiment and a simple project organization that encourages modular thinking. To design an experiment, researchers separately configure the client (front end), which defines everything that participants experience in their web browser, thus defining the experimental treatment or stimulus; and the server (back end), which consists of callbacks defining the logic of an experimental trial. The front end consists of a sequence of five modules: consent, intro (e.g., instructions, quiz), lobby, game, and outro (e.g., survey). The lobby is automatically generated and managed by Empirica according to parameters set in the GUI. The researcher need only modify the intro, outro, and game design via javascript. The back end consists of callbacks defining game initialization, start/end behavior for rounds and stages, and event handlers for changes in data states.

Empirica structures the game (experimental trial) as players (humans or artificial participants) interacting in an environment defined by one or more rounds (to allow for “repeated” play); each round consists of one or more stages (discrete time steps); and each stage allows players to interact continuously in real time. Empirica provides a timer function which can automatically advance the game from stage to stage, or researchers can define logic that advances games based on participant behavior or other conditions.

As Empirica requires some level of programming experience for experiment development, the platform accommodates the possibility that different individuals may be responsible for designing, programming, and administering experiments. To support this division of labor, Empirica provides a high-level interface for the selection of experimental conditions and the administration of live trials. From this interface, experiment administrators can assign players to trials, manage participants, and monitor the status of games. Experiment designers can configure games to have different factors and treatments, and export or import machine-readable YAML files that fully specify entire experiment protocols (i.e., the data generation process) and support replication via experiment-as-code. Experiment configuration files can also be generated programmatically by researchers wishing to employ procedural generation and adaptive experimentation methods to effectively and efficiently explore the parameter space.

Figure 3: This screenshot of the “Guess the Correlation Game” shows the view that participants use to update their social network. The interface uses reactive and performant front-end components.
Figure 4: This screenshot of the “Detective Game” shows the view that participants use to categorize mystery clues as either Promising Leads (which are shared with their social network neighbors) or Dead Ends (which are not). The interface uses reactive and performant front-end components.
Figure 5: This screenshot shows the “Room Assignment” task. The real-time interaction, the ability to assign students to rooms in parallel, and text-based chat employs default features and interaction components provided by Empirica.
Figure 6:

This screenshot shows the second stage of the first round of the revised “Politics Challenge” estimation task. The illustrated breadcrumb feature employs customized default UI elements provided from Empirica, and the timer was employed without modification.

2.2 Implementation

Empirica is built using common web development tools. It is based on the Meteor444https://www.meteor.com/ application development framework and employs Javascript on both the front end (browser) and the back end (server). Meteor implements tooling for data reactivity around the MongoDB database, websockets, and RPC (remote procedure calls). Meteor also has strong authentication, which secures the integrated admin interface. Most experiment designers will not need to be familiar with Meteor to use the Empirica platform.

The front end is built with the UI framework React,555https://reactjs.org/ which supports the system’s reactive data model. Automatic data reactivity implemented by Empirica alleviates the need for the experiment designer to be concerned with data synchronization between players. React is also very popular, with many resources from libraries to learning courses and a large talent pool of experienced developers. For Empirica, React is also desirable because it encourages a modular, reusable design philosophy. Empirica extends these front-end libraries by providing experimenter-oriented UI components such as breadcrumbs showing experiment progression, player profile displays, and user input components (e.g., Sliders, text-based Chat, Random Dot Kinematogram). These defaults reduce the burden on experiment designers while maintaining complete customizability.

Empirica’s back end is implemented in Node.js.666https://nodejs.org/en/ Callbacks are the foundation of the server-side API. Callbacks are hooks where the experiment developer can add custom behavior. These callbacks are triggered by events of an experiment run (onRoundStart, onRoundEnd, onGameEnd, ...). The developer is given access to the data related to each event involving players and games and can thus define logic in javascript that will inspect and modify this data as experiments are running.

This design allows Empirica to reduce the technical burden on experiment designers by providing a data interface that is tailored to the needs of behavioral lab experiments. The developer has no need to interact with the database directly. Rather, Empirica provides simple accessors (get, set, append, log) that facilitate data monitoring and updating. These accessor methods are available on both the front end and the back end. All data is scoped to an experiment-relevant construct such as game, player, round, or stage. Data can also be scoped to the intersection of two constructs, e.g., a player and a game object: player.round and player.stage which contain the data for a player at a given round or stage. The accessor methods are reactive, meaning that data is automatically saved and propagated to all players.

Another ease-of-use feature is that an Empirica experiment is initialized with a one-line command in the terminal (Windows, macOS, Linux) to populate an empty project scaffold. A simple file structure separates front-end (client) code from back-end (server) code to simplify the development process. Because Empirica is built using the widely adopted Meteor framework, a completed experiment can also be deployed with a single command to either an in-house server or to a software-as-a-service platform such as Meteor Galaxy. Additionally, Empirica provides its own simple open-source tool to facilitate deploying Empirica experiments to the cloud for production.777https://github.com/empiricaly/meteor-deploy This ease of use facilitates iterative development cycles in which researchers can rapidly revise and re-deploy experiment designs.

Empirica is designed to operate with online labor markets such as Amazon Mechanical Turk or other participant recruitment sources (e.g., volunteers, in-person participants, classrooms).

3 Case Studies

Throughout its development, Empirica has been used in the design of cutting-edge experimental research. Below we illustrate Empirica’s power and flexibility in four examples, each of which highlights a different functionality.

3.1 Exploring the parameter space: Dynamic social networks and collective intelligence

The “Guess the Correlation” [1] game was developed to study how individuals’ local decisions to form or break social ties could change the global structure of social networks, and the network’s subsequent effect on group performance in changing environments (Figure 3).

In this experiment, participants were asked to estimate the strength of statistical correlations between two variables (such as height and weight) that were graphed on a scatter plot. In subsequent stages, participants observed the real-time estimates of their neighbors in a social network, updated their guesses, received feedback on their performance and that of their neighbors, and updated their social ties for the next round. Without the participants’ knowledge, the experimenters introduced distracting statistical noise into the graphed data to introduce variation in individual performance. Then, in the middle of the 20-round series, they abruptly shuffled the noise levels (thereby inducing a change in which individuals provided the best information to their social network neighbors). The experimenters measured how the network structure evolved in response to these external quality shocks, and the effect of network changes on performance. This experiment is complicated by the fact that even subtle changes in the environmental conditions (e.g., how a situation is framed, incentive structure, availability and quality of feedback, cost of rewiring, difficulty of the task, or the identities of the participants) can lead to dramatically different macro-scale outcomes.

Empirica’s configuration interface allowed the experimenters to parameterize the entire experiment design, and then to systematically sample the parameter space without the need for changing a single line of code. The final publication tested seven experimental conditions with varied levels of social interaction and quality of observed performance feedback.

This experiment showed that even the best-performing individuals benefited from interacting with a network of peers, and that dynamic networks — in which individuals chose their collaborators — improved performance significantly compared with static networks. The experimenters also found that when given full feedback, networks of participants adapted to changes by shifting influence to people who had better information, thereby substantially reducing individual error and benefiting from collective wisdom.

3.2 Real-time interaction at scale: An 80-player game of high-speed “Clue”

The “Detective Game” [18] was developed to study the social contagion of interacting beliefs. Prior theoretical work predicted that when beliefs influenced one another’s likelihood of adoption, new mechanisms could emerge that shaped social polarization and collective sensemaking. If these predictions could be shown experimentally, traditional models of “independent” diffusion would be insufficient to describe the spread of “interdependent” ideas.

In the detective game, 80 individuals in four artificial social networks exchange a set of “clues” with one another in real-time, competing in teams to solve a fictional “mystery.” The game is fast-paced, with individuals reacting to new information and constantly synthesizing an understanding of which suspect committed the crime, and how. Elements on the screen move dynamically as an individual’s neighbors update their beliefs, and as the player drags clues around in their “detective’s notebook,” shown in Figure 4.

Outcomes of the game are measured at the network level, and so are sensitive to dropout and inattention. Running the experiment thus requires coordinating a large number of individuals to begin the task simultaneously, interact in real time, and maintain high levels of engagement. To support this level of engagement, the experimenter needed to ensure that wait times were short, that interactive elements used modern high-performance display libraries, and that latency between the server and client was imperceptible to the user. At the same time, the codebase needed to be understandable by other researchers, so that the details of the (fairly complex) implementation could be effectively critiqued.

This game would have been impossible to design using a discrete time (i.e., round/turn-based) experiment platform, as the pace of interaction would have been tedious for participants and the resulting data quality would be low. It would also have been challenging in a platform with pre-specified display patterns. Empirica’s API design allowed the experiment designer to try various visualization libraries before finding one with the required performance at the right level of abstraction. Empirica’s back end provided seamless real-time coordination between the server and client using a straightforward syntax, and scaled to handle the volume of interaction the application needed. This case study illustrates the benefits of a flexible default design: the default scaffold enabled rapid prototyping for initial development, while the deep flexibility ensured that the final product could meet all of the experiment design needs.

The study supported theoretical predictions that interactions between beliefs can create new mechanisms of social polarization, which interact with familiar drivers of polarization such as homophily and social network distance.

3.3 Two-phase experiment design: Distributed human computation problems

The “Room Assignment” game [2] was developed to explore the factors that allow a team working together to outperform its individual members. The experiment examined (1) under what task complexity conditions, if any, do groups of problem-solvers outperform individual problem-solvers, and (2) which of several factors about the group composition (e.g., skill level, skill diversity, social perceptiveness level, cognitive style diversity, etc.) predict the quality of the found solution.

In the game, participants were asked to assign “students” to “rooms” where each student had a specified utility for each room. Their objective was to maximize total student utility while also respecting constraints (e.g., “Students A and B may not share a room or an adjacent room”). The task was first completed individually, and then repeated in groups. Groups were allowed to communicate via text-based chat and move different “students” simultaneously to perform parallel processing and computation (Figure 5). This task is a specific instance of a Constraint Satisfaction and Optimization Problem (CSOPs), and is characterized by a rugged payoff function with many locally optimal but globally suboptimal solutions. The complexity of the task can be systematically varied by changing the number of students (), rooms (), or constraints ().

To answer their research questions, the experimenters implemented in Empirica a “two-phase” experiment design in which the same group of participants performed the task twice, at one-week intervals. Phase one was used for gathering ex-ante measurements of each participant’s skill level on the room assignment problem, social perceptiveness level, and cognitive style (i.e., solution search strategy). Phase two randomized these same individuals into “individual” or “group” conditions within blocks based upon the attributes measured previously in phase one.

In addition to the challenge of implementing real-time interaction and text-based chat, this design required experimenters to recruit and reliably match the same subject pool across the two phases of this experiment, and to coordinate a large block-randomized design. Empirica’s careful participant data management and flexible randomization architecture made these features possible with a simplicity that could not have been achieved in most other platforms.

With the statistical power enabled by the two-stage block-randomized design, the experimenters were able to show that while groups are more efficient than individuals when the task is complex, this relationship is reversed when the task is simple. They were also able to identify the average skill level of team members as the dominant predictor of group performance, over many other candidate factors.

3.4 Rapid-turnaround replication: Echo chambers and belief accuracy

The “Estimation Challenge” experiment [8] was designed to test how social interaction shapes belief accuracy and belief polarization in echo chambers. This experiment tasked participants with completing a series of numeric estimations related to polarizing political topics before and after being able to see each other’s answers. For example, one question asked, “How has the number of unauthorized immigrants living in the US changed in the past 10 years?” The experimenters found that participants became more accurate and less polarized after social exchange, despite the absence of cross-cutting ties between Democrats and Republicans.

This experiment was initially built using a proprietary platform that was built in partnership with a third-party developer. The experiment tasked groups with answering four unique questions. For each question, subjects were asked to provide their answer three times, and were able to see peer responses before providing the second and third response. Behind the scenes, the platform generated an ad-hoc social network to determine whose responses each subject saw and mimic the flow of information through online social media sites. The interface design alone took several months to implement.

After submitting these results for publication, the reviewers expressed concern that the experiment design did not fully capture the effects of a politicized environment. The experimenters were given 60 days to revise and resubmit their paper.

It would have been infeasible to extend the experiment to respond to reviewers’ questions within the allotted time had the experimenters continued to use the original proprietary software. Using Empirica, they were able to rebuild the original experiment to replicate their initial findings, while also extending the user interface to address the questions posed by reviewers, as seen in Figure 6. This redesign provided more a overtly politicized style, a rebrand as “The Politics Challenge,” and new questions targeting more divisive topics. The new interface was designed, constructed and tested in approximately two weeks.

This experiment required negligible alteration from the prepopulated Empirica scaffolding beyond customizing the visual design, demonstrating the capability of flexible defaults. Once the design was complete, the conditions requested by the reviewers were configurable in Empirica’s post-development GUI admin interface. The social network was generated using the onGameInit() callback. The experimenter’s choice to replicate and extend their findings in a different platform improved the reviewers’ confidence in the results by showing that outcomes replicated even under explicitly politicized conditions.

4 Discussion

4.1 Ethical considerations

As with any human subjects research, virtual lab experiments are subject to ethical considerations. These include (but are not limited to) pay rates for participants [36], data privacy protection [9], and the potential psychological impact of stimulus design. While most of these decisions will be made by the researchers implementing an experiment using Empirica, we have adopted a proactive strategy that employs default settings designed to encourage ethical experiment design. As one example, the initial scaffolding generated by Empirica includes a template for providing informed consent, considered a bare minimum for ethical research practice. The scaffolding also includes a sample exit survey which models inclusive language; e.g., the field for gender is included as a free-text option. To encourage privacy protection, Empirica by default omits external identifiers when exporting data, to prevent leaking of personal information such as email addresses or Amazon Turk account identifiers.

4.2 Limitations and future developments

As with other leading computational tools, Empirica is not a static entity, but a continually developing project. This paper reflects the first version of the Empirica platform, and lays the groundwork for an ecosystem of tools to be built over time. Due to its design, modules that are part of the current platform can be switched out and improved independently without rearchitecting the system. Indeed it is precisely because Empirica (or for that matter, any experiment platform) cannot be expected to offer optimal functionality indefinitely that this modular design was chosen.

The usability-functionality trade-off faced by existing experiment platforms is endemic to tightly integrated “end-to-end” solutions developed for a particular class of problems. By moving toward an ecosystem approach, Empirica has a chance to resolve this trade-off. As such, future development of Empirica will include the development of a set of open standards that defines what this encapsulation (service/component) is, how to communicate with it, and how to find and use it.

The use of the “ecosystem” as a design principle presents several opportunities for operational efficiency.

  • An ecosystem will allow the reuse of software assets, in turn lowering development costs, decreasing development time, reducing risk, and leveraging existing platform investments and strengths.

  • The individual components of the ecosystem will be loosely coupled to reduce vendor/provider lock-in and create a flexible infrastructure. As a result, the individual components of the ecosystem will be modular in the sense that each can be modified or replaced without needing to modify or replace any other component because the interface to the component remains the same. The resulting functional components will be available for end-users (i.e., researchers) to amalgamate (or mashup) into situational, creative, and novel experiments in ways that the original developers may not originally envision.

  • The functional scope of these components will allow for the possibility to directly define experiment requirements as a collection of these functional components, rather than translating experiment requirements into lower-level software development requirements. As a result, the ecosystem will abstract away many of the logistical concerns of running experiments, analogous to how cloud computing has abstracted away from the management of technical resources for many companies.

To enable an even wider range of experiment designs, we are currently developing a Software Development Kit (SDK) that will expose elements of the Empirica core to experiment developers. The SDK will further facilitate the implementation of custom components by creating APIs to Empirica’s back end, allowing developers to customize various behaviors (such as treatment assignment, lobby configuration, etc.) without substantially increasing the complexity development.

By distancing ourselves from a monolithic approach, and adopting a truly modular architecture with careful design of the low-level abstractions of experiments, we hope Empirica will decouple flexibility from ease-of-use and open the door to an economy of software built around conducting new kinds of virtual labs experiments.

5 Conclusion

Empirica provides a complete virtual lab for designing and running online lab experiments taking the form of anything that can be viewed in a web browser. The primary philosophy guiding the development of Empirica is the use of “flexible defaults,” which is core to our goal of providing a “do anything” platform that remains accessible to a typical computational social scientist. In its present form, Empirica enables rapid development of virtual lab experiments, and the researcher need only provide a recruitment mechanism to send participants to the page at the appropriate time. Future versions of Empirica will abstract the core functionality into an ecosystem that allows the development and integration of multiple tools including automated recruitment. This future version will also maintain as a “tool” the current Empirica API, continuing to enable the rapid development of experiments. However, it is our expectation that Empirica as a tool will form the core component of a broader ecosystem enabling researchers to go beyond the traditional in-person lab paradigm and take advantage of the expanded potential enabled by virtual lab research.

6 Acknowledgements

The authors are grateful to all the persons who have contributed to the development of Empirica over the years. A special thanks to the super contributor Hubertus Putu Widya Primanta Nugraha. Also, we thank the users of Empirica for suggestions for improvement and reporting bugs. We were also supported by a strong team of advisors including Iyad Rahwan, Matthew Salganik, Alex ‘Sandy’ Pentland, Alejandro Campero, Niccolò Pescetelli, and Joost P Bonsen.

7 Online Resources

Empirica is entirely open-source and in active development. The codebase is currently hosted on Github.888https://github.com/empiricaly Documentation and tutorial videos are available on the Empirica homepage.999https://empirica.ly We encourage readers who are interested in the software to contribute ideas or code that can make it more useful to the community.


  • [1] A. Almaatouq, A. Noriega-Campero, A. Alotaibi, P. M. Krafft, M. Moussaid, and A. Pentland (2020-05) Adaptive social networks promote the wisdom of crowds. Proceedings of the National Academy of Sciences of the United States of America. External Links: ISSN 0027-8424, Link, Document Cited by: §3.1, Empirica: a virtual lab for high-throughput macro-level experiments.
  • [2] A. Almaatouq, M. Yin, and D. J. Watts (2020) Collective problem-solving of groups across tasks of varying complexity. Cited by: §1.2, §3.3, Empirica: a virtual lab for high-throughput macro-level experiments.
  • [3] A. A. Arechar, S. Gächter, and L. Molleman (2018) Conducting interactive experiments online. Experimental Economics 21 (1), pp. 99–131. Cited by: §1.2.
  • [4] E. Awad, S. Dsouza, R. Kim, J. Schulz, J. Henrich, A. Shariff, J. Bonnefon, and I. Rahwan (2018) The moral machine experiment. Nature 563 (7729), pp. 59–64. Cited by: §1.1.
  • [5] S. Balietti, B. Klein, and C. Riedl (2018) Fast model-selection through adapting design of experiments maximizing information gain. arXiv preprint arXiv:1807.07024. Cited by: Empirica: a virtual lab for high-throughput macro-level experiments.
  • [6] S. Balietti (2017-10) NodeGame: real-time, synchronous, online experiments in the browser. Behavior Research Methods 49 (5), pp. 1696–1715. External Links: ISSN 1554-351X Cited by: §1.2.
  • [7] J. Becker, D. Brackbill, and D. Centola (2017) Network dynamics of social influence in the wisdom of crowds. Proceedings of the national academy of sciences 114 (26), pp. E5070–E5076. Cited by: §1.2, Empirica: a virtual lab for high-throughput macro-level experiments.
  • [8] J. Becker, E. Porter, and D. Centola (2019) The wisdom of partisan crowds. Proceedings of the National Academy of Sciences, pp. 201817195. External Links: Link, Document Cited by: §3.4.
  • [9] M. H. Birnbaum (2004) Human research and data collection via the internet. Annu. Rev. Psychol. 55, pp. 803–832. Cited by: §4.1.
  • [10] J. Chandler, P. Mueller, and G. Paolacci (2014) Nonnaïveté among amazon mechanical turk workers: consequences and solutions for behavioral researchers. Behavior research methods 46 (1), pp. 112–130. Cited by: §1.1.
  • [11] D. L. Chen, M. Schonger, and C. Wickens (2016-03) OTree—an open-source platform for laboratory, online, and field experiments. Journal of Behavioral and Experimental Finance 9, pp. 88–97. External Links: ISSN 2214-6350 Cited by: §1.2.
  • [12] J. R. de Leeuw (2015-03) JsPsych: a javascript library for creating behavioral experiments in a web browser. Behavior Research Methods 47 (1), pp. 1–12. External Links: ISSN 1554-351X Cited by: §1.2.
  • [13] H. Finger, C. Goeke, D. Diekamp, K. Standvoß, and P. König (2017) LabVanced: a unified javascript framework for online studies. In International Conference on Computational Social Science (Cologne), Cited by: §1.2.
  • [14] M. Giamattei, L. Molleman, K. Seyed Yahosseini, and S. Gächter (2019) LIONESS lab–a free web-based platform for conducting interactive experiments online. Available at SSRN 3329384. Cited by: §1.2.
  • [15] J. K. Goodman, C. E. Cryder, and A. Cheema (2013) Data collection in a flat world: the strengths and weaknesses of mechanical turk samples. Journal of Behavioral Decision Making 26 (3), pp. 213–224. Cited by: §1.1.
  • [16] J. K. Hartshorne, J. R. de Leeuw, N. D. Goodman, M. Jennings, and T. J. O’Donnell (2019-08) A thousand studies for the price of one: accelerating psychological science with pushkin. Behavior Research Methods 51 (4), pp. 1782–1803. External Links: ISSN 1554-351X Cited by: §1.1, §1.2.
  • [17] J. J. Horton, D. G. Rand, and R. J. Zeckhauser (2011-09) The online laboratory: conducting experiments in a real labor market. Experimental Economics 14 (3), pp. 399–425. External Links: ISSN 1386-4157 Cited by: §1.1, Empirica: a virtual lab for high-throughput macro-level experiments.
  • [18] J. Houghton (2020) Interdependent diffusion: the social contagion of interacting beliefs. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA. Cited by: §3.2.
  • [19] F. Ishowo-Oloko, J. Bonnefon, Z. Soroye, J. Crandall, I. Rahwan, and T. Rahwan (2019) Behavioural evidence for a transparency–efficiency tradeoff in human–machine cooperation. Nature Machine Intelligence 1 (11), pp. 517–521. Cited by: §1.2.
  • [20] B. Letham, B. Karrer, G. Ottoni, and E. Bakshy (2019-06) Constrained bayesian optimization with noisy experiments. Bayesian Analysis 14 (2), pp. 495–519. External Links: ISSN 1936-0975 Cited by: Empirica: a virtual lab for high-throughput macro-level experiments.
  • [21] A. Mao, Y. Chen, K. Z. Gajos, D. C. Parkes, A. D. Procaccia, and H. Zhang (2012) Turkserver: enabling synchronous and longitudinal online experiments. In

    Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence

    External Links: Link Cited by: §1.2.
  • [22] A. Mao, L. Dworkin, S. Suri, and D. J. Watts (2017) Resilient cooperators stabilize long-run cooperation in the finitely repeated prisoner’s dilemma. Nature Communications 8 (1), pp. 1–10. Cited by: Empirica: a virtual lab for high-throughput macro-level experiments.
  • [23] W. Mason and S. Suri (2012) Conducting behavioral research on amazon’s mechanical turk. Behavior Research Methods 44 (1), pp. 1–23. External Links: ISSN 1554-351X Cited by: §1.1, Empirica: a virtual lab for high-throughput macro-level experiments.
  • [24] M. E. McKnight and N. A. Christakis (2016) Breadboard: software for online social experiments. Vers. Cited by: §1.2.
  • [25] S. Palan and C. Schitter (2018) Prolific.ac—a subject pool for online experiments. Journal of Behavioral and Experimental Finance 17, pp. 22–27. Cited by: §1.1.
  • [26] G. Paolacci, J. Chandler, and P. G. Ipeirotis (2010) Running experiments on amazon mechanical turk. Judgment and Decision Making 5 (5), pp. 411–419. Cited by: Empirica: a virtual lab for high-throughput macro-level experiments, Empirica: a virtual lab for high-throughput macro-level experiments.
  • [27] J. Radford, A. Pilny, A. Reichelmann, B. Keegan, B. F. Welles, J. Hoye, K. Ognyanova, W. Meleis, and D. Lazer (2016) Volunteer science: an online laboratory for experiments in social psychology. Social Psychology Quarterly 79 (4), pp. 376–396. Cited by: Empirica: a virtual lab for high-throughput macro-level experiments.
  • [28] U. Reips and C. Neuhaus (2002-05) WEXTOR: a web-based tool for generating and visualizing experimental designs and procedures. Behavior Research Methods, Instruments, & Computers: A Journal of the Psychonomic Society, Inc 34 (2), pp. 234–240. External Links: ISSN 0743-3808 Cited by: §1.2.
  • [29] M. J. Salganik, P. S. Dodds, and D. J. Watts (2006) Experimental study of inequality and unpredictability in an artificial cultural market. Science 311 (5762), pp. 854–856. Cited by: Empirica: a virtual lab for high-throughput macro-level experiments.
  • [30] T. C. Schelling (2006) Micromotives and macrobehavior. WW Norton & Company. Cited by: Empirica: a virtual lab for high-throughput macro-level experiments.
  • [31] H. Shirado and N. A. Christakis (2017) Locally noisy autonomous agents improve global human coordination in network experiments. Nature 545 (7654), pp. 370–374. Cited by: §1.2.
  • [32] M. L. Traeger, S. S. Sebo, M. Jung, B. Scassellati, and N. A. Christakis (2020) Vulnerable robots positively shape human conversational dynamics in a human–robot team. Proceedings of the National Academy of Sciences 117 (12), pp. 6370–6375. Cited by: §1.2.
  • [33] M. A. Valentine, D. Retelny, A. To, N. Rahmati, T. Doshi, and M. S. Bernstein (2017) Flash organizations: crowdsourcing complex work by structuring crowds as organizations. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 3523–3537. Cited by: Empirica: a virtual lab for high-throughput macro-level experiments.
  • [34] L. von Ahn and L. Dabbish (2008-08) Designing games with a purpose. Communications of the ACM 51 (8), pp. 58–67. External Links: ISSN 0001-0782 Cited by: Empirica: a virtual lab for high-throughput macro-level experiments.
  • [35] M. E. Whiting, I. Gao, M. Xing, J. D. N’Godjigui, T. Nguyen, and M. S. Bernstein (2020-05) Parallel worlds: repeated initializations of the same team to improve team viability. Proc. ACM Hum.-Comput. Interact. 4 (CSCW1), pp. 22. External Links: Link, Document Cited by: Empirica: a virtual lab for high-throughput macro-level experiments.
  • [36] M. E. Whiting, G. Hugh, and M. S. Bernstein (2019) Fair work: crowd work minimum wage with one line of code. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 7 (1), pp. 197–206. Cited by: §4.1.
  • [37] M. E. Whiting, A. Blaising, C. Barreau, L. Fiuza, N. Marda, M. Valentine, and M. S. Bernstein (2019-11) Did it have to end this way? understanding the consistency of team fracture. Proc. ACM Hum.-Comput. Interact. 3 (CSCW). External Links: Link, Document Cited by: §1.2.