When social influence promotes the wisdom of crowds

06/22/2020 ∙ by Abdullah Almaatouq, et al. ∙ MIT 0

Whether, and under what conditions, groups exhibit “crowd wisdom” has been a major focus of research across the social and computational sciences. Much of this work has focused on the role of social influence in promoting the wisdom of the crowd versus leading the crowd astray, resulting in conflicting conclusions about how the social network structure determines the impact of social influence. Here, we demonstrate that it is not enough to consider the network structure in isolation. Using theoretical analysis, numerical simulation, and reanalysis of four experimental datasets (totaling 4,002 human subjects), we find that the wisdom of crowds critically depends on the interaction between (i) the centralization of the social influence network and (ii) the distribution of the initial, individual estimates, i.e., the estimation context. Specifically, we propose a feature of the estimation context that measures the suitability of the crowd to benefit from influence centralization and show its significant predictive powers empirically. By adopting a framework that integrates both the structure of the social influence and the estimation context, we bring previously conflicting results under one theoretical framework and clarify the effects of social influence on the wisdom of crowds.



There are no comments yet.


page 7

page 11

page 12

page 15

page 16

page 19

page 20

page 31

Code Repositories


Code and data for the task-dependence framework

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Materials and Methods

Collective estimation and influence centralization, . We consider a group of agents indexed by , and assume that each agent is endowed with an independent and identically distributed initial estimate . The distribution of the initial estimates, , is parametrized by the unknown truth, , the systematic bias, , and the dispersion, . In many common models of social influence, the collective estimate, , can be expressed as a convex combination of the initial estimates: , where are positive real weights summing to one. These weights represent the influence of individual agents on shaping the collective estimate. Without loss of generality, we assume that the agents are ordered in the decreasing order of their influence, i.e., . In order to investigate the role of network centralization, , we consider a class of influence structures indexed by such that (see SI section S1.1 for more details),


Using , we interpolate between a dictatorial setup with a single influential voice (i.e., and ) and a fully decentralized setup where everyone has an equal voice (i.e., and ).

Proposed feature for the estimation context, . We measure the probability that the collective estimate produced by a centralized influence structure, , , outperforms the decentralized baseline, . We denote this probability by . To compute in Figure 3, we have fixed , , and . Therefore, is entirely determined by the distribution of the initial estimates ( and ). Similarly, in Figure 4, we have fixed (sampled randomly and repeatedly from the study’s dataset), , and , so that is entirely determined by the empirical distribution of the initial estimates. Figure S3 and Tables S2-S3 replicate our simulation and empirical results for a range of and values. For distributions , supported over positive reals, with cumulative function , we propose the following lower bound (proved in SI section S2.1):


In SI section S2.1, we show how to limit the rate of tail decay for different classes of distributions, to produce a non-trivial (non-zero) lower bound as . For heavy-tailed distributions, such as Pareto, log-Laplace, and log-normal (see SI subsections S2.1.1 to S2.1.3), we identify phase transition behaviors, whereby the proposed lower bound’s limiting value transitions from to or , as crosses a critical value.

Statistical tests. All statistics were two-tailed and based on mixed-effects models that included random effects to account for the nested structure of the data. In particular, the regression equation for Figure 4.C is:


where is the standardized (z-score) absolute error of the revised collective estimate for the -th group in the -th estimation context, ; is the fixed intercept for the regression model; is the fixed coefficient for the estimation context feature, ; is an indicator variable of whether or not social interaction has occurred; is the fixed coefficient for the social influence centralization; is the fixed coefficient for the interaction term between the estimation context feature, , and influence centralization (shown in Figure 4.C); is the random coefficient for the -th group; and is a Gaussian error term. The absolute error of the revised collective estimate has been standardized, i.e., z-scored, in order to compare errors across different tasks (the correct answer for different tasks can differ by orders of magnitude). The analysis was conducted on 815 observations; 678 groups with social influence (centralized), and 137 groups without social influence (decentralized).

The logistic regression equation for Figure 4.D is:


where is a binary indicator for whether or not the -th group in the -th estimation context improved the accuracy of its collective estimate after social interaction; is the fixed intercept for the regression model; is the fixed coefficient for the estimation context feature, ; is the random coefficient for the -th group; and is a Gaussian error term. The analysis was conducted on 678 observations (groups with social influence).

Further details of the regression analysis are provided in SI section S3.1, Table S1. Robustness checks for the regression results are presented in Tables S2-S3.

Data and code availability. Replication data and code are available at https://github.com/amaatouq/task-dependence.