Credit card record and call data record
We analyze a unique anonymized dataset1 that includes credit-card transactions of (non-identifiable) individuals2 over a three-year temporal window (from June 2015 to May 2018). By recording only the first purchase of each individual2 in each nameless store and after pre-filtering the data (see Supplementary Note S1), the complete dataset includes more than million of time-stamped transactions1. In the following, whenever we will refer to an individual, this should be interpreted as a non-identifiable individual whose real identity is impossible to identify. Accordingly, whenever we will refer to a group of “detected” individuals, this should be interpreted as a group of individuals whose real identities cannot be identified. For the sake of better readability, when referring to the analyzed datasets and the individuals in the following, we shall omit the labels “anonymized” and “non-identifiable”.
The first observation is that the dataset is highly heterogeneous, encompassing categories as diverse as book stores, tech stores, and florists, among many others. To control for this heterogeneity, we restrict our analysis to three categories of stores that have a well-defined interpretation: eating places, clothing stores, and food stores. The stores that belong to each of the three categories are selected based on their Merchant Category Code (MCC) information present in our database – we refer to Supplementary Table S1 for details.
We split the dataset’s time span into three periods: a -month pre-filtering period that is used to determine which stores appear for the first time in the training period that follows; a -month training period that ranges from December 2015 to May 2017, where we aim to detect groups of key individuals (described below); a -month validation period that is used to perform and validate success predictions. The main rationale behind our choice of the relative duration of training and validation period is that the validation period should be long enough to include a substantial number of new stores to validate our predictions, yet short enough to not exceedingly restrict the training period where we detect the key individuals.
Table 1 summarizes basic data properties. We denote as and the number of stores visited by individual and the number of first-time customers of store , respectively. The individuals’ number of visited stores,
, is highly heterogeneous, with a small number of outliers with a large number of visited stores (Fig.1A and Figs. S1-S2). Similarly, the number of first-time customers per store is highly heterogeneous (Figs. 1B) and dependent on the store category (Figs. 1C and Figs. S3-S4). The number of new stores per month (i.e., stores that receive their first transaction) shows a decreasing trend over time (Fig. S5). Besides the CCR, we analyze a CDR from the same market over a one-year period that overlaps with the CCR’s time span. For a subset of the population, we know both the social behavior (in terms of mobile phone communication) and the economic behavior (in terms of monetary purchases in stores) – see Supplementary Note S2 for details. From the CDR, we extract individual-level traits related to their centrality in the social network (time-averaged degree Goldenberg et al. (2009), collective influence Morone and Makse (2015), and social diversity Eagle, Macy, and Claxton (2010)), and mobility (radius of gyration Gonzalez, Hidalgo, and Barabasi (2008) and mobility diversity Song et al. (2010)). We refer to Supplementary Note S3 for the details of the measured features.
Framing a success prediction problem
We consider the classification problem where we aim to predict whether a store introduced in the validation period will be among the top- shops by final number of first-time customers, among the stores with the same MCC that received their first transaction in the same month – if this is the case, we say that the store is successful. Our definition of success factors out two potential confounding factors in our measure of success: store age and category (see Fig. S4). To quantify the predictive power of a given group of individuals, , we measure the fraction of successful stores among those that featured an individual in among the earliest first-time customers – we refer to this fraction as ’s success rate. We divide this success rate by the baseline success rate given by the fraction of successful stores among those that received at least first-time customers by the end of the validation period, obtaining the fold increase of ’s success rate with respect to the baseline expectation. For the sake of brevity, we refer to this ratio as ’s success-rate fold increase. For each group
of individuals, we also measure alternative prediction evaluation metrics (recall, positive likelihood ratio, and Matthews’ correlation coefficient) – see Supplementary Note S4 for details.
Importantly, the above-defined problem allows us to quantify and compare the predictive power of the actions by different groups of individuals, offering the opportunity to clarify whether some groups of individuals exhibit a strong and consistent predictive power. The problem is fundamentally different from the influence maximization problem that has focused on the detection of vital nodes for the structure of and dynamics on a given network Morone and Makse (2015); Lü et al. (2016); Aral and Dhillon (2018). Based on that stream of literature, it is natural to investigate whether sets of individuals that perform well in influence maximization problems (e.g., selected by degree Goldenberg et al. (2009) or collective influence Morone and Makse (2015)) make purchases that are predictive of success. The intuition would be the following: if social hubs have a disproportionate impact on diffusion processes, their purchases in recent stores might trigger large-scale word-of-mouth processes and, as a result, be predictive of success for the visited store. However, we find that this is not always the case.
The inconsistent predictive signal from the hubs’ purchases
We find that the social hubs (selected by appropriate time-averages of degree Goldenberg et al. (2009) and collective influence Morone and Makse (2015), see Supplementary Note S3 for details) are not reliable predictors of success for the visited stores. For example, for eating places that received their first transaction within the validation period, the hubs’ success rate is by considering the earliest first-time customers, against the baseline success rate, resulting in a -fold increase in success rate. The success-rate fold increase is even larger for social hubs selected by collective influence (). These results indicate that the social hubs’ purchases are predictive of success for eating places and that, remarkably, the top-individuals by collective influence are better predictors of success than the top-individuals by degree. We observe a qualitatively similar result for food stores, but not for clothing stores: When considering the earliest customers of a clothing store, the stores that received a purchase by a social hub (selected by degree) exhibit a success rate that is only marginally larger than the baseline ( against ). The hubs’ success rate can be even smaller than the baseline for larger numbers of early customers included to make the prediction (see Fig. 2 and the related discussion below), and similar results are obtained with other prediction evaluation metrics (see Fig. S6). We conclude that in general, the early-purchases by the hubs are not reliable predictors of success for the visited store. The observed inconsistent performance of the social hubs aligns with the system-dependent conclusions on the role of social hubs for success prediction in online systems Shulman, Sharma, and Cosley (2016). Additional research is needed to uncover the reason why the social hubs are only predictive of success for specific store categories, which might reflect the different importance of word-of-mouth processes for the success of different categories of stores.
Motivated by our predictive problem and the inconsistent predictive signal from the social hubs, we adopt a statistical procedure that seeks to find individuals – referred to as discoverers Medo et al. (2016) – who are persistently able to discover stores with a high potential of becoming popular. To this end, we define a discovery as an early purchase in a store that later becomes successful. The discoverers are selected by a measure of statistical unexpectedness – called surprisal, a term coined in thermodynamics Tribus (1961) – that quantifies how unlikely the observed number of discoveries was under a null model that preserves the individuals’ level of activity (in terms of number of visited stores) and assumes that everyone has the same likelihood to collect a discovery – see for the details. In general, it is not guaranteed that there exists a sizeable set of individuals whose number of discoveries significantly deviate from the expectations from the null model, once we account for the possibility of achieving an unexpected number of discoveries by statistical fluctuations.
However, we find that while the number of discoveries per individual is positively correlated with the number of visited stores (Fig. 1D), the deviations from the trend are significant and cannot be explained by chance. To rule out the possibility that high values of surprisal are obtained through random fluctuations, we compare the empirical surprisal values of the detected discoverers with the top-surprisal values observed by resampling the individuals’ number of discoveries from the null-model distribution (see Material and ). We find that the largest empirical surprisal values are significantly larger than the largest surprisal values obtained by resampling the individuals’ number of discoveries (Fig. 1E). The surprisal values are little correlated with the individuals’ number of visited stores, indicating that activity alone is not a good proxy for the propensity of an individual to discover successful stores (Fig. 1F). The results in Fig. 1 refer to eating places; results for other store categories are qualitatively similar (Figs. S1-S2). Taken together, these findings indicate that some individuals exhibit a clear propensity to purchase in recently-opened stores that later become successful, and it is highly unlikely that this pattern can be explained solely by the individuals’ level of activity.
Tracking detected individuals to predict store success
The previous analysis reveals that there exist individuals who repeatedly purchase in recent stores that later end up being successful. Yet, the detected individuals would have true predictive value only if they would be predictive of success over a time window that follows the training period. Is the discoverers’ tendency to collect discoveries persistent enough over time to allow us to track their actions for reliable out-of-sample success predictions? This is not obvious a priori, but if it would be the case, it would suggest that discovering successful stores is a persistent behavioral trait of the discoverers. Besides, are the predictions made by tracking the discoverers’ purchases more accurate than those obtained through other groups of top individuals, selected by network centrality measures (hubs selected by degree Goldenberg et al. (2009), collective influence Morone and Makse (2015), and social diversity Eagle, Macy, and Claxton (2010)), total expenditures (high-expenditure individuals Di Clemente et al. (2018)), number of visited stores (store explorers), and mobility-related features (explorers selected by mobility diversity Song et al. (2010) and radius of gyration Gonzalez, Hidalgo, and Barabasi (2008))? We refer to the Supplementary Notes S2–S3 for all details on the detection of the groups of top-individuals included in the analysis. Driven by these questions, we study the predictive problem framed above, where the predictive power of top-individuals detected within the training period is evaluated for stores that received their first transaction within the validation period.
We find that the detected discoverers exhibit a consistent predictive signal across all three store categories (Fig. 2). In particular, for eating places and food stores, the discoverers exhibit the largest success-rate fold increase for all numbers of included early customers (Figs. 2A–B). For clothing stores, explorers selected by radius of gyration exhibit a similar success rate to the discoverers’ one (Figs. 2C). The early purchases by other classes of individuals might still be associated with larger-than-baseline success rates, yet none of the other groups of individuals is competitive across all three store categories. This is true not only for the social hubs (see the discussion above), but also for high-expenditure individuals and explorers – we refer to Fig. 2 for the full results.
Similar conclusions can be drawn from the results obtained with two more prediction evaluation metrics: the Matthews’ correlation coefficient and the positive likelihood ratio Powers (2011)
– we refer to Fig. S6 for the detailed results. The recall metric (namely, the fraction of successful stores that are classified as successful) exhibits a different trend compared to the success rate because by construction, it favors groups of individuals that purchased in many different stores. Because of this, store explorers exhibit the largest recall values across all three categories, followed by the discoverers and high-expenditure individuals (see Fig. S6). Our results are reasonably robust with respect to variations in the parameters of the analysis – see Supplementary Note S5 and Figs. S8–S14 for the detailed results. We also notice that it is possible to consistently improve the predictions’ success rates by pairing the discoverers with other groups of individuals, yet these improvements tend to be marginal (see Supplementary Note S5 and Fig. S7).
Beyond the classification problem, one can also investigate whether stores that received a discoverer among the earliest customers tend to receive a larger number of customers in the future. We find that this is the case across all three store categories. For example, among the eating places that received their first transaction in the validation period, those that had a discoverer among the earliest customers tend to gain approximately more first-time customers that the average store with the same age and category. Similar results are obtained for food and clothing stores, where the relative fold increases of the final number of customers associated with the presence of a discoverer among the earliest customers are and
, respectively (see Fig. S15 for the detailed results as a function of the number of early customers included in the analysis). Taken together, our results indicate that early purchases by the discoverers are associated with increased odds of success and an increased number of customers with respect to the baseline.
Socioeconomic characterization of discoverers, social hubs, and explorers
Having detected the discoverers and measured their predictive power, it is inevitable to investigate which traits make them different from ordinary individuals and social hubs. This is relevant for companies in order to detect prospective discoverers in scenarios where transaction data for their customers are not available, and potentially nudge into behaving as discoverers individuals who exhibit appropriate combinations of traits. We start by observing that there is little overlap between pairs of groups of discoverers of different categories of stores: The largest overlap is common discoverers among the discoverers of eating places and food stores (corresponding to the of the discoverers of eating places and the of the discoverers of food stores); the largest Jaccard similarity is observed between the discoverers of eating places and clothing stores. This indicates that it is unlikely that a discoverer in one category is also a discoverer in another category of stores, suggesting that the discoverers are highly specialized.
Interesting demographic differences emerge across the groups of individuals (Fig. 3). The discoverers’ demographic traits are consistent with the store explorers’ ones for eating places (the members of both groups tend to be males whose age is below the population median) and for clothing stores (the members of both groups tend to be female). A clear difference emerges for food stores: the store explorers tend to be males with below-median age, yet the discoverers tend to be female with above-median age. Hence, not only store explorers are not necessarily discoverers (Fig. 1F), but the two groups of individuals can also exhibit starkly different demographic traits.
As for socioeconomic traits, across all three store categories, the trait where the discoverers’ median deviates the most from the population median is the number of visited stores, followed by total expenditures, collective influence, and number of contacts (for all these differences, according to the Mood’s median test). The discoverers’ large expenditures and number of social contacts suggest that they have a higher socioeconomic status and degree of social connectedness than ordinary individuals. In this sense, they respect the traits of early adopters outlined in Rogers’ seminal work on the diffusion of innovations Rogers (2010). Yet, the discoverers are not outstanding in any of these traits: for example, the social hubs selected by degree exhibit a substantially larger number of contacts, and store explorers purchase in a substantially larger number of stores. We also notice that the discoverers’ mobility diversity is slightly above the population median, with the smallest relative difference observed for the discoverers of food stores (). Overall, these results support a scenario where the discoverers benefit from various socioeconomic traits, yet none of the investigated traits is predictive of the behavior of an individual as a discoverer. Further research is necessary to assess whether alternative socioeconomic traits, or some psychological traits, can be leveraged to accurately distinguish the discoverers from the rest of the population.
Intriguingly, we can compare the socioeconomic and demographic traits of the social hubs with results obtained by previous studies Gao, Zhang, and Zhou (2019). We find that the social hubs (selected by degree and collective influence) exhibit above-median expenditures (). This is in qualitative agreement with previous studies that reported that expenditures are a reliable proxy for monthly income Di Clemente et al. (2018), and social hubs tend to have a higher economic status Luo et al. (2017). The social hubs also exhibit an above-median number of visited stores, but not an above-median radius of gyration nor mobility diversity. In line with previous findings on online platforms Goldenberg et al. (2009), social hubs tend to be males. Compared to the discoverers, the social hubs exhibit a smaller radius of gyration and a smaller number of visited stores. Store explorers tend to be more central in the social network and to travel for a longer distance (as measured by their radius of gyration) than the discoverers.
This article focused on a data-rich scenario where we have access to individual-level purchasing history, communication activity, and mobility information. Our results support the wisdom of the few paradigm: only a small set of individuals act as reliable success predictors, and we can accurately detect them from the transaction history through a parsimonious and highly-scalable statistical procedure, without the need to integrate social and mobility information. This is highly non-trivial because of the several factors that can limit success predictability in complex socioeconomic systems Salganik, Dodds, and Watts (2006); Hofman, Sharma, and Watts (2017), and the recent contrasting conclusions on the possibility to predict success based on key individuals Watts and Dodds (2007); Goldenberg et al. (2009); Cha et al. (2010); Bakshy et al. (2011); Shulman, Sharma, and Cosley (2016). Our general approach to measure the predictive power of individuals has promise for applications in different social systems, and additional research is needed to establish the degree of universality of our results.
Most of the influencer-related literature aims to detect influential individuals from their position in a social network Lü et al. (2016); Aral and Dhillon (2018); Muller and Peres (2019). However, our results reveal that early purchases by social hubs are only predictive of success for specific store categories, and their predictive signal is weaker than that observed for discoverers. By contrast, the discoverers are not among the most central individuals in the social network that we analyzed, yet their monetary transactions are predictive of the stores’ potential, without the need to explicitly consider the social network the discoverers are embedded in. Therefore, companies and organizations that have only access to transaction data would be able to use our approach to detect key individuals for success prediction.
This paper extends the recent stream of literature on success prediction Hofman, Sharma, and Watts (2017); Clauset, Larremore, and Sinatra (2017); Yucesoy et al. (2018); Williams, Lacasa, and Latora (2019) by focusing on brick-and-mortar stores. In today’s world where online purchasing is disrupting traditional offline retailing, organizations, marketers and investors can benefit from understanding the factors that can drive a store to long-term success. This is relevant not only to traditional brick-and-mortar stores, but also to companies that aim to combine online and offline retailing activities to improve sales and customer experience Bell, Gallino, and Moreno (2017, 2018). In line with the growing number of studies on popularity prediction in online systems Hofman, Sharma, and Watts (2017) and impact prediction in science Clauset, Larremore, and Sinatra (2017), our definition of store success builds on a popularity-based metric: the number of first-time customers received by a store Bell, Gallino, and Moreno (2017). This choice allowed us to ask a well-defined predictive question. On the other hand, alternative metrics (e.g., the number of regular or satisfied customers) may capture different and equally relevant dimensions of store success. Understanding how to best quantify and predict store success is a challenge for future research, and it may require a combination of popularity-based and monetary indicators.
To conclude, three limitations of our study need to be addressed in future studies. First, our study implicitly assumed that mobile-phone communication data provide us with sufficiently good estimates of the individuals’ centrality in the actual network of social contacts. On the other hand, with the rise of online social networks and instant messaging platforms, the phone communication network only provides us with a partial representation of the actual communication flows in society. Obtaining a complete representation of the social communication patterns across an entire nation is clearly unattainable. Therefore, while the incompleteness of our social graph is a limitation of our work, it also mimics a real-world managerial scenario where an organization has only access to incomplete social information about its customers. Future studies need to generalize our results to online communities or virtual worlds where data on individuals’ social connections, communication, and behavior are (nearly) complete.
Second, as for the interpretability-accuracy tradeoff in success predictions Hofman, Sharma, and Watts (2017)
, our simple classifiers based on the early purchases by selected groups of key individuals have the advantage of having a clear interpretation in terms of individuals’ behavioral patterns. At the same time, we did not attempt to maximize the predictive performance. Machine-learning algorithms that incorporate multiple, diverse features might achieve higher precision. In this direction, an exhaustive analysis of the predictability of store success based on many different features, in line with extensive studies carried out for virality prediction in online systemsCheng et al. (2014); Shulman, Sharma, and Cosley (2016), is desirable. In future studies, our classifiers based on specific groups of key individuals can be used as baselines for future studies on early-stage success prediction in diverse kinds of socioeconomic systems.
Finally, based on our data, we cannot establish any causal connection between the discoverers’ purchases and the success of the stores. We have addressed the predictive question of whether the purchases by selected key individuals are consistent indicators of future success, but we did not address the question of whether the detected discoverers can accelerate the diffusion processes they participate in, similarly as the long-studied opinion leaders and influencers are assumed to do Lü et al. (2016); Aral and Dhillon (2018); Muller and Peres (2019). Do the discoverers influence their peers through direct communication, or do they choose stores that have a high potential to gain many customers, without actively influencing their future popularity? Are they influential or susceptible members of their social networkAral and Walker (2012)? What would happen if the discoverers are chosen as seeds of a diffusion process Muller and Peres (2019), or prevented from adopting a recent innovation they are potentially interested in Catalini and Tucker (2017)? Clarifying the role played by the discoverers in empirical diffusion processes through both observational data analysis and field experiments is an intriguing direction for future research.
Materials and Methods
We aim to quantify individuals’ tendency to purchase in recent stores that later become successful. To this end, we define a discovery as an event where an individual purchases in a store no later than days after the store received its first transaction, and the store turns out to be successful. A store is considered as successful if it is among the top- stores by final number of first-time customers (at the end of the training period), among the stores with the same Merchant Category Code that received their first transaction in the same month. Stores that received their first transaction within the last two months of the training period are excluded from the analysis; similarly, transactions in that period do not contribute to the individuals’ number of visited stores and discoveries, but only to the stores’ number of customers. In the following, we denote the individuals who made at least one transaction in a store included in the analysis as , where is the total number of active individuals.
We determine individual ’s propensity to discover successful stores as the unexpectedness of ’s observed number of discoveries, , in terms of the statistical surprisal , where
represents the probability that individualcollected or more discoveries given its total number of visited stores, . More specifically, the probability is determined analytically as follows. We consider a process where each individual draws (without replacement) marbles out of an urn which contains marbles, of which are labelled as “discoveries”; and match the empirical number of first-time transactions and discoveries, respectively: , and . This choice of the null model aims to factor out individuals’ level of activity, , from the surprisal measure. The probability that achieves
discoveries follows the hypergeometric distribution with mean:
Individual ’s surprisal is given by
The discoverers are detected as the top- individuals by . Results in the main text have been obtained with , days, ; results for different values of are reported in Figs. S7–S9.
The surprisal metric naturally rewards individuals whose observed number of discoveries far exceeds their expected number of discoveries under the null model. For example, for eating places, the top-individual by surprisal collected discoveries out of eating places where he purchased. His expected number of discoveries under the null model was , meaning that he collected roughly eight times more discoveries than expected by chance. The probability he achieved discoveries under the null model is , resulting in a large surprisal value ().
The bootstrap procedure
Even in a random sampling process, some individuals might still achieve a large value of surprisal due to statistical fluctuations. To ascertain that the largest observed values of surprisal cannot be explained by chance, we perform a bootstrap analysis Medo et al. (2016). In each realization of the bootstrap procedure, for each individual, we extract its number of discoveries from the hypergeometric distribution (1), and compute the surprisal value, , associated with the extracted number of discoveries. For each bootstrap realization, we rank the individuals according to their surprisal, obtaining the Zipf’s plot for that realization. We then average the Zipf plot obtained over different realizations, and compare the resulting Zipf plot with the Zipf plot corresponding to the empirical surprisal values. The results show that for the highest ranking positions, the empirical surprisal values are significantly larger than the bootstrapped ones (Figs. 1E, S1–S2).
Groups of top individuals
We consider eight groups of top individuals, that can be broadly classified into four categories: Discoverers, social hubs, explorers, and high-expenditure individuals. We provide a brief description of the four categories: (1) The discoverers are selected by the surprisal defined in Eq. (2); (2) The social hubs (or briefly, hubs) are selected by appropriate time-averages of three centrality metrics extracted from the CDR: degree, collective influence, and social diversity; (3) The explorers are selected by three metrics: total number of visited stores (extracted from the CCR), time-average of mobility diversity and radius of gyration (extracted from the CCR); (4) The high-expenditure individuals selected by their time-averaged total expenditures. For each store category, we selected the top- individuals for each trait as the top-individuals. This choice corresponds allows us to use the same number of top-individuals for all classifiers. All results in the manuscript have been obtained with ; results for more selective thresholds are shown in Fig. S10. We refer to the Supplementary Notes S2–S3 for all details on the detection of these groups of individuals.
for the female ratio and all the other traits, respectively. The binomial test allows us to test the null hypothesis that males and females are equally likely to occur in the detected group of individuals. The Mood’s median test allows us to test whether the group and population median were extracted from two populations with the same median.
Materials and Data Availability
Source code is available on request to the authors. For contractual and privacy reasons, the raw data is not available. Upon request, the authors can provide appropriate documentation for replication, and they might provide samples of the processed data.
The Human Subjects Committee of the Faculty of Economics, Business Administration and Information Technology at the University of Zurich has authorized this research on 29 March 2018. In particular, it has reviewed the information regarding the procedures and protocols in our research, and confirmed that they comply with all applicable regulations.
This work has been supported by the Science Strength Promotion Program of UESTC and by the URPP Social Networks at the University of Zurich. MSM and CJT acknowledge financial support from the Swiss National Science Foundation (Grant No. 200021-182659). MSM acknowledges financial support from the UESTC professor research start-up (Grant No. ZYGX2018KYQD215).
- Lazer et al. (2009) D. Lazer, A. Pentland, L. Adamic, S. Aral, A.-L. Barabási, D. Brewer, N. Christakis, N. Contractor, J. Fowler, M. Gutmann, et al., “Computational social science,” Science 323, 721–723 (2009).
- Gao, Zhang, and Zhou (2019) J. Gao, Y.-C. Zhang, and T. Zhou, “Computational socioeconomics,” Physics Reports (2019).
- Barabási (2018) A.-L. Barabási, The Formula: The Universal Laws of Success (Little Brown and Company, New York, 2018).
- Clauset, Larremore, and Sinatra (2017) A. Clauset, D. B. Larremore, and R. Sinatra, “Data-driven predictions in the science of science,” Science 355, 477–480 (2017).
- Fortunato et al. (2018) S. Fortunato, C. T. Bergstrom, K. Börner, J. A. Evans, D. Helbing, S. Milojević, A. M. Petersen, F. Radicchi, R. Sinatra, B. Uzzi, et al., “Science of science,” Science 359, eaao0185 (2018).
- Fraiberger et al. (2018) S. P. Fraiberger, R. Sinatra, M. Resch, C. Riedl, and A.-L. Barabási, “Quantifying reputation and success in art,” Science 362, 825–829 (2018).
- Williams, Lacasa, and Latora (2019) O. E. Williams, L. Lacasa, and V. Latora, “Quantifying and predicting success in show business,” Nature communications 10, 2256 (2019).
- Goldenberg et al. (2009) J. Goldenberg, S. Han, D. R. Lehmann, and J. W. Hong, “The role of hubs in the adoption process,” Journal of Marketing 73, 1–13 (2009).
- Cheng et al. (2014) J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, and J. Leskovec, “Can cascades be predicted?” in Proceedings of the 23rd international conference on World wide web (ACM, 2014) pp. 925–936.
- Martin et al. (2016) T. Martin, J. M. Hofman, A. Sharma, A. Anderson, and D. J. Watts, “Exploring limits to prediction in complex social systems,” in Proceedings of the 25th International Conference on World Wide Web (International World Wide Web Conferences Steering Committee, 2016) pp. 683–694.
- Hofman, Sharma, and Watts (2017) J. M. Hofman, A. Sharma, and D. J. Watts, “Prediction and explanation in social systems,” Science 355, 486–488 (2017).
- Wang, Song, and Barabási (2013) D. Wang, C. Song, and A.-L. Barabási, “Quantifying long-term scientific impact,” Science 342, 127–132 (2013).
- Yucesoy et al. (2018) B. Yucesoy, X. Wang, J. Huang, and A.-L. Barabási, “Success in books: a big data approach to bestsellers,” EPJ Data Science 7, 7 (2018).
- Candia et al. (2019) C. Candia, C. Jara-Figueroa, C. Rodriguez-Sickert, A.-L. Barabási, and C. A. Hidalgo, “The universal decay of collective memory and attention,” Nature Human Behaviour 3, 82 (2019).
- Sinatra et al. (2016) R. Sinatra, D. Wang, P. Deville, C. Song, and A.-L. Barabási, “Quantifying the evolution of individual scientific impact,” Science 354, aaf5239 (2016).
- Ma and Uzzi (2018) Y. Ma and B. Uzzi, “Scientific prize network predicts who pushes the boundaries of science,” Proceedings of the National Academy of Sciences 115, 12608–12615 (2018).
- Wu, Wang, and Evans (2019) L. Wu, D. Wang, and J. A. Evans, “Large teams develop and small teams disrupt science and technology,” Nature 566, 378 (2019).
- Watts and Dodds (2007) D. J. Watts and P. S. Dodds, “Influentials, networks, and public opinion formation,” Journal of Consumer Research 34, 441–458 (2007).
- Bakshy et al. (2011) E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts, “Everyone’s an influencer: quantifying influence on twitter,” in Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (ACM, 2011) pp. 65–74.
- Libai, Muller, and Peres (2013) B. Libai, E. Muller, and R. Peres, “Decomposing the value of word-of-mouth seeding programs: Acceleration versus expansion,” Journal of Marketing Research 50, 161–176 (2013).
- Shulman, Sharma, and Cosley (2016) B. Shulman, A. Sharma, and D. Cosley, “Predictability of popularity: Gaps between prediction and understanding,” in Tenth International AAAI Conference on Web and Social Media (2016).
- Aral and Dhillon (2018) S. Aral and P. S. Dhillon, “Social influence maximization under empirical influence models,” Nature Human Behaviour 2, 375 (2018).
- Kitsak et al. (2010) M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, and H. A. Makse, “Identification of influential spreaders in complex networks,” Nature Physics 6, 888 (2010).
- Banerjee et al. (2013) A. Banerjee, A. G. Chandrasekhar, E. Duflo, and M. O. Jackson, “The diffusion of microfinance,” Science 341, 1236498 (2013).
- Morone and Makse (2015) F. Morone and H. A. Makse, “Influence maximization in complex networks through optimal percolation,” Nature 524, 65–68 (2015).
- Lü et al. (2016) L. Lü, D. Chen, X.-L. Ren, Q.-M. Zhang, Y.-C. Zhang, and T. Zhou, “Vital nodes identification in complex networks,” Physics Reports 650, 1–63 (2016).
- Muller and Peres (2019) E. Muller and R. Peres, “The effect of social networks structure on innovation performance: A review and directions for research,” International Journal of Research in Marketing 36, 3–19 (2019).
- Medo et al. (2016) M. Medo, M. S. Mariani, A. Zeng, and Y.-C. Zhang, “Identification and modeling of discoverers in online social systems,” Scientific Reports 6, 34218 (2016).
- Gonzalez, Hidalgo, and Barabasi (2008) M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi, “Understanding individual human mobility patterns,” Nature 453, 779 (2008).
- Song et al. (2010) C. Song, Z. Qu, N. Blumm, and A.-L. Barabási, “Limits of predictability in human mobility,” Science 327, 1018–1021 (2010).
- Eagle, Macy, and Claxton (2010) N. Eagle, M. Macy, and R. Claxton, “Network diversity and economic development,” Science 328, 1029–1031 (2010).
- Tribus (1961) M. Tribus, Thermostatics and thermodynamics: an introduction to energy, information and states of matter, with engineering applications (van Nostrand, 1961).
- Di Clemente et al. (2018) R. Di Clemente, M. Luengo-Oroz, M. Travizano, S. Xu, B. Vaitla, and M. C. González, “Sequences of purchases in credit card data reveal lifestyles in urban populations,” Nature communications 9 (2018).
- Powers (2011) D. M. Powers, “Evaluation: from precision, recall and f-factor to roc, informedness, markedness & correlation,” Journal of Machine Learning Technologies 2, 37–63 (2011).
- Rogers (2010) E. M. Rogers, Diffusion of innovations (Simon and Schuster, 2010).
- Luo et al. (2017) S. Luo, F. Morone, C. Sarraute, M. Travizano, and H. A. Makse, “Inferring personal economic status from social network location,” Nature Communications 8, 15227 (2017).
- Salganik, Dodds, and Watts (2006) M. J. Salganik, P. S. Dodds, and D. J. Watts, “Experimental study of inequality and unpredictability in an artificial cultural market,” Science 311, 854–856 (2006).
- Cha et al. (2010) M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi, “Measuring user influence in twitter: The million follower fallacy,” in Fourth International AAAI Conference on Weblogs and Social Media (2010).
- Bell, Gallino, and Moreno (2017) D. R. Bell, S. Gallino, and A. Moreno, “Offline showrooms in omnichannel retail: Demand and operational benefits,” Management Science 64, 1629–1651 (2017).
- Bell, Gallino, and Moreno (2018) D. R. Bell, S. Gallino, and A. Moreno, “The store is dead—long live the store,” MIT Sloan Management Review 59, 59–66 (2018).
- Aral and Walker (2012) S. Aral and D. Walker, “Identifying influential and susceptible members of social networks,” Science 337, 337–341 (2012).
- Catalini and Tucker (2017) C. Catalini and C. Tucker, “When early adopters don’t adopt,” Science 357, 135–136 (2017).
- Mood (1950) A. M. Mood, “Introduction to the theory of statistics.” (1950).
- Liao et al. (2017) H. Liao, M. S. Mariani, M. Medo, Y.-C. Zhang, and M.-Y. Zhou, “Ranking in evolving complex networks,” Physics Reports 689, 1–54 (2017).
- Pappalardo et al. (2015) L. Pappalardo, F. Simini, S. Rinzivillo, D. Pedreschi, F. Giannotti, and A.-L. Barabási, “Returners and explorers dichotomy in human mobility,” Nature communications 6 (2015).
- Vaccario et al. (2017) G. Vaccario, M. Medo, N. Wider, and M. S. Mariani, “Quantifying and suppressing ranking bias in a large citation network,” Journal of Informetrics 11, 766–782 (2017).
- Sarigöl et al. (2014) E. Sarigöl, R. Pfitzner, I. Scholtes, A. Garas, and F. Schweitzer, “Predicting scientific success based on coauthorship networks,” EPJ Data Science 3, 9 (2014).
- James et al. (2013) G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical learning, Vol. 112 (Springer, 2013).
- Chicco (2017) D. Chicco, “Ten quick tips for machine learning in computational biology,” BioData Mining 10, 35 (2017).
- McGee (2002) S. McGee, “Simplifying likelihood ratios,” Journal of General Internal Medicine 17, 647–650 (2002).
- Boughorbel, Jarray, and El-Anbari (2017) S. Boughorbel, F. Jarray, and M. El-Anbari, “Optimal classifier for imbalanced data using matthews correlation coefficient metric,” PLOS ONE 12, e0177678 (2017).
- Matthews (1975) B. W. Matthews, “Comparison of the predicted and observed secondary structure of t4 phage lysozyme,” Biochimica et Biophysica Acta (BBA)- Protein Structure 405, 442–451 (1975).
Supplementary Note S1: Data filtering and networks construction
Before describing the data, we point out that all the data analyzed in the article are anonymized. The subjects of the analysis (individuals and stores) are represented by meaningless hashes in the dataset. All individuals are non-identifiable, meaning that there is no way to reconstruct the individuals’ real identities; all stores are nameless, there is no way to reconstruct the stores’ real name; all transactions are innominate. For the sake of better readability, when referring to the analyzed datasets and the individuals in the following, we shall omit the labels ”anonymized” and ”non-identifiable”. Nevertheless, whenever we will refer to an individual, it should be interpreted as a non-identifiable individual whose real identity is impossible to identify. Whenever we will refer to a group of ”detected” individuals, it should be interpreted as a group of individuals whose real identities cannot be identified.
|Eating places||Restaurants or eating places|
|Drinking Places (Alcoholic Beverages), Bars, Taverns, Cocktail lounges, Nightclubs and Discotheques|
|Fast Food Restaurants|
|Food stores||Grocery stores, Supermarkets|
|Freezer and Locker Meat Provisioners|
|Candy, Confectionery, Nut stores|
|Dairy Products stores|
|Misc. Food stores – Convenience stores and Specialty Markets|
|Clothing stores||Men’s and Boy’s Clothing and Accessories stores|
|Women’s Ready-to-Wear stores|
|Women’s Accessory and Specialty stores|
|Children’s and Infant’s Wear stores|
|Family Clothing stores|
|Sports Apparel, Riding Apparel stores|
|Furriers and Fur stores|
|Men’s and Women’s Clothing stores|
|Tailors, Seamstress, Mending, and Alterations|
|Wig and Toupee stores|
|Miscellaneous Apparel and Accessory stores|
Credit Card Record (CDR)
We analyzed a Credit Card Record (CCR) from a large bank in an emerging country collected over a three-year period from June 2015 to June 2018. We filtered out stores with less than ten customers throughout the whole data time span. We consider three store categories: eating places, food stores, and clothing stores. The three categories have been selected according to the Merchant Category Codes (MCCs) that are available in the data, according to the classification scheme reported in Table 2. We study separately three temporal bipartite networks where individuals are connected to the stores they purchased in. The time-stamp of each link is determined by the time-stamp of the first purchase by the individual.
We split the three-year CCR into three non-overlapping periods, as explained below. The time periods reported below refer to the ones that were used for the analysis in the main text; in Supplementary Note S5, we tested the robustness of our results with respect to other choices for the data partitioning.
Pre-filtering period (June 2015 – November 2015). We used this period to assess whether a store that appears in the training period is a new store or a previously-existing one. If a store found in the training period is also found in the pre-filtering period, it is a pre-existing one and does not contribute to the customers’ number of discoveries, whereas it still contributes to their number of visited stores.
Training period (December 2015 – May 2017). We analyze separately the three categories of stores reported in Table S1. A potential issue is that the discoverer detection procedure requires us to estimate the success of the stores, which might be unreliable for stores that received their first transaction near the end of the training period. For this reason, we filtered out from the analysis the stores introduced less than two months before the end of the training period. The total number of relevant stores is denoted as . The time of each link is determined by the first visit of individual to store . We denote as and the number of stores visited by individual within the training period (excluding the last two months of this period) and the number of first-time customers of store , respectively. We denote as the number of discoveries collected by individual within the training period (excluding the last two months of this period). The reason for excluding the last two months when measuring and is that the estimation of success might be unreliable for the shops that appeared for the first time near the end of the training period.
Validation period (June 2017 – May 2018). The transactions from June 2017 to May 2018 are used as validation period to assess the out-of-sample predictive signal for different groups of detected individuals. We focus on the stores that were opened within this period, and assess the relation between the presence/absence of different groups of individuals among the earliest customers and the future success of the store (see Supplementary Note S3 for the details). Again, a potential issue is that the prediction evaluation procedure requires us to estimate the future success of the stores, which might be unreliable for stores opened near to the end of the validation period. For this reason, we filtered out from the analysis the stores introduced less than two months before the end of the validation period. We denote by the resulting number of relevant stores.
Call Data Record (CDR) and its relation with the CCR
We analyzed a Call Data Record (CDR) from a large mobile phone operator from the same country where the CCR was recorded. The CDR used in this study covers a one-year period from January 2016 to December 2016. Importantly, this period overlaps with the CCR’s time span, and it is possible to partially match the individuals in the CDR with the individuals in the CCR (see below). For each telco customer, the CDR contains all the calls she made to or received from both other telco customers and non-customers. This implies that for the telco customers, we can observe their complete mobile-phone communication activity, whereas for the non-customers, we can only see their communications with the telco customers. Besides, each individual may be telco customer only for specific months of the year, but not throughout the whole year.
In our work, we use the CDR to construct snapshots of the social network. For each month, we only include the telco customers in the network. When computing the time-averaged centrality metrics of each individual in the social network, we only include the months when the target individual was a telco customer. The rationale is that the individuals’ centrality is largely underestimated in the months when they are not telco customers, because their calls from/to non-telco customers are not included. We refer to Supplementary Note S2 for a description of how the time-averaged centrality metrics were computed.
Supplementary Note S2: Individual-level traits extracted from the CCR
We describe here how we extracted the individual-level traits of interest from the available CCR.
The individuals’ surprisal is defined by Eq. (2) in the main text. We refer to the main text for the details of its computation. The discoverers of a given category of stores are the top- individuals by the surprisal obtained by analyzing the respective category of stores.
Number of visited stores (Store explorers).
For each of the three categories of stores considered in our work, we count the number of visited stores per individual, , within the training period. The store explorers of a given category of stores are the top- individuals by number of visited stores within the training period.
Time-averaged total expenditure (High-expenditure individuals).
For each bank customer who made at least one purchase within the training period, we extract his/her total expenditures from each month between January and December 2016, and we average over this -month period. The high-expenditure individuals are the top- individuals by time-averaged total expenditure.
Supplementary Note S3: Individual-level traits extracted from the CDR
We describe here how we extracted the individual-level traits of interest from the available CDR. The extracted traits are used both to detect the top-individuals used to make predictions (as detailed below), and to characterize the groups of top-individuals in Fig. 4 of the main text.
Centrality and social hubs
We introduce here appropriate time averages of three different centrality metrics: degree, collective influence, social diversity.
Time-averaged number of contacts (Social hubs by degree).
The number of contacts (or degree) of the individuals in the social network is probably the simplest metric to quantify individuals’ centrality Liao et al. (2017). Individuals with a large number of contacts – social hubs – have been first used for success prediction by Goldenberg et al. Goldenberg et al. (2009). To detect the social hubs Goldenberg et al. (2009), we measure the number of contacts per individual within each month, . We denote by the set of telco customers in month . An individual may be telco customers only for some months within the CDR timespan. Motivated by the lines of reasoning in Supplementary Note S1.B, we define the time-averaged number of contacts for individual as
From the definition, it follows that only for those individuals who are telco customers for at least one month. Only the months when an individual is a telco customer are included in the average. The social hubs by degree are the top- individuals by , among the individuals who are found in the CCR and made at least one purchase in stores of the analyzed category within the training period.
We note that if an individual is found in the CCR but she is not among the telco customers, she obtains and cannot be detected as a social hub. This is a consequence of the incompleteness of our data: given a set of individuals who make transactions in the CDR, we do not know the communication activity for all of them and, as a result, we can only detect the top-individuals by centrality among those that are also telco customers. Similar remarks apply for all other individuals’ traits extracted from the CDR.
Time-averaged Collective Influence (Social hubs by Collective Influence, CI).
While the degree is the simplest centrality metric in networks Liao et al. (2017), it neglects higher-order network effects that are potentially informative about the position of the nodes. As a more sophisticated metric of network centrality, we rely on the collective influence metric introduced by Morone and Makse Morone and Makse (2015). The metric detects the minimal set of nodes that, once removed from the network, disrupt the network’s giant component. The detection of these nodes is typically referred to as the structural influence maximization problem Lü et al. (2016). Morone and Makse Morone and Makse (2015) solved analytically the problem through the theory of optimal percolation on graphs, and showed that the collective influence metric provides a reliable approximation to the solution of the problem and, at the same time, can be computed rapidly on large datasets. By considering the network of telco customers in month , the collective influence of a telco customer in month is given by Morone and Makse (2015)
where denotes the ball of radius centered in , and denotes its frontier Morone and Makse (2015). Here, we set . As we did for the number of contacts, we define the time-averaged collective influence of individual as
The social hubs by collective influence are the top- individuals by , among the individuals who are also found in the CCR and made at least one purchase in stores of the analyzed category within the training period.
Time-averaged social diversity (Social hubs by social diversity)
We consider an alternative metric of social importance that has brought insights into regional socioeconomic development Eagle, Macy, and Claxton (2010). The social diversity metric quantifies whether a given individual tends to communicate repeatedly with a restricted set of contacts, or whether she contacts a diverse set of people. It is defined as the entropy Eagle, Macy, and Claxton (2010)
where represents ’s total number of interactions within month , the total number of interactions between and within month . If a person has only one contact over one month , then and, as a consequence, . As we did for the previous centralities, we define the time-averaged social diversity of individual as
The social hubs by social diversity are the top- individuals by , among the individuals who are also found in the CCR and made at least one purchase in stores of the analyzed category within the training period.
Mobility-related traits and explorers
Time-averaged mobility diversity
The mobility diversity metric has been introduced to characterize the predictability of human mobility Song et al. (2010). The idea is to understand whether a given individual uses a restricted number of antennas or a diverse set of antennas. The latter means that the individual has been in a diverse set of locations, which is a manifestation of high mobility. The mobility diversity of individual is given by the following entropy Song et al. (2010):
where denotes the number of calls made/received by through antenna within month , and denotes the total number of calls made/received by individual within month . If uses only one antenna within month , which results in . On the other hand, individuals who made/received calls from many different antenna are characterized by large values of . As we did for the previous CDR-extracted traits, we define the time-averaged mobility diversity of individual as
The explorers by mobility diversity are the top- individuals by , among the individuals who are also found in the CCR and made at least one purchase in stores of the analyzed category within the training period.
Time-averaged radius of gyration
The radius of gyration can be interpreted as the characteristic distance traveled by a given individual Gonzalez, Hidalgo, and Barabasi (2008). This metric has been used to distinguish explorative individuals from ”returner” individuals who tend to only visit a small number of locations Pappalardo et al. (2015). Individual ’s total radius of gyration is defined as Gonzalez, Hidalgo, and Barabasi (2008)
where denotes the number of times individual uses antenna within month , denotes the set of antennas that individual visited, ,
is a two-dimensional vector describing the geographic coordinates of location, represents individual ’s center of mass. As we did for the previous CDR-extracted traits, we define the time-averaged radius of gyration of individual as
The explorers by radius of gyration are the top- individuals by , among the individuals who are also found in the CCR and made at least one purchase in stores of the analyzed category within the training period.
Supplementary Note S4: Success prediction
We provide here all details on the formulation of the predictive problem, the Naive Bayes Classifiers used in the paper, and the prediction evaluation metrics.
Formulation of the predictive problem
In line with the literature on the prediction of popularity in online systems Martin et al. (2016); Shulman, Sharma, and Cosley (2016); Hofman, Sharma, and Watts (2017), we aim to address the following question: Can we use the behavioral and socioeconomic traits of stores’ early customers to predict the eventual popularity of the store? In other words, we peek into the stores’ early activity data, and we aim to use this information to predict the stores’ eventual success. Formulating the related predictive problem requires two design choices: (1) How much peeking into early activity in a store is allowed, and (2) which metric of store success we aim to predict Shulman, Sharma, and Cosley (2016); Hofman, Sharma, and Watts (2017).
As for (1), ideally, we would like to look at a period of early activity that is short enough that the eventual success of the store is not evident. Besides, we would like to exclude from the predictive problem stores that only received few customers and, for this reason, are unlikely to become successful in the future. Motivated by these considerations, we peek into the stores’ earliest first-time customer, where we vary from to ; when we consider the earliest first-time customer, only stores that received at least first-time customer are included in the analysis.
As for (2), a broad range of predictive problems emerges depending on the dimension of store success that we aim to predict. In this work, we focus on the stores’ popularity, , defined by their total number of first-time customers (i.e., the total number of people who purchase in the store at least once). However, we cannot use directly to define the group of successful stores, as is strongly influenced by store age and MCC (see Supplementary Note S2). One faces an analogous issue when measuring the impact of scientific papers through citation-based indicators (which often results in metrics of impact that are biased by paper age and scientific field Vaccario et al. (2017); Liao et al. (2017)), or when attempting to compare the performance of different innovation diffusion processes that started at different points in time Muller and Peres (2019). To factor out these confounding effects, we define the group of popular stores as the group of stores that are ranked among the top- stores by cumulative number of first-time customers, among stores with the same MCC that received their first transaction in the same month. The cumulative number of first-time customers is measured throughout the complete validation period. Assessing the popularity of each store only in relation to stores of the same MCC and age directly removes the bias of without the need to know the distribution of .
Putting together (1) and (2), we formulate a binary classification problem where we observe the earliest first-time customers of a store within the validation period (excluding the last two months), and we aim to predict whether the store will end up in the group of the popular stores. Importantly, to validate the predictions, we only consider stores that received their first purchase during the validation period (excluding the last two months), whereas the eight different classes of top-individuals defined in the main text and Supplementary Note S2 are detected within the training period.
Naive Bayes Classifiers (NBCs)
To quantify the predictive performance of different groups of individuals, we consider Naive Bayes Classifiers (NBCs) Sarigöl et al. (2014). For each group of relevant individuals (e.g., can represent the set of discoverers), we consider the simplest possible classification rule: a store is classified as successful if at least one individual was among the earliest customers, as non-successful otherwise. Such a simple classifier allows us to compare the predictive power of the eight groups of relevant individuals considered here. From a machine-learning standpoint, this classifier requires no model training: it only takes as input the list of individuals detected within the training period. It is useful to introduce a random classifier that provides us, for each prediction evaluation metric, with a baseline performance metric. Such a random classifier is simply a random guess: each store is classified as successful with probability , as non-successful with probability .
Beyond this simple classifier, we also consider ”two-dimensional” NBCs that result from the joint presence of individuals from two groups of relevant individuals. For each pair of relevant individuals (e.g., and may represent the set of discoverers and social hubs, respectively), we implement the following classification rule: a store is classified as successful if and only if at least one individual from and one individual from were among the earliest first-time customer, as non-successful otherwise. Compared to the -dimensional NBCs, this rule is stricter, which results in a smaller number of stores classified as successful. As shown in Fig. 3 of the main text, this can result in precision improvements that range from marginal to large, depending on the pairs of considered individuals and store category.
Prediction evaluation metrics
Therefore, the ratio between the precision for the NBC of group and the expected precision of the random classifier (or equivalently, the ratio between ’s success rate and the baseline success rate) can be interpreted as the fold increase in success rate444The terminology ”fold change” is typically used in biology. In our study, it is particularly convenient because it describes effectively the fact that the stores with individuals from a given group of individuals among their early customers may exhibit increased odds of success. (see main text). Groups of individuals with success-rate fold increase substantially larger than one can be interpreted as predictors of success.
Recall. The recall, , is commonly used in information retrieval and it is typically considered as a complementary metric to the precision. It is defined as the fraction of successful items that are labeled as positive Powers (2011)
In our study, the recall metric naturally favors groups of individuals who purchased in many different stores, because they are more likely to label a store as positive and, therefore, to label the successful ones as positive. However, the metric is still informative to have a full understanding of the predictions by the different groups of individuals: While in the main text we assessed whether stores that received an early purchase by a discoverer are more likely to be successful, the recall metric informs us about how many successful stores we can hope to detect by tracking the discoverers.
Positive likelihood ratio. The positive likelihood ratio, , is commonly used in medicine and diagnostic testing McGee (2002). For our problem, it is defined as the probability that a positive-classified store does end up in divided by the probability that a negative-classified store does end up in . It can be expressed in terms of recall and specificity Powers (2011):
where denotes the recall (or true positive rate) and denotes the specificity (or true negative rate).
Matthews’ correlation coefficient (). As the number of popular stores, , is substantially smaller than the total number of stores, traditional prediction-evaluation metrics like accuracy and the F1-score become ineffective to evaluate classifiers’ predictive performance. It has been shown numerically Boughorbel, Jarray, and El-Anbari (2017); luque2019impact that the Matthews’ correlation coefficient is substantially less sensitive to data imbalance than the accuracy and the F1-score and, therefore, it is preferable Chicco (2017). The Matthews’ correlation coefficient, , is defined by the equation Matthews (1975)
Differently from the accuracy and F1 score, is robust with respect to variations of relative class size Boughorbel, Jarray, and El-Anbari (2017), which makes it suitable for our classification task that features imbalanced classes Chicco (2017).
The full results for the parameters adopted in the main text are reported in Fig. S6, and the results for different parameter settings are reported in Figs. S7–S13 and discussed in Supplementary Note S6. Here, we briefly discuss the results in Fig. S5. The results for the success-rate fold increase were already shown in Fig. 2 in the main text and discussed there. The positive likelihood ratio is in almost perfect agreement with the success-rate fold increase. The Matthews’ correlation coefficient is also in good agreement with the success-rate fold increase, although some evident deviations can be observed: for example, store explorers and high-expenditure individuals emerge clearly as the second best-performing group of individuals for eating places and food stores, respectively. The results for recall significantly differ, which reflects the fact that metric rewards highly-active individuals who purchase in many different stores. The store explorers emerge as the best-performing group, followed by the discoverers and the high-expenditure individuals.
In the main text, we focused on the simplest possible predictive classifier based on the early purchases by different groups of individuals. We report here that precision improvements are possible through combinations of these features. The two-dimensional classifiers introduced above tend indeed to outperform the one-dimensional classifier based on the discoverers (Fig. S7). For example, combining the discoverers with one of the three groups of explorers or high-expenditure individuals leads to a success-rate improvement for all three store categories. On the other hand, combining the discoverers with one of the three groups of social hubs leads to a success-rate improvement only for eating places and food stores, but not for clothing stores. This is unsurprising given the poor performance of the three groups of social hubs when considered independently (Fig. 2 in main text), and it suggests that combining pairs of groups of individuals can marginally improve the success rate, but only if both groups have an above-baseline success rate when considered alone.
Supplementary Note S5: Robustness of the predictions
Our predictive analysis has five main parameters. Two parameters are specific to the discoverers detection method (), and three parameters are related to the predictive problem design (). We provide below a description of the parameter variations that we performed, and an overview of the results. We begin by the two parameters of the discoverers detection method:
The duration of the time window over which discoveries can be collected (time window for discoveries), . By definition, a discovery is indeed only achieved when an individual purchases in a store no more than days after the store received its first transaction. In the main text, we set . We subsequently tested (see Fig. S7 for the results). Note that with , a successful store can be discovered throughout the whole training period, regardless of the time at which it receives its first transaction. The results (Fig. S8) indicate that for , the predictive power is little sensitive to . On the other hand, a too narrow time window for discoveries () leads to suboptimal performance.
The percentage of stores that are considered as successful in the discoverer detection procedure, . In the main text, we set . We subsequently tested (see Fig. S8 for the results). Interestingly, the discoverers detected with cannot be interpreted as discoverers of successful stores because with this value of , all stores are candidates for discoveries. Therefore, the detected individuals are better interpreted as persistent early adopters. The results are mixed: For eating and food stores, a more restrictive definition of success in the training period () can lead to a better out-of-sample performance in terms of precision, Matthews’ correlation, and positive likelihood ratio. On the other hand, for clothing stores, a looser filter for successful stores () and even no filter at all () can achieve a better performance than that by the discoverers used in the main text (). This is in qualitative agreement with the finding that explorative individuals perform well for clothing stores (see Fig. 2C).
We turn our attention to the three parameters related to the predictive problem design:
The percentage of stores that are considered as successful in the prediction evaluation, . In the main text, we set . We subsequently tested (see Figs. S9–S10 for the results). We make a conservative choice and always use the same parameters for the discoverer detection as in the main text (); in principle, better performance might be achieved by tuning the value of to match . There is still a positive predictive signal for the discoverers across the three categories. We note that the discoverers’ signal weakens the most for clothing stores for a more selective success threshold (, Fig. S10); in this setting, the explorers by radius of gyration emerge as better predictors of success.
The percentage of individuals selected as top-individuals by each metric, . In the main text, we set . We subsequently tested a more selective threshold, (see Fig. S11 for the results). The results are mostly consistent with those obtained in the main text.
The relative duration of the training and validation period. By denoting with and the duration of the training and validation period, respectively, we started by performing all the analysis with (from Dec. 2015 to May 2017) and (from June 2017 to May 2018), as described in Supplementary Note S1. Subsequently, we repeated the analysis for a shorter training period (, see Fig. S13) and a longer one (, see Fig. S14). The results indicate that a longer validation period (and, correspondingly, a shorter training period) can lead to a stronger predictive power for the discoverers. In particular, in Fig. S13 and for less than early customers included, the discoverers emerge as the top-performing individuals also for clothing stores. The discoverers’ predictive power becomes weaker for a shorter validation period (Fig. S14), yet the discoverers remain the top-performing individuals for eating places and food stores.
To summarize, in our analysis, we started by performing the complete analysis with a predefined set of parameters (, results reported in Figs. 2-3 of the main text and Fig. S6), and subsequently assessed the robustness of our predictions against variations of the parameters (Figs. S8–S14). The results are robust with respect to reasonable variations of the parameters, yet some observed variations are highly informative of the role of the various parameters. In particular, compared to the predictive results reported in the main text, factors that can improve the discoverers’ predictive power are: calibrating the definition of success used in the discoverer detection (i.e., ); a longer validation period (and therefore, a larger number of stores used to validate the predictions); a more restrictive threshold () to select the top-individuals that are tracked to make the predictions.