From Reddit to Wall Street: The role of committed minorities in financial collective action

07/15/2021 ∙ by Lorenzo Lucchini, et al. ∙ The Alan Turing Institute 0

In January 2021, retail investors coordinated on Reddit to target short selling activity by hedge funds on GameStop shares, causing a surge in the share price and triggering significant losses for the funds involved. Such an effective collective action was unprecedented in finance, and its dynamics remain unclear. Here, we analyse Reddit and financial data and rationalise the events based on recent findings describing how a small fraction of committed individuals may trigger behavioural cascades. First, we operationalise the concept of individual commitment in financial discussions. Second, we show that the increase of commitment within Reddit predated the initial surge in price. Third, we reveal that initial committed users occupied a central position in the network of Reddit conversations. Finally, we show that the social identity of the broader Reddit community grew as the collective action unfolded. These findings shed light on financial collective action, as several observers anticipate it will grow in importance.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

In January 2021, the GameStop shares traded on the New York Stock Exchange experienced a classic “short squeeze” Asquith et al. (2005); Barberis and Thaler (2005). As the price sharply jumped higher, traders who had bet that its price would fall (i.e., who “shorted” it) were forced to buy it in order to prevent even greater losses, thus further promoting the price rally Barberis and Thaler (2005); Gam (2021); cnb (2021)

. Victims of the squeeze were professional hedge funds, and particularly Melvin Capital Management who lost 53% of its investments for a total estimated

billion USD Chung (2021). The short squeeze was initially and primarily triggered by users of the subreddit r/wallstreetbets (WSB), a popular Internet forum on the social news website Reddit, who managed to translate online discussions into an highly coordinated financial operation.

These events garnered huge attention from the media, professionals, and financial authorities. Notably, the US Treasury Secretary Janet Yellen convened a meeting of financial regulators including the heads of the Securities and Exchange Commission, Federal Reserve, Federal Reserve Bank of New York, and the Commodity Futures Trading Commission to examine the GameStop squeeze Lawder and Hunnicutt (2021). Cindicator Capital, a fund specialized in digital assets, published a hiring call for a sentiment trader with three years of active trading experience and having been a member of WallStreetBets for more than a year with karma—a Reddit measure of “how much good the user has done” for the community—of more than  Kochkodin (2021). Finally, the House Committee on Financial Services of the U.S. Congress held a hearing titled ‘Game Stopped? Who Wins and Loses When Short Sellers, Social Media, and Retail Investors Collide’ to discuss the events DFV (2021). They called as witness Reddit user Keith Gill, known as u/DeepFuckingValue on WSB, who had a central role in triggering the collective action. At the hearing, committee members expressed concern with respect to gamification of investment 111, encouraged by trading platform such as Robinhood, largely adopted by retail investors due to low commissions. However, how the coordination on WSB took place in the first place remains unclear, despite the importance of clarifying this mechanism in order to assess risks and devise regulations.

In this paper, we analyse discussions on WSB from Nov 27, 2020, to Feb 3, 2021 and investigate how they translated into collective action before and during the squeeze. Motivated by recent theoretical Xie et al. (2011); Niu et al. (2017) and experimental Centola et al. (2018) evidence that minorities of committed individuals may mobilise large fractions of a population Granovetter (1978); Schelling (2006); Xie et al. (2011) even when they are extremely small Iacopini et al. (2021), we investigate whether committed users on WSB had a role in triggering the collective action. To this aim, we operationalise the commitment of a user as an exhibited proof that the user has financial stakes in the asset. We consider specific labels on posts (called flairs

) and apply computer vision to classify

screenshot posted as proofs, to identify a total of events of commitment by users. We show that a sustained commitment activity systematically predates the increase of GameStop share returns, while simple measures of public attention towards the phenomenon cannot predict the share increase. Additionally, we also show that the success of the squeeze operation determines a growth of the social identity of WSB participants, despite the continuous flow of new users into the group. Finally, we find that users who committed early occupy a central position in the discussion network, as reconstructed by WSB posts and comments, during the weeks preceding the stock price surge, while more peripheral users show commitment only in the last phases of the saga.

Event Date Description
a GME earnings 2020-12-08 GME earning reports revealed a increase in e-commerce revenues
b New board 2021-01-11 GME announced a renewed Board of Directors, which included experts in e-commerce.
c Citron prediction 2021-01-19 Citron Research, a popular stock commentary website, published a piece predicting the value of the GME stock would decrease. Citron Research is managed by Andrew Edward Left, a financial analyst and renowned short seller.
d Elon Musk’s tweet 2021-01-26 Business magnate Elon Musk tweeted “Gamestonk!!” along with a link to WSB.
Table 1: Key events relevant to the GME short squeeze.

The GameStop saga

GameStop (GME) is a U.S. video game retailer which was at the center of the short squeeze in January 2021. The timeline of the events around to the squeeze is summarized in Table 1, and it unfolded as follows. In 2019, Reddit user u/DeepFuckingValue entered a long position on GME and started sharing regular updates in WSB. On October 27, 2020, Reddit user u/Stonksflyingup shared a video explaining how a short position held by Melvin Capital, a hedge fund, could be used to trigger a short squeeze. On January 11, 2021, GME announced a renewed Board of Directors, which included experts in e-commerce. This move was widely regarded as positive for the company, and sparked some initial chatter on WSB. On January 19, Citron Research (an investment website focused on shorting stocks) released a prediction that GME’s price would decrease rapidly. On January 22, users of WSB initiated the short squeeze. By January 26, the stock price increased more than 600%, and its trading was halted several times due to its high volatility. On that same date, business magnate Elon Musk tweeted “Gamestonk!!” along with a link to WSB. On January 28, GME reached its all-time intra-day highest price, and more than 1 million of its shares were deemed failed-to-deliver, which sealed the success of the squeeze. A failure to deliver is the inability of a party to deliver a tradable asset, or meet a contractual obligation; a typical example is the failure to deliver shares as part of a short transaction. On January 28, the financial service company Robinhood, whose trading application was popular among WSB users, halted all the purchases of GME stocks. On February 1 and 2, the stock price declined substantially.

By the end of January 2021, Melvin Capital, which had heavily shorted GameStop, declared to have covered its short position (i.e., closed it by buying the underlying stock). As a result, it lost 30% of its value since the start of 2021, and suffered a loss of 53% of its investments, i.e., more than 4 billion USD.

The r/wallstreeetbets ecosystem

Reddit is a public discussion website structured in an ever-growing set of independent subreddits dedicated to a broad range of topics. Users can submit new posts to any subreddit, and other users can add comments to existing posts or comments, thus creating nested conversation threads. One such subreddit is r/wallstreeetbets (WSB), a forum for investors and traders on Reddit, which self-describes with the tagline “Like 4chan with a Bloomberg terminal”. It is dedicated to high-risk trades involving derivative financial products (e.g., options and futures, often leveraged), and is thus not targeted to the beginner investor, but to somewhat experienced retail traders. Created in 2012, as of June 2021, it counts more than 10M subscribers (self proclaimed ‘degenerates’, but also known as ‘autists’, ‘retards’, and ‘apes’, depending on the type of information shared on the subreddit). As clear from its description so far, the WSB community is known for its profane and juvenile humor, and has a well defined identity reinforced also by the common use of jargon (e.g., ‘stonks’ for stocks, ‘tendies’ for profits, and ‘diamond hands’ or ‘paper hands’ for people that hold stocks through turbulent times or sell them at the first loss, respectively). The popularity of this forum has increased in recent years (since 2017 especially), possibly also due to the widespread adoption of no-commission brokers and mobile online trading platforms such as

The topics of discussion on the forum are varied, but there are some common patterns of behaviors which are also described in the FAQ wsb (2021). When submitting a post, a user can apply a category tag called ‘flair’, which serves as an indication of its content. The allowed flairs, together with a short description, are reported in Table 2. The community takes flairs seriously and strictly enforces them (e.g., the FAQ report that misusing important flairs can lead to getting permanently banned). It is thus very common to find posts containing screenshots of an open position on a risky bet tagged with a YOLO flair, all interspersed with unhinged humorous posts and memes.

The discussion within the subreddit follows a simple post-comment dynamic, where each post separately grows its multi-level comment tree. Each interaction, it being a post or a comment, can additionally receive ‘upvotes’ and ‘downvotes’. While ‘upvoting’/ or ‘downvoting’ represents a typical ‘slacktivist’ practice for anonymously expressing one’s position, other users can also choose to ‘award’ prizes to more emphatically recognize a post or comment.

Flair Meaning
YOLO (You Only Live Once) YOLO flair is for dank trades only. The minimum value at risk must be at least $10,000 in options, or $25,000 in equity.
DD (Due Diligence) The research you have done on a specific company/sector/trade idea. This is a high effort text post. It should include sources and citations. It should be a long post and not just a link to a submission.
Discussion An idea or article that you would like to talk about. Needs to be more involved than ”up or down today?”
Gain Use this flair to show off a solid winning trade. Minimum gain is $2,500 for options, $10,000 for shares. You must show or explain your trade. If you have to say something like ”position in comments” then it’s a bad screenshot.
Loss Show off a brutal, crushing loss. Minimum loss $2,500 for options, $10,000 for shares. You must show or explain your trade. If you have to say something like ”position in comments” then it’s a bad screenshot.
Table 2: Flairs allowed on r/wallstreeetbets and their meaning as per the subreddit guidelines.

Collective attention, Commitment, and Identity on WSB

Figure 1: GME stock returns compared with: (A) number of posts submitted on WSB; (B) number of posts on WSB

that showed financial commitment; (C) level of group identity (shaded areas corresponds to two standard error of the daily average).

Figure 1 compares the daily returns of GME (the percent increase of price compared to the previous day) with three quantities calculated over time on a daily basis: i) activity within the community measured as the number of posts submitted on WSB; ii) number of posts on WSB that showed financial commitment towards GME stocks; and iii) level of group identity signaled by language markers in WSB submissions about GME.

The posting activity in the WSB community is characterised by a weekly periodicity that endures stably until the announcement of a renewed Board of Directors (event b). After the first considerable increase in the stock price on January 14 (%), the activity grows noticeably. Posting activity raises exponentially after the second price spike on January 26 (%) and it culminates on January 28, two days after the stock evaluation reached its maximum. Public attention to the GME phenomenon spreads far beyond the boundaries of Reddit. The number of GME-related tweets follows a similar exponential growth starting on January 27 after Elon Musk’s endorsement (event d) and peaking on January 28 (Figure SI 1(A)). The growing interest on Twitter matches the explosive growth in the number of new subscribers to the WSB subreddit (Figure SI 1(B-C)). Overall, these results support three conclusions. First, collective attention towards GME follows the asset price growth with a delay. Second, despite the collective action being designed and coordinated on Reddit, wide interest was expressed on other social media as well. Last, not only the discussion originated from Reddit gradually attracted the attention of larger crowds to the topic, but it also engaged those crowds to the point of attracting them to the original source of the discussion—the WSB subreddit.

The evolution of commitment over time differs considerably from the growth of collective attention. Figure 1(B) shows the number of daily commitment events measured by counting “Gain”, “Loss”, “YOLO” posts (i.e., posts with one of these flairs), and the screenshots that WSB users submitted as proof of stake (see Methods for details). Before the new board of directors was announced (event b), WSB users uploaded a few dozens of commitment posts per day. The number of commitment posts increases tenfold on the day of the first price spike and keeps growing steadily afterwards. For eleven days, between the first price spike on January 14 until the next spike on January 25, such increase in commitment takes place in absence of any growth in financial returns. In summary, commitment predates price surges and is sustained also in absence of gains.

The presence of commitment in absence of returns raises the question of whether commitment was supported by other processes endogenous to the WSB community. A recent ethnographic study found that active members of WSB use shared linguistic markers and reciprocation of custom awards to express and reinforce the community’s sense of identity Boylston et al. (2021). Identity is a shared sense of belonging to a group Brewer and Gardner (1996) that can influence inter-group behavior Tajfel et al. (1979), not least by fostering cooperation Yamagishi and Kiyonari (2000); Simpson (2006). Identity is often signaled explicitly through symbols Mach (1993) or language cues Ochs (1993). To measure the group identity in GME-related submissions, we used a validated indicator of group identity Tausczik and Pennebaker (2010): for each submission, we calculated the fraction of the first person pronouns that are the plural pronoun “we”, and averaged those fractions across all the submissions of a given day (more details in Methods). Figure 1(C) shows the group identity expressed by GME-related submissions within the WSB community. The signal oscillates heavily until mid-January, due to the relatively low number of submissions. As the number of submissions increases, we detect two peaks. The first peak follows the market analysis from Citron Research (event c) that forecast a drop in GME stock price and antagonized the members of the WSB community by referring to them as “suckers at this poker game”. This finding is in agreement with the theoretical expectation of community identity being created during processes of struggle between social groups Cook-Huffman (2008)—in this case, between WSB and its detractors. The second peak matches the maximum increase of the stock price, and it is likely caused by the acknowledgment of collective success in performing the short squeeze. In short, we find that expressions of identity emerged concurrently with the increase of commitment and might have played a role in sustaining it, but identity is unlikely to be the origin of the collective action.

Commitment and reach of core vs peripheral authors

The sustained flow of commitment events during the weeks preceding stock price surge indicates the presence of a minority of committed users. As the interaction between the committed minority and the rest of the community is crucial to the success of a collective action Granovetter (1978); Schelling (2006); Xie et al. (2011), we study the dynamics of social interactions between committed individuals and other WSB users. These interactions occur over a rapidly evolving social network. Figure 2(A) shows a few snapshots of the network of replies over time. In these networks, users are connected if they submitted a comment in reply to the post or comment of another during the time-span considered. As new users join, the number of small disconnected components in the network increases (see Figure SI 4(B-C)), while the connected component tends to cluster around few popular discussion threads (especially the so-called daily megathreads Boylston et al. (2021)) created with the purpose of summarizing the events of the day and planning future actions. The structural transformation of the network happens abruptly rather than gradually.

Figure 2: The evolution of the GME discussion network. Panel (A) shows three examples of GME discussion networks, reconstructed in different time windows with the same number of nodes, = 3000. Nodes are WSB users, colored according to whether they posted a commitment submission (red) or not (gray). The size of nodes is inversely proportional to their k-shell, i.e. node belonging to the core are bigger than peripheral nodes. A link exists between two nodes if one of the two replied to the other at least once. Panels (B) and (C) show two key topological features of networks reconstructed over a rolling time window of 7 days: (B) The heterogeneity of the degree distribution , defined as , where and

are the first and second moments of the degree distribution, and

n-th moment is  Newman (2010). (C) The average network reciprocity. Shaded areas represent standard deviations of the network metrics aggregated on a daily rolling window basis.

We quantify this structural change by reconstructing networks over a rolling time window of 7 days, and looking at the evolution of two key topological quantities of these networks in time. First, we observe that the heterogeneity of the distribution of the nodes’ degree (i.e., the number of different users each user replies to) Newman (2010) increases three-fold in the span of 20 days after event c, thus reflecting the simultaneous emergence of super-hubs of discussion together with users engaging only in isolated interactions. Second, the direct reciprocity of interaction (i.e., the fraction of replies that are reciprocated within the time window considered) gets roughly halved in the same time span. This signal, combined with the increase in expressions of group identity (Figure 1(C)), is compatible with the emergence of generalized reciprocity Yamagishi and Kiyonari (2000), a norm according to which individual messages are not expected to receive direct responses; comments are not perceived as pieces of a conversation but rather as contributions to a collective discussion from which everyone benefits.

The complex and dynamic nature of the social network raises the question of what is the typical position of committed users in the network, and whether this position changes over time. To answer this question, we operationalise the notion of network position with the concept of -core shell Seidman (1983): the set of nodes in which every node is connected with other members of the set with at least links. This measure is a good indicator of a node’s centrality because it directly gauges embeddedness (the density of connections around it), and it is a good proxy for reachability (how quickly it can be reached from any other node of the network). For each temporal slice of the network, we perform its -core decomposition (see Methods), and we measure the level of commitment exhibited in each -core shell. Borrowing from previous work Barberá et al. (2015), we estimate the potential influence that commitment events have on the community at large by measuring not only the volume of commitment events in a shell, but also the number of people that these events reach—namely the number of WSB members who commented on a post that is submitted by a committed node in that shell.

Figure 3 shows how commitment activity and its reach are distributed between users in the core of the network (high -core shells) and peripheral users (low -core shells), as a function of time. First, in Figure 3(A-B) we show the fraction of commitment activity and reach that are generated by nodes belonging to an increasingly large number of -core shells, taken from the core to the periphery. To disentangle the effect of the network’s evolution from the distribution of commitment and reach on the network, and meaningfully compare commitment distribution over networks reconstructed in different periods, we contrast the observed commitment activity and its reach to a null model benchmark which preserves the commitment but randomizes the network’s topology (see Methods). In Figure 3(A-B), the curve being higher (lower) than the benchmark indicates that commitment volume or reach are generated predominantly by nodes in the core (periphery) of the network. For example, in the network of interactions between November 11 and December 18, central nodes are those who pledge more commitment to the GME cause (Figure 3(A)); on the contrary, when considering interactions between January 20 and January 27, commitment comes mostly from peripheral actors (Figure 3(B)).

To get a comprehensive picture of the coreness of committed users over time, we measure the difference between the area below the observed curve and the area below the benchmark curve at a given temporal slice, for all the slices computed on networks reconstructed by using a rolling time window of 7 days. Results are robust to the slicing strategy chosen for constructing the networks (see Supplementary Information (SI), Sec. A.6, Fig. SI 6 and 7). Figure 3(C) shows the value of such difference as a function of time. Relative to the benchmark model, both commitment and reach are concentrated in the network’s core until event c (January 19). From that moment onward, the commitment activity obtains a larger reach within the periphery. While always remaining more concentrated in the core, the commitment activity spreads more and more towards the periphery following event c. Therefore, the committed minority which may have triggered the first price increase in the GME stock is formed by central users in the discussion network unfolding on WSB. Only in the last phases of the collective action, when the price has already increased considerably, peripheral users step in and show commitment, which reaches more peripheral peers.

Figure 3: The evolution of commitment and reach. (A-B) Fraction of observed commitment activity (red line) and reach (blue line) produced by nodes belonging to an increasingly large fraction of -core shells (from core to periphery). Curves constructed using the observed data (filled lines) are compared to those obtained for the benchmark model (dashed lines). Results are shown for two network slices, the first constructed between December 11 and December 18 (A), the second between January 20 and January 27 (B). (C) Average difference between the area below the observed curve and the area below the benchmark curve over time, for commitment (red dashed-dotted line) and reach (blue dashed-dotted line). Shaded areas corresponds to the standard deviation area computed for each slice. Dashed vertical lines indicate relevant events (see Table 1). For values of difference larger than zero (gray shaded area), activity is concentrated in the core of the network, relative to the benchmark model. Networks are constructed using a sliding window of days.


In this paper we showed that the collective action originated on Reddit and culminated in the successful short squeeze of GameStop shares was driven by a small number of committed individuals. We operationalised financial commitment on Reddit as providing proof of stakes in a given asset, often in the form of a screenshot. We then showed that events of commitment predated the initial surge in price, which in turn attracted more participants to the GameStop discussion and thus triggered new events of commitment. Finally, we described how initial committed users were part of the core of the network of Reddit conversations, and that the social identity of the broader group of Reddit users grew as the collective action unfolds.

Our study focused on an single, unprecedented, event of financial collective action. While this is certainly a limitation, as more events would allow us to corroborate or falsify our findings, a prompt investigation of the GameStop events was in order. The events that unfolded over the course of the few weeks that we analysed in our study caused sustained effects on the market. Seven months later (at the time of writing this manuscript), the value of the GameStop stocks had risen by compared to the beginning of 2021. The price increase inflicted enormous financial losses to multiple hedge funds, one of which was forced to shut down Fletcher (2021).

The influence of retail investors in equity markets is rapidly growing, and now accounts for almost as much volume as hedge and mutual funds combined.222 This rise has been mainly driven by the emergence of commission-free trading platforms, that offer the possibility to trade fractions of shares, so that users can start trading even with very small amounts. Moreover, these platforms allow investors to use leverage, by buying and selling options and accessing to cheap margin loans from brokerages, in a gamified user experience. This “democratization of trading and investing” is unlikely to disappear any time soon Aramonte and Avalos (2021), so other financial collective actions might be coordinated in the future, possibly through different social media channels.

In this perspective, beyond the role of committed individuals in promoting the coordinated action, our findings have other potential implications to be tested in future research. (i) The fact that initial committed individuals were part of the core of the Reddit discussions implies that the system may be resilient against adversarial attacks where freshly created “committed” bots try to influence the community. (ii) The finding that identity was not the driver of the collective action but, on the contrary, a byproduct of it may imply that successive actions that leverage it might be easier to coordinate. (iii) The change in network structure ensuing from the arrival of new users, who joined the discussion motivated by the initial success of the squeeze, and the corresponding shift of the bulk of commitment and reach from the core to the periphery of the network, highlights the role of the system’s openness and the hierarchies that catalyse a successful collective action.

Taken together, our findings highlight that financial collective action cannot be reduced to the impact of social coordination on financial markets. The effect –and, particularly, the success– of an action have profound consequences on the membership, structure, and dynamics of the original group, whose evolution may have in its turn consequences on future actions. Thus, the initial committed individuals trigger a behavioural cascade which is self-sustaining and transforms the group itself. More events and data are needed to clarify this interplay between bottom-up processes of social coordination and financial markets, and this is a direction for future work. Our results represent a first step in this direction, and we anticipate that, as financial collective action is expected to acquire even more importance in the future, they will be of interest to researchers, industry professionals, and regulators.



We used two main sources of data: the activity on the subreddit r/wallstreetbets and the price of GameStop shares, ticker GME.

Reddit is organized in communities, called subreddits, that share a common topic and a specific set of rules. Users subscribe to subreddits, which contribute to the news feed of the user (their home) with new posts. Inside each subreddit, a user can publish posts (also called “submissions”), or comment on other posts and comments, thus creating trees of discussion that grow over time. Users can attach flairs to posts: a set of community-defined tags to define the semantic scope of the post, thus facilitating content search and filtering. Users can assign awards to posts or comments to recognize their value. Awards are sold by Reddit for money, they come in a variety of types, and some of them reward the recipient with money or perks such as access to exclusive subreddits.

We collected all posts and comments submitted to the r/wallstreetbets subreddit from January , up to the beginning of February . We did so by querying the Pushshift API Baumgartner et al. (2020), which stores all Reddit activity over time—using the PMAW wrapper Podolak (2021) (see SI, Sec A.1 for more details). The API returns rich metadata, including the timestamp of submission, the identity of the authors, its text content, and the awards each submission and comment received. In total we retrieved posts and comments submitted to the subreddit by different authors. We specialize only to posts related to GME by searching for posts containing either in the title of in the text-body the word “GME” or “Gamestop” (lowercase occurrences included) and all the comment trees associated with those submissions. This selected set consists of posts and comments. The period over which our study focuses its attention (from November 27, 2020 to February 3, 2021) includes of the posts and of the comments submitted since January 1, 2016 until February 3, 2021.

We retrieved GameStop daily prices from Yahoo Finance, using the Python library yfinance Aroussi (2021), and computed the daily price return as the daily relative change, where is the Open price at day .

Quantifying commitment

Event type Count Unique count Authors Unique authors
Table 3: Commitment events per type. Count column shows the number of posts classified as commitment events because of a “YOLO”, “Gain”, “Loss’ flair, or a screenshot of commitment identified with our machine vision classifier. Unique count shows the number of posts uniquely classified by the commitment type.

One of the widely-shared norms in the WSB community is to provide proof of one own’s financial position when initiating a new discussion about investments Boylston et al. (2021). This is commonly achieved by supplementing submissions with screenshots of open positions—typically gains, losses, or orders—taken from online trading applications. We used these screenshots to quantify commitment, as they provide a direct way to identify users who had stakes in financial assets. To gather them, we employ two methods: flairs and screenshots.

We use three flairs to mark posts containing a proof of position: the gain and loss flairs mark gains or losses for a minimum of USD, and the YOLO flair indicates investment positions with a minimum value at risk of USD. Flair-tagged submissions are moderated and are approved only if a relevant screenshot is attached. While it is mandatory for users to attach investment screenshots to have flairs approved, they can also attach screenshots to their submissions without using any flair.

As we are interested in capturing any signal of commitment, regardless of their magnitude, we resort to machine vision to identify commitment screenshots based on their visual content only. We retrieve all the screenshots attached to any of the submissions in our dataset by querying all URLs terminating with common image extensions (e.g., .png, .jpg). Out of this set, we randomly sample images and manually inspect them. We mark as positive all the screenshots which display gains, losses, or orders, and as negative all the remaining images, which include a broad variety of content ranging from screenshots of stock prices to memes.

We label positive examples and

negative examples. We use this set of labeled images to train a supervised model. Among several classifiers available off-the-shelf that we test, the most accurate is a PyTorch 

Paszke et al. (2019) implementation of DenseNet Huang et al. (2017)

, a deep neural network architecture designed for image classification. We initialize DenseNet with weights pre-trained on ImageNet 

Deng et al. (2009), a widely-used reference dataset of 1.2M labeled images. We then fine-tune the neural network (i.e., update its weights) by training it further by feeding it 70% of our labeled images. During fine-tuning, we use the Adam optimizer Kingma and Ba (2014) to minimize cross-entropy loss. We then measure the classifier’s performance on the remaining 30% of the examples by using precision (the fraction of pictures that the classifier labeled as positive that are actually positive), recall

(the fraction of positive pictures that the classifier labeled correctly), and F1 score (the harmonic mean between precision and recall). On our validation set, the classifier achieves a precision of

, a recall of , and an F1 score of .

We run the classifier on all images from r/wallstreeetbets and we merge the posts which contain the images that the classifier marks as positive with the set of flaired posts. In total, following this procedure we identify commitment events. Table 3 shows the number of commitment events divided by event type. Posts can be classified as commitment events by flair type or pictures of a holding position or both at the same time. The “unique count” column shows the contribution to the identification of commitment events uniquely coming from the single commitment type. In Figure SI 2 we show the contributions of each commitment, revealing that commitments from “YOLO” flairs are the dominant ones except for a three-days period in which the first price surge was followed by a surge in the number of commitment events from “Gain” flairs.

Quantifying identity

To capture linguistic expression of identity, we use two methods. First, we resort to a simple word count approach using Linguistic Inquiry Word Count (LIWC). LIWC is a lexicon of words grouped into categories that reflect social processes, emotions, and basic functions. It is based on the premise that the words people use provide clues to their psychological states. In particular, the abundant use of words in the LIWC category

we (i.e., first-person plural subject pronoun) related to the use of words from the LIWC category I (i.e., first-person singular subject pronoun) is a validated indicator of group identity Tausczik and Pennebaker (2010). Therefore, we measure identity as the fraction of pronoun we against the number of both we and I pronouns occurring in each submission text body. The results obtained with this particular estimator of identity are robust when compared to two alternative methods, which we discuss in SI.

Discussion network on WSB

We reconstructed the network of social interactions on r/wallstreeetbets; each node represents a user who submits a post or a comment on the subreddit, and each directed link represents user commenting on a submission by user . The direction of the link represents an interaction, and it is opposite to the information flow (user should have read what wrote to answer, but it is not guaranteed that will read ’s reply). Considering all nodes and interactions on WSB between authors discussing about GME, the resulting networked components consist of nodes and directed links, activated over the entire period starting on November 27, 2020 and concluding on February 3, 2021.

The time at which posts and comments are published can be used to obtain a description of social interaction dynamics. We modeled such dynamic through temporal slicing. In particular, we considered a rolling time window of seven days and shift it by two hours throughout the whole timespan of our dataset, for a total of windows. For each time window, we constructed a network using posts and comments published during that time window. We tested alternative temporal slicing strategies, and discussed them in SI Sec. A.6.

For each slice, we characterized nodes with a number of features, including their age (the time elapsed since their first interaction within the community), in- or out-degree (number of incoming or outgoing edges), their commitment (number of commitment events), or the reach of their commitment (number of users who comment on their commitment events). We also ran -core decomposition Wasserman et al. (1994) on the network of each temporal slice. The algorithm partitions nodes by their core shell (or core number), i.e., the shell , defined as the maximal subgraph in which every vertex has at least degree . The -core decomposition algorithm does not take edge directionality into account, and it considers the degree of a node as the sum of its outgoing and incoming edges.

Null model for random commitment activity. When computing the commitment of nodes as a function of their core number, to assess if committed users are more central or peripheral in the network, it is important to compare with a null model which takes into account the network’s topology. For this reason, we consider a null model of random commitment in which committed events are reshuffled randomly over the whole network, while the network’s structure is preserved. The empirical commitment of nodes with core number

is then compared with a uniform distribution of commitment across nodes, which is equivalent to averaging the results over an infinite number of random shuffles.


Supplementary Information