Identifying Competition and Mutualism Between Online Groups

07/14/2021
by   Nathan TeBlunthuis, et al.
University of Washington
0

Platforms often host multiple online groups with highly overlapping topics and members. How can researchers and designers understand how interactions between related groups affect measures of group health? Inspired by population ecology, prior social computing research has studied competition and mutualism among related groups by correlating group size with degrees of overlap in content and membership. The resulting body of evidence is puzzling as overlaps seem sometimes to help and other times to hurt. We suggest that this confusion results from aggregating inter-group relationships into an overall environmental effect instead of focusing on networks of competition and mutualism among groups. We propose a theoretical framework based on community ecology and a method for inferring competitive and mutualistic interactions from time series participation data. We compare population and community ecology analyses of online community growth by analyzing clusters of subreddits with high user overlap but varying degrees of competition and mutualism.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/08/2019

Key-agreement based on automaton groups

We suggest several automaton groups as key-agreement platforms for Anshl...
06/30/2021

When the Echo Chamber Shatters: Examining the Use of Community-Specific Language Post-Subreddit Ban

Community-level bans are a common tool against groups that enable online...
01/12/2022

No Community Can Do Everything: Why People Participate in Similar Online Communities

Large-scale quantitative analyses have shown that individuals frequently...
12/29/2021

Modeling Prejudice and Its Effect on Societal Prosperity

Existing studies on prejudice, which is important in multi-group dynamic...
10/12/2020

PhD dissertation to infer multiple networks from microbial data

The interactions among the constituent members of a microbial community ...
01/07/2019

The Group Element of Cybercrime: Types, Dynamics, and Criminal Operations

While cybercrime can often be an individual activity pursued by lone hac...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Although the fact is frequently ignored in social computing scholarship, online groups do not exist in isolation.111We use the term “online group” instead of “online community” to help avoid confusion with our term “community ecology” which plays an important conceptual and analytic role in our paper. Indeed, although studying interdependence between online groups is different and complex (hill_studying_2019), research in social computing has sought to quantify how online groups share users or topics (datta_identifying_2017; del_tredici_semantic_2018; tan_all_2015; hessel_science_2016), and how such interactions relate to outcomes like the emergence of new groups (tan_tracing_2018), contributions to peer produced knowledge (vincent_examining_2018), and the spread of hate speech (chandrasekharan_you_2017). Although this work has demonstrated that intergroup interactions matter, very little intergroup research has tackled questions of group success—i.e., why some online groups succeed in maintaining active and long-lived participation while most do not. Can intergroup relationships explain whether online groups will grow or decline?

Studies in social computing have drawn from organizational ecology to answer this question (wang_impact_2012; zhu_impact_2014; resnick_starting_2012; zhu_selecting_2014). Inspired by the ecological study of biological systems, organizational ecology is an influential body of theory in sociology that studies competition and mutualism among human organizations (hannan_population_1977; baum_ecological_2006). Although ecological studies of firms and social movements have developed a clear and established body of theory with strong empirical support (baum_ecological_2006), similar studies of online groups have yielded inconsistent results that differ both from one context to another and from theoretical predictions. For example, wikis whose memberships overlap with other wikis survived longer (zhu_selecting_2014), but Usenet groups with overlapping memberships failed more quickly (wang_impact_2012).

We argue that these confusing results are the result of a conflation of concepts and measures from two distinct strands of theory in organizational ecology: population ecology and community ecology. Both define competition as a form of interdependence that decreases growth and mutualism as one that increases growth. However, population ecology focuses on modeling the how overlapping resources among groups affect their subsequent growth, decline, or survival (astley_two_1985; baum_ecological_2006; dobrev_dynamics_2001). It does not attempt to directly study competitive and mutualistic interactions. On the other hand, community ecology recognizes that groups often exist within “ecological communities,” or clusters of highly related entities, and provides an approach for inferring competitive and mutualistic interactions among these. Although the stated goal of ecological research in social computing has been to understand how groups influence each others’ ability to sustain participation, ecological research in social computing has relied exclusively on concepts and measures from population ecology. This paper seeks to explain the puzzling set of findings in ecological social computing research by introducing community ecology.

We do so in a three-part empirical study using a dataset drawn from the 10,000 communities on Reddit with the most contributors to analyze 641 clusters of online groups with overlapping participants. In Study A, we conduct the most important type of population ecology analysis, a test of what is called density dependence theory, and find support for the theory. This analysis suggests that high degrees of user overlap are associated with competition. In Study B, we introduce our method for community ecology analysis that infers networks of competitive and mutualistic interactions by using clustering analysis and vector autoregression (VAR) models of group sizes over time

(sims_macroeconomics_1980; canova_var_2007; ives_estimating_2003). We illustrate the method in four case studies and present a large-scale computational analysis showing that mutualistic interactions are far more common than competitive ones. Finally, in Study C, we bring Study A and Study B together to compare population ecology and community ecology by extending the density dependence models from Study A with a variable accounting for competition and mutualism. While we find that adding this variable does not help predict growth, including ecological interactions in our VAR models improves time series forecasting.

We discuss how these findings illuminate differences between population ecology and community ecology and show how the two perspectives are complementary. While Study A suggests that competition is strongest when user overlap is high, Study B finds widespread mutualism among groups with overlapping membership. Although these findings might seem contradictory, they reflect how population ecology studies overlapping resources related to favorable or unfavorable environmental conditions, while community ecology studies competitive and mutualistic interactions playing out in local networks of specific groups. By demonstrating that mutualistic and competitive interactions within clusters of highly related groups are important—and by describing how to measure them—this paper lays the groundwork for future research to investigate and design for interdependence between online groups that supports their growth and success.

2. Related Work

Online groups are important sites for social support (de_choudhury_mental_2014), entertainment (ducheneaut_alone_2006), information sharing (benkler_wealth_2006), and political mobilization of disinformation campaigns and protest movements (choudhury_social_2016; benkler_social_2013; krafft_disinformation_2020). Although an online group’s ability to achieve its goals depends on attracting and retaining contributors, few develop a sizable group of participants (benkler_wealth_2006; dimaggio_social_2001; johnson_emergence_2014; koh_encouraging_2007; kraut_role_2014). Many attempts to explain the success and growth of online groups look to properties of individual groups like characteristics of founders (kraut_role_2014), language use (danescu-niculescu-mizil_no_2013), turnover (dabbish_fresh_2012), and designs for regulating behavior (halfaker_rise_2013; teblunthuis_revisiting_2018).

Recent research suggests that interdependence among online groups is also important to explaining success and failure (cunha_are_2019; kairam_life_2012; tan_all_2015; tan_tracing_2018). For example, banning hate subreddits reduced hate-speech in related subreddits (chandrasekharan_you_2017). In a very different context, there is evidence that Reddit and Stack Overflow receive substantial benefits from activity on Wikipedia (vincent_examining_2018). Our work contributes to this literature by providing a new conceptual lens and statistical method for studying competition and mutualism between online groups.

2.1. Online Groups Depend on Resources

Like prior ecological research in social computing and information systems, we build on resource dependence theory (RDT) (butler_membership_2001; wang_impact_2012). butler_membership_2001 (butler_membership_2001) introduces RDT to argue that growth in online groups is driven by positive feedback as participants contribute resources such as content, information, attention, or social interactions, which motivate further contributions by subsequent participants. That said, online groups do not grow forever and RDT explains that growth is self-limiting because costs of participation increase in larger groups (butler_membership_2001; butler_attraction-selection-attrition_2014).

Ecological approaches recognize that interrelated online groups may share resources with one another in ways that constrain their growth and survival. Rival resources like participant’s time, attention, and efforts raise the possibility of competition because they become unavailable to others when used by one group (benkler_wealth_2006; kubiszewski_production_2010; ostrom_public_1977; romer_endogenous_1990). RDT suggests that declines in online participation can be explained in terms of competition over important rival resources (wang_impact_2012).

On the other hand, online groups also rely on nonrival resources. They can even produce connective and communal public goods like opportunities to communicate and collections of information (fulk_connective_1996) which can be “antirival” when their usefulness increases as a result of others using them (kubiszewski_production_2010; weber_political_2000). For example, the usefulness of a communication network increases as more people join it (fulk_connective_1996; katz_network_1985). Similarly, the usefulness of an information good can increase as more people come to know, refer to, and depend upon it (kubiszewski_production_2010; weber_political_2000). If multiple online groups help build the same connective or communal public goods, they may form mutualistic interactions where contributions to one group may “spill over” and motivate participation in mutualist groups (zhu_impact_2014). Ecological approaches seek to understand how different types of resources will limit or promote growth.

2.2. Population Ecology, Density Dependence and Overlapping Resources

While this paper focuses on the ecological study of online groups, other social computing and HCI scholars have used the term “ecology” (and related concepts like “ecoystem” and “environment”) to denote assemblage of sites, devices, or platforms (nardi_information_1999; wang_coming_2015). We use the term more narrowly in reference to the conceptual and mathematical models of ecological dynamics. In particular, our work builds on a tradition rooted in organizational ecology. First developed in the late 1970s by sociologists studying interactions between firms, organizational ecology was inspired by, and has drawn closely from, ecological studies in biology (hannan_population_1977).

Because online groups bear similarities to traditional organizations, organizational ecology provides a compelling theoretical framework for understanding interdependence among online groups. It has inspired at least three high-quality empirical studies of how resources shared by online groups shared shape their growth, decline, or survival (wang_impact_2012; zhu_impact_2014; zhu_selecting_2014). These studies draw from the population ecology strand of organizational ecology that studies ecological dynamics within a population of groups. In organizational ecology, populations have been defined as sets of organizations sharing an organizational industry or business model (hannan_organizational_1989). In social computing, populations have been defined as online groups sharing a given social media platform (wang_impact_2012; zhu_impact_2014; zhu_selecting_2014).

While population ecology involves several distinct theoretical propositions, density dependence theory (DDT) is perhaps the most prominent and is the subject of all three prior ecological studies of online groups (wang_impact_2012; zhu_impact_2014; zhu_selecting_2014). DDT models competitive or mutualistic forces in a population of groups as a function of density which, in the earliest and most influential studies of DDT, is simply the size of the population. In this way, DDT assumes that every group in the population is facing the same competitive and mutualistic pressures (aldrich_organizations_2006). However, online groups sharing a platform have diverse topics (kairam_life_2012), norms (chandrasekharan_internets_2018; fiesler_reddit_2018), and user bases (tan_all_2015). Because groups sharing few resources are unlikely to be strongly interdependent, ecological studies of online groups have modeled density dependence based on the concept of overlap density (baum_ecological_2006; dobrev_dynamics_2001; wang_impact_2012; zhu_impact_2014; zhu_selecting_2014). Rather than the number of groups that exist in a population, overlap density measures the extent to which an one group’s members or topics overlap with all other groups’. Overlap density thus characterizes a group’s niche or local resource environment defined by its distinctive topic and membership.

DDT proposes a model for the growth organizational populations that has a similar structure to butler_membership_2001’s (butler_membership_2001) RDT model for the growth of online groups. In DDT, mutualism is the engine of positive feedback driving population growth. Organizational ecologists show how successful organizations in an emerging industry develop nonrival resources like the legitimacy of a business model or industrial know-how that attract new organizations to enter the market (carroll_density_1989; hannan_organizational_1989). Similarly, a population of online groups, such as those sharing a platform, may grow in size as their platform gains in popularity, as established groups spin off new ones, and as useful knowledge develops that can be shared between groups (tan_tracing_2018; zhu_impact_2014).

In RDT, growth of online groups is self-limiting because of the challenges in managing large groups (butler_membership_2001). In DDT, competition among population members over rival resources limits growth (hannan_organizational_1989). DDT thus proposes a trade-off in which low density reflects limited opportunities for mutualistic contributions of nonrival resources like legitimacy, connectivity, and knowledge, but high density reflects competition over rival resources. Therefore, DDT predicts that the relationship between density and positive outcomes like growth or survival is -shaped (inverse-U-shaped) (baum_ecological_2006; carroll_density_1989).

Tests of DDT in populations of online groups yield inconsistent results. In wang_impact_2012, user overlap in Usenet newsgroups is associated with decreasing numbers of participants. Similarly, teblunthuis_population_2020 find that topical overlaps between online petitions are negatively associated with participation. By contrast, zhu_impact_2014 find that membership overlap is positively associated with increasing survival of new Wikia wikis. Only zhu_selecting_2014 find support for the -shaped relationship predicted by DDT in an enterprise social media platform.

In Study A, we provide a test of DDT using data from Reddit. The classical logic of DDT appears reasonable in the context of Reddit because low overlap density is likely to reflect an impoverished environment lacking in non-rival resources like skills and knowledge of experienced users while a group with high overlap is likely to face competition over its members (zhu_selecting_2014; zhu_impact_2014): (H1) The relationship between overlap density and the growth of online groups is -shaped (inverse-U-shaped).

DDT proposes that very high levels of density will decrease growth because of increasing forces of competition within a niche. However, to conclude that groups with the greatest membership overlaps are likely competitors would be to commit a well known statistical fallacy (piantadosi_ecological_1988; robinson_ecological_1950). The density of a group’s environment suggests that it faces competition or mutualism, but it does not tell us which overlapping communities are competitors and which are mutualists. Community ecology overcomes this limitation of DDT.

2.3. Introducing Community Ecology

Perhaps the most natural way to understand the distinction between the population ecology and community ecology is in where they believe ecological dynamics like competition and mutualism play out (astley_two_1985). While population ecology locates competition and mutualism within an environmental niche, community ecology locates competition and mutualism in networks of interdependent groups called ecological communities (aldrich_organizations_2006). In organizational ecology, this can mean studying interactions between different organizational populations (e.g. sorensen_recruitment-based_2004; mcpherson_ecology_1983), or networks of interactions between organizations (e.g powell_network_2005; margolin_normative_2012). While varying conceptions of community ecology are found in the organizational ecology literature (freeman_community_2006), the approach we describe is identical in structure to that taken by aldrich_organizations_2006 and hawley_human_1986.

Community ecology focuses on ecological interactions (aldrich_organizations_2006). Ecological interactions can be mutualistic when one group has a positive influence on the second such that growth in the first group leads to growth in the second. They can also be competitive if one group has a negative effect on the second such that growth in the first group leads to decline in the second. Ecological interactions can be reciprocated if mutualism (or competition) from one group to another group is returned in kind. An ecological interaction can also be mutualistic in one direction and competitive in the other. The competitive or mutualistic interactions in an ecological community are quantified by the community matrix, a central analytical object in community ecology in both biology and organization science (verhoef_community_2010; novak_characterizing_2016; aldrich_organizations_2006).

In Study B, we demonstrate community ecology by inferring networks of ecological interactions in ecological communities on Reddit. Because our understanding of community ecology theory does not suggest hypotheses about what we will find, we conduct an exploratory data analysis to find out whether mutualism or competition among subreddits is more common on Reddit and present case studies illustrating the types of ecological communities we identify.

2.4. Predicting Growth

In Study C we build upon our analyses from Study A and Study B by testing whether community ecology can explain the growth and decline of online groups in ways that population ecology can not. We do this by analyzing in two different ways whether accounting for ecological interactions helps predict future group sizes. In general, competition for overlapping resources will have no effect on group growth if something besides the overlapping resource limits growth (verhoef_community_2010). For example, two wikis might share a large number of contributors (have high user overlap), but their growth might be limited by a lack of core contributors who perform important administrative tasks like policy making and software administration (zhu_impact_2014). Community ecology relaxes the assumption that competition and mutualism are caused by user overlap density and instead seeks to infer these relationships from data. We test the importance of this conceptual shift for predicting growth by testing two hypotheses. The first uses a model comparison approach to test if adding a measure of ecological interactions to the density dependence models from Study A improves predictions of growth: (H2) A model with ecological interactions and density dependence predicts growth in online groups better than density dependence alone.

Support for H2 may be a relatively low bar for assessing whether ecological interactions are important factors shaping the growth of online groups because of confounding moderator or mediator variables related to the occurrence of ecological interactions. Therefore, we also use a time series forecasting approach to test whether modeling ecological interactions is useful for making time series forecasts of participation in online groups: (H3) The addition of ecological interactions to a baseline time series model improves the forecasting performance. While this does not directly compare population ecology and community ecology, it validates that ecological interactions are important.

3. Materials & Methods

3.1. Data

Our data are drawn from the publicly available Pushshift archive of Reddit submissions and comments which we obtained from December 5th 2005 to April 13th 2020 (baumgartner_pushshift_2020). Within this dataset, we limit our analysis to submissions and comments from the 10,000 subreddits with the highest number of comments. There are 702 subreddits larger than the smallest subreddit included in our dataset having a majority of submissions marked “NSFW,” which typically indicates pornographic material. As others have done in large-scale studies of Reddit (e.g., datta_identifying_2017), we exclude these subreddits to avoid asking members of our research team to inspect clusters including pornography. The top 10,000 subreddits provide a sufficiently large number of ecological communities for our statistical analysis.

3.2. Study A: Density Dependence Theory

3.2.1. User overlap

quantifies the degree to which two subreddits ( and ) share users. zhu_impact_2014 and wang_impact_2012 both measure user overlap between two groups by counting the number of users contributing to both groups at least once and exclude users who appear in more than 10 groups. In our preliminary analysis, we found that this measure led to similarity measures and clusters with poor face validity. These issues may have stemmed from how Reddit users often peripherally participate in many groups while participating heavily in few (tan_all_2015; hamilton_loyalty_2017; zhang_community_2017). Therefore, our measure of user overlap follows datta_identifying_2017 by using the number of comments each user makes in each pair of groups.

To measure user overlap between subreddits, we first build user frequency vectors by counting the number of times each user comments in each subreddit. We prevent giving undue weight to subreddits with higher overall activity levels by normalizing the comment-counts for each subreddit by the maximum number of comments by a single author in the subreddit:

(1)

where , the user frequency, is the number of times that user authors a comment in subreddit .

This results in a user frequency vector

for each subreddit that is sparse and high-dimensional, having one element for each user account that comments in any subreddit in our dataset. Next, we use LSA to reduce the dimensionality of the user frequency vectors. LSA is based on the singular value decomposition and is common in natural language processing and information retrieval. LSA preserves subreddit similarities while removing noise and dealing with sparsity

(dumais_latent_2004):

(2)

is the matrix where columns are author frequency vectors and

is its singular value decomposition. Truncating the singular value decomposition to use only the first

left-singular vectors gives . Left-multiplying a subreddit’s author frequency vector by transforms the high-dimensional author frequencies into , their approximation in the -dimensional space.

We then obtain our measure of user overlap

by taking the cosine similarities between the resulting vectors for a pair of subreddits:

(3)

where is the euclidean norm of the transformed user frequencies for subreddit .

3.2.2. Growth

is the dependent variable in our density dependence models testing H1 and is also used in our test of H2 as part of Study B. Growth is measured as the change in the (log-transformed) size of a subreddit over the final 24 weeks of our data, from to November 4th 2019 to April 13th 2020.

3.2.3. Overlap density

is the normalized average user overlap for a given subreddit. It is the independent variable in our density dependence model testing H1:

(4)

where is the set of groups in our dataset.

3.2.4. Regression model for H1

To test H1, we fit Model 1 which has first and second-order terms for overlap density to allow for a curvilinear relationship between overlap density and growth.

(5)

where is the growth of subreddit and is its overlap density.

3.3. Study B: Introducing Community Ecology

3.3.1. Clustering to identify ecological communities

Analyzing networks of ecological interactions is the key difference between community ecology and population ecology. To identify ecological communities of related subreddits, we use a clustering procedure on the user overlap measure described above in §3.2.1. We selected a clustering model using grid-search to obtain a high silhouette coefficient (rousseeuw_silhouettes_1987). The silhouette coefficient captures the degree to which a clustering creates groups of subreddits with high within-cluster similarity.

Our description of our measure for user overlap in §3.2.1 does not explain how we choose the number of LSA dimensions . To do so, we ran the affinity propagation (frey_clustering_2007), HDBSCAN (mcinnes_hdbscan_2017) and k

-means clustering algorithms and selected the algorithm, hyperparameters, and LSA dimensions

that resulted in the clustering with the best silhouette coefficient having less than 5,000 isolated subreddits, and at least 50 clusters. We limit the number of isolated subreddits because some choices of hyperparamters for the HDBSCAN algorithm could improve the silhouette coefficient, but at the cost of greatly increasing numbers of isolated subreddits. Choosing a relatively high limit to the number of isolates helps ensure that our clusters contain highly related communities. We chose an HDBSCAN clustering with 731 clusters, 4964 isolated subreddits, LSI dimensions, and a silhouette score of 0.48. We exclude the isolated subreddits from our analysis. More details about our clustering selection process are found in the online supplement.

We evaluate the external validity of the chosen clustering using the purity evaluation criterion (manning_introduction_2018) To do so, an undergraduate research assistant examined a random sample of 100 clusters including 744 subreddits. By visiting the subreddits and using her own judgment, the assistant flagged subreddits that did not seem like a good fit for their assigned cluster. Using these labels and excluding 25 subreddits that have been deleted, made private, or banned, we calculated the purity of our clustering as 0.92. This means that we believe that 92% of subreddits belong to their assigned cluster.

3.3.2. Group size

is the dependent variable of the models we use to infer ecological interactions. Measured as the number of distinct commenting users in a subreddit each week, group size quantifies the number of people who participate in a subreddit over time. Typical of social media participation data, group size is highly skewed. Therefore, we transform it by adding 1 and taking the natural logarithm.

3.3.3. Inferring ecological interactions using Vector Auto Regression

The community matrix

of ecological interactions can be inferred from time series data using vector autoregression models (VAR models). VAR models are a workhorse in biological ecology because VAR(1) models (i.e. VAR models with a single autoregressive term) have a close relationship to the Gompertz models of population growth which are widely used in ecology

(ives_estimating_2003). Even in the presence of unmodeled nonlinearities, VAR(1) models can reliably identify competition or mutualism in empirically realistic scenarios (certain_how_2018). VAR models also been widely adopted in the social sciences, particularly in political science and in macroeconomics (box-steffensmeier_time_2014).

VAR(1) models can be intuitively understood as a generalization of auto-regressive AR(1) models in time series analysis. But while AR(1) models predict the state of a single time series as a function of its previous value, VAR(1) models simultaneously predict multiple time series as a function of the values of every other variable in the system (canova_var_2007; ives_estimating_2003):

(6)

where is a vector containing the sizes of a set of online groups () at time . is the vector of intercept terms and is the vector of linear time trends () for each community (). represents the influence of , the size of the online group at time on . is a column of , a matrix of coefficients in which the diagonal elements correspond to intrinsic growth rates (marginal to the trend) for each online group and the off-diagonal elements are inter-group influences, and is the vector of error terms

Additional time-dependent predictors () can be included in the vectors with coefficients . Because subreddits are created at different times, growth trends must begin only after the subreddit is created. We use to introduce a counter-trend during the period prior to the creation of subreddits so that each group’s growth trend begins in the period the group is created. For each group created at time we fill with the sequence . In other words, adds a counter-trend only during the period prior to the first comment in subreddit . We fix elements of equal to 0 unless , so the counter trend only influences subreddit . This effectively sets approximately equal to .

We fit VAR(1) models using ordinary least squares as implemented in the

vars R package to predict the group size each week using over the history of each subreddit prior to November 4th 2019 (pfaff_var_2008). We hold out 24 weeks of data for forecast evaluation and fit our models on the remainder. To ensure that sufficient data is available for fitting the models, we exclude 946 subreddits and 89 clusters having less than 156 weeks of activity.

3.3.4. Characterizing ecological communities

In Study B, we interpret the community matrix as a directed network of ecological interactions, a competition-mutualism network (ives_estimating_2003). Although the elements of correspond to direct associations between group sizes (novak_characterizing_2016), ecological interactions can also be indirect. Consider 3 one-directional interactions between three groups (, , ) such that growth in predicts decreased growth in (), growth in predicts decreased growth in (), but and do not directly interact ().

This does not necessarily mean that groups A and C are independent. Rather, an exogenous increase in A predicts a decrease in B and thereby an eventual increase in C. Such indirect relationships are analyzed by using impulse response functions (IRFs) to interpret a VAR model (box-steffensmeier_time_2014). In large VAR models containing many groups, the great number of parameters can mean that few specific elements of will be statistically significant, even as many weak direct relationships can combine into statistically significant IRFs (canova_var_2007).

3.3.5. Average ecological interaction

measures the extent to which an overall ecological community is mutualistic or competitive by taking the mean point estimate of the off-diagonal coefficients of

:

(7)

if then mutualistic interactions within the ecological community are stronger than competitive ones, and if then competitive interactions are stronger then mutualistic ones.

3.3.6. Ecological interaction strength

quantifies the overall strength of ecological interactions in an ecological community as the mean absolute value of the point estimates of the off-diagonal coefficients of :

(8)

where is the absolute value of the coefficient .

Ecological communities of subreddits with overlapping users vary in both the overall strength of ecological interactions and in the overall degree of mutualism and competition between member groups. If an ecological community’s average ecological interaction is positive, we say the ecological community is mutualistic. If it is negative, we say the ecological community is competitive. The average ecological interaction can be close to 0 in two ways. First, ecological interaction strength can simply be low. Alternatively, the ecological community can have a mixture of competitive and mutualistic interactions that cancel one another out when averaged.

3.3.7. Impulse response functions

(IRFs) of our VAR(1) models correspond to our visualizations of example competition-mutualism networks in §4.2.1. An IRF predicts how much each group’s size would change in response to a sudden increase in the size of each other group (verhoef_community_2010):

(9)

where is the impulse response function at time . is an -by-identity matrix so our impulses represent a log-unit increase of 1 to each group. is a matrix with elements corresponding to the response of group to the impulse of group . We draw an edge in the competition-mutualism network if the 95% CI of does not include zero at any time . If , the edge indicates mutualism and if the edge indicates competition.222In higher-order VAR() models that use past observations as predictors can be less than 0 for some and greater than 0 for some . However, this is not possible in the VAR(1) models we use.

We compute the IRFs with bootstrapped confidence intervals (CI) based on 1,000 samples using the

vars R package.

3.4. Study C: Predicting growth

3.4.1. Average subreddit mutualism

is the independent variable for our test of H2 and measures the average influence of other subreddits in the ecological community on a given subreddit , which we calculate by taking the mean of off-diagonal elements of row of the community matrix:

(10)

where is the set of subreddits in the ecological community and is the number of subreddits in . We use the mean instead of the sum because different ecologial communities have different numbers of subreddits.

3.4.2. Regression models for H2

We test H2 by using likelihood ratio tests to compare Model 1 and Model 2 which adds average subreddit mutualism () as a predictor. We also fit Model 3 which we compare to Model 2 to test if overlap density explains variation that average subreddit mutualism does not.

(11)
(12)

where is the growth of subreddit , is its overlap density, is its average subreddit mutualism, and , , , and are regression coefficients.

3.4.3. Forecasting growth using ecological interactions

To test H3, we evaluate whether modeling ecological interactions improves time series forecasting of future participation in online groups by comparing the model in Equation 6 to a baseline model with the off-diagonal elements of fixed to 0. This baseline model is equivalent to our VAR model, but excludes ecological interactions.

We use two forecasting metrics with differing assumptions: root-mean-squared-error (RMSE) and the continuous ranked probability score (CRPS). RMSE is commonly used, non-parametric, and intuitive, but does not take differing scales of the predicted variable or forecast uncertainty into account. Thus, in our setting it may place excessive weight on forecasts of larger subreddits where errors may have greater magnitude simply because the absolute magnitude of the variance is greater. By rewarding forecasts where the true value has high probability under the predictive distribution, the CRPS accounts for variance in the data and rewards forecasts for both accuracy and precision and is thus a “proper scoring rule” for evaluating probabilistic forecasts

(gneiting_strictly_2007)

. Our CRPS calculations assume that the predictive forecast distribution for each community is normal with standard deviations given by the 68.2% forecast confidence interval. We calculate CRPS using the

scoringRules R package (jordan_evaluating_2019).

4. Results

Figure 1. Relationship between density and growth. A 2D histogram of subreddits with overlap density (log-transformed) on the X-axis and the change in the logarithm of the number of distinct commenting users on the Y-axis. The black line shows the marginal effect of overlap density on growth as predicted by Model 2. The gray region shows the 95% confidence interval of the marginal effect.

4.1. Study A: Density Dependence Theory

We test the classical prediction of density dependence theory as formulated in H1 using Model 1 which has first- and second-order terms for effect of overlap density on growth. As described in §2.2, H1 hypothesizes that overlap density will have a curvilinear -shaped (inverse-U-shaped) relationship with growth indicated by a positive first-order regression coefficient and a negative second-order coefficient.

Model 1 Model 2 Model 3
Overlap density 1.50 (0.26) 1.50 (0.26)
Overlap density 2.08 (0.41) 2.09 (0.41)
Average subreddit commensalism 0.12 (0.26) 0.11 (0.26)
Constant 0.23 (0.03) 0.23 (0.04) 0.04 (0.01)
Log Likelihood -4970 -4970 -4986
Observations 4,090 4,090 4,090
Note: p
Table 1. Loglinear regression predicting subreddit growth as a function of overlap density. The model supports the prediction of density dependence theory of a -shaped relationship between overlap density and growth.

As predicted, we observe a -shaped relationship between overlap density and growth. Figure 1 plots the marginal effects of overlap density on growth for the median subreddit laid over the data on which the model is fit. Table 1 shows regression coefficients for Models 1-3. For about half of subreddits, increasing overlap density is associated with higher growth rates. The point where increasing density ceases to predict increasing growth and begins to predict decreasing growth is at the 49th percentile. Prototypical subreddits at this overlap density grew slightly (95% CI:[0.001,0.06]). Yet subreddits at the lower and upper extremes of overlap density slightly declined on average. Typical groups at the 20th percentile of overlap density decline by 1.1 members (95% CI:[-1.1,-1.15]) and typical groups at the 80th percentile decline by 1.2 members (95% CI:[-1.1,-1.28]). While we find support for classical theoretical prediction of a curvilinear, (-shaped) relationship between overlap density and growth, this does not imply that relationships between highly overlapping communities are more competitive.

4.2. Study B: Introducing Community Ecology

Figure 2 visualizes the distribution of average ecological interaction and ecological interaction strength over the 641 ecological communities we identify. We observe ecological communities characterized by strong forms of both mutualism and competition, others having mixtures of the two, and some with few significant ecological interactions. Mutualism is more common than competition with the mean community having an average ecological interaction of 0.03 (, ). We find that 524 clusters (81.7%) are mutualistic. Not only are most ecological communities mutualistic, but more mutualistic ecological communities have greater ecological interaction strength (Spearman’s , ). Therefore, our community ecology analysis suggests that among groups with similar users, mutualistic ecological interactions are more common than competitive ones.

Figure 2. Two-dimensional histogram showing ecological communities on Reddit in our typology. The X-axis shows the overall degree of mutualism or competition in clusters of subreddits with high user overlap based on the average ecological interaction. The Y-axis shows the ecological interaction strength representing the overall magnitude of competition or mutualism.

4.2.1. Example ecological communities

We present four case studies to illustrate our typology of ecological communities of online groups. Figure 2 shows that we find clusters of subreddits characterized by mutualism, competition, a mixture of mutualism and competition, and few ecological relationships at all. We select one case from each of these four types using our measures of average ecological interaction (§3.3.5) and ecological interaction strength (§3.3.6). To allow for more interesting network structures, we draw our cases from the 367 large clusters having at least five subreddits.

(a) The ecological community of subreddits for supporting mental health and survivors of abuse is dense with largely mutualistic interactions. Some interactions, like that between r/mentalhealth and r/survivorsofabuse are mutualistic in one direction but competitive in the other.
(b) The subreddits about real estate and finance are relatively competitive. We detect reciprocal competitive relationships among the real estate subreddits in the triad including r/realestateinvesting, r/realestate and r/commercialrealestate.
(c) Subreddits about watches are dense with both mutualistic and competitive interactions. There is a reciprocal competitive interaction between r/gshock and r/seiko, a reciprocal mutualistic interaction between r/gshock and r/rolex well as several unreciprocated mutualistic and competitive interactions.
(d) An ecological community of subreddits about Call of Duty video games characterized by relatively sparse ecological interactions. We detect only two mutualistic interactions from r/blackops3 to r/infinitewarfare from codzombies to r/infinitewarfare.
Figure 3. Network visualizations of commensal relationships in example ecological communities of subreddits with overlapping users. Yellow indicates competition and purple indicates mutualism.

Figure 3, presents visualizations of competition-mutualism networks representing statistically significant impulse response functions as described in §3.3.7. In the course of our analysis, we also examined the terms of the vector autoregression parameter , the impulse response functions, and model fits and forecasts, all of which are available in our online supplement. We also visited each subreddit in the clusters and read their sidebars and top posts in order to validate each subreddit’s membership in the ecological community and to support our brief qualitative descriptions.

4.2.2. Mutualism among mental health subreddits

To find a case characterized by mutualism, we selected the top 37 large clusters with the greatest average ecological interaction. From these, we arbitrarily chose one interesting ecological community, the mental health cluster, which includes 11 subreddits for supporting people in struggles with mental health, addiction, and surviving abuse. Constitutive subreddits include those focused on specific mental health diagnoses like r/bpd (bipolar disorder) and r/cptsd (complex post traumatic stress disorder) while others like r/survivorsofabuse and r/adultsurvivors are support groups.

The interactions among these subreddits are dense and primarily mutualistic as shown in Figure 3. There are a handful of competitive interactions like the reciprocal competition detected between r/codedependence and r/bpd. We also observe some interactions that are mutualistic in one direction and competitive in the other. For example, growth in r/addiction predicts increasing growth in r/cptsd even as that growth in r/cptsd predicts decreasing growth in r/addiction. This suggests a pattern in which r/cptsd siphons members from r/addiction. That said, the density of mutualistic interactions shown in Figure 3 suggests that different subreddits have complementary roles in this ecological community as people turn to different types of groups for help with interrelated problems. While attempting to explain why different online groups form mutualistic or competitive interactions is left to future research, the example of mental health subreddits shows how groups with related topics and overlapping participants can have mutualistic interactions where growth in one predicts growth in many of the rest.

4.2.3. Competition among real estate and financial independence subreddits

To find competitive clusters we first selected clusters from the 36 large clusters with the lowest average ecological interaction. From these, we chose an ecological community that we label financial independence. Among the 6 subreddits in this cluster, r/realestateinvesting, r/realestate. r/commercialrealestate all deal in different aspects of the real estate industry, while r/financialindependence and r/fatfire (the acronym “fire” means “financial independence/retire early”) are focused on building wealth and becoming financially independent and r/financialplanning is a general purpose subreddit for financial advice.

In contrast to the mental health ecological community, the finance cluster has mostly competitive ties as visualized in Figure 3. The fact that even this cluster, among the most competitive in our data, contains a number of mutualistic ties reflects just how prevalent mutualism is among subreddits with high degrees of user overlap. That said, we detect three reciprocal competitive interactions among the three subreddits that focus on real estate. The edges from r/fatfire to r/commercialrealestate and r/financialindependence are competitive as well. Interestingly, all the interactions between the general finance subreddits (r/financialplanning and r/financialindependence) and r/realestate are mutualistic.

4.2.4. Mixed interactions among timepiece subreddits

Next, we turn to an example of an ecological community with low average ecological interaction but high ecological interaction strength. We first select the 36 large clusters with the average ecological interaction closest to 0. To find an ecological community with a mixture of mutualism and competition, we select from the 15 clusters with the greatest ecological interaction strength from within this group and chose the timepiece cluster containing 7 subreddits about watches.

As shown in Figure 3, the ecological community of timepiece subreddits is dense with ecological interactions (though not as dense as the mental health subreddits). We observe both reciprocated mutualistic interactions, like that between r/rolex and r/gshock, and competitive interactions like that between r/gshock and r/seiko. We also observe numerous unreciprocated competitive and mutualistic relationships like the mutualism between r/watchexchange and r/watchcirclejerk333The suffix is widely understood on Reddit to signify a jokey, meme, or satirical subreddit. and the competition between r/japanesewatches and r/seiko. Though the average ecological interaction among these subreddits is near 0, our analysis reveals a complex ecological community with a mixture of competition and mutualism.

4.2.5. Sparse interactions among Call of Duty subreddits

To find a case where ecological interactions are weak, we return to the group of the 36 large clusters with the average ecological interaction closest to 0 but select from the 15 clusters within this group with the lowest ecological interaction strength. From these we chose the Call of Duty cluster containing five groups about the popular military first person shooter series of video games.

The Call of Duty ecological community is sparse, having only two significant ecological interactions among its 5 member groups. This ecological community includes subreddits about different editions of the series such as r/blackops3, r/infinitewarfar and r/wwii as well as one about a popular spin-off zombie game r/codzombies and the more general r/callofduty subreddit. We find that the growth in r/blackops3 or r/codzombies predicts growth in r/infinitewarfare and no other ecological interactions.

The timepiece and Call of Duty ecological communities illustrate how subreddits with overlapping users can have relatively strong or weak forms of ecological interdependence. Though both clusters are characterized by high degrees of user overlap and low average ecological interaction, the timepiece cluster has a dense competition-mutualism network while the call of duty network is sparse.

4.3. Study C: Predicting Growth

We now compare the environmental approach of population ecology with the relational approach of community ecology. In Study B above, we presented examples of diverse ecological communities among subreddits with overlapping members. However, the presence of this diversity this does not mean that ecological interactions are related to the growth of online groups, the key outcome of previous ecological studies. We therefore hypothesized that ecological interactions will improve the predictive performance of a density dependence model in H2.

4.3.1. Ecological interactions do not improve growth prediction

To test H2, we compare Model 1, our density dependence model having first and second order terms for overlap density, with Model 2, which also includes average subreddit mutualism (§3.4.1) as a predictor. We also examine Model 3, in which the only predictor is average subreddit mutualism. Table 1 shows regression coefficients for our models.

We do not observe a statistically significant association between average subreddit mutualism and growth (). Moreover, a likelihood ratio test comparing Model 1 and Model 2 does not support H2 as Model 2 does not predict subreddit growth better than Model 1 (, ). Comparing Model 2 to Model 3 shows that overlap density explains variation that average subreddit mutualism does not (, ). Overlap density helps explain a group’s future growth, but the overall degree of mutualism or competition a group faces in its ecological community does not.

4.3.2. Forecasting accuracy

The likelihood ratio tests in §4.3.1 are limited because improvements in predictive performance (or lack thereof) may be due to unobserved factors predictive of growth that are correlated with average subreddit mutualism. We hypothesized in H3 that the inter-group dependencies in our VAR models can better forecast the size of subreddits compared to baseline time series models that do not account for ecological interactions. As described in §3.4.3, we test H3 by comparing two forecasting metrics: the root-mean-squared-error (RMSE) and the continuous ranked probability score (CRPS).

VAR models including ecological interactions have forecasting performance superior to the baseline model in terms of both RMSE and CRPS. We evaluate forecast performance for all subreddits which were assigned to clusters for 24 weeks. The RMSE under the baseline model (0.84) is greater than the RMSE of the VAR models (0.75) and the CRPS of the baseline model (72,853) is also the greater than the CRPS of the VAR models (72,669). This reflects a substantive improvement in forecast accuracy robust to the choice of forecasting metric.

Our baseline model contains a constant term and a trend term for each group and therefore accounts for all time-invariant within-group variation. Because overlap density is a subreddit-level variable that does not vary over time, we know that the improvement in forecasting performance comes from modeling ecological interactions in ways not captured by the measure.

5. Threats to Validity

Our work is subject to several important threats to validity that we cannot fully address. First, we study ecological communities on only one platform hosting online groups and our results may not generalize to other platforms or time periods. Additionally, while our community ecology approach assumes that ecological interactions drive dynamics in the size of groups over time and cause groups to grow or decline, drawing causal inference using our method would depend on several untestable assumptions. For example, our ability to infer causal relationships might be limited if groups we do not consider—including groups on other platforms—play a role in an ecological community. Regression estimates in Models 1-3 may be confounded by omitted variables and cannot support causal interpretations. Therefore, we refrain from claiming that the relationships we infer are causal.

The method we propose for identifying ecological interactions between online groups has limitations common to all time series analysis of observational data. Potential omitted variables might also include additional time lags of group size. Although we chose to use VAR(1) models with only 1 time lag, we hope future work can improve upon our approach and model more complex dynamics with additional lags. Like most other time series analysis, vector autoregression assume that error terms are stationary. This is difficult to evaluate empirically and may not be realistic (canova_var_2007). Future work might relax these assumptions using more complex models with time-varying parameters, state space models (box-steffensmeier_time_2014), nonlinear time series models (cenci_regularized_2019; kantz_nonlinear_2003), or stationarity-enforcing priors (heaps_enforcing_2020). Such approaches may require additional contextual knowledge and be difficult to scale to an analysis of hundreds of different ecological communities, but may prove fruitful in future work focusing on ecological dynamics within ecological communities of interest. Such models may also be useful in future work investigating how ecological interactions change over time.

Additional threats to validity stem from our use of algorithmic clustering to identify ecological communities. Organizational ecologists have rarely attempted to estimate the full community matrix for an entire population containing a large number of groups because of data and statistical limitations (e.g. ruef_emergence_2000; sorensen_recruitment-based_2004). For instance, 100 million possible ecological interactions exist within a set of 10,000 communities. Attempting to infer them all raises considerable computational and statistical challenges. We chose to use a clustering analysis to explore the typical ecological communities on a platform. However, clustering algorithms are limited and principled definitions of an ecological community based on qualitative contextual knowledge may be more appropriate for focused studies of particular ecological communities.

While we choose clusters based on high degrees of user overlap, and validate our clustering in terms of the silhouette coefficient and purity criteria, we might have obtained different results if we had clustered in a different way. Additionally, our efforts to obtain clusters with a high silhouette coefficient lead use to remove a large number of subreddits from our analysis. Thus, our results are not representative of Reddit overall, but only of those subreddits that were included in our analysis. Furthermore, clustering algorithms like the one we use may not have unique solutions and different initial conditions might lead to different results. While these algorithms allow us to scale up our analysis, future investigations should also consider qualitative approaches to constructing ecological communities.

6. Discussion

To introduce community ecology and compare it to population ecology, we presented three studies. In Study A, we found support for H1 showing—as predicted by density dependence theory—that overlap density has an -shaped association with subreddit growth. Subreddits with moderate overlap density in our data declined less than subreddits with either very low or very high overlap density. According to population ecology theory, this suggests that high-density environments are competitive and less conducive to growth than medium-density environments.

Surprisingly, this contrasts with our results in Study B, where we studied the diversity of ecological communities using vector autoregression models of group sizes over time to infer networks of ecological interactions. We find ecological communities that are mutualistic or competitive, that mix the two, or that have few significant ecological interactions at all. Overall, however, ecological communities of subreddits are typically mutualistic and mutualistic interactions are stronger on average than competitive ones. Although we find evidence of density dependence, density-dependent competition does not necessarily reflect typical relationships in ecological communities of highly overlapping subreddits.

Our results in Study C show that the size of the other members of an ecological community improve time series forecasts of participation in online groups. However, average subreddit mutualism did not help predict growth. This suggests that population ecology and community ecology offer complementary environmental and relational perspectives. Population ecology’s focus on environmental factors such as niche and resource overlaps is useful for predicting growth, but does not provide a way to study networks of mutualism and competition. Community ecology provides a way to unpack density and provide insights about the specific relationships between groups. While modeling these interactions helps forecast participation levels in groups, the existence of these interactions may be independent of future growth. For example, if mutualistic relationships are common in declining ecological communities, that would explain our result for H2.

The complementary nature of the two ecologies is seen in the coincidence of our findings in Study A and Study B. Indeed, these results can help explain the puzzling set of empirical results about the relationship between resource overlaps and outcomes like growth, decline and survival (wang_impact_2012; zhu_impact_2014; zhu_selecting_2014). Studies of density dependence theory in social computing measure the density of an online group’s niche in terms of its overlaps in participants or topics. Our analysis clearly shows that resource overlaps between two groups might have little to do with whether they are mutualists or competitors. Instead, overlaps may simply reflect the hospitality of an environment to groups with overlapping topics or user-bases. As a result, differing environmental conditions of Wikis and Usenet groups might explain why user overlap was associated with the survival of Wikis (zhu_impact_2014) but with the decline of Usenet groups (wang_impact_2012). Wikia was a young and growing platform during zhu_impact_2014’s (zhu_impact_2014) data collection period when the growth of groups may have been limited by knowledge of how to build a wiki and this knowledge was provided by more overlapping experienced users. Usenet was in decline during wang_impact_2012’s (wang_impact_2012) study period and this may have produced competitive environmental conditions.

The widespread mutualism found in Study B resonates with long-held understandings of ecological interactions in evolutionary theory (kropotkin_mutual_2012). Competition is unlikely to persist because it decreases survival. Because mutualism increases survival, it will be favored by natural selection (armstrong_competitive_1980; axelrod_evolution_1981). Similarly, competition can be avoided if groups adopt specialized roles in their ecological community, a dynamic known as resource partitioning in organizational ecology (carroll_concentration_1985; menge_competition_1972; schoener_resource_1974). Resource partitioning theory suggests that competition among real estate subreddits observed in Figure 3 may be due to a lack of specialization. If specialization does not emerge over time, such groups of competing subreddits may have decreased chances of surviving. By contrast, mental health support groups like those observed in Figure §3 appear to have distinctive purposes or roles. Future work to test such mechanisms in ecological communities of online groups may reveal ways that online groups complement or cooperate with each other.

Within large platforms for online groups, the great number of ecological communities that can be studied should make it possible for future work to apply methods from network science to construct and test generalizable theories about the roles of different types of resources, design features of platforms, and governance institutions in these ecological interactions. Future work should also incorporate community ecology analysis in case studies of important topics such ecological communities engaged in peer production, political mobilization, misinformation, or mental health support.

Although we focused on online groups within a single platform, groups may use multiple platforms with distinctive affordances for different purposes (fiesler_moving_2020; kiene_technological_2019). Since the VAR method relies only on time series data to infer ecological interactions, it can be applied to study ecological communities spanning social media platforms. Community ecology can thus provide a bridge between quantitative studies of participation in online groups and theories of interconnected information ecologies (nardi_information_1999). While we focus on relationships between groups sharing a platform, one can apply our concepts and methods to understand how interdependent systems of technologies and users give rise to higher levels of social organization on social media platforms (astley_two_1985; aldrich_organizations_2006).

6.1. Implications for Design

In the final chapter of their book on Building Successful Online Communities, kraut_building_2012 advise managers of online groups to select an effective niche and beware of competition. But these recommendations are based on little direct evidence from studies of online groups and offer almost no concrete advice on what a designer or group should do differently based on either piece of advice. Although further research into ecological interactions is needed before design principles can be derived, we provide a framework for online group managers to think about ecological constraints on group size. While intuition suggests that online group managers might seek out mutualistic relationships and avoid competitive ones, it is often not obvious whether another group with overlapping users is a competitor or mutualist. Our method provides a way for group managers to know.

Competitors have a negative impact on growth, but ecological theory suggests that specialization is an adaptive strategy in response to competition (aldrich_organizations_2006; carroll_concentration_1985; kraut_building_2012; powell_network_2005). Using our method, group managers might identify competitors limiting the growth of their groups. With knowledge from this analysis in hand, they might be able to escape a competitive dynamic by specializing. While competitive relationships are defined by how they decrease the size of groups, competition can also be important to the health of the broader ecological community. Exit to an alternative group can be an avenue for political change in response to grievances and poor governance (hirschman_exit_1970; frey_emergence_2019). The threat of competition with other groups may make expressions of voice more persuasive to moderators or platforms (hirschman_exit_1970).

Groups looking to increase activity should desire to seek out mutualistic relationships and we believe that designers of online platforms can help them do so. Features such as meta-groups, group search, recommendation engines, and practices like linking related groups may lower barriers between groups and support mutualism. However, it is not obvious to what extent particular features will support competition, mutualism, or both. Using our method, managers and designers can test features intended to support mutualism.

7. Conclusion

While explanations for the rise or decline of online groups often look to internal mechanisms, understanding the role of interdependence between online groups is increasingly important. While prior research has investigated competition and mutualism among online groups with overlapping users and topics using the population ecology framework (wang_impact_2012; zhu_impact_2014; zhu_selecting_2014), this approach does not provide a way to infer competitive or mutualistic interactions among related groups. We introduce the community ecology framework as a complementary perspective to population ecology. By inferring competition-mutualism networks directly from time-series data, our community ecology approach helps resolve empirical tensions raised by prior ecological work in social computing and reveal that most interactions within clusters of subreddits with highly overlapping users are mutualistic. Our methods provide a foundation for future work investigating related online groups.

References