Computational landscape of user behavior on social media

01/25/2019 ∙ by David Darmon, et al. ∙ 0

With the increasing abundance of 'digital footprints' left by human interactions in online environments, e.g., social media and app use, the ability to model complex human behavior has become increasingly possible. Many approaches have been proposed, however, most previous model frameworks are fairly restrictive. We introduce a new social modeling approach that enables the creation of models directly from data with minimal a priori restrictions on the model class. In particular, we infer the minimally complex, maximally predictive representation of an individual's behavior when viewed in isolation and as driven by a social input. We then apply this framework to a heterogeneous catalog of human behavior collected from fifteen thousand users on the microblogging platform Twitter. The models allow us to describe how a user processes their past behavior and their social inputs. Despite the diversity of observed user behavior, most models inferred fall into a small subclass of all possible finite-state processes. Thus, our work demonstrates that user behavior, while quite complex, belies simple underlying computational structures.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The current decade has been marked by an increasing availability of high-resolution, heterogeneous data sets capturing human behavior in both real-world and digital environments Eubank et al. (2004); Compton et al. (2014); Kramer et al. (2014); Toole et al. (2015). This has made possible, for the first time, large scale investigations into human behavior across diverse groups of individuals. Of such phenomena, human communication patterns are one of the most well-studied. Such studies have included written correspondences Oliveira and Barabási (2005), email correspondences Malmgren et al. (2008, 2009), and call/SMS records Jiang et al. (2013); Wu et al. (2010). The characteristics of these behavioral patterns include heavy tails, seasonality, and burstiness. This is certainly still an active field of research, and many authors have called into question whether the observed patterns are truly universal characteristics of human behavior or epiphenomena of the methods used in data collection and analysis Goh and Barabási (2008); Kivelä and Porter (2015); Ross and Jones (2015).

The standard model for human communication patterns treats the observed behavior as a realization from some sort of point process. Typically, the point process is taken to be a renewal process, where the observed behavior is completely specified by a distribution over the times between activity. To account for the complex properties of human behavior enumerated above, the interevent distribution is specified to have a heavy right tail, which naturally gives rise to burstiness. The authors of Malmgren et al. (2008, 2009) develop a refinement of this model which incorporates seasonality by allowing an individual to pass between passive and active states, where the behavior within the active state is governed by a Poisson process. Further refinements of this model allow the activity during the active periods to follow non-Poissonian dynamics Ross and Jones (2015).

We undertake an analysis of human communication that does not a priori assume a renewal or renewal-like model of the observed behavior. Motivated by the field of computational mechanics Shalizi and Crutchfield (2001), we define our models explicitly in terms of a predictive representation of the observed behavior. Unlike renewal process models, we do not assume the behavior of individuals only depends on the time between actions. We seek to understand the behavior locally in time, where locality is defined around periods of activity. Moreover, we explicitly incorporate the interactive aspect of online social media services, something missing from much of the work on modeling human interevent distributions, with Raghavan et al. (2013) as a notable exception.

In Johnson et al. (2012), the authors set out to elucidate the structural properties of stochastic processes using tools from computational mechanics. To do so, they restricted their investigation to the subset of stochastic processes that are finitary, that is, those stochastic processes that have a representation with a finite number of “causal” states Wiesner and Crutchfield (2008), as defined in Section II.3. In this work, instead of elucidating all possible finitary models, we approach the problem from the opposite direction: we seek to trace the computational landscape of human behavior in digital environments by discovering the finitary models present in user behavior, and then investigate their computational structure.

We consider four models for user behavior on social media. Figure 1 provides a schematic representation of these models. The most general model (a) assumes that a user’s future behavior is influenced by both their past behavior and the past behavior of their social network, which we call the self+socially-driven model. Models (b) and (c) are two restrictions of this model, the former where we assume that user’s future behavior is only influenced by their past behavior, and the latter where we assume that the user’s future behavior is only influenced by the past behavior of their social network. Finally, model (d) corresponds to the case where the user’s behavior is entirely explained by time-of-day and day-of-week (i.e. seasonality).

Figure 1: A schematic representation of the classes of models that we consider in this paper. (a) The most general case, where the user’s observed behavior is influenced by their social inputs and their own past behavior. (b) The self-driven case, where the user’s behavior only depends on their past behavior. (c) The socially-driven case, where the user’s behavior only depends on their social inputs. (d) The seasonally-driven case, where the user’s behavior can largely be attributed to time.

In the rest of the paper, we proceed as follows. In Section II we motivate and develop the four models just presented, and propose methods for inferring them. In Section III.1 we explore the descriptive performance of these models on a real world data set derived from 15K users on the microblogging platform Twitter. In Section III.2, we investigate the structure of the models present amongst the users in our data set, and discuss the implications of these models. Finally, we conclude with the implications of our present work for the study of human communication patterns.

Ii Methodology

ii.1 User Behavior as a Discrete-Time Point Process

Consider the behavior of a user on a social media service. At any given time instant, a user either posts to the social media service or not. Thus, the user’s behavior may be modeled as a point process, where events correspond to posts. A naive model of the user’s behavior might assume that they are equally likely to use the service during any time instant. Under this model, the time between uses is exponentially distributed, and their activity pattern would correspond to a realization from a Poisson process. However, human communication patterns are known to be exhibit non-trivial complexities not accounted for by this model 

Goh and Barabási (2008); Rybski et al. (2012), and thus more flexible models are required. In the following sections, we present three models that capture the observed complexity of human behavior in very different ways: a seasonally-driven model where the user’s behavior is accounted for by time-of-day; a self-driven model where a user’s behavior results from self-feedback; and a socially-driven model where a user’s behavior results from both social- and self-feedback.

In practice, the information about human behavior on digital services is reported in seconds. Because we are interested in human-scale interactions between a user, their inputs, and the social media service, this time resolution is too fine grained. We begin by discretizing time into intervals of length . We then ask if, during an interval , the user was active. We denote this value for a user by and define

(1)

The choice of specifies the time scale of interest. For example, if we take , then the process captures the weekly patterns of behavior that the user exhibits. If instead we take , then captures the daily patterns of the user. In this paper, we will take , because we are interested in the short-timescale behavior of user behavior and user-user interaction. However, it is important to note that there is no single ‘correct’ resolution when considering the behavior of a point process, and a multi-timescale analysis may be appropriate Crutchfield et al. (2015). Moreover, different time resolutions may be more or less appropriate for different users. Figure 2a demonstrates the activity patterns of three users at the 10 minute resolution, represented as a rastergram. Each row of the rastergram corresponds to a single day of activity, and each column of the rastergram corresponds to a ten minute window within a single day. A point occurs in the rastergram when for that day and time.

In a social media setting, a user has access to information provided by other users on the service. For example, a user might passively examine the messages generated by other users they follow, observe a particular form of communication directed at them, or actively investigate a keyword or topic. Generically, we will denote the inputs to a user as . We will assume that the inputs to the user can be mapped to a finite alphabet . As an example, if we consider to correspond to whether or not the user receives a mention during the time interval , then we take , where corresponds to no mention during that time interval, and corresponds to one or more mentions.

Our goal in this paper is the develop several contrasting models of a user’s observed behavior

. We take a predictive view of modeling, where we seek to infer the probability that the user engages with the social media service, given their past history of engagement and the past history of their inputs. Let

be the past behavior of user , and let be the future behavior of the user. Similarly, let and be the past and future values of the user’s inputs, considering as the present. Then we are interested in determining

(2)

the distribution over user ’s behavior starting from time , given their own past behavior and the past behavior of their inputs. For ease of presentation, in the following sections, we drop the dependence on in the notation, but emphasize that for each user , we assume a unique model for that user’s behavior.

ii.2 Seasonally-Driven Model: Inhomogenous Bernoulli

Thus far, we have specified our model of human behavior in terms of a discrete-time point process: the observed behavior of the user is either active or quiescent during any given interval of time. One of the simplest models that can capture some of the complexity of human behavior is a renewal process Esteban et al. (2012); Doerr et al. (2013). From this perspective, the activity of the user is taken to occur at random times, with the time between occurrences governed by a distribution over the interarrival times. For example, if we take the interarrival distribution to be geometric with parameter , then the renewal process is a Bernoulli process, the discrete-time analog of a Poisson process. Typically, the interarrival distribution is taken to have a long tail, to capture the fact that human behavior tends to be bursty, with long periods of quiescence punctuated by periods of high activity. See Figure 2 for examples of users who exhibit such behavior. Popular distributions for the interarrival times include log-normal, power law, and stretched exponential distributions Goh and Barabási (2008).

Due to the inherent seasonality in human behavior, time-homogeneous renewal process-based models are almost certainly misspecified. For example, a typical user on Twitter will be more likely to be active during the daylight hours in their geographic area than during the nighttime hours. This fact may explain the long tails typically observed in studies of the activity patterns of humans Jo et al. (2012). Moreover, we see such daily and weekly seasonality patterns in the aggregate behavior of users on Twitter. Because of this, we consider a time-inhomogeneous point process model for a user’s observed activity, where the probability a user is active during any time interval is independent of their previous activity and the activity of their inputs, and varies smoothly with time,

(3)

Moreover, we assume that is periodic, , with chosen such that for a coarsening interval , week. We take (3) as our instantiation of the seasonally-driven model from Figure 1d.

We estimate the individual seasonality

for each user via a Generalized Additive Model (GAM) Hastie and Tibshirani (1990)

(4)

where

(5)

using the mgcv package in R. Figure 2 demonstrates the observed behavior of several users, along with their estimated activity probabilities .

Figure 2: (a) Rastergram representation of the activity of three users on Twitter over a 44 week period. (b) The expected activity of the same three users. Each panel corresponds to the expected activity by day-of-week, from Monday to Sunday. (c) The expected activity from (b), laid out in the same format as the rastergram. Note that the color scale for each panel is taken from 0 to for each user to make the seasonality in the activity patterns more obvious.

ii.3 Self-driven Model: The -machine

The previous model assumes that the user’s activity during any time interval is independent of their activity during other time intervals, and accounts for the seasonality and bursting observed in a user by allowing the probability of their activity to vary according to time-of-day and day-of-week. Alternatively, a user might exhibit burstiness due to self-excitation. As an example, the user might be isolated from the devices they use to interact with the social media service, which would lead to a period of quiescence. Then, upon regaining access to their devices, they might use the service, which could lead to a self-excitation to continue using the service.

This sort of behavior motivates an autoregressive model for the user’s behavior, where their behavior in the future is determined by their past behavior. That is, the probability that they behave a certain way in the future starting at time

is determined by how they behaved up until time and does not depend on their inputs,

(6)

This model assumes that the users behavior is conditionally stationary Caires and Ferreira (2005). For human behavior, this assumption may not hold in general, and thus care must be taken in applying this model with actual data. We address this in Section II.5, where we specify our procedure for day-casting user behavior. Previous work has found this model to perform well with many users on Twitter Ver Steeg and Galstyan (2012); Raghavan et al. (2013); Darmon et al. (2013).

We take the self-driven model from Figure 1b to correspond to a stochastic process governed by (6), which we develop using computational mechanics Shalizi and Crutchfield (2001). Computational mechanics provides the unique, minimally complex, maximally predictive representation of a discrete state, discrete time stochastic process over the alphabet . The insight of computational mechanics is that when considering the predictive distribution (6), it is typically more useful to consider a statistic of the past rather than the entire past itself. It can be shown that the unique minimal sufficient predictive statistic of the past for the future of a conditionally stationary stochastic process is the equivalence class over predictive distributions. That is, two pasts and are considered equivalent if and only if . The equivalence relation induces a set of equivalence classes called the causal states of the process. The causal states, the allowed transitions between them, and the probability associated with the transitions is called the -machine for the process . For a stochastic process with a finite number of predictive equivalence classes, the -machine may be represented as deterministic finite automata, where the states of the automata correspond to the causal states, and the transitions between states are determined by the outputs . A demonstration of a portion of such a representation is given in Figure 2(a).

(a) -machine
(b) -transducer
Figure 3: Transitions between (a) -machine and (b) -transducer causal states. Each transition is labeled by the (a) marginal and (b) joint emission symbol, as well as the transition probability .

ii.4 Socially-driven Models: The -transducer

The previous two models assume that either the user is driven seasonally by time-of-day type influences, or that the user is self-driven. On a social media service, we expect a user to interact with other users, and therefore we would expect the user’s behavior to be associated with the behavior of those users. For example, a user might become more likely to tweet if they have recently been mentioned by another user. Such social associations are captured by the social inputs of a user . In particular, we will focus on the mention history of the user,

(7)

That is, corresponds to whether or not the user received any mentions during the time window of length indexed by .

We take the modeling perspective where the user acts as a transducer, mapping their own past behavior and the past behavior of their social inputs into their future behavior. More explicitly, as with the self-driven example, we seek the minimally complex, maximally predictive model for the user’s behavior. Again, computational mechanics provides such a model via the -transducer Crutchfield (1994a); Shalizi (2001). The theory for the -transducer has been recently developed in Barnett and Crutchfield (2015). The main insight is the same as for the -machine: we define an equivalence relation over joint input-output pasts such that two pasts are equivalent if they induce the same predictive distribution over the future output. As with the -machine, this equivalence relation induces a partition of the joint input-output past into channel causal states. Transitions between channel causal states occur on joint input-output pairs, and thus an -transducer with finitely many channel causal states, like an -machine, can be represented as a deterministic automata. The representation is given in Figure 2(b).

For the socially-driven model, we distinguish between the self-memoryful -transducer and self-memoryless cases, corresponding to the models from Figure 1a and Figure 1b, respectively. For the self-memoryful -transducer, we construct the equivalence classes using the joint input-output pasts, while for the self-memoryless -transducer, we construct the equivalence classes using only the output pasts. Thus, the self-memoryless -transducer assumes that a user’s behavior is purely driven by their social input. We consider both cases under the heading of socially-driven models.

ii.5 Data Collection and Pre-processing

The activity of the 15K users was collected over a 49 week period, from 6 June 2014 to 15 May 2015. After data cleaning to account for outages in the data collection, 44 weeks of data were generated. We did not include the quiescent users in our analysis. As described above, the self-driven and socially-driven models assume that a user’s behavior can be modeled as a conditionally stationary stochastic process, where the distribution over futures is independent of the time index conditional on the observed past of the user or the observed joint input / output past, respectively. In order to make this assumption approximately true, we ‘daycast’ the time series associated with each user as follows. For a user, we determine their time zone as recorded by Twitter, and window their activity to be between 9 AM and 10 PM during their local time. We take this time window to capture the waking hours of a typical individual.

For this study, we split the 44 weeks of data into 28 weeks of training data and 16 weeks of testing data. The training data is used to select and infer the models, as we describe in the next section. The testing data is used for the comparison of these models in terms of their predictive and descriptive performance. This train/test split is performed to ensure that we obtain unbiased estimates of how the models perform for each user.

ii.6 Model Inference and Selection

For the seasonally-driven model, the only model parameter associated with each user is the smoothing parameter for the splines used to estimate the non-parametric term in (4). This parameter is chosen using generalized cross validation Hastie et al. (2009) on the 28 weeks of training data.

For -machine reconstruction, we use the Causal State Splitting Reconstruction (CSSR) algorithm Shalizi and Klinkner (2004) to infer the models from data. For -transducer reconstruction, we use the Transducer Causal State Splitting Reconstruction (transCSSR) algorithm, described in Appendix A. Both CSSR and transCSSR require the specification of a tuning parameter that controls the probability of splitting histories from a state when no such split should occur, and , the maximum history length used in determining the candidate causal / transducer states. We fix at 0.001. The maximum history length directly balances between the flexibility of the model and the precision with which the probabilities may be estimated. As an example, suppose a maximum history length is sufficient to resolve the causal states. In the extreme case that each history of length specifies a unique predictive distribution (an order Markov model), then the model would result in causal states. However, as we increase , we also necessarily decrease the number of examples of each history used to estimate the predictive distribution. This can result in spurious splitting of histories.

We use -fold cross-validation Hastie et al. (2009) to choose the appropriate for each user. In particular, for each user, we randomly partition the 196 days in the training set into folds. For a held out fold whose time indices are , define the empirical total variation (ETV) distance between their observed behavior during that fold and the model inferred using the remaining folds as

(8)

where is the Kronecker delta and is the probability of observing outcome at time using the model inferred with all of the data except that from th fold. In the binary case, (8) reduces to

(9)

Thus, we see that (8) quantifies the model performance by comparing the actual outcome for the user to the estimated probability of that outcome using the model. We can then compute the average of the empirical total variation over the held out sets,

(10)

and choose to minimize this value. We perform this optimization using from 1 to 6, which for corresponds to a time span between ten minutes and an hour.

Iii Results

iii.1 Descriptive Performance Across the Model Classes

We begin by examining the ability of the four models developed in Sections II.2II.4 to describe a given user’s behavior. To do so, we compute the ETV, as defined by (8

), between the held out test data and the cross-validated models of each type. This provides us with a measure of how the models generalize to unseen behavior, and thus an indication of how well the models describe a user’s behavior. Because the ETV for a given user depends on their overall activity level, we standardize the ETV for a model

by the ETV for the seasonality model, giving us a score function

(11)

Recalling that a smaller ETV value indicates a smaller distance between the observed behavior and the model predictions, we see that will be greater than 1 when model outperforms the seasonal model, and smaller than 1 otherwise.

The scores across all users for all models are shown in Figure 4. The diagonal shows the densities of scores across the users for each model type. The self- and socially-driven models generally perform better than the seasonal model, with all of the score densities having a heavy tail right tail. We see that the self-memoryful -transducer performs best, with a score greater than 1 for 82.1% of the users. The self-memoryless -transducer is next best, with a score greater than 1 for 79.4% of the users. The -machine has a score greater than 1 for 72% of the users.

Figure 4: The descriptive performance of the self-driven and socially-driven models for each user on the test set data relative to the seasonally-driven model using the ETV-based score defined by (11). The diagonal entries show the density of scores for the -machine, self-memoryless -transducer, and self-memoryful -transducer across the users. The off-diagonal entries compare the scores between the non-seasonal models. A score greater than 1 indicates that the referenced model outperformed the seasonally-driven model.

We further summarize the pairwise comparisons between the non-seasonal models in Table 1. As expected, the self-memoryful -transducer outperforms both the -machine and the self-memoryless -transducer on most users, as indicated by the mass of users above the identity line in the first quadrant in the bottom row of Figure 4. However, the users are much more equally split between those where the -machine outperforms the self-memoryless -transducer (45.2%) and vice versa (54.8%).

\  M T-ML T-MF
M 0.452 0.244
T-ML 0.548 0.336
T-MF 0.756 0.618
Table 1: Pairwise comparison between the -machine (M), self-memoryless -transducer (T-ML), and self-memoryful -transducer (T-MF) across the users. Each entry in the table gives the proportion of users with .

iii.2 -machine Causal Architectures

We next explore the typical -machine architectures across the users. The number of causal states for an -machine gives a rough indication of the complexity of the user’s behavior since each causal state indicates a further refinement of the past necessary for predictive sufficiency. In fact, the logarithm of the number of states is called the topological complexity of the -machine Crutchfield (1994b). We find that most users are best described by models with a small number of states, with 95% of users having 13 or fewer causal states, and the largest -machine having 58 states. Note that the maximum number of possible causal states when is .

Renewal Process. We next consider the general types of stochastic processes captured by many of the -machines. We find that a large proportion of the users have -machines which correspond to a generalization of a discrete-time renewal process. Recall that a discrete-time renewal process is a point process such that the lengths of periods of quiescence (runs of 0s between successive 1s) are independent and distributed according to an inter-arrival distribution  Marzen and Crutchfield (2015). Equivalently, discrete-time renewal processes can be defined in terms of the survival function . Because discrete-time renewal processes are a special case of the more general processes described by (6), their -machine architecture takes on a very particular form Marzen and Crutchfield (2015). The -machine for a discrete-time renewal process has a unique start state transitioned to after a period of activity, and transitions after a period of quiescence traverse a chain of states that counts the number of time points since an activity period occurred. We reproduce the generic architecture found amongst the renewal process -machines in Figure 5 (left). This is a special finite state case of the more general architecture for a discrete-time renewal process. In the nomenclature introduced in Marzen and Crutchfield (2015), this is an eventually -Poisson process with characteristic parameters , where refers to the number of quiescent time steps necessary for the -machine to behave as a Poisson (Bernoulli) process, and the

refers to the smallest resolution at which the inter-event times may be coarse-grained and remain geometrically distributed. Such a process has an inter-event distribution

(12)

where specify the initial values of the inter-event distribution and . We note that using CSSR with finite necessarily results in the reconstruction of finite state -machines, and thus for an eventually -Poisson processes with , the inferred

-machine will be an approximation to the longer memory process. In fact, this motivates a particular family of parametric models with parameters

and which specify the initial inter-event behavior. We emphasize that this particular family of parametric models was not assumed, but rather discovered via the use of CSSR.

Reverse Renewal Process. A renewal process is specified by a distribution over run lengths of quiescence. For such a process, the distribution over run lengths of activity follows a geometric distribution. One could also define a process where these roles are reversed: the distribution over run lengths of activity takes an arbitrary form, and the distribution over run lengths of quiescence follows a geometric distribution. We call such a process a reverse renewal process, since the roles of quiescence and activity are reversed. The -machine for a reverse renewal process is given in Figure 5 (right). In analogy to the eventually -Poisson process, we call this process a reverse eventually -Poisson process, which has the inter-quiescence distribution given by

(13)

where specify the initial values of the inter-quiescence distribution and .

Figure 5: The -machine representations of a renewal process of the eventually -Poisson type with characteristic (left) and of the reverse eventually -Poisson type with characteristic (right). 67.9% of users have -machines of the renewal-type, while only 0.59% have -machines of the reverse renewal-type.

Alternating Renewal Process. More generally, we can define a class of processes such that the distributions over run lengths of activity and quiescence are allowed to deviate from the geometric distribution. Such processes are known as alternating renewal processes. An alternating renewal process switches between periods of quiescence and activity with probabilities governed by the quiescence length and activity length distributions. The -machine for an alternating renewal process with eventually geometric distributions for both inter-arrival and inter-quiescence is given in Figure 6. We call such a process an alternating eventually -Poisson process. Again, this class of processes offers another parametric model for user behavior, with parameters and .

Figure 6: The -machine representation of an alternating renewal process of the eventually -Poisson type with characteristic . 8.7% of users have -machines of the alternating renewal type.

Because of the stereotyped architecture of the -machines for renewal, reverse renewal, and alternating renewal processes, we can easily identify those users whose -machines have these architectures. An -machine represents an alternating renewal process if and only if there is precisely one state transitioned to on an from a state transitioned to on an , . For example, the state transitioned to on a 1 from states transitioned to on a 0 represents the start of a run of 0s. The -machines for renewal / reverse renewal processes have this property, in addition to only having a single state transitioned to on a 1 / 0. Thus, renewal and reverse renewal processes are a subset of alternating renewal processes. Using these rules, we can identify which users’ models correspond to renewal, reverse renewal, or alternating renewal processes. We find that 1881 (13.1%) of the -machines correspond to (homogeneous) Bernoulli processes, 5408 (37.7%) correspond to two-state renewal / reverse renewal processes, 2713 (18.9%) correspond to pure renewal processes with three or more states, 85 (0.59%) correspond to pure reverse renewal processes with three or more states, and 1250 (8.7%) correspond to alternating renewal processes with four or more states.

As we have seen, by definition the non-alternating renewal users must be such that their -machine has one or more states transitioned to on an from a state transitioned to on an . In practical terms, this means that for these users, knowledge of the time since a user switched from a period of activity / quiescence to a period of quiescence / activity is not sufficient to resolve a causal state. However, in many cases it is sufficient to know the behavior of the user immediately prior to a switch from quiescence to activity or vice versa. For example, a user may behave differently when they have switched from active to passive after just being active compared to after just being passive. These cases correspond to generalizations of the alternating renewal process to higher orders. Table 2 summarizes the number of models that correspond to an alternating renewal model of a certain order. For example, a zeroth order alternating renewal model corresponds to a Bernoulli process, a first order alternating renewal process corresponds to the model architecture in Figure 6, etc. We see that many of the models resolve to alternating renewal models of higher orders. In total, 95% of the users have an -machine in agreement with an alternating renewal model of order 6 or smaller.

Alternating Renewal Order # of Ms # of Ts
0 1881 (13.1%) 648 (5.1%)
1 9546 (66.7%) 8249 (65.3%)
2 611 (4.3%) 400 (3.2%)
3 493 (3.4%) 309 (2.4%)
4 530 (3.7%) 243 (1.9%)
5 518 (3.6%) 220 (0.02%)
6 134 (1.0%) 147 (0.01%)
Table 2: The number (proportion) of users with -machines and -transducers of a given alternating renewal order. A 0th order alternating renewal process corresponds to a Bernoulli process, a 1st order alternating renewal process corresponds to a standard renewal process, etc.

iii.3 -transducer Causal Architectures

Thus far, we have considered the models associated with user behavior when we ignore their inputs. Next we turn to the models that incorporate those inputs, namely the -transducer. Recall that the input we consider is whether a user was mentioned during time interval . As with the -machines for the self-driven model, we find that most users are well-described by -transducers with a small number of states, with 95% having 6 states or fewer and 90% having 25 or fewer states in the self-memoryless and self-memoryful cases, respectively.

Renewal-like Self-Memoryless Transducer. For the self-memoryless case, 12059 of the 12641 mentioned users (95%) have an -transducer with an architecture analogous to Figure 5a. That is, the user has a ‘just-mentioned’ state, and subsequent time steps without the user receiving a mention lead to transitions away from this state, until a terminal state is reached. The causal states therefore map to the time since the user was mentioned, with all times of length or longer mapped to the same state. Thus, when viewed as purely socially-driven, the relevant quantity to track for almost all of the users is the time since they were last mentioned.

Renewal-like Self-Memoryful Transducer. A similar overarching ‘counting’ model architecture is also present amongst the memoryful -transducers. Recalling that another way to view the states of an alternating renewal process is as counting the length of runs of since the last , we can generalize this to the -transducer by considering states that count the lengths of runs of input-output symbols since the last input-output symbol . As in the memoryless -transducer case, we call this an alternating renewal-like process, since the causal states act in a similar fashion. We present a schematic representation of the partitioning of the channel causal state space in Figure 7. For these -transducers, the causal states can be partitioned based on the runs of they count since the last . Thus, we begin by dividing the set of causal states into four quadrants, based on the runs of which they count. All states in a quadrant labeled by are transitioned to on . Then, the causal states within a quadrant are further partitioned into thirds, where each third corresponds to the symbol seen before the current run of . Thus, each third has a unique start state that is transitioned to on a from a state transitioned to on a . 10216 of 12641 (81%) of the mentioned users have an -transducer in this alternating renewal-like class. Note that the partitioning given in Figure 7 is the most general possible for this type of -transducer. The quadrants / thirds within a quadrant may further collapse, as dictated by the structure of the -transducer. For example, Figure 8 is an alternating renewal-like transducer inferred for 27% of the mentioned users. This -transducer has three states, which correspond to runs of , , and and are labeled as such. In this case, the quadrants corresponding to and collapse, since the corresponding state counts runs of regardless of the user behavior . Moreover, all thirds within a given quadrant also collapse, since the states treat runs of as the same from any . In terms of the actual behavior of the user, we see that the state labeled corresponds to when the user has been both quiescent and unmentioned in the recent past. In this case, the user has probability of being active given this state. The state labeled corresponds to when the user has been active, but not mentioned, in the recent past. In this case, the user has probability of being active given this state. Finally, the state labeled corresponds to the case where the user has been mentioned in the recent past, regardless of whether or not the user has been active. The user has probability of being active given this state. Thus, for over a quarter of the users, we see that knowledge of the recent past of both their own and their inputs behaviors provides sufficient information for predicting their future behavior. In particular, each of the quadrants requires only a single state, whereas in the most general model of this type with allows for states per quadrant.

As with the renewal, reverse renewal, and alternating renewal processes inferred from the -machines, this alternating renewal-like -transducer motivates a particular parametric model, albeit a much more complicated one. In this case, we need to specify the chain lengths within each third of a quadrant . However, this results in at most states overall, compared to states in the most general model, and therefore a linear growth in model complexity as a function of history length compared to a geometric growth.

Again, as with the alternating renewal process, the alternating renewal-like -transducer generalizes to higher orders by considering the input-output behavior immediately prior to a switch from to . For example, a second order alternating renewal-like -transducer would distinguish between a user becoming quiescent and unmentioned after being mentioned twice in the past compared to going unmentioned before the previous mention. Many of the users exhibit -transducers of higher order as shown in Table 2. Of the 12641 mentioned users, 78% are alternating renewal-like of order 6 or smaller.

Figure 7: A schematic demonstrating the partitioning of the transducer state space associated with a renewal-like -transducer. Each quadrant is determined by the input-output symbol pair being ‘counted,’ and each third within a quadrant is determined by the input-output pair the count begins from. We only show outgoing transitions for the first third of the first quadrant, which correspond to transitions of , or after observing . 70.4% of users have -transducers of this type.
Figure 8: The most common self-memoryful -transducer architecture associated with 3376 of the 12641 mentioned users (27%).

Iv Conclusions

In this paper, we have developed and applied a modeling framework for human behavior in digital environments. The approach begins by viewing a user’s behavior as a discrete-time point process at a prespecified temporal resolution, and then considers four possible stochastic models that might give rise to the user’s behavior, namely the seasonal, self-driven, socially-driven, and self- and socially-driven processes which we estimate using an inhomogeneous Bernoulli process, an -machine, or self-memoryless/memoryful -transducers.

We have found that simple computational architectures, as specified by their -machines and -transducers, describe much of the observed behavior of the users in our data set. A renewal process model, or its generalizations to reverse renewal and alternating renewal processes, was found to be appropriate for approximately 80% of the users in our study. This is in agreement with much of the literature on human communication patterns. However, we emphasize that we did not assume such models a priori, but rather discovered

their prevalence by using non-parametric modeling in an exploratory fashion. In fact, the appearance of reverse renewal and alternating renewal processes demonstrate that renewal process models alone are not sufficient to describe, for example, the burstiness observed in human communication patterns. Moreover, we discovered a new class of renewal-like models that generalize renewal processes to input-output systems. We found that this class of models describes over 70% of the users in terms of the interaction between their activity and their social inputs. The prevalence of these stereotyped

-machines/transducers motivates the use of either frequentist (such as the cross-validation approach used in this paper) or Bayesian (as recently developed in Strelioff and Crutchfield (2014)) approaches that take advantage of these structures a priori during the estimation process. In addition to the generalized alternating renewal models, more general models were necessary for over 20% of the users in the self-driven case and nearly 30% of the users in the self- and socially-driven case.

One short-coming of our work is the discretization of the behavior of all users at the time resolution of 10 minutes in order to analyze their behavior through the lens of discrete-time computational mechanics. The computational mechanics of continuous-time, discrete event processes was recently developed in Marzen and Crutchfield (2017a, b). While at present machine reconstruction algorithms for inferring -machines from such processes do not exist, this work and others motivate their development. A reanalysis of this data set from this viewpoint may reveal additional details hidden by the discretization.

The apparent complexity of observed user behavior on Twitter seems to arise from a simple computational landscape. Our present work lays out an initial sketch of its features. We hope this work motivates further exploration of the computational landscape of user behavior on Twitter and other communication platforms, and refinement of their maps.

Appendix A The transCSSR Algorithm for -transducer Reconstruction

A diverse collection of algorithms have been developed to infer -machines from data, from the topological methods first presented in Crutchfield and Young (1989) to more recent methods based on Bayesian methods Strelioff and Crutchfield (2014). Additional algorithms have been developed based on spectral methods Varn et al. (2013) and integer programming Paulson and Griffin (2014). We focus on the Causal State Splitting Reconstruction (CSSR) algorithm Shalizi and Klinkner (2004).

The theory formalizing -transducers has only recently been developed, and the literature on -transducer reconstruction from finite data is sparse. Sketches of CSSR-like algorithms for -transducer reconstruction are provided in Shalizi and Crutchfield (2001); Haslinger et al. (2010). In this appendix, we develop the ideas originally suggested in these prior works, and present a generalization of CSSR for -transducer reconstruction from data resulting from input-output systems. In homage to CSSR, we call our algorithm transCSSR, a portmanteau of transducer and CSSR. The transCSSR algorithm has been implemented in Python, and is available on GitHub Darmon (2015).

We sketch the transCSSR algorithm here, and give the pseudocode in Figure 9. The first phase of the algorithm groups input-output histories into a set of weakly prescient states. It begins by assuming that all input-output histories induce the same one-step-ahead predictive distribution. This is equivalent to assuming the transducer output future is independent of the input-output past, or to grouping together all joint histories into a candidate causal state represented by the joint input-output suffix , where is the null symbol. At each successive step, each candidate causal state

is tested for weak prescience by growing its histories into the past by one input-output pair. If the state is weakly prescient, then the predictive distribution for the new history should be equivalent to that of its parent causal state. This condition is tested using the null hypothesis

(14)

with a significance test of size . If the null hypothesis is rejected, the predictive distribution of the history is compared against all of the remaining candidate causal states using the restricted alternative hypothesis

(15)

Finally, if the history’s predictive distribution does not agree with any of the candidate causal states, it is split into a new candidate casual state. Such potential splitting is performed until the input-output symbols under consideration are each of length . Any hypothesis test for comparing discrete distributions may be used for (14) and (15). We use the test based on the -statistic Harremoës and Tusnády (2012).

At the end of the this stage, each candidate causal state consists of histories that are within-state equivalent and between-state distinct in terms of their predictive distributions, and each candidate causal state is weakly prescient. The causal states have these properties, in addition to being deterministic / unifilar on transitions between states on input-output pairs. To ensure determinism / unfilarity, the successor state for each history of length on an input-output pair is determined. If two or more histories in the same state transition to different states on the input-output pair , that state is split, and the determinization step is repeated. This procedure is repeats until all transitions are deterministic. Since there are finitely many histories, this procedure always terminates, in the extreme case with each history having its own causal state. Because this stage only ever splits histories from states, and the states before this stage were weakly prescient and within-state equivalent and between-state distinct, the resulting states are as well. Therefore, the procedure results in a set of states that are weakly prescient and deterministic, and thus causal.

  i@. Initialization: ,
  ii@. Homogenization:
  while  do
           for each  do
                    estimate
                    for each  do
                             for each  do
                                      estimate
                                      Test
                             end for
                    end for
           end for
           
  end while
  iii@. Determinization:
  Remove transient states from
  recursive False
  while Not recursive do
           recursive True
           for each  do
                    for each  do
                              first
                             
                             for each  do
                                      if   then
                                               create new state
                                               
                                               for each such that  do
                                                        Move
                                               end for
                                               recursive False
                                      end if
                             end for
                    end for
           end for
  end while
  
  Test
  if null hypothesis (14) passes a test of size  then
           
  else if restricted alternative hypothesis (15) passes a test of size for  then
           Move
  else
           create new state
           Move
  end if
  
  Move(
  
  re-estimate
  
  re-estimate
Figure 9: Pseudo-code for the transCSSR algorithm. Arguments: : the discrete alphabets for the input and output processes; : the joint input-output sequence of length drawn from ; : the maximum history length used when estimating candidate causal states; : the probability of falsely rejecting the null hypotheses (14) and (15).

References

  • Eubank et al. (2004) Stephen Eubank, VS Kumar, Madhav V Marathe, Aravind Srinivasan,  and Nan Wang, “Structural and algorithmic aspects of massive social networks,” in Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Society for Industrial and Applied Mathematics, 2004) pp. 718–727.
  • Compton et al. (2014) Ryan Compton, David Jurgens,  and David Allen, “Geotagging one hundred million twitter accounts with total variation minimization,” in Big Data (Big Data), 2014 IEEE International Conference on (IEEE, 2014) pp. 393–401.
  • Kramer et al. (2014) Adam DI Kramer, Jamie E Guillory,  and Jeffrey T Hancock, “Experimental evidence of massive-scale emotional contagion through social networks,” Proc. Natl. Acad. Sci. U.S.A. 111, 8788–8790 (2014).
  • Toole et al. (2015) Jameson L Toole, Carlos Herrera-Yaqüe, Christian M Schneider,  and Marta C González, “Coupling human mobility and social ties,” J. R. Soc. Interface 12, 20141128 (2015).
  • Oliveira and Barabási (2005) Joao Gama Oliveira and Albert-László Barabási, “Human dynamics: Darwin and einstein correspondence patterns,” Nature 437, 1251–1251 (2005).
  • Malmgren et al. (2008) R Dean Malmgren, Daniel B Stouffer, Adilson E Motter,  and Luís AN Amaral, “A poissonian explanation for heavy tails in e-mail communication,” Proc. Natl. Acad. Sci. U.S.A. 105, 18153–18158 (2008).
  • Malmgren et al. (2009) R Dean Malmgren, Jake M Hofman, Luis AN Amaral,  and Duncan J Watts, “Characterizing individual communication patterns,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2009) pp. 607–616.
  • Jiang et al. (2013) Zhi-Qiang Jiang, Wen-Jie Xie, Ming-Xia Li, Boris Podobnik, Wei-Xing Zhou,  and H Eugene Stanley, “Calling patterns in human communication dynamics,” Proc. Natl. Acad. Sci. U.S.A. 110, 1600–1605 (2013).
  • Wu et al. (2010) Ye Wu, Changsong Zhou, Jinghua Xiao, Jürgen Kurths,  and Hans Joachim Schellnhuber, “Evidence for a bimodal distribution in human communication,” Proc. Natl. Acad. Sci. U.S.A. 107, 18803–18808 (2010).
  • Goh and Barabási (2008) K-I Goh and A-L Barabási, “Burstiness and memory in complex systems,” Europhys. Lett. 81, 48002 (2008).
  • Kivelä and Porter (2015) Mikko Kivelä and Mason A Porter, “Estimating interevent time distributions from finite observation periods in communication networks,” Phys. Rev. E 92, 052813 (2015).
  • Ross and Jones (2015) Gordon J Ross and Tim Jones, “Understanding the heavy tailed dynamics in human behavior,” Phys. Rev. E 91, 062809 (2015).
  • Shalizi and Crutchfield (2001) Cosma Rohilla Shalizi and James P Crutchfield, “Computational mechanics: Pattern and prediction, structure and simplicity,” J. Stat. Phys. 104, 817–879 (2001).
  • Raghavan et al. (2013) Vasanthan Raghavan, Greg Ver Steeg, Aram Galstyan,  and Alexander G Tartakovsky, “Modeling temporal activity patterns in dynamic social networks,” IEEE Trans. Comput. Soc. Syst.  (2013).
  • Johnson et al. (2012) Benjamin D Johnson, James P Crutchfield, Christopher J Ellison,  and Carl S McTague, “Enumerating finitary processes,” Theo. Comp. Sci.  (2012).
  • Wiesner and Crutchfield (2008) Karoline Wiesner and James P Crutchfield, “Computation in finitary stochastic and quantum processes,” Physica D 237, 1173–1195 (2008).
  • Rybski et al. (2012) Diego Rybski, Sergey V Buldyrev, Shlomo Havlin, Fredrik Liljeros,  and Hernán A Makse, “Communication activity in a social network: relation between long-term correlations and inter-event clustering,” Sci. Rep. 2 (2012).
  • Crutchfield et al. (2015)

    James P Crutchfield, Michael Robert DeWeese,  and Sarah E Marzen, “Time resolution dependence of information measures for spiking neurons: Scaling and universality,” Front. Comput. Neurosci. 

    9, 105 (2015).
  • Esteban et al. (2012) Javier Esteban, Antonio Ortega, Sean McPherson,  and Maheswaran Sathiamoorthy, “Analysis of twitter traffic based on renewal densities,” arXiv preprint arXiv:1204.3921  (2012).
  • Doerr et al. (2013) Christian Doerr, Norbert Blenn,  and Piet Van Mieghem, “Lognormal infection times of online information spread,” PLOS ONE 8, e64349 (2013).
  • Jo et al. (2012) Hang-Hyun Jo, Márton Karsai, János Kertész,  and Kimmo Kaski, “Circadian pattern and burstiness in mobile phone communication,” New J. Phys. 14, 013055 (2012).
  • Hastie and Tibshirani (1990) Trevor J Hastie and Robert J Tibshirani, Generalized Additive Models, Vol. 43 (CRC Press, 1990).
  • Caires and Ferreira (2005) Sofia Caires and Jose A. Ferreira, “On the non-parametric prediction of conditionally stationary sequences,” Statistical Inference for Stochastic Processes 8, 151–184 (2005).
  • Ver Steeg and Galstyan (2012) Greg Ver Steeg and Aram Galstyan, “Information transfer in social media,” in Proceedings of the 21st International Conference on World Wide Web (ACM, 2012) pp. 509–518.
  • Darmon et al. (2013) David Darmon, Jared Sylvester, Michelle Girvan,  and William Rand, “Predictability of user behavior in social media: Bottom-up v. top-down modeling,” in Social Computing (SocialCom), 2013 International Conference on (IEEE, 2013) pp. 102–107.
  • Crutchfield (1994a) James P Crutchfield, Optimal Structural Transformations–the -transducer, Tech. Rep. (UC Berkeley Physics Research Report, 1994).
  • Shalizi (2001) Cosma Rohilla Shalizi, Causal architecture, complexity and self-organization in the time series and cellular automata, Ph.D. thesis, University of Wisconsin–Madison (2001).
  • Barnett and Crutchfield (2015) Nix Barnett and James P Crutchfield, “Computational mechanics of input-output processes: Structured transformations and the -transducer,” J. Stat. Phys. , 1–48 (2015).
  • Hastie et al. (2009) Trevor Hastie, Robert Tibshirani, Jerome Friedman, T Hastie, J Friedman,  and R Tibshirani, The Elements of Statistical Learning, Vol. 2 (Springer, 2009).
  • Shalizi and Klinkner (2004) Cosma Rohilla Shalizi and Kristina Lisa Klinkner, “Blind construction of optimal nonlinear recursive predictors for discrete sequences,” in Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI 2004), edited by Max Chickering and Joseph Y. Halpern (AUAI Press, Arlington, Virginia, 2004) pp. 504–511.
  • Crutchfield (1994b) James P Crutchfield, “The calculi of emergence: computation, dynamics and induction,” Physica D 75, 11–54 (1994b).
  • Marzen and Crutchfield (2015) S Marzen and JP Crutchfield, “Informational and causal architecture of discrete-time renewal processes,” Entropy 17, 4891–4917 (2015).
  • Strelioff and Crutchfield (2014) Christopher C Strelioff and James P Crutchfield, “Bayesian structural inference for hidden processes,” Phys. Rev. E 89, 042119 (2014).
  • Marzen and Crutchfield (2017a) Sarah Marzen and James P Crutchfield, “Informational and causal architecture of continuous-time renewal processes,” J. Stat. Phys. 168, 109–127 (2017a).
  • Marzen and Crutchfield (2017b) Sarah E Marzen and James P Crutchfield, “Structure and randomness of continuous-time, discrete-event processes,” J. Stat. Phys. 169, 303–315 (2017b).
  • Crutchfield and Young (1989) James P Crutchfield and Karl Young, “Inferring statistical complexity,” Phys. Rev. Lett. 63, 105 (1989).
  • Varn et al. (2013) Dowman P Varn, Geoffrey S Canright,  and James P Crutchfield, “-machine spectral reconstruction theory: a direct method for inferring planar disorder and structure from x-ray diffraction studies,” Acta Crystallogr. A 69, 197–206 (2013).
  • Paulson and Griffin (2014) Elisabeth Paulson and Christopher Griffin, “Computational complexity of the minimum state probabilistic finite state learning problem on finite data sets,” arXiv preprint arXiv:1501.01300  (2014).
  • Haslinger et al. (2010) Robert Haslinger, Kristina Lisa Klinkner,  and Cosma Rohilla Shalizi, “The computational structure of spike trains,” Neural Comput. 22, 121–157 (2010).
  • Darmon (2015) David Darmon, “transCSSR: Transducer Causal State Splitting Reconstruction,” https://github.com/ddarmon/transCSSR (2015).
  • Harremoës and Tusnády (2012) Peter Harremoës and Gábor Tusnády, “Information divergence is more -distributed than the -statistics,” in Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on (IEEE, 2012) pp. 533–537.