I Introduction
The current decade has been marked by an increasing availability of highresolution, heterogeneous data sets capturing human behavior in both realworld and digital environments Eubank et al. (2004); Compton et al. (2014); Kramer et al. (2014); Toole et al. (2015). This has made possible, for the first time, large scale investigations into human behavior across diverse groups of individuals. Of such phenomena, human communication patterns are one of the most wellstudied. Such studies have included written correspondences Oliveira and Barabási (2005), email correspondences Malmgren et al. (2008, 2009), and call/SMS records Jiang et al. (2013); Wu et al. (2010). The characteristics of these behavioral patterns include heavy tails, seasonality, and burstiness. This is certainly still an active field of research, and many authors have called into question whether the observed patterns are truly universal characteristics of human behavior or epiphenomena of the methods used in data collection and analysis Goh and Barabási (2008); Kivelä and Porter (2015); Ross and Jones (2015).
The standard model for human communication patterns treats the observed behavior as a realization from some sort of point process. Typically, the point process is taken to be a renewal process, where the observed behavior is completely specified by a distribution over the times between activity. To account for the complex properties of human behavior enumerated above, the interevent distribution is specified to have a heavy right tail, which naturally gives rise to burstiness. The authors of Malmgren et al. (2008, 2009) develop a refinement of this model which incorporates seasonality by allowing an individual to pass between passive and active states, where the behavior within the active state is governed by a Poisson process. Further refinements of this model allow the activity during the active periods to follow nonPoissonian dynamics Ross and Jones (2015).
We undertake an analysis of human communication that does not a priori assume a renewal or renewallike model of the observed behavior. Motivated by the field of computational mechanics Shalizi and Crutchfield (2001), we define our models explicitly in terms of a predictive representation of the observed behavior. Unlike renewal process models, we do not assume the behavior of individuals only depends on the time between actions. We seek to understand the behavior locally in time, where locality is defined around periods of activity. Moreover, we explicitly incorporate the interactive aspect of online social media services, something missing from much of the work on modeling human interevent distributions, with Raghavan et al. (2013) as a notable exception.
In Johnson et al. (2012), the authors set out to elucidate the structural properties of stochastic processes using tools from computational mechanics. To do so, they restricted their investigation to the subset of stochastic processes that are finitary, that is, those stochastic processes that have a representation with a finite number of “causal” states Wiesner and Crutchfield (2008), as defined in Section II.3. In this work, instead of elucidating all possible finitary models, we approach the problem from the opposite direction: we seek to trace the computational landscape of human behavior in digital environments by discovering the finitary models present in user behavior, and then investigate their computational structure.
We consider four models for user behavior on social media. Figure 1 provides a schematic representation of these models. The most general model (a) assumes that a user’s future behavior is influenced by both their past behavior and the past behavior of their social network, which we call the self+sociallydriven model. Models (b) and (c) are two restrictions of this model, the former where we assume that user’s future behavior is only influenced by their past behavior, and the latter where we assume that the user’s future behavior is only influenced by the past behavior of their social network. Finally, model (d) corresponds to the case where the user’s behavior is entirely explained by timeofday and dayofweek (i.e. seasonality).
In the rest of the paper, we proceed as follows. In Section II we motivate and develop the four models just presented, and propose methods for inferring them. In Section III.1 we explore the descriptive performance of these models on a real world data set derived from 15K users on the microblogging platform Twitter. In Section III.2, we investigate the structure of the models present amongst the users in our data set, and discuss the implications of these models. Finally, we conclude with the implications of our present work for the study of human communication patterns.
Ii Methodology
ii.1 User Behavior as a DiscreteTime Point Process
Consider the behavior of a user on a social media service. At any given time instant, a user either posts to the social media service or not. Thus, the user’s behavior may be modeled as a point process, where events correspond to posts. A naive model of the user’s behavior might assume that they are equally likely to use the service during any time instant. Under this model, the time between uses is exponentially distributed, and their activity pattern would correspond to a realization from a Poisson process. However, human communication patterns are known to be exhibit nontrivial complexities not accounted for by this model
Goh and Barabási (2008); Rybski et al. (2012), and thus more flexible models are required. In the following sections, we present three models that capture the observed complexity of human behavior in very different ways: a seasonallydriven model where the user’s behavior is accounted for by timeofday; a selfdriven model where a user’s behavior results from selffeedback; and a sociallydriven model where a user’s behavior results from both social and selffeedback.In practice, the information about human behavior on digital services is reported in seconds. Because we are interested in humanscale interactions between a user, their inputs, and the social media service, this time resolution is too fine grained. We begin by discretizing time into intervals of length . We then ask if, during an interval , the user was active. We denote this value for a user by and define
(1) 
The choice of specifies the time scale of interest. For example, if we take , then the process captures the weekly patterns of behavior that the user exhibits. If instead we take , then captures the daily patterns of the user. In this paper, we will take , because we are interested in the shorttimescale behavior of user behavior and useruser interaction. However, it is important to note that there is no single ‘correct’ resolution when considering the behavior of a point process, and a multitimescale analysis may be appropriate Crutchfield et al. (2015). Moreover, different time resolutions may be more or less appropriate for different users. Figure 2a demonstrates the activity patterns of three users at the 10 minute resolution, represented as a rastergram. Each row of the rastergram corresponds to a single day of activity, and each column of the rastergram corresponds to a ten minute window within a single day. A point occurs in the rastergram when for that day and time.
In a social media setting, a user has access to information provided by other users on the service. For example, a user might passively examine the messages generated by other users they follow, observe a particular form of communication directed at them, or actively investigate a keyword or topic. Generically, we will denote the inputs to a user as . We will assume that the inputs to the user can be mapped to a finite alphabet . As an example, if we consider to correspond to whether or not the user receives a mention during the time interval , then we take , where corresponds to no mention during that time interval, and corresponds to one or more mentions.
Our goal in this paper is the develop several contrasting models of a user’s observed behavior
. We take a predictive view of modeling, where we seek to infer the probability that the user engages with the social media service, given their past history of engagement and the past history of their inputs. Let
be the past behavior of user , and let be the future behavior of the user. Similarly, let and be the past and future values of the user’s inputs, considering as the present. Then we are interested in determining(2) 
the distribution over user ’s behavior starting from time , given their own past behavior and the past behavior of their inputs. For ease of presentation, in the following sections, we drop the dependence on in the notation, but emphasize that for each user , we assume a unique model for that user’s behavior.
ii.2 SeasonallyDriven Model: Inhomogenous Bernoulli
Thus far, we have specified our model of human behavior in terms of a discretetime point process: the observed behavior of the user is either active or quiescent during any given interval of time. One of the simplest models that can capture some of the complexity of human behavior is a renewal process Esteban et al. (2012); Doerr et al. (2013). From this perspective, the activity of the user is taken to occur at random times, with the time between occurrences governed by a distribution over the interarrival times. For example, if we take the interarrival distribution to be geometric with parameter , then the renewal process is a Bernoulli process, the discretetime analog of a Poisson process. Typically, the interarrival distribution is taken to have a long tail, to capture the fact that human behavior tends to be bursty, with long periods of quiescence punctuated by periods of high activity. See Figure 2 for examples of users who exhibit such behavior. Popular distributions for the interarrival times include lognormal, power law, and stretched exponential distributions Goh and Barabási (2008).
Due to the inherent seasonality in human behavior, timehomogeneous renewal processbased models are almost certainly misspecified. For example, a typical user on Twitter will be more likely to be active during the daylight hours in their geographic area than during the nighttime hours. This fact may explain the long tails typically observed in studies of the activity patterns of humans Jo et al. (2012). Moreover, we see such daily and weekly seasonality patterns in the aggregate behavior of users on Twitter. Because of this, we consider a timeinhomogeneous point process model for a user’s observed activity, where the probability a user is active during any time interval is independent of their previous activity and the activity of their inputs, and varies smoothly with time,
(3) 
Moreover, we assume that is periodic, , with chosen such that for a coarsening interval , week. We take (3) as our instantiation of the seasonallydriven model from Figure 1d.
ii.3 Selfdriven Model: The machine
The previous model assumes that the user’s activity during any time interval is independent of their activity during other time intervals, and accounts for the seasonality and bursting observed in a user by allowing the probability of their activity to vary according to timeofday and dayofweek. Alternatively, a user might exhibit burstiness due to selfexcitation. As an example, the user might be isolated from the devices they use to interact with the social media service, which would lead to a period of quiescence. Then, upon regaining access to their devices, they might use the service, which could lead to a selfexcitation to continue using the service.
This sort of behavior motivates an autoregressive model for the user’s behavior, where their behavior in the future is determined by their past behavior. That is, the probability that they behave a certain way in the future starting at time
is determined by how they behaved up until time and does not depend on their inputs,(6) 
This model assumes that the users behavior is conditionally stationary Caires and Ferreira (2005). For human behavior, this assumption may not hold in general, and thus care must be taken in applying this model with actual data. We address this in Section II.5, where we specify our procedure for daycasting user behavior. Previous work has found this model to perform well with many users on Twitter Ver Steeg and Galstyan (2012); Raghavan et al. (2013); Darmon et al. (2013).
We take the selfdriven model from Figure 1b to correspond to a stochastic process governed by (6), which we develop using computational mechanics Shalizi and Crutchfield (2001). Computational mechanics provides the unique, minimally complex, maximally predictive representation of a discrete state, discrete time stochastic process over the alphabet . The insight of computational mechanics is that when considering the predictive distribution (6), it is typically more useful to consider a statistic of the past rather than the entire past itself. It can be shown that the unique minimal sufficient predictive statistic of the past for the future of a conditionally stationary stochastic process is the equivalence class over predictive distributions. That is, two pasts and are considered equivalent if and only if . The equivalence relation induces a set of equivalence classes called the causal states of the process. The causal states, the allowed transitions between them, and the probability associated with the transitions is called the machine for the process . For a stochastic process with a finite number of predictive equivalence classes, the machine may be represented as deterministic finite automata, where the states of the automata correspond to the causal states, and the transitions between states are determined by the outputs . A demonstration of a portion of such a representation is given in Figure 2(a).
ii.4 Sociallydriven Models: The transducer
The previous two models assume that either the user is driven seasonally by timeofday type influences, or that the user is selfdriven. On a social media service, we expect a user to interact with other users, and therefore we would expect the user’s behavior to be associated with the behavior of those users. For example, a user might become more likely to tweet if they have recently been mentioned by another user. Such social associations are captured by the social inputs of a user . In particular, we will focus on the mention history of the user,
(7) 
That is, corresponds to whether or not the user received any mentions during the time window of length indexed by .
We take the modeling perspective where the user acts as a transducer, mapping their own past behavior and the past behavior of their social inputs into their future behavior. More explicitly, as with the selfdriven example, we seek the minimally complex, maximally predictive model for the user’s behavior. Again, computational mechanics provides such a model via the transducer Crutchfield (1994a); Shalizi (2001). The theory for the transducer has been recently developed in Barnett and Crutchfield (2015). The main insight is the same as for the machine: we define an equivalence relation over joint inputoutput pasts such that two pasts are equivalent if they induce the same predictive distribution over the future output. As with the machine, this equivalence relation induces a partition of the joint inputoutput past into channel causal states. Transitions between channel causal states occur on joint inputoutput pairs, and thus an transducer with finitely many channel causal states, like an machine, can be represented as a deterministic automata. The representation is given in Figure 2(b).
For the sociallydriven model, we distinguish between the selfmemoryful transducer and selfmemoryless cases, corresponding to the models from Figure 1a and Figure 1b, respectively. For the selfmemoryful transducer, we construct the equivalence classes using the joint inputoutput pasts, while for the selfmemoryless transducer, we construct the equivalence classes using only the output pasts. Thus, the selfmemoryless transducer assumes that a user’s behavior is purely driven by their social input. We consider both cases under the heading of sociallydriven models.
ii.5 Data Collection and Preprocessing
The activity of the 15K users was collected over a 49 week period, from 6 June 2014 to 15 May 2015. After data cleaning to account for outages in the data collection, 44 weeks of data were generated. We did not include the quiescent users in our analysis. As described above, the selfdriven and sociallydriven models assume that a user’s behavior can be modeled as a conditionally stationary stochastic process, where the distribution over futures is independent of the time index conditional on the observed past of the user or the observed joint input / output past, respectively. In order to make this assumption approximately true, we ‘daycast’ the time series associated with each user as follows. For a user, we determine their time zone as recorded by Twitter, and window their activity to be between 9 AM and 10 PM during their local time. We take this time window to capture the waking hours of a typical individual.
For this study, we split the 44 weeks of data into 28 weeks of training data and 16 weeks of testing data. The training data is used to select and infer the models, as we describe in the next section. The testing data is used for the comparison of these models in terms of their predictive and descriptive performance. This train/test split is performed to ensure that we obtain unbiased estimates of how the models perform for each user.
ii.6 Model Inference and Selection
For the seasonallydriven model, the only model parameter associated with each user is the smoothing parameter for the splines used to estimate the nonparametric term in (4). This parameter is chosen using generalized cross validation Hastie et al. (2009) on the 28 weeks of training data.
For machine reconstruction, we use the Causal State Splitting Reconstruction (CSSR) algorithm Shalizi and Klinkner (2004) to infer the models from data. For transducer reconstruction, we use the Transducer Causal State Splitting Reconstruction (transCSSR) algorithm, described in Appendix A. Both CSSR and transCSSR require the specification of a tuning parameter that controls the probability of splitting histories from a state when no such split should occur, and , the maximum history length used in determining the candidate causal / transducer states. We fix at 0.001. The maximum history length directly balances between the flexibility of the model and the precision with which the probabilities may be estimated. As an example, suppose a maximum history length is sufficient to resolve the causal states. In the extreme case that each history of length specifies a unique predictive distribution (an order Markov model), then the model would result in causal states. However, as we increase , we also necessarily decrease the number of examples of each history used to estimate the predictive distribution. This can result in spurious splitting of histories.
We use fold crossvalidation Hastie et al. (2009) to choose the appropriate for each user. In particular, for each user, we randomly partition the 196 days in the training set into folds. For a held out fold whose time indices are , define the empirical total variation (ETV) distance between their observed behavior during that fold and the model inferred using the remaining folds as
(8) 
where is the Kronecker delta and is the probability of observing outcome at time using the model inferred with all of the data except that from th fold. In the binary case, (8) reduces to
(9) 
Thus, we see that (8) quantifies the model performance by comparing the actual outcome for the user to the estimated probability of that outcome using the model. We can then compute the average of the empirical total variation over the held out sets,
(10) 
and choose to minimize this value. We perform this optimization using from 1 to 6, which for corresponds to a time span between ten minutes and an hour.
Iii Results
iii.1 Descriptive Performance Across the Model Classes
We begin by examining the ability of the four models developed in Sections II.2–II.4 to describe a given user’s behavior. To do so, we compute the ETV, as defined by (8
), between the held out test data and the crossvalidated models of each type. This provides us with a measure of how the models generalize to unseen behavior, and thus an indication of how well the models describe a user’s behavior. Because the ETV for a given user depends on their overall activity level, we standardize the ETV for a model
by the ETV for the seasonality model, giving us a score function(11) 
Recalling that a smaller ETV value indicates a smaller distance between the observed behavior and the model predictions, we see that will be greater than 1 when model outperforms the seasonal model, and smaller than 1 otherwise.
The scores across all users for all models are shown in Figure 4. The diagonal shows the densities of scores across the users for each model type. The self and sociallydriven models generally perform better than the seasonal model, with all of the score densities having a heavy tail right tail. We see that the selfmemoryful transducer performs best, with a score greater than 1 for 82.1% of the users. The selfmemoryless transducer is next best, with a score greater than 1 for 79.4% of the users. The machine has a score greater than 1 for 72% of the users.
We further summarize the pairwise comparisons between the nonseasonal models in Table 1. As expected, the selfmemoryful transducer outperforms both the machine and the selfmemoryless transducer on most users, as indicated by the mass of users above the identity line in the first quadrant in the bottom row of Figure 4. However, the users are much more equally split between those where the machine outperforms the selfmemoryless transducer (45.2%) and vice versa (54.8%).
\  M  TML  TMF 

M  —  0.452  0.244 
TML  0.548  —  0.336 
TMF  0.756  0.618  — 
iii.2 machine Causal Architectures
We next explore the typical machine architectures across the users. The number of causal states for an machine gives a rough indication of the complexity of the user’s behavior since each causal state indicates a further refinement of the past necessary for predictive sufficiency. In fact, the logarithm of the number of states is called the topological complexity of the machine Crutchfield (1994b). We find that most users are best described by models with a small number of states, with 95% of users having 13 or fewer causal states, and the largest machine having 58 states. Note that the maximum number of possible causal states when is .
Renewal Process. We next consider the general types of stochastic processes captured by many of the machines. We find that a large proportion of the users have machines which correspond to a generalization of a discretetime renewal process. Recall that a discretetime renewal process is a point process such that the lengths of periods of quiescence (runs of 0s between successive 1s) are independent and distributed according to an interarrival distribution Marzen and Crutchfield (2015). Equivalently, discretetime renewal processes can be defined in terms of the survival function . Because discretetime renewal processes are a special case of the more general processes described by (6), their machine architecture takes on a very particular form Marzen and Crutchfield (2015). The machine for a discretetime renewal process has a unique start state transitioned to after a period of activity, and transitions after a period of quiescence traverse a chain of states that counts the number of time points since an activity period occurred. We reproduce the generic architecture found amongst the renewal process machines in Figure 5 (left). This is a special finite state case of the more general architecture for a discretetime renewal process. In the nomenclature introduced in Marzen and Crutchfield (2015), this is an eventually Poisson process with characteristic parameters , where refers to the number of quiescent time steps necessary for the machine to behave as a Poisson (Bernoulli) process, and the
refers to the smallest resolution at which the interevent times may be coarsegrained and remain geometrically distributed. Such a process has an interevent distribution
(12) 
where specify the initial values of the interevent distribution and . We note that using CSSR with finite necessarily results in the reconstruction of finite state machines, and thus for an eventually Poisson processes with , the inferred
machine will be an approximation to the longer memory process. In fact, this motivates a particular family of parametric models with parameters
and which specify the initial interevent behavior. We emphasize that this particular family of parametric models was not assumed, but rather discovered via the use of CSSR.Reverse Renewal Process. A renewal process is specified by a distribution over run lengths of quiescence. For such a process, the distribution over run lengths of activity follows a geometric distribution. One could also define a process where these roles are reversed: the distribution over run lengths of activity takes an arbitrary form, and the distribution over run lengths of quiescence follows a geometric distribution. We call such a process a reverse renewal process, since the roles of quiescence and activity are reversed. The machine for a reverse renewal process is given in Figure 5 (right). In analogy to the eventually Poisson process, we call this process a reverse eventually Poisson process, which has the interquiescence distribution given by
(13) 
where specify the initial values of the interquiescence distribution and .
Alternating Renewal Process. More generally, we can define a class of processes such that the distributions over run lengths of activity and quiescence are allowed to deviate from the geometric distribution. Such processes are known as alternating renewal processes. An alternating renewal process switches between periods of quiescence and activity with probabilities governed by the quiescence length and activity length distributions. The machine for an alternating renewal process with eventually geometric distributions for both interarrival and interquiescence is given in Figure 6. We call such a process an alternating eventually Poisson process. Again, this class of processes offers another parametric model for user behavior, with parameters and .
Because of the stereotyped architecture of the machines for renewal, reverse renewal, and alternating renewal processes, we can easily identify those users whose machines have these architectures. An machine represents an alternating renewal process if and only if there is precisely one state transitioned to on an from a state transitioned to on an , . For example, the state transitioned to on a 1 from states transitioned to on a 0 represents the start of a run of 0s. The machines for renewal / reverse renewal processes have this property, in addition to only having a single state transitioned to on a 1 / 0. Thus, renewal and reverse renewal processes are a subset of alternating renewal processes. Using these rules, we can identify which users’ models correspond to renewal, reverse renewal, or alternating renewal processes. We find that 1881 (13.1%) of the machines correspond to (homogeneous) Bernoulli processes, 5408 (37.7%) correspond to twostate renewal / reverse renewal processes, 2713 (18.9%) correspond to pure renewal processes with three or more states, 85 (0.59%) correspond to pure reverse renewal processes with three or more states, and 1250 (8.7%) correspond to alternating renewal processes with four or more states.
As we have seen, by definition the nonalternating renewal users must be such that their machine has one or more states transitioned to on an from a state transitioned to on an . In practical terms, this means that for these users, knowledge of the time since a user switched from a period of activity / quiescence to a period of quiescence / activity is not sufficient to resolve a causal state. However, in many cases it is sufficient to know the behavior of the user immediately prior to a switch from quiescence to activity or vice versa. For example, a user may behave differently when they have switched from active to passive after just being active compared to after just being passive. These cases correspond to generalizations of the alternating renewal process to higher orders. Table 2 summarizes the number of models that correspond to an alternating renewal model of a certain order. For example, a zeroth order alternating renewal model corresponds to a Bernoulli process, a first order alternating renewal process corresponds to the model architecture in Figure 6, etc. We see that many of the models resolve to alternating renewal models of higher orders. In total, 95% of the users have an machine in agreement with an alternating renewal model of order 6 or smaller.
Alternating Renewal Order  # of Ms  # of Ts 

0  1881 (13.1%)  648 (5.1%) 
1  9546 (66.7%)  8249 (65.3%) 
2  611 (4.3%)  400 (3.2%) 
3  493 (3.4%)  309 (2.4%) 
4  530 (3.7%)  243 (1.9%) 
5  518 (3.6%)  220 (0.02%) 
6  134 (1.0%)  147 (0.01%) 
iii.3 transducer Causal Architectures
Thus far, we have considered the models associated with user behavior when we ignore their inputs. Next we turn to the models that incorporate those inputs, namely the transducer. Recall that the input we consider is whether a user was mentioned during time interval . As with the machines for the selfdriven model, we find that most users are welldescribed by transducers with a small number of states, with 95% having 6 states or fewer and 90% having 25 or fewer states in the selfmemoryless and selfmemoryful cases, respectively.
Renewallike SelfMemoryless Transducer. For the selfmemoryless case, 12059 of the 12641 mentioned users (95%) have an transducer with an architecture analogous to Figure 5a. That is, the user has a ‘justmentioned’ state, and subsequent time steps without the user receiving a mention lead to transitions away from this state, until a terminal state is reached. The causal states therefore map to the time since the user was mentioned, with all times of length or longer mapped to the same state. Thus, when viewed as purely sociallydriven, the relevant quantity to track for almost all of the users is the time since they were last mentioned.
Renewallike SelfMemoryful Transducer. A similar overarching ‘counting’ model architecture is also present amongst the memoryful transducers. Recalling that another way to view the states of an alternating renewal process is as counting the length of runs of since the last , we can generalize this to the transducer by considering states that count the lengths of runs of inputoutput symbols since the last inputoutput symbol . As in the memoryless transducer case, we call this an alternating renewallike process, since the causal states act in a similar fashion. We present a schematic representation of the partitioning of the channel causal state space in Figure 7. For these transducers, the causal states can be partitioned based on the runs of they count since the last . Thus, we begin by dividing the set of causal states into four quadrants, based on the runs of which they count. All states in a quadrant labeled by are transitioned to on . Then, the causal states within a quadrant are further partitioned into thirds, where each third corresponds to the symbol seen before the current run of . Thus, each third has a unique start state that is transitioned to on a from a state transitioned to on a . 10216 of 12641 (81%) of the mentioned users have an transducer in this alternating renewallike class. Note that the partitioning given in Figure 7 is the most general possible for this type of transducer. The quadrants / thirds within a quadrant may further collapse, as dictated by the structure of the transducer. For example, Figure 8 is an alternating renewallike transducer inferred for 27% of the mentioned users. This transducer has three states, which correspond to runs of , , and and are labeled as such. In this case, the quadrants corresponding to and collapse, since the corresponding state counts runs of regardless of the user behavior . Moreover, all thirds within a given quadrant also collapse, since the states treat runs of as the same from any . In terms of the actual behavior of the user, we see that the state labeled corresponds to when the user has been both quiescent and unmentioned in the recent past. In this case, the user has probability of being active given this state. The state labeled corresponds to when the user has been active, but not mentioned, in the recent past. In this case, the user has probability of being active given this state. Finally, the state labeled corresponds to the case where the user has been mentioned in the recent past, regardless of whether or not the user has been active. The user has probability of being active given this state. Thus, for over a quarter of the users, we see that knowledge of the recent past of both their own and their inputs behaviors provides sufficient information for predicting their future behavior. In particular, each of the quadrants requires only a single state, whereas in the most general model of this type with allows for states per quadrant.
As with the renewal, reverse renewal, and alternating renewal processes inferred from the machines, this alternating renewallike transducer motivates a particular parametric model, albeit a much more complicated one. In this case, we need to specify the chain lengths within each third of a quadrant . However, this results in at most states overall, compared to states in the most general model, and therefore a linear growth in model complexity as a function of history length compared to a geometric growth.
Again, as with the alternating renewal process, the alternating renewallike transducer generalizes to higher orders by considering the inputoutput behavior immediately prior to a switch from to . For example, a second order alternating renewallike transducer would distinguish between a user becoming quiescent and unmentioned after being mentioned twice in the past compared to going unmentioned before the previous mention. Many of the users exhibit transducers of higher order as shown in Table 2. Of the 12641 mentioned users, 78% are alternating renewallike of order 6 or smaller.
Iv Conclusions
In this paper, we have developed and applied a modeling framework for human behavior in digital environments. The approach begins by viewing a user’s behavior as a discretetime point process at a prespecified temporal resolution, and then considers four possible stochastic models that might give rise to the user’s behavior, namely the seasonal, selfdriven, sociallydriven, and self and sociallydriven processes which we estimate using an inhomogeneous Bernoulli process, an machine, or selfmemoryless/memoryful transducers.
We have found that simple computational architectures, as specified by their machines and transducers, describe much of the observed behavior of the users in our data set. A renewal process model, or its generalizations to reverse renewal and alternating renewal processes, was found to be appropriate for approximately 80% of the users in our study. This is in agreement with much of the literature on human communication patterns. However, we emphasize that we did not assume such models a priori, but rather discovered
their prevalence by using nonparametric modeling in an exploratory fashion. In fact, the appearance of reverse renewal and alternating renewal processes demonstrate that renewal process models alone are not sufficient to describe, for example, the burstiness observed in human communication patterns. Moreover, we discovered a new class of renewallike models that generalize renewal processes to inputoutput systems. We found that this class of models describes over 70% of the users in terms of the interaction between their activity and their social inputs. The prevalence of these stereotyped
machines/transducers motivates the use of either frequentist (such as the crossvalidation approach used in this paper) or Bayesian (as recently developed in Strelioff and Crutchfield (2014)) approaches that take advantage of these structures a priori during the estimation process. In addition to the generalized alternating renewal models, more general models were necessary for over 20% of the users in the selfdriven case and nearly 30% of the users in the self and sociallydriven case.One shortcoming of our work is the discretization of the behavior of all users at the time resolution of 10 minutes in order to analyze their behavior through the lens of discretetime computational mechanics. The computational mechanics of continuoustime, discrete event processes was recently developed in Marzen and Crutchfield (2017a, b). While at present machine reconstruction algorithms for inferring machines from such processes do not exist, this work and others motivate their development. A reanalysis of this data set from this viewpoint may reveal additional details hidden by the discretization.
The apparent complexity of observed user behavior on Twitter seems to arise from a simple computational landscape. Our present work lays out an initial sketch of its features. We hope this work motivates further exploration of the computational landscape of user behavior on Twitter and other communication platforms, and refinement of their maps.
Appendix A The transCSSR Algorithm for transducer Reconstruction
A diverse collection of algorithms have been developed to infer machines from data, from the topological methods first presented in Crutchfield and Young (1989) to more recent methods based on Bayesian methods Strelioff and Crutchfield (2014). Additional algorithms have been developed based on spectral methods Varn et al. (2013) and integer programming Paulson and Griffin (2014). We focus on the Causal State Splitting Reconstruction (CSSR) algorithm Shalizi and Klinkner (2004).
The theory formalizing transducers has only recently been developed, and the literature on transducer reconstruction from finite data is sparse. Sketches of CSSRlike algorithms for transducer reconstruction are provided in Shalizi and Crutchfield (2001); Haslinger et al. (2010). In this appendix, we develop the ideas originally suggested in these prior works, and present a generalization of CSSR for transducer reconstruction from data resulting from inputoutput systems. In homage to CSSR, we call our algorithm transCSSR, a portmanteau of transducer and CSSR. The transCSSR algorithm has been implemented in Python, and is available on GitHub Darmon (2015).
We sketch the transCSSR algorithm here, and give the pseudocode in Figure 9. The first phase of the algorithm groups inputoutput histories into a set of weakly prescient states. It begins by assuming that all inputoutput histories induce the same onestepahead predictive distribution. This is equivalent to assuming the transducer output future is independent of the inputoutput past, or to grouping together all joint histories into a candidate causal state represented by the joint inputoutput suffix , where is the null symbol. At each successive step, each candidate causal state
is tested for weak prescience by growing its histories into the past by one inputoutput pair. If the state is weakly prescient, then the predictive distribution for the new history should be equivalent to that of its parent causal state. This condition is tested using the null hypothesis
(14) 
with a significance test of size . If the null hypothesis is rejected, the predictive distribution of the history is compared against all of the remaining candidate causal states using the restricted alternative hypothesis
(15) 
Finally, if the history’s predictive distribution does not agree with any of the candidate causal states, it is split into a new candidate casual state. Such potential splitting is performed until the inputoutput symbols under consideration are each of length . Any hypothesis test for comparing discrete distributions may be used for (14) and (15). We use the test based on the statistic Harremoës and Tusnády (2012).
At the end of the this stage, each candidate causal state consists of histories that are withinstate equivalent and betweenstate distinct in terms of their predictive distributions, and each candidate causal state is weakly prescient. The causal states have these properties, in addition to being deterministic / unifilar on transitions between states on inputoutput pairs. To ensure determinism / unfilarity, the successor state for each history of length on an inputoutput pair is determined. If two or more histories in the same state transition to different states on the inputoutput pair , that state is split, and the determinization step is repeated. This procedure is repeats until all transitions are deterministic. Since there are finitely many histories, this procedure always terminates, in the extreme case with each history having its own causal state. Because this stage only ever splits histories from states, and the states before this stage were weakly prescient and withinstate equivalent and betweenstate distinct, the resulting states are as well. Therefore, the procedure results in a set of states that are weakly prescient and deterministic, and thus causal.
References
 Eubank et al. (2004) Stephen Eubank, VS Kumar, Madhav V Marathe, Aravind Srinivasan, and Nan Wang, “Structural and algorithmic aspects of massive social networks,” in Proceedings of the Fifteenth Annual ACMSIAM Symposium on Discrete Algorithms (Society for Industrial and Applied Mathematics, 2004) pp. 718–727.
 Compton et al. (2014) Ryan Compton, David Jurgens, and David Allen, “Geotagging one hundred million twitter accounts with total variation minimization,” in Big Data (Big Data), 2014 IEEE International Conference on (IEEE, 2014) pp. 393–401.
 Kramer et al. (2014) Adam DI Kramer, Jamie E Guillory, and Jeffrey T Hancock, “Experimental evidence of massivescale emotional contagion through social networks,” Proc. Natl. Acad. Sci. U.S.A. 111, 8788–8790 (2014).
 Toole et al. (2015) Jameson L Toole, Carlos HerreraYaqüe, Christian M Schneider, and Marta C González, “Coupling human mobility and social ties,” J. R. Soc. Interface 12, 20141128 (2015).
 Oliveira and Barabási (2005) Joao Gama Oliveira and AlbertLászló Barabási, “Human dynamics: Darwin and einstein correspondence patterns,” Nature 437, 1251–1251 (2005).
 Malmgren et al. (2008) R Dean Malmgren, Daniel B Stouffer, Adilson E Motter, and Luís AN Amaral, “A poissonian explanation for heavy tails in email communication,” Proc. Natl. Acad. Sci. U.S.A. 105, 18153–18158 (2008).
 Malmgren et al. (2009) R Dean Malmgren, Jake M Hofman, Luis AN Amaral, and Duncan J Watts, “Characterizing individual communication patterns,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2009) pp. 607–616.
 Jiang et al. (2013) ZhiQiang Jiang, WenJie Xie, MingXia Li, Boris Podobnik, WeiXing Zhou, and H Eugene Stanley, “Calling patterns in human communication dynamics,” Proc. Natl. Acad. Sci. U.S.A. 110, 1600–1605 (2013).
 Wu et al. (2010) Ye Wu, Changsong Zhou, Jinghua Xiao, Jürgen Kurths, and Hans Joachim Schellnhuber, “Evidence for a bimodal distribution in human communication,” Proc. Natl. Acad. Sci. U.S.A. 107, 18803–18808 (2010).
 Goh and Barabási (2008) KI Goh and AL Barabási, “Burstiness and memory in complex systems,” Europhys. Lett. 81, 48002 (2008).
 Kivelä and Porter (2015) Mikko Kivelä and Mason A Porter, “Estimating interevent time distributions from finite observation periods in communication networks,” Phys. Rev. E 92, 052813 (2015).
 Ross and Jones (2015) Gordon J Ross and Tim Jones, “Understanding the heavy tailed dynamics in human behavior,” Phys. Rev. E 91, 062809 (2015).
 Shalizi and Crutchfield (2001) Cosma Rohilla Shalizi and James P Crutchfield, “Computational mechanics: Pattern and prediction, structure and simplicity,” J. Stat. Phys. 104, 817–879 (2001).
 Raghavan et al. (2013) Vasanthan Raghavan, Greg Ver Steeg, Aram Galstyan, and Alexander G Tartakovsky, “Modeling temporal activity patterns in dynamic social networks,” IEEE Trans. Comput. Soc. Syst. (2013).
 Johnson et al. (2012) Benjamin D Johnson, James P Crutchfield, Christopher J Ellison, and Carl S McTague, “Enumerating finitary processes,” Theo. Comp. Sci. (2012).
 Wiesner and Crutchfield (2008) Karoline Wiesner and James P Crutchfield, “Computation in finitary stochastic and quantum processes,” Physica D 237, 1173–1195 (2008).
 Rybski et al. (2012) Diego Rybski, Sergey V Buldyrev, Shlomo Havlin, Fredrik Liljeros, and Hernán A Makse, “Communication activity in a social network: relation between longterm correlations and interevent clustering,” Sci. Rep. 2 (2012).

Crutchfield et al. (2015)
James P Crutchfield, Michael Robert DeWeese, and Sarah E Marzen, “Time resolution dependence of information measures for spiking neurons: Scaling and universality,” Front. Comput. Neurosci.
9, 105 (2015).  Esteban et al. (2012) Javier Esteban, Antonio Ortega, Sean McPherson, and Maheswaran Sathiamoorthy, “Analysis of twitter traffic based on renewal densities,” arXiv preprint arXiv:1204.3921 (2012).
 Doerr et al. (2013) Christian Doerr, Norbert Blenn, and Piet Van Mieghem, “Lognormal infection times of online information spread,” PLOS ONE 8, e64349 (2013).
 Jo et al. (2012) HangHyun Jo, Márton Karsai, János Kertész, and Kimmo Kaski, “Circadian pattern and burstiness in mobile phone communication,” New J. Phys. 14, 013055 (2012).
 Hastie and Tibshirani (1990) Trevor J Hastie and Robert J Tibshirani, Generalized Additive Models, Vol. 43 (CRC Press, 1990).
 Caires and Ferreira (2005) Sofia Caires and Jose A. Ferreira, “On the nonparametric prediction of conditionally stationary sequences,” Statistical Inference for Stochastic Processes 8, 151–184 (2005).
 Ver Steeg and Galstyan (2012) Greg Ver Steeg and Aram Galstyan, “Information transfer in social media,” in Proceedings of the 21st International Conference on World Wide Web (ACM, 2012) pp. 509–518.
 Darmon et al. (2013) David Darmon, Jared Sylvester, Michelle Girvan, and William Rand, “Predictability of user behavior in social media: Bottomup v. topdown modeling,” in Social Computing (SocialCom), 2013 International Conference on (IEEE, 2013) pp. 102–107.
 Crutchfield (1994a) James P Crutchfield, Optimal Structural Transformations–the transducer, Tech. Rep. (UC Berkeley Physics Research Report, 1994).
 Shalizi (2001) Cosma Rohilla Shalizi, Causal architecture, complexity and selforganization in the time series and cellular automata, Ph.D. thesis, University of Wisconsin–Madison (2001).
 Barnett and Crutchfield (2015) Nix Barnett and James P Crutchfield, “Computational mechanics of inputoutput processes: Structured transformations and the transducer,” J. Stat. Phys. , 1–48 (2015).
 Hastie et al. (2009) Trevor Hastie, Robert Tibshirani, Jerome Friedman, T Hastie, J Friedman, and R Tibshirani, The Elements of Statistical Learning, Vol. 2 (Springer, 2009).
 Shalizi and Klinkner (2004) Cosma Rohilla Shalizi and Kristina Lisa Klinkner, “Blind construction of optimal nonlinear recursive predictors for discrete sequences,” in Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI 2004), edited by Max Chickering and Joseph Y. Halpern (AUAI Press, Arlington, Virginia, 2004) pp. 504–511.
 Crutchfield (1994b) James P Crutchfield, “The calculi of emergence: computation, dynamics and induction,” Physica D 75, 11–54 (1994b).
 Marzen and Crutchfield (2015) S Marzen and JP Crutchfield, “Informational and causal architecture of discretetime renewal processes,” Entropy 17, 4891–4917 (2015).
 Strelioff and Crutchfield (2014) Christopher C Strelioff and James P Crutchfield, “Bayesian structural inference for hidden processes,” Phys. Rev. E 89, 042119 (2014).
 Marzen and Crutchfield (2017a) Sarah Marzen and James P Crutchfield, “Informational and causal architecture of continuoustime renewal processes,” J. Stat. Phys. 168, 109–127 (2017a).
 Marzen and Crutchfield (2017b) Sarah E Marzen and James P Crutchfield, “Structure and randomness of continuoustime, discreteevent processes,” J. Stat. Phys. 169, 303–315 (2017b).
 Crutchfield and Young (1989) James P Crutchfield and Karl Young, “Inferring statistical complexity,” Phys. Rev. Lett. 63, 105 (1989).
 Varn et al. (2013) Dowman P Varn, Geoffrey S Canright, and James P Crutchfield, “machine spectral reconstruction theory: a direct method for inferring planar disorder and structure from xray diffraction studies,” Acta Crystallogr. A 69, 197–206 (2013).
 Paulson and Griffin (2014) Elisabeth Paulson and Christopher Griffin, “Computational complexity of the minimum state probabilistic finite state learning problem on finite data sets,” arXiv preprint arXiv:1501.01300 (2014).
 Haslinger et al. (2010) Robert Haslinger, Kristina Lisa Klinkner, and Cosma Rohilla Shalizi, “The computational structure of spike trains,” Neural Comput. 22, 121–157 (2010).
 Darmon (2015) David Darmon, “transCSSR: Transducer Causal State Splitting Reconstruction,” https://github.com/ddarmon/transCSSR (2015).
 Harremoës and Tusnády (2012) Peter Harremoës and Gábor Tusnády, “Information divergence is more distributed than the statistics,” in Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on (IEEE, 2012) pp. 533–537.
Comments
There are no comments yet.