The Past as a Stochastic Process

by   David H. Wolpert, et al.

Historical processes manifest remarkable diversity. Nevertheless, scholars have long attempted to identify patterns and categorize historical actors and influences with some success. A stochastic process framework provides a structured approach for the analysis of large historical datasets that allows for detection of sometimes surprising patterns, identification of relevant causal actors both endogenous and exogenous to the process, and comparison between different historical cases. The combination of data, analytical tools and the organizing theoretical framework of stochastic processes complements traditional narrative approaches in history and archaeology.



There are no comments yet.


page 1

page 2

page 3

page 4


New Proofs of the Basel Problem using Stochastic Processes

The number π ^2/6 is involved in the variance of several distributions i...

Deep Learning Anti-patterns from Code Metrics History

Anti-patterns are poor solutions to recurring design problems. Number of...

Towards the automated large-scale reconstruction of past road networks from historical maps

Transportation infrastructure, such as road or railroad networks, repres...

Efficient Answering of Historical What-if Queries

We introduce historical what-if queries, a novel type of what-if analysi...

Changepoint analysis of historical battle deaths

It has been claimed and disputed that World War II has been followed by ...

Understanding the Historic Emergence of Diversity in Painting via Color Contrast

Painting is an art form that has long functioned as major channel for co...

Providing Advanced Access to Historical War Memoirs Through the Identification of Events, Participants and Roles

The progressive digitization of historical archives provides new, often ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Stochastic processes

i.1 First-order Markov processes

To make the ideas above more precise, we now introduce the most basic type of stochastic process, which we shall refer to repeatedly in the sections below. This process is called a “(first-order, time-homogeneous) Markov process”, with the associated equation for the dynamics sometimes referred to as a “master equation.”

Let be the location of a society in some appropriate socio-political-environmental state space, and let be time. Consider the evolution of the society through that space starting at the value at time

. (For the moment we take

to be a real-valued vector, for expository purposes.) A fixed (time-invariant) stochastic process assigns a probability to each trajectory of

emanating from this starting point. In a Markov stochastic process, there is a parameter specifying a “transition matrix,” or kernel,

, and the probability distribution of trajectories through

is governed by the linear differential equation,


Intuitively, the term captures the flow from all states into , while the term captures the flow from into all other states.

As (1) illustrates, in a stochastic process it is not the state of variable that evolves deterministically and continuously. Instead, it is its probabilities that do. These probabilities tell us how the trajectory of values can fluctuate, in general allowing both smooth, drift-like behavior, as well as discontinuous jumps. Importantly, there are two types of such jumps. If the system is currently at , and the kernel places non-infinitesimal weight on an that lies a non-infinitesimal distance from , then there is nonzero probability of a discontinuous jump across the space. On the other hand, it could be that a small change in the current position, , results in a large change in the kernel , and therefore a large change in where the system is likely to move from . This can be viewed as a “jump” in the derivative of , arising from a small change in i.e., a jump the “direction of flow” of the system, rather than in the position of the system itself. The parameter parameterizes the stochastic rule, and so directly specifies where in state-space either type of jump occurs.

No matter what the precise stochastic process model being used to explain historical data, we will refer to the average-case, smooth patterns in the trajectories through the space of random variables as the

trends emphasized by structural approaches. These include patterns in the trajectories through the space of environmental variables as well as through more strictly socio-political ones. Similarly, discontinuous jumps in the trajectories of the random variables often but not always involve human agency. The distinction between trends and jumps is often invoked in the historiography of science. For example, the invention of new ideas or technologies can occur through a gradual process, such as with Moore’s law. New ideas can also occur through a jump, however, such as Einstein’s invention of general relativity.

In addition to such changes in the value of at some time that are governed by the stochastic process, it is possible that the stochastic process itself changes at some time . Such a change in the process would not affect the immediate value , but rather it would affect how is likely to evolve after time .444To be more concrete, suppose that the stochastic process itself changes at time . This would mean that if we start the society at at some time , the likely ensuing trajectories would be different from what they would be if we instead start the society at at a time . We refer to such a change in the stochastic process itself as an exogenous perturbation of the stochastic process.555As a technical comment, throughout this paper we view real-world data sets as being formed by noisy observations of trajectories , rather than incorporating the observational process directly into the stochastic process itself. So we do not consider the possible effects of time-dependence in the map taking to our observational data. In particular, for the purposes of this paper, we do not consider the many ways that archaeological data concerning events further in the past can be “more noisy” than recent archaeological data. Formally, an exogenous perturbation is a change at some time in the parameter in (1), due to an event not captured in the stochastic process variables, .

To illustrate these points, suppose is a vector of variables related to socio-economic characteristics of a particular society. For that space of variables , slow changes in climate due to reasons independent of human activity constitute gradual exogenous perturbations. For the same space of variables, events like volcanic eruptions would also be exogenous perturbations, but sudden ones. Similarly, the birth of a great leader who is in fact historically consequential (not just idolized that way by future generations) would be a sudden exogenous perturbation.

It can be quite challenging to ascertain from a given historical data set when (if ever) such exogenous perturbations happened, or if instead the dynamics of the data through the space of values just reflects evolution under (1) for an unvarying . Yet making this determination is crucial if we wish to infer any general principles of historical dynamics. We touch on this issue several times in the examples below, and also discuss it with some actual historical dynamics from the US Southwest in the SI Section 1.A.

The foregoing summary of Markov processes elides many potentially important details. To present some of those details, in the SI Section 1 we work through perhaps the simplest Markov process, a 1-dimensional random walk with step size 1. We also discuss some important variants to (1) there.

i.2 Higher-order stochastic processes

It is important to emphasize that the first-order Markovian dynamics given in (1) is only one type of stochastic process. If a system is first-order Markovian then the values of our variables in the past provide no useful information beyond what’s in their current value for predicting their future values. However, in many common circumstances this property will be violated, at least to a degree, and so the system will not evolve precisely according to (1).

As an example, suppose we coarse-grain the values of into large bins, e.g., to help in statistical estimation. Then in general, even if the dynamics over is first-order Markovian, the dynamics over those coarse-grained bins will not be. (We illustrate this with the random walk example in the SI Section 1.) Similarly, if we leave some important components of out of our stochastic process model, they become what are called “hidden variables”. In such circumstances, again, the dynamics over the visible components of will not be first-order Markovian (although it might be -th order Markovian for ).

In all of these scenarios, alternative techniques like Hidden Markov models (see Section 2( 

II.4)), or high-order Markov models, may be more appropriate than first-order Markov models obeying (1). Indeed, certain types of time-series analysis that have already been used in historical analysis can be formulated as a type of higher order Markov process. For example, the use of delay coordinates Takens (1981); Sauer et al. (1991) to predict future trajectories can be formulated as a technique for estimating the average future trajectory under such a stochastic process.666Note though that such techniques can also be interpreted as estimating deterministic dynamics embedded on a high dimensional manifold in state-space Takens (1981); Sauer et al. (1991). Similarly, vector autoregressive moving-average (VARMA) models Lütkepohl (2005) can also be formulated as stochastic process models.

While such higher-order models have been applied before in historical analyses, they have several non-trivial shortcomings, as discussed above. In addition, many of the issues mentioned above (e.g., detecting exogenous perturbations) are generic, in that they also arise in those alternative models. For all these reasons, we mostly concentrate on first-order Markov processes in this paper.

Ii Recent research in history based on stochastic process modeling

ii.1 Jumps in sociopolitical complexity of polities in the longue duree

It is widely agreed that over long-enough periods of time there is a strong trend for polities to increase in size, or to disappear (e.g., Tainter (1988)). Less clear is whether such increases are relatively smooth trends in which various measures of scale and/or capability, such as information processing or capability at warfare, increase in lock step; or whether there are clear discontinuities or disjunctions in the growth processes.

The Seshat data set François et al. (2016)

allows analyses that can address such questions. This data set contains values for a large and diverse set of variables concerning societies over the past several thousand years, in both Afro-Eurasia and the Americas. (We provide more background in the SI Section 3.) This data set has been condensed for some recent analyses into a set of nine complexity characteristics (CCs) defined by the researchers for multiple past societies. These are reported at (usually) one-hundred-year intervals for the dominant polity controlling each of thirty world regions called Natural Geographic Areas (NGAs; the controlling polity may have its capital outside the NGA). For some NGAs, these time-stamped data stretch back millennia. Principal components analysis (PCA) on the CCs found that the primary component, PC1, explained roughly three-quarters of the variability of the 9 CCs 

Turchin et al. (2018a). The Seshat team suggests interpreting PC1 as an overall, scalar measure of social complexityTurchin et al. (2018a); Whitehouse et al. (2019).777This is because, by construction, the individual CCs represent separate complexity measures, PC1 “captures three quarters of the variability”. Note as well that PC1 has a roughly equal loading on all 9 CCs – that is, it is a weighted version of the individual complexity measures. Note that PCA is independent of the time-stamps of the individual data points. So mathematically, there is no reason to expect that PC1 bears any relation to time; it could just as readily decrease with time as increase. Nonetheless, examination of the data makes clear that there is a strong tendency for polities (and sequences of polities within an NGA) to increase their PC1 value with time, albeit with long periods of stasis Turchin et al. (2018a). Crucially, different NGAs have different starting dates, and there is no guarantee that sociopolitical change will march in lock-stop across NGAs. On the other hand, as discussed below, PC2 consists of two sets of CC components: one related to polity scale and another related to information processing capabilities Shin et al. (2020). This suggests fitting a very basic “stochastic process model” to the Seshat data, by graphing PC2 against PC1, with larger PC1 values generally corresponding to later times in an NGA’s sequence.

A recent paper (Shin et al., 2020) explored this possibility by analyzing how PC2 and PC1 co-vary. Whereas PC1 has positive loadings on all 9 CCs, PC2 has negative loadings on the scale variables (such as polity size) and positive loading on information variables (such as writing, texts, and money). Interestingly, as illustrated in Fig. 1, PC2 is a “saw-tooth” function of PC1, first decreasing until PC1 is about , then increasing until PC1 is about , then decreasing again. This reveals an unexpected sociopolitical phenomenon: there is a strong average tendency among the polities in this sample to first increase in size until a certain threshold size is reached (PC1 ), at which point the dynamics is taken over by increases in the information-processing ability of the polity (broadly defined) with relatively little increase in size. Eventually this process reaches a new threshold (PC1 ), after which a dynamic process increasing the size of the polity again takes over.


Figure 1: The mean value of PC2 as a function of PC1, where the mean and error bars are calculated using a sliding window with a width of 1.0 in PC1-space. This graphic is reproduced from Shin et al. (2020).

The key point is that the two values of PC1 at approximately and appear to constitute thresholds in complexity, with different sociopolitical processes dominating the dynamics depending on whether the PC1 value is less than , between and , or greater than . This tendency is exhibited by polities that reached these thresholds at vastly different times, in far-separate locations and differing local environments, with extremely different neighbors. Therefore these two thresholds cannot be due to exogenous perturbations, in which would change at a specific time. Rather it appears likely that they are examples of jumps of the second type discussed above, where the transition matrix in (1) changes discontinuously once (in this case, PC1-PC2) reaches the associated threshold.

One serendipitous consequence of the sorts of stochastic process modeling we advocate in this paper is that it fosters cumulative cascades of analyses, in which an insight coming out of analyzing one data set allows us to analyze a second data set in a way that would not have been possible otherwise. We can illustrate this with the current controversy about the relationship between when a polity undergoes a jump in “social complexity”, and when its members widely adopt worship of “moralizing gods.” On side of the debate are researchers who say that the emergence of such gods is a precondition for a jump in social complexity. (Loosely speaking, the reasoning underlying this hypothesis is that worship of such deities fosters wide-spread sacrifice in pursuit of the common good, which in turn results in the jump in social complexity.) Others have argued just as vigorously that the causal influence is the other way around, that such jumps are actually a necessary condition for the emergence of moralizing gods in a society.

We can exploit our analysis described above concerning hinge points to address this issue. To begin, we suppose that the first hinge point is a marker for a jump in social complexity. As described in App. A(2), we can then analyze the relationship between the time that moralizing gods appear in a polity and the time when that polity undergoes a jump in social complexity. The results contrast with the somewhat polarized views in the literature  Purzycki et al. (2016); Whitehouse et al. (2019)

. Given a null hypothesis that polities with moralizing gods arise later than polities without them, there appears to be little evidence of correlation between the time that moralizing gods appear in a polity and the time that that polity experiences a “jump of social complexity”.

888We emphasize, though, that we have not yet done a careful analysis of exactly how much correlation there is. Such gods do not seem to be a precondition for jumps in complexity by a polity, nor does it seem that a polity’s having gone through such a jump is an absolutely necessary precondition for their appearance (as claimed in Whitehouse et al. (2019)). A natural way to extend this preliminary analysis would be to fit a full stochastic process model to historical data in a state-space that consists of both the CCs of a society (or one or more of the corresponding PC-values) and a variable representing the presence / absence of moralizing gods.

ii.2 Are there stable social evolutionary types of societies?

Papers in the social sciences sometimes note that some set of multiple time-series across a space seem to move quite slowly across one or more regions of that space. This has sometimes been interpreted as meaning that those regions are “basins of attraction” or “attractors” or “equilibria” of an underlying stochastic process Peregrine (2018). A long-running interest in archaeology (stretching back at least to the mid-late nineteenth century Tylor (1871)) is whether there really are such stable social evolutionary types of societies, or whether any apparent instances of such societies in the data are illusory.

In this subsection we illustrate how careful use of stochastic process analysis can help address this issue, again using the Seshat dataset Turchin et al. (2015). If one plots a histogram of the PC1 values of all the time-stamped observations across NGAs, one finds that, even though every polity is usually sampled once per century (though this is less true early in time for polities with long time sequences), the PC1 values cluster into two widely separated regions (see Fig. 2). Does this imply that those regions of PC1 values are “equilibria” of the underlying dynamics, i.e., stable social evolutionary types?

While some have thought so Miranda and Freeman (2020); Turchin et al. (2018b), an analysis using the sorts of tools we advocate in this paper suggests that the answer, in point of fact, is no. Those clusters can be explained as arising from a null model that results from the interplay between the starting values of polities in the data set (i.e., the point at which they are first coded), the underlying stochastic process, and the sampling of the stochastic process. To give a simple, counter-intuitive example of that interplay, even if a stochastic process has the same average speed across a space, if the variance

of its speed varies across that space, then we will see clusters; regions with higher variance will have clusters of more data points than regions of lower variance 

Shin et al. (2020). Complicating the picture still further is the fact that the Seshat data set comprises multiple, different time-series, reflecting a combination of three factors:

  1. Random variation across the time series of the polities in the PC1 value of the earliest data point;

  2. Random variation across the polities in the chronological time of that first PC1 value;

  3. A general drift of polities from low to high PC1 values that is well-modeled by a time-homogeneous Markov chain.


Figure 2: Histogram of PC1 values for all polity-time pairs.

Perhaps the most conservative way to test whether such factors lead to features like clusters in the data set is to use a null hypothesis that the data were generated by a first-order, time-homogeneous discrete-time Markov chain, corrected so that the variance of the chain as one moves across the space matches the variance in the data set as one moves across the space.

Indeed, as described in the supplementary materials of (Shin et al., 2020), one can accurately fit a first-order, time-homogeneous discrete-time Markov chain to the Seshat data set PC1 values, mapping one PC1 value to the next with a conditional distribution that does not change in time (as it ought, if polities were “dwelling” in certain portions of its space). If one forms 30 sample time-series by running that Markov chain 30 times, each for a different number of iterations, and superimposes the 30 resultant time-series, one finds that they will typically form two widely-separated clusters, just like the clusters in the original Seshat data along the PC1 dimension (Fig. 2). Yet because the underlying process is a time-homogeneous Markov chain, there are no attractors in the underlying dynamics. 999A time-homogeneous Markov chain has a unique stationary probability distribution. That distribution can have multiple peaks, but the probability is of the system staying in one peak indefinitely. In that sense, the dynamics cannot have multiple basins of attraction.

Intuitively, these clusters reflect several factors. First, the 30 time-series are all transients (none long enough to be samples of the stationary distribution of the Markov chain). Second, they are all of different lengths (due to (a, b) above). Combined with the fact that the generating conditional distribution is non-uniform, the result is that two clusters in the data appear, with no direct significance for interpreting the underlying dynamics. We discussed this example in some detail to highlight the formal and theoretical subtleties in using stochastic processes to model archaeological data sets. But, as this analysis shows, these challenges can be met—supporting our claim that such a program can now be developed successfully.

ii.3 Comparing different historical trajectories: Flow maps

Another big question in history is to what degree different trajectories are comparable. Formal analysis of the sort presented here allows us to move beyond metaphorical uses of developmental stages or historical cycles. In Sec. 2II.2 we presented a cautionary tale concerning computational history, arguing semi-formally that a simple null-hypothesis test shows that the Seshat data set is consistent with the hypothesis that it was generated by a time-homogeneous Markov chain. While such null-hypothesis tests are a useful staring point, they are blunt instruments that sidestep the critical question of what is the best statistical inference from the data.

Fortunately, analytical tools developed very recently in other scientific fields might prove quite useful when applied to historical data sets. In this section we illustrate how one such tool can produce a far more sophisticated estimate of the stochastic process that generated the Seshat data than a simple fit with a time-homogeneous Markov chain.

Fig. 3 presents a plot of trajectories of the polities in the Seshat data set in the joint PC1-PC2 space. Each of those 30 time-series is quite short, which makes it difficult to extend any single one of them, e.g., by using vector autoregressive-moving-average (VARMA) models or delay coordinate techniques Lütkepohl (2005). A different approach is to model the dynamics across PC1-PC2 by fitting the data to a Markov process as in (1), with its transition matrix restricted to the set of stochastic processes called “Langevin equations” (a.k.a., “stochastic differential equations” [SDEs]) where the drift and diffusion terms vary across the space. As an illustration, Fig. 3 shows the drift term of the Langevin equation given by applying the Bayesian estimation technique recently introduced in Yildiz et al. (2018) to the time-series across PC1-PC2 in the Seshat data.


Figure 3:

The two axes are PC1 and PC2 of the (original) Seshat data set, respectively. The red dots are the elements of that data set. Since those elements of the data set are time-stamped, one can find the Langevin SDE that has highest posterior probability conditioned on that data set. The black arrows are the mean velocity vectors of that Langevin equation at the associated PC1-PC2 positions (i.e., they are values of the drift vector field of that SDE evaluated on a grid). Finally, the blue lines are counterfactual, sample trajectories of that SDE.

There are several advantages of this kind of fit of an SDE to a set of short time series’. Most directly, recall that as discussed above, by itself, the fact that PC1 explains most of the variability of the Seshat data set provides no basis for identifying the PC1 position of a polity as a measure of its social complexity, despite the fact that it has often been explicitly used that way Whitehouse et al. (2019). In fact, one could argue that any data set, by itself, cannot provide guidance as to how to quantify “social complexity”. Ultimately, the problem is that social complexity is a vague, intuitive concept, meaning different things to different people.

In contrast, a flow field like that in Fig. 3 focuses attention on the more concrete issue of how real-world societies actually evolve. For example, it has long been noted that some sufficiently coarse-grained data sets seem to suggest that there is a rough tendency of polities to go through historical cycles Khaldun (2015). As future work, one can imagine estimating the SDE that generated those data sets, then employing Monte Carlo sampling of that SDE to quantify this tendency of polities to go through such cycles. For example, this could be done by Monte Carlo estimating the probability that a polity currently at a given will follow a trajectory that goes at least away from but then returns to within of , as a function of and . (In this regard, note the suggestion in Fig. 3 of a circulation pattern around the point .

In addition, it may prove fruitful to re-express the stochastic flow field (or more precisely, the field of drift vectors defining the SDE). In particular, robust algorithms have been developed in computer graphics for inferring a Helmholtz decomposition Arfken and Weber (1999) of such flow fields. Such a decomposition expresses the flow across PC1-PC2 in terms of (the gradient of) a scalar potential field and (the curl of) a vector potential field. The scalar potential field might prove particularly illuminating; level curves of that potential field across PC1-PC2 would be an extremely flexible way of identifying polities all of whom are at the same “stage of development”. In the absence of a vector potential, the dynamics of polities would simply be gradient descent, descending those level curves.

At a high level, this joint use of tools for inferring SDEs and the Helmholtz decomposition is a way to infer directly from the data novel laws governing social systems (or at least strong empirical regularities emerging from them), formulated in terms of the Helmholtz decomposition. Note though that both the SDEs and the resulting potentials might not have a simple, explicit form. That could make it difficult to give them a simple social science interpretation. This difficulty is actually fundamental in all analysis of longitudinal data sets; in particular a similar interpretational challenge holds even for fitting one-dimensional time series data sets using delay coordinate techniques, even linear ones like ARMA models (Takens, 1981; Sauer et al., 1991); addressing this challenge is a topic for future research.

ii.4 A hidden Markov model for demographic data

While archaeologists often debate which types of variables are most important for understanding a given region’s past, a few of these are often considered more crucial than others, including climate (principally temperature and rainfall), demography, political organization, technology, economy, ideology, and local ecology. In this section, we describe in some detail a stochastic process model that can be used for demographic modeling and which can be used to fuse multiple types of data into a single inferential framework. Aside from being potentially useful in its own right, we offer the model as a template for using stochastic process models in archaeological or historical work.

Of the variables related to demography, perhaps the most crucial is a region’s total population size, but it is often necessary to consider also how the population is structured by age, sex, space, and socioeconomic status. High-quality demographic models exist for describing structured populations Rogers (1966); Caswell (2001), so the main challenge is how to use demographic models to infer past demography from the available archaeological data.

One complication is the temporal fidelity at which inference is possible. Seasonal and yearly changes in climate can change demography at yearly and sub-yearly timescales, but it is rarely, if ever, possible to infer both climate variability and demography at these timescales. As described in the introduction, this means that the data is coarse-grained, and so might not be accurately described by a Markov process. Therefore a hidden Markov model (HMM) could prove more suitable. In App.  D we describe in detail an age- and sex-structured population projection model. The fundamental demographic variable is the population vector of females in the population, the elements of which are the number of females in each distinct age class. The time-dependent population matrix projects the population to the next, distinct time-step per


In addition, a corresponding population vector of males exists, which depends on the sex ratio at birth (SRB) of males to females and male age-specific mortality (details in SI), which is rarely equal to female age-specific mortality and may fluctuate more, for reasons like warfare.

Substantial work in historical demography and life history theory places constraints on the plausible values of Wood (1998); Jones (2009); Jones and Tuljapurkar (2015), or (if adopting a Bayesian approach) can be used to specify priors on . To allow for this we assume there exist distinct types of annual, reference population projection matrices. For example, one type could be appropriate during a famine or other stressing event, while another corresponds to a stable population size in which high mortality (especially high juvenile mortality) is balanced by high fertility. Let be a probability vector of length that gives the probability of being in a given reference state. Transitions between these hidden states occur per


where the transition matrix depends on a slowly changing, exogenous climatic variable

, which we propose modeling as a binary or categorical variable that only occasionally changes given the fidelity at which climatic reconstruction is possible (hence, the Markovian assumption applies only so long as

is fixed).

What is appealing about the preceding framework is that it allows a diversity of data to be used to infer and . This has immense value for two reasons. First, it is usually better to use more data (and never worse, if the statistical model is good). Second, while any given type of data is likely biased in distinct and predictable ways (e.g., skeletal collections usually have too low a proportion of the very young, whereas very old individuals are incorrectly aged), these biases are usually different for each data type and uncorrelated across samples, so inference will be far better when multiple types of data are used and sample sizes increase.

This model could be especially valuable if applied to archaeological case studies for which a diversity of factors and variables have been linked. A pertinent example is the so-called Classic Maya Collapse. Between about AD 750-1000, various parts of the Maya cultural region in Mexico and Central America experienced a breakdown of sociopolitical systems. Associated with this were major demographic changes (migrations and, likely, overall population decline), changes in long-distance trade routes, intra-elite competition, warfare, and environmental degradation, all in the context of severe drought Webster (2002); Aimers (2007); Kennett et al. (2012); Turner and Sabloff (2012); Hoggarth et al. (2017). Exactly how these factors are linked remains an open question, yet a stochastic process model would likely help make these linkages explicit were it to be correctly fitted to the data. For example, drought has been proposed as the major, causal factor leading to the breakdown of sociopolitical systems, but others argue that all these factors, and more, played some role; using our proposed modeling framework would help elucidate the causes.

Iii Previous studies recast in terms of stochastic process models

iii.1 Integrating Different Data Types for New Insights

Like demographic processes, environmental processes have long been seen as catalysts of past social change. In particular, environmental processes are central to research on the history of human-centered food webs Crabtree et al. (2017a); Dunne et al. (2016). The benefits of bringing a stochastic process perspective to this research can be illustrated with the investigation in Crabtree et al. Crabtree et al. (2019). That research focused on the question of why small-bodied mammals went extinct in portions of Australia after people were removed from the Western Desert.

In the absence of humans, animal populations are often modeled as evolving via Markovian processes Meyn and Tweedie (2009). However, Crabtree et al. (2019) demonstrated that the extinctions experienced in the Western Desert would not be predicted from the natural baseline extinction rates of such a process. Adopting the perspective of this paper, this result highlights an important modeling issue: did the extinction of small-bodied mammals reflect a change in a hidden variable of an underlying Markovian process, or was it due to a change of a parameter of such a process, via an exogenous perturbation? This distinction is critical because there are statistical techniques for predicting the evolution of Markovian processes with hidden variables, such as time-series analysis, described above. It might be possible with such a model to predict a change in the dynamics of animal populations by looking at time-series data of their dynamics. However, by definition, if the change in the dynamics was due to an exogenous perturbation, then it cannot be predicted from any time-series data, no matter how exhaustive.

In this particular case, it seems unlikely that the change in the dynamics of the animal population level can be accurately modeled as the change in the value of a hidden variable, e.g., by using a time-series model of the dynamics of . In other words, it seems likely that the change in the dynamics of the animal populations reflects an exogenous perturbation of the parameter governing those dynamics. However, this has yet to be formally established. In addition, a more elaborate stochastic process model might expand to include some variables concerning the people who eventually forced aboriginal peoples to leave. In such a model, changes to the dynamics of animal populations might reflect the “second type of jump”, described above in the discussion just below (1). If that were determined to be the case, it would mean that future values of might be predictable, and therefore in particular future values of the animal population level might be predictable.

As another example, Yeakel et al. Yeakel et al. (2014) leveraged Egyptian art by coding the taxa present from the Narmur Palette and other datable art work to model animal species extinctions over time. By combining these data with contemporaneous climate data, they found that aridification pulses in the context of a growing Egyptian population played an important role in destabilizing the ecosystem. In the language of stochastic processes, if we take to be the combination of human and animal population levels, we see a trend as growing human populations exerting increasing pressures on the ecosystem. These pressures were exacerbated by exogenous perturbations in the form of a series of aridification pulses and a worsening climate.

This example suggests that it may be possible to look at the dynamics of human and animal populations in other regions and infer when aridification pulses, or indeed other climatic events, likely occurred, by testing for exogenous perturbations to the trends in the dynamics of the joint human / animal population levels. In the original context of ancient Egypt, where we already have identified some exogenous perturbations in the form of aridification pulses, we could perhaps test for the presence of other exogenous perturbations, e.g., due to invasions, or perhaps even due to causes currently absent from the historical record. More broadly, a stochastic process framework could suggest a new avenue for research for Egyptologists: the impersonal trends of extinction and exogenous perturbations of aridification likely had feedbacks on Egyptian society, from impacts on hunting to impacts on cosmology. Exploring how the societal impacts at a given time varied depending on whether there was only an underlying trend or also an exogenous perturbation at that time could lead to new insights on the dynamics of Egyptian social structures.

iii.2 Stochastic processes over the structure of language and over how language is used

Stochastic processes that are themselves superpositions of Markovian time evolution and branching processes describing the temporal evolution of features in systems that from time to time break up into parts that evolve (largely) independently thereafter. Such processes underlie the history of biology and human languages and are the base of phylogenetics. Data on cognates across many languages, together with a small number of historical anchor points, allows for the reconstruction of language trees and assign approximate dates of divergence. A careful, quantitative analysis of Polynesian languages, for instance, revealed some periods of “stasis” in the history of subfamilies that correspond to distinct phases in the settlement history of the Pacific Islands Gray et al. (2011). Similar arguments have shed light on the expansion of Indo-European language families. Analogous reconstructions of historical population distributions have been conducted using human genetic markers. The idea to integrate genetic, linguistic, and archaeological data is of course not new Scheinfeldt et al. (2010), although so far this has been done at the level of interpreting the data rather then integrating them into a common process model. The latter would be desirable in particular because linguistic changes are strongly impacted by contact phenomena such as borrowing, and even correlate with extra-linguistic factors such as the prevalence of agriculturally produced food Blast et al. (2019)

. At present, quantitative studies in linguistics are almost always based on data sets that are the result of extensive manual curation — although modern methods in natural language processing might be suitable or at least adaptable to tap into much larger resources

Bhattacharya et al. (2018).

Modern large-scale digitized historical archives often provide high-resolution data on the words that individuals—often, though not always, members of a polity’s elite—were writing and sharing with each other. Tools from natural language processing are now both simple enough and robust enough to allow social and political scientists to track the flow of complex patterns of language use that correspond to concepts and habits of thought.

Ref. Barron et al. (2018), for example, used the French Revolution Digital Archive ( to show the different roles that members of the French revolutionary parliament played in introducing, sustaining, or rejecting novel ideas in the speeches of that country’s constitutional debate. These ideas were discovered in an “unsupervised” manner; rather than pre-determining a list of important ideas, and words that corresponded to them, topic modeling automatically extracted word patterns that could then be back-validated on the speeches themselves. Discovered topics include, for example, concepts as fine-grained as “the possibility of enemies of the revolution within the military”.

iii.3 Analyzing the Great Acceleration

We are currently living in the Anthropocene. One of the more striking characteristics of this period has been the exponential growth in a large number of important metrics in earth and social systems since the beginning of the 1950s. These growth dynamics are often collectively referred to as the “Great Acceleration” Steffen et al. (2004). What caused these transformations is less clear however.

All the metrics that have contributed to the Great Acceleration are, however, dependent on population size, which itself has been changing during this period, to no small degree as a consequence of growth in these metrics. This suggests we fit our data concerning those factors with a stochastic process model, to gain quantitative insight into their joint evolution (see appendices for examples of such a model). In particular, because of the specific growth patterns of the variables, it is natural to fit the data to a stochastic process over the logarithms of the variables. When applied to historical data sets, this variant of stochastic process models is known as historical or temporal scaling analysis. Not only can it give us insights into the joint dynamics of population size and the metrics considered by researchers of the great acceleration. In addition, when performed for different temporal intervals, it gives us insight into changes of those joint dynamics. This then invites us to investigate whether those changes in the dynamics are due to exogenous perturbations and /or due to changes in hidden variables (as in the analysis of HMMs in Sec. 2II.4 and / or due to the system reaching a new point of its state space (as in the case of hinge points, considered above in Sec. 2II.1).

To make this more concrete, suppose that we find that the scaling coefficient remains stable across time. That would be evidence for a simple trend of the underlying stochastic process. Suppose, as an alternative, that the scaling coefficient changes over time. This is evidence for one of three phenomena: either a concurrent change in some exogenous factor driving the dynamics (like a change in climate, or a major drought, or the Green revolution of agriculture); a concurrent change in a hidden socio-political variable generating the observed historical dynamic (like a major war, a world-wide depression, or changes in financial regulation) or simply the system reaching a critical threshold — a new part of the socio-political space — in which the dynamics are different. This illustrates how the stochastic process perspective could invite a host of new investigations, if it was found that the scaling coefficients changed over time.

A recent analysis of the dynamics of the Great Acceleration Painter et al. (2020) we determined that socioeconomic, technological, information, biological and earth systems, when scaled to population size, sort into four distinct growth patterns. These patterns suggest fundamental differences in the underlying network of interactions causing the observed patterns of change. Besides the already known sub-linear, linear and super-linear patterns we detected a novel fourth pattern for the Great Acceleration, with scaling coefficients larger than 3, which is unusually large. This pattern applies to parameters that are no longer limited by person to person interactions such as those related to knowledge, technology and finance. These parameters can be interpreted as the main drivers behind the ever accelerating growth dynamics of the Great Acceleration. Decade by decade comparison of these parameters also revealed patterns of change during this time period that correspond well with the possible sources of change that can influence the dynamics of an underlying stochastic process.

Whether an event or a discovery (such as those contributing to the Great Acceleration) is considered a transformative innovation or just a random occurrence affecting a stochastic process can often only be decided in hindsight. What can be done, however, is to asses to what extent such events were surprising or predictable, given the context of the times–here represented by an underlying stochastic process model. Emerging machine learning methods, especially deep learning neural networks, have succeeded at dramatically improving the prediction of quantities across a wide range of contexts. These models have begun to enter the social and historical sciences as complex event and trend prediction

Bainbridge (1995); Davidson (2017); Carley (1996); Zeng (1999); Maltseva et al. (2016); Zhan et al. (2018); MacLeod et al. (2016); Baćak and Kennedy (2019). We can apply this methodology to a number of different events, such as a regime change, adoption of a technology or the emergence of an institution with a particular function (e.g., urban garbage collection). Prediction targets can also be more complex and specific. It has been applied to predict the location, amount, denomination, and material composition of an archaeological cache of money and precious metal hordes, and the distribution of mints from which the coinage derives. This might only be limited by the area, such as surrounding the Mediterranean, and a period of time to which the hordes date (see appendices for a more in depth discussion of these methods).

Iv Future work and “aspirational” case studies

iv.1 Combining time-series’ of hundreds of random variables

As far more data sets become available we will be confronted with combining multiple time series of systems evolving from up to hundreds of different variables, including not only the sorts of variables recorded in Seshat, but also completely different kinds of variables, like lead levels in glaciers, or tree ring widths as climate proxies, to large-scale characteristics of polities that others have discussed before. We will be combining these with wholly different kind of time series, including things like word usage patterns in historical documents, structures in legal codes, time series of fashions, ecosystem characteristics, etc., all concerning different random variables.

As always, great insights will accrue from fine-grained analysis, in which domain experts deeply scrutinize only a few of these time series at once. However, there is also the great allure of integrating all of these myriad time series in a systematic, statistically principled way, to uncover unanticipated connections and insights. Indeed, ultimately, one would want to be able to integrate all of these time series into a single underlying stochastic process, with associated error bars. However, there are many analytical challenges to doing that, due to the sheer number of these times series, and the sheer breadth of the types of associated random variables.

As an important example of such a challenge, how does one infer causalities in a statistically principled way among hundreds of time series, all involving different random variables? For example, what techniques would allow us to uncover statistically significant causal connections relating time series ranging over hundreds of different spaces? Can we scale up techniques like Granger causality from econometrics Granger (1969) or transfer entropy and directed information from information theory, to such large numbers of spaces Schreiber (2000); Amblard and Michel (2013)?

As another example, can we extend breakpoint analysis / change detection to involve multiple time series, in order to find non-stationary breaks in the underlying dynamic process? Such breaks would reflect some exogenous events, perturbing the underlying dynamics of the system. So, for example, such a breakpoint might indicate that a particular leader of a state was indeed consequential, rather than just being a “product of their time", in that they perturbed the underlying dynamics of sociopolitical processes — we might conclude from such a breakpoint that a leader had literally changed the course of history.

iv.2 Leveraging agent-based models for stochastic process modeling

Agent-based models (ABMs) are becoming increasingly instrumental for investigating processes that leave limited physical traces in historical and archaeological data, such as work examining the dispersal of hominins out of Africa Romanowska et al. (2017). Typically such models involve the dynamics of variables that are hidden, in that they do not directly appear in the historical data. For example, Crabtree and colleagues Crabtree et al. (2017b) and Kohler and colleagues Kohler et al. (2018) built a model for sociopolitical evolution in the North American Southwest in which territorial groups undergoing population growth may succeed in subjugating or merging with other groups via warfare or intimidation. Networks of flows of maize as tribute are among the many outputs from these models. These flows constitute hidden variables have not been observed directly in the archaeological record.

Adopting this perspective on ABMs suggests many new ways they can be combined with stochastic process modeling. Most obviously, if our original data set has lots of missing values, by fitting the parameters of an ABM to match the data we do have, we could use the values the ABM assigns to the variables with missing data as estimates of those missing data values. In this sense, ABMs can sometimes be used as a variant of the technique of imputations, used so extensively to deal with missing data in the original Seshat analysis Turchin et al. (2018a). As another example, we could sample the values of the hidden variables in the ABM model at regular time-intervals. By adding those samples to the original data set, we could produce an estimated data set in a much larger space than the original data set. For example, if we fit the parameters of an ABM model so that it reproduced the data in Seshat, then we could sample the variables of the ABM model at single century-intervals, and add those sampled values to Seshat, to produce a dataset in a much larger space. We could then perform any of the stochastic process analyses described above to this augmented data set. For example, we could perform PCA on this augmented data set, and examine how the PC2 of this new PCA varies as the new PC1 increases, perhaps finding a more elaborate version of the hinge points that were found by analyzing the original Seshat data set.


The concept of history unfolding stochastically is not new; in the context of the history of life on Earth, Stephen J. Gould famously asked what would happen if we could “replay the tape”, which implicitly supposes that an underlying stochastic process generated that tape Gould (1989). Similarly, stochastic process modeling of environmental dynamics has been used to infer exogenous perturbations to the dynamics of human social systems Malik (2020). In addition, the phylogenetic tree reconstructions of human language dynamics discussed in Sec. 3III.2 have been used as a “clock” to infer the dynamics of socio-political phenomena Currie et al. (2010); Sheehan et al. (2018). There has also been some work directly applying time-series analysis techniques to socio-political datasets Turchin (2018).

These are isolated instances though, rather than a systematic scientific program. Here we propose something more fundamental: that by grounding our investigations of human social dynamics in stochastic process models, we can not only better investigate the historical record, but also begin to unify the myriad approaches that have been championed for analyzing that record. Such a program would also potentially allow us to detect drivers for the historical processes that generated that historical record — in particular, drivers that had not already been anticipated in social science models. This might allow the data to drive our formulation of social science models, as an adjunct to the more conventional approach under which we analyze datasets only after we first formulate models (e.g., based on intuitive insight and / or on analogizing with models from other scientific fields). Crucially, as we illustrated with the examples above, both the data sets and computational tools necessary for this vision to become a reality are now coming into being.

It is important to emphasize that we do not argue that one specific stochastic process we have identified generates the dynamics of history. (Indeed, we expect that it will be most fruitful to view history as multiple, interwoven stochastic processes, all with different characteristics.) We are not even advocating whether a time-homogeneous process or time-inhomogeneous process be considered. Ultimately, as in all statistical analysis, the choice of model to fit to the data is governed by considerations of number of data, size of the space of variables, types of variables, etc., with cross-validation used to help winnow the options. (See SI Section IA.)

We are also not advocating that one specific state space be used to model the stochastic process(es) of human history. Nor are we arguing which subsequent analyses should be applied to stochastic process(es) models inferred from historical data, e.g., to uncover possible causal relationships among historical variables.

More generally, we are also not arguing that all of historical analysis should be formulated in terms of stochastic processes. Statics, as opposed to dynamics, is also (obviously) an extremely important aspect of historical analysis, as history is not only concerned with time series analysis, but also with revealing the internal structures of societies and the patterns of their interactions at any given single point of time. Indeed, even in those sciences where all phenomena are based on a single dynamic law, like quantum physics (Schrödinger’s equation), much research focuses on statics rather than dynamics.

Finally, we note that a stochastic process formulation is also central to the other historical sciences, ranging from biology to meteorology to geology. So not only does this perspective allow us to unify the analyses of computational history, it also allows us to align how we investigate human history with how it is done in the other historical sciences.

Acknowledgments We thank S. DeDeo for useful conversations in the early stages of formulating this paper. This paper developed out of a Working Group convened by the Santa Fe Institute in January 2020, supported in part by the National Science Foundation under Grant No. SMA-1620462 to D.H.W and T.A.K.

Appendix A Types of dynamics

a.1 Which stochastic process model to use

Although we believe their are many advantages to using stochastic processes, there are also potential limitations and important considerations in choosing the type of stochastic process model to use. In this appendix, we discuss some of these issues, and provide further details on examples discussed in the main text. In many cases, the examples offer a concrete illustration of the limitations and pitfalls that we must otherwise discuss fairly abstractly (and, of course, of the benefits). One major set of issues has to do with the sparsity of archaeological data, which means that a formal stochastic model may not capture every salient aspect of the socio-political-environmental dynamics, and implies that, a priori, we should not necessarily propose detailed models with too many parameters and explicit features. This is one benefit using first order Markov models rather than more complex models with more delays. This problem can be somewhat mitigated by using prior information in a Bayesian modeling framework.

We should regardless be aware of the shortcomings of first order Markov models. We treat potentially deterministic fluctuations whose underlying causes we do not grasp as stochastic. Some models may be “blind” to details of human agency. Since the data and models operate at a rather coarse level, as will be discussed below, it is possible to violate the Markovian assumption. Similarly, even though the some underlying variables, such as humans, polities, dollars, etc., are discrete, we often work with continuous state spaces to simplify the mathematics. Fortunately, this is usually a good and legitimate approximation. In spite of these and other shortcomings, these first order Markov models offer crucial advantages, as they allow us to capture dominant features and qualitative aspects in a robust manner. They can be flexibly adapted to add complexity when new data come in. They are simple enough for understanding the dynamics they generate. They may allow us to identify driving forces and critical turning points in historical processes.

There are many variants of the basic first-order Markov process given in Eq. 1. For example, if the state space is discrete rather than real-valued, then the integral in Eq. 1 gets replaced by a sum. If in addition one models a historical process as evolving in discrete time, e.g., years, then the derivative on the LHS of Eq. 1 gets replaced by a discrete-difference. In fact, that is also the setting of the formal example discussed below, as well as the discrete-time Markov chains discussed in Section 2B and Section 2D of the main text.

It is important to note though that there are subtle assumptions that arise if we use a discrete-time Markov chain. It turns out that a sizeable portion of all discrete-time Markov chains are theoretically impossible, if one presumes that the true underlying Markov process is actually continuous in time. As a striking example, suppose we have a system with only two states, , e.g., due to coarse-graining. The simple discrete-time Markov chain that flips the two possible states, sending , cannot even be approximated as arising from an underlying continuous-time Markov process over those two states Owen et al. (2018); Wolpert et al. (2019). In fact, the set of all discrete-time Markov chains that cannot even be approximated with a continuous-time Markov process has nonzero measure (according to any of the usual measures over the space of stochastic matrices defining discrete-time Markov chains). The same is true if we restrict attention to discrete-time Markov chains that (unlike the bit flip) are highly non-deterministic. Since the physical world is in fact continuous in time, this means that if one wishes to fit a discrete-time Markov model to time series generated by some evolving physical system — including sociopolitical systems — one should exercise great care, to avoid accidentally selecting a Markov chain that is physically impossible.

In addition, even in the context of continuous-time models, the assumption of a first-order Markov process is a very strong one. Formally, it means that knowledge of the current state of suffices to compute the probabilities of its future states. This amounts to assuming that due to the nature of the variables in , knowledge of past values of will not lead to better predictions of future values, beyond knowing the current value. This assumption is satisfied when all relevant details of the state are known. However, if we coarse-grain the state, that is, lump similar values together into some kind of coarse or macro state, then the dynamics of those states need no longer be Markovian. (See Pfante et al. (2014) for a systematic analysis of coarse graining.) Likewise, if there are hidden variables that cannot be directly observed, but influence the dynamics of the observable state , the latter’s dynamics need not be Markovian. Generally, when our information about the current state is incomplete, be it due to coarse graining, hidden variables, or some other factor, we may have to draw upon the memory of the states to make better predictions.

Ideally, we could do this by fitting a higher-order Markov model to the data. Such an approach is closely related to delay-embedding techniques Bradley and Kantz (2015). (Note that delay embeddings capture chaotic dynamics, which is not possible with first-order Markov processes.) However, in practice, the amount of data one needs to fit an order- model grows exponentially with the the size of the space and the value of — using a larger value of will result in a poor statistical fit. Especially in the context of fitting historical data-sets, where data is quite sparse, this can mean that for purely statistical reasons we have to either choose , or adopt a careful Bayesian analysis if we wish to fit the data with an model. (However, see Malik (2020) for a recent example of trying to fit higher-order models with non-Bayesian methods even when data are sparse, in the specific context of historical data.)

In practice, it is probably most common to use cross-validation to determine

, as well the other hyperparameters in one’s model, even if one adopts a Bayesian approach. It is worth noting that there are alternative approaches though, which don’t involve cross-validation. For example, in a hierarchical Bayesian approach, one would average over the hyperparameters according to a hyperprior. As another example, one could set hyperparameters using the semi-Bayesian approach of ML-II. (As a technical comment, the use of Bayesian “Occam factors” should not be used to choose

, since they build in a bias to low-dimension models; see Wolpert et al. (1995).)

a.2 Noise versus chaos versus bifurcations

Often stochastic processes can be viewed as a deterministic evolution of the variable with noise superimposed. In particular, for a broad class of functions , Eq. 1 in the main text, which involves the dynamics of a time-dependent probability distribution , can be reformulated as a noisy equation for the time-dependent state of the system, :


where the is the Wiener noise process, and the functions and are determined by . (This is the “Langevin equation”, discussed in the text.) can be viewed as the deterministic dynamics of the variable , with determining the amount of noise superimposed on that dynamics.

The conceptual distinction between deterministic and stochastic dynamics can get blurred in practice because the deterministic dynamics may be chaotic and therefore appear random. Chaos essentially means that very small fluctuations can get amplified so that from very similar initial conditions very different endpoints can be reached after a long enough time, even if the dynamics are deterministic. Fortunately, people have developed sophisticated methods to distinguish chaotic and stochastic components in time series Kantz and Schreiber (1997). The formal concept of a stochastic process can accommodate both deterministic and stochastic features. In such a process, the probabilities for a state variable change in time according to some rule that is described by some parameter . The state is observed, while the parameter defines the model and can only be statistically inferred, but not directly observed. itself may also change, in which case the process is called non-stationary, but usually on a slower time scale than . In nonlinear dynamics, the qualitative properties of the dynamics may change at particular values of . One speaks of bifurcations. That is, a very small variation of the parameter can send the dynamics into completely different regimes. Thus, while in chaotic dynamics, the future of a trajectory may depend very sensitively on the initial conditions, at bifurcations, the dynamics depends very sensitively on a parameter value. It is clearly important to identify such bifurcation points in historical dynamics.

Appendix B Previous examples considering history as a dynamic process

For some time a few archaeologists and historians have been graphing behaviors of societies through time in small state spaces, usually considering two variables at a time. Such phase plots implement part of the program we discuss here, since they make it possible to describe trajectories through time in these small state spaces, though they do not generally attempt the fundamental step of formulating the stochastic process model that underlies the behaviors such plots reveal. They are descriptive, graphic devices that do serve to identify semi-cyclic tendencies and possible discontinuities through time. Examples include Kohler et al. (2009), who examine the frequency of interpersonal violence against population size through time, and Reese et al. (2019), in which the number of communities and the population size of a study area are plotted against each other, using line colors and symbol sizes to put these into the contexts of estimated maize production levels and average community sizes (Fig. 4). In the case considered in that figure, a population bearing a new sociopolitical system intruded on this area between the AD 1040 and 1080. From the perspective of this study area, that immigration can be considered as an exogenous perturbation that changed the subsequent evolution of the social and settlement system by (among other things) allowing larger communities to be supported (hence, changing ).

Peter Turchin and colleagues have investigated the dynamics of agrarian states with special attention to their demography, social structure, surplus extraction, and sociopolitical instability Turchin and Nefedov (2009). These variables, and their proxies, are expected to exhibit a patterned relationship with each other through time based on theory developed in Goldstone (1991), Turchin (2003). These examples go further than the archaeological cases mentioned above by positing formal models, though there is no attempt to quantify the fit between any of these models and the empirical examples, or to derive models directly from the data.

[width=.9]REESE_ET_AL_figure-6.pdf .

Figure 4:

Relationship through time between number of communities and population on the Mesa Verde cuesta, Colorado shown in z-score space across the periods used by the Village Ecodynamics Project. This area was first densely colonized by farming populations ca. A.D. 600 (at bottom left of figure, unlabeled), and was completely depopulated by ca. AD. 1285. Each point is plotted at the midpoint (AD) of each of the 14 periods recognized; the size of each bubble is proportional to average community population at that time. A 20-year smoothed maize productivity niche is shown on a red (low productivity) to green (high productivity) spectrum. Source:

(Reese et al., 2019, Figure 6)

Appendix C Using PC1/PC2 hinges as reference points for the appearance of moralizing gods

So long as one can evaluate the value of Seshat’s PC1 for the polities recorded in that data set, one can quantify how “complex” those polities were when they underwent events recorded in that data set. In earlier work with Seshat this was done simply by identifying PC1 with complexity. In particular, Whitehouse et al. (2019) used a time-series fit to Seshat data to determine when “jumps” occurred in the complexity of individual polities. The times of those jumps were then compared to the time of appearance of Moralizing Gods (MGs) in the polity.101010Presence of moralizing gods has now been added as variable in Seshat, to allow this analysis by Whitehouse et al. (2019), but was not a component of the PC values discussed in this section. The conclusion was that jumps in complexity are preconditions for the appearance of MGs.

Note that this approach assumes that all intervals in PC1 space correspond to the same amount of “social complexity”, since it identifies large changes in PC1 during small times as “jumps in complexity”. In addition, in order to assign a time of the complexity jump to any particular polity based on its time-series, which is then used to determine whether the jump was before the MG onset for that polity, any time-series was discarded from the data set unless it had a continuous sequence of PC1 values stretching to before the MG onset. This introduces statistical artifacts.111111To see this, as a null hypothesis, suppose that all jumps in social complexity occur during the interval from to in PC1. Suppose as well that MG onset PC1 values occur independently, by sampling a Gaussian centered on PC1 . Finally, suppose that due to archaeological artifacts, half of all time-series have a continuous sequence of PC1 values stretching back only to PC1 , and the other half stretch back to before PC1 . Then it will appear that there is probability of MG onset occurring after the jump in social complexity, even though the real probability is .

The discovery of hinge points provides an alternative way to investigate the relationship between jumps in social complexity and the appearance of MGs, without these two shortcomings. As shown in Figure 5, MGs arise in polities both before or after the polity undergoes a “jump in complexity”. At least based on the complexity thresholds found by analyzing Seshat (Shin et al., 2020), there are instances in which MGs arise before the threshold at PC1 , and instances in which MGs arise after the threshold at PC1 .

From a social science perspective, this suggests that a discontinuity in social complexity neither causes the onset of MGs nor requires them, contradicting several other analyses in the literature (including one that was based on the same Seshat dataset). In terms of building an underlying stochastic process model, these results are broadly consistent with the hypothesis that the onset of MGs is a jump of the first kind, where a large change of occurs with small but non-negligible probability under the transition matrix .121212Intuitively, this hypothesis is similar to supposing that the onset of MGs is a Poisson process, with a rate that is non-negligible across a broad span of PC1 values. We emphasize though that we have not yet done a proper statistical test of this hypothesis.

Note though that precisely because such a Poisson process is independent of the current value of , as well as previous values, it is very challenging to distinguish the hypothesis that the onset of MGs is a Poisson process from the hypothesis that it is be an exogenous perturbation. This is a significant difference from the hinge points themselves. Because the thresholds captured in those hinge points do depend on the current value of , (by definition), and since the precise value of differs widely from one polity to another, those hinge points seem much more certain to be a jump, albeit of the second kind.


Figure 5: TBD

Appendix D A 1-dimensional random walk as a simple stochastic process

We illustrate stochastic processes using perhaps the simplest such process there is. A random variable that at every discrete time step assumes some integer value, and from one time step to the next can change its value by , The probability of going up is , and that of going down hence . Such jumps at different times are independent of each other. Only , but not can be directly observed. can only be estimated from the observed data of . Suppose we observe a finite time series of points generated by sampling such a process, but cannot directly observe . If changes at some time – an exogenous perturbation that we do not observe – then the dynamics after will be different. But based only on the time-series, no matter how different it may look before and after , we cannot definitely conclude that changed, either at or some other time. We can only suspect such a change when the time series starts to look qualitatively different. The best we can do is statistically estimate that such a change occurred. Fortunately, there do exist powerful techniques for such estimates.

Next, suppose that never changes, but we coarse-grain , into bins of width . Then if we know that the system is currently in the bin , in order to predict the probability that at the next time step the system will be in bin we need to estimate the relative probabilities of which precise point in the system is currently in. In general, we will assign a non-zero probability to the event that the system is currently at the precise point , and therefore assign a nonzero probability that at the next time step the system is in . On the other hand, if we also know that at the previous time step the system was in the bin , then in fact we know the system is currently at the point , with probability , and so cannot be in the bin in the next time step. So knowing something about the past state of the system, in addition to knowing its current state, changes the relative probabilities of its future states. This illustrates that coarse-graining the observed time-series will in general change a Markovian dynamics into a non-Markovian one.

Appendix E Seshat: Global History Databank

Begun in 2009, the Seshat project has pursued the ambitious goal of developing a dataset to test theories about sociocultural evolution by cataloging the development of human civilization from the dawn of the Neolithic to the Industrial Revolution Turchin et al. (2015). Two foundational elements of Seshat are the Natural Geographic Area (NGA) and polity. An NGA is a roughly 100 km by 100 km geographic area delimited by natural geographical features such as river basins, coastal plains, valleys, islands, and so forth Turchin et al. (2018c). A polity is an independent political unit that controls territory, and can range in size from small groups organized in local, independent local communities to territorially expansive, multi-ethnic empires. At any given time, exactly one polity controls an NGA, though that controlling polity may have its base or capital outside the NGA. For example, the Konya Plain NGA is located in the Central Anatolia Region of contemporary Turkey and has an area of 28,900 square km. It was controlled by the Hittite Empire (a polity) in 1000 BC and by the Eastern Roman Empire (also a polity) in AD 500. Seshat data used in recent analyses (i.e., the CCs in the World Sample 30; see below) are coded at 100-year intervals. Thus, a useful way to think about the Seshat data is as a data matrix in which each row consists of an NGA-polity-time triplet.

The Seshat database contains an ever-growing set of variables, now well over 1500, and cases, coded through time for each polity in consultation with archaeological and historical experts. For example, for the Hittite Empire in 1000 BC, the following variables related to money have these codes:

Neo-Hittite Empire in 1000 BC (Population 1.3 - 2.0 million)

Tokens Absent
Precious Metals Present
Foreign coins ??
Indigenous coins ??
Paper currency Absent

For comparison, when the Konya Plain was controlled by the Eastern Roman Empire in AD 500, it had a population of about 15 million people.

Seshat is constantly evolving as new data are added and old data re-assessed. This includes the addition of new variables and new NGAs. Recent articles that utilize the Seshat database have used a fixed version with 30 NGAs called the World Sample 30. NGAs for this sample were chosen to maximize geographic extent and diversity in social organization. In particular, 3 NGAs were chosen from each of 10 world regions (Africa, Europe, Central Eurasia, Southwest Asia, South Asia, Southeast Asia, East Asia, North America, South America, and Oceania-Australia) and the 3 NGAs in each region were selected such that sociopolitical complexity arrived relatively early, intermediate, and late. For this World Sample 30, recent analyses have also utilized a summary version of the dataset in which 51 variables were aggregated into nine distinct variables called ?Complexity Characteristics? (CCs):

  1. Population: Population of the entire polity

  2. Territorial Area: Territorial extent (area) of the polity

  3. Capital Population: Population size of the largest urban center (usually the capital)

  4. Hierarchical Levels: Number of types of settlements (e.g., hamlets to cities) and levels of administrative hierarchy

  5. Government: Aspects of government and bureaucracy, such as the presence of a legal code and merit promotion

  6. Infrastructure: Presence of bridges, roads, irrigation, etc.

  7. Information Technology/Writing: Presence and type of writing and recording systems

  8. Texts: Presence of specialized literature, including scientific texts, histories, calendars, fiction, etc.

  9. Money: The monetary system—presence of local/foreign currencies, paper currency, tokens of exchange, etc. (see Neo-Hittite example above)

Each of the CCs is normalized to lie between 0 and 1, and for each polity-time pair in the World Sample 30 there exists an observation for each CC. One challenge of working with the Seshat data is that, especially for earlier polities, there is insufficient evidence to code many variables—and often disagreements among experts.

Rather than assign a value for each CC and create one canonical imputation, the Seshat team assigned a distribution of possible values. These distributions are then sampled 20 times, to produce 20 replicates of the dataset (Turchin et al., 2018c, p. 7)

. The imputation is performed on each replicate, producing 20 different sets of CC values. These replicates are used to produce confidence intervals on the proportion of variance explained by each PC, the component loadings, and the values of the PCs for each polity.

Appendix F A hidden Markov model of age-specific demography

Consider an age structured population of reproductive females (including juveniles) where the age-class contains individual between and years of age, with being the age-spacing; it is straightforward to generalize this to males and post-reproductive females, as well as to account for other types of population structure, such as spatial location and socioeconomic statusRogers (1966); Caswell (2001). The column vector gives the number, or proportion, of females in each age-class at time step . is the age-specific fertility of females in age-class , accounting only for female offspring. The age-specific survival of females in age-class is . Given these definitions, the population project matrix for reproductive females is


where is the last reproductive age class. The population project equation is


These preceding equations succinctly summarize three effects: Females get older, that is, advance from one age class to the next, give birth to young females, and may die.

Stable demography: If demographic rates are stable131313Certain technical constraints on must also be satisfied. In particular, must be irreducible and primitive, which is almost always true. the population vector converges to a constant growth rate and stable age distribution. The growth “rate” (more precisely, the per period growth factor) equals

, the dominant left eigenvalue of

. The stable age distribution (

) equals the corresponding dominant right eigenvector and the age specific reproductive value (

) equals the corresponding left eigenvector Fisher (1958); Caswell (2001).

Time dependence: Rather than assuming a constant population projection matrix, we now add a subscript to allow time-dependence, . The time-dependent population projection equation is


One can easily expand the preceding model to a full demographic model that includes males and post-reproductive females. Parameters such as the sex-ratio at birth and age-specific mortality for the additional segments of the population can be inferred from known demographic statistics and/or observed data. The full demographic specification for time period is , where is the time-dependent vector of age-specific fertilities and is the time-dependent vector of age-specific survivals, and the demographic model for the period to is . An implicit assumption in this definition is that the population vector at time step is set assuming stable demography defined by the demographic specification . This assumption could be relaxed by letting be a free parameter in the demographic model .

f.1 From demographic model to likelihood and a hidden Markov model

To link the preceding model to the stochastic process framework suggested in the main text, we now interpret the population projection matrices as being determined by latent variables in a hidden Markov model. Further, we assume that there exists an external variable, let’s say for the sake of concreteness a categorical climate variable (likely slowly changing) that can be directly or indirectly observed, indexed by , where for each climate state, , different transition probabilities apply:


where is an indicator variable for the categorical climate state and is the transition matrix that applies for climate state . There are many alternatives to this approach, including modeling the projection matrices as depending on a continuous, exogenous climate parameter vector . However, the model we describe is sufficiently simple to be motivating yet sufficiently complex to be realistic. We assume that there exists a set of reference dynamics indexed by , where each reference dynamics, , is for a distinct annual demographic state – for example, one could correspond to famine, another warfare, etc. Next define the vector


to be the population dynamics state, , for the time periods through . Given the preceding formulation, it is straightforward to calculate the probability of any given given the set of transition probabilities (to be inferred) and the set of reference dynamics (pre-specified); we denote this probability by , where we indicate the set of transition matrices with and the set of reference population project matrices with . One can then sum over the set of valid vectors and use Equations 5 and 6 to calculate the probability of each

. Finally, this can be linked to available archaeological data to calculate a likelihood function, which can be used for either maximum likelihood estimation or Bayesian inference. For example, given a set of radiocarbon determinations (e.g., as in

Price et al. (2020)) one can assume that the probability a given sample is from a given year is proportional to the total population size in that year, the sum of the elements of . Similarly, if skeletal age-at-death is known, and a rough date estimate is available from associated artifacts, one can calculate the relative probability of being in the pertinent age-class (and sex-class if the model is suitably generalized) from the population vectors by summing across years as determined by the associated artifacts. Naturally, one can use multiple types of data as part of a single likelihood calculation to improve inference of the underlying model parameters; a major benefit of the approach we have described is that it is straightforward to accommodate additional types of data in the likelihood calculation in order to further improve inference (e.g., isotopic data to inform on migration, health data such as from linear enamel hypoplasias (LEHs) to improve inference on mortality, and both ancient and modern genetic data).


  • McAllister (2002) James W. McAllister, “Historical and Structural Approaches in the Natural and Social Sciences,” in The Future of the Sciences and Humanities, edited by Peter Tindemans, Alexander Verrijn-Stuart,  and Rob Visser (Amsterdam University Press, Amsterdam, 2002) pp. 19–54.
  • Sewell (1992) William H. Sewell, “A Theory of Structure: Duality, Agency, and Transformation,” American Journal of Sociology 98, 1–29 (1992).
  • Weiss et al. (1993) H. Weiss, M. A. Courty, W. Wetterstrom, F. Guichard, L. Senior, R. Meadow,  and A. Curnow, “The Genesis and Collapse of Third Millennium North Mesopotamian Civilization,” Science 261, 995–1004 (1993).
  • Strawhacker et al. (2020) C. Strawhacker, G. Snitker, M. Peeples, A. Kinzig, K. Kintigh, K. Bocinsky,  and K. Spielmann, “A Landscape Perspective on Climate-Driven Risks to Food Security: Exploring the Relationship between Climate and Social Transformation in the Prehispanic U.S. Southwest,” American Antiquity 85, 427–451 (2020).
  • Peregrine (2020) Peter N Peregrine, “Climate and social change at the start of the Late Antique Little Ice Age,”  (2020).
  • (6) Of course, our preconceptions and experience still affect our choice of which variables go into the historical data set in the first place. Our concern here is to able to minimize the role of such preconceptions in the analysis of our data sets, however those data sets were constructed.
  • Turchin et al. (2018a) Peter Turchin, Thomas E. Currie, Harvey Whitehouse, Pieter François, Kevin Feeney, Daniel Mullins, Daniel Hoyer, Christina Collins, Stephanie Grohmann, Patrick Savage, Gavin Mendel-Gleason, Edward Turner, Agathe Dupeyron, Enrico Cioni, Jenny Reddish, Jill Levine, Greine Jordan, Eva Brandl, Alice Williams, Rudolf Cesaretti, Marta Krueger, Alessandro Ceccarelli, Joe Figliulo-Rosswurm, Po-Ju Tuan, Peter Peregrine, Arkadiusz Marciniak, Johannes Preiser-Kapeller, Nikolay Kradin, Andrey Korotayev, Alessio Palmisano, David Baker, Julye Bidmead, Peter Bol, David Christian, Connie Cook, Alan Covey, Gary Feinman, Árni Daníel Júlíusson, Axel Kristinsson, John Miksic, Ruth Mostern, Cameron Petrie, Peter Rudiak-Gould, Barend ter Haar, Vesna Wallace, Victor Mair, Liye Xie, John Baines, Elizabeth Bridges, Joseph Manning, Bruce Lockhart, Amy Bogaard,  and Charles Spencer, “Quantitative historical analysis uncovers a single dimension of complexity that structures global variation in human social organization,” Proc. Natl. Acad. Sci. USA 115, E144–E151 (2018a).
  • Whitehouse et al. (2019) Harvey Whitehouse, Pieter Francois, Patrick E Savage, Thomas E Currie, Kevin C Feeney, Enrico Cioni, Rosalind Purcell, Robert M Ross, Jennifer Larson, John Baines, et al., “Complex societies precede moralizing gods throughout world history,” Nature 568, 226–229 (2019).
  • (9) In contrast, the insights from a high-order time-series analysis of the same data set are often more opaque, being of the form, “if the social group has had a particular sequence of characteristics in a succession of time periods, then its future dynamics is likely to be …”.
  • (10) Note that often we will want to have a large number of parameters in our model simply to allow it to capture how the stochastic dynamics varies across the vector space of characteristics of the group. That is just as true for a first-order model as a high-order one.
  • (11) To be more concrete, suppose that the stochastic process itself changes at time . This would mean that if we start the society at at some time , the likely ensuing trajectories would be different from what they would be if we instead start the society at at a time .
  • (12) As a technical comment, throughout this paper we view real-world data sets as being formed by noisy observations of trajectories , rather than incorporating the observational process directly into the stochastic process itself. So we do not consider the possible effects of time-dependence in the map taking to our observational data. In particular, for the purposes of this paper, we do not consider the many ways that archaeological data concerning events further in the past can be “more noisy” than recent archaeological data.
  • Takens (1981) Floris Takens, “Detecting strange attractors in turbulence,” in Dynamical systems and turbulence, Warwick 1980 (Springer, 1981) pp. 366–381.
  • Sauer et al. (1991) Tim Sauer, James A Yorke,  and Martin Casdagli, “Embedology,” Journal of statistical Physics 65, 579–616 (1991).
  • (15) Note though that such techniques can also be interpreted as estimating deterministic dynamics embedded on a high dimensional manifold in state-space Takens (1981); Sauer et al. (1991).
  • Lütkepohl (2005) Helmut Lütkepohl, New Introduction to Multiple Time Series Analysis (Springer Science & Business Media, 2005).
  • Tainter (1988) Joseph A. Tainter, The Collapse of Complex Societies, Vol. New Studies in Archaeology (Cambridge University Press, 1988).
  • François et al. (2016) Pieter François, JG Manning, Harvey Whitehouse, Rob Brennan, Thomas Currie, Kevin Feeney,  and Peter Turchin, “A macroscope for global history: Seshat global history databank, A methodological overview,”  (2016).
  • (19) This is because, by construction, the individual CCs represent separate complexity measures, PC1 “captures three quarters of the variability”. Note as well that PC1 has a roughly equal loading on all 9 CCs – that is, it is a weighted version of the individual complexity measures.
  • Shin et al. (2020) Jaeweon Shin, Michael Holton Price, David H. Wolpert, Hajime Shimao, Brendan Tracey,  and Timothy A. Kohler, “Scale and information-processing thresholds in Holocene social evolution,” Nat. Comm. 11, 2394 (2020).
  • Purzycki et al. (2016) Benjamin Grant Purzycki, Coren Apicella, Quentin D Atkinson, Emma Cohen, Rita Anne McNamara, Aiyana K Willard, Dimitris Xygalatas, Ara Norenzayan,  and Joseph Henrich, “Moralistic gods, supernatural punishment and the expansion of human sociality,” Nature 530, 327–330 (2016).
  • (22) We emphasize, though, that we have not yet done a careful analysis of exactly how much correlation there is.
  • Peregrine (2018) Peter N. Peregrine, “Toward a Theory of Recurrent Social Formations,” in The Emergence of Pre-Modern States: New Perspectives on the Development of Complex Societies, edited by Jeremy A. Sabloff and Paula L. W. Sabloff (SFI Press, Santa Fe, NM, 2018) pp. 271–295.
  • Tylor (1871) Edward Burnett Tylor, Primitive Culture: Researches into the Development of Mythology, Philosophy, Religion, Art, and Custom (John Murray, London, 1871).
  • Turchin et al. (2015) Peter Turchin, Rob Brennan, Thomas E. Currie, Kevin C. Feeney, Pieter Francois, Daniel Hoyer, J. G. Manning, Arkadiusz Marciniak, Daniel Mullins, Alessio Palmisano, Peter Peregrine, Edward A. L. Turner,  and Harvey Whitehouse, “Seshat: The global history databank,” Cliodynamics 6, 1–107 (2015).
  • Miranda and Freeman (2020) Lux Miranda and Jacob Freeman, “The two types of society: Computationally revealing recurrent social formations and their evolutionary trajectories,” PLOS ONE 15, e0232609 (2020).
  • Turchin et al. (2018b) Peter Turchin, Thomas E. Currie, Harvey Whitehouse, Pieter François, Kevin Feeney, Daniel Mullins, Daniel Hoyer, Christina Collins, Stephanie Grohmann, Patrick Savage, Gavin Mendel-Gleason, Edward Turner, Agathe Dupeyron, Enrico Cioni, Jenny Reddish, Jill Levine, Greine Jordan, Eva Brandl, Alice Williams, Rudolf Cesaretti, Marta Krueger, Alessandro Ceccarelli, Joe Figliulo-Rosswurm, Po-Ju Tuan, Peter Peregrine, Arkadiusz Marciniak, Johannes Preiser-Kapeller, Nikolay Kradin, Andrey Korotayev, Alessio Palmisano, David Baker, Julye Bidmead, Peter Bol, David Christian, Connie Cook, Alan Covey, Gary Feinman, Árni Daníel Júlíusson, Axel Kristinsson, John Miksic, Ruth Mostern, Cameron Petrie, Peter Rudiak-Gould, Barend ter Haar, Vesna Wallace, Victor Mair, Liye Xie, John Baines, Elizabeth Bridges, Joseph Manning, Bruce Lockhart, Amy Bogaard,  and Charles Spencer, “Quantitative historical analysis uncovers a single dimension of complexity that structures global variation in human social organization - supplementary material,” Proc. Natl. Acad. Sci. USA 115, E144–E151 (2018b), supplementary Material.
  • (28) A time-homogeneous Markov chain has a unique stationary probability distribution. That distribution can have multiple peaks, but the probability is of the system staying in one peak indefinitely. In that sense, the dynamics cannot have multiple basins of attraction.
  • Yildiz et al. (2018) Cagatay Yildiz, Markus Heinonen, Jukka Intosalmi, Henrik Mannerstrom,  and Harri Lahdesmaki, “Learning stochastic differential equations with Gaussian processes without gradient matching,” in 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP) (IEEE, 2018) pp. 1–6.
  • Khaldun (2015) Ibn Khaldun, The Muqaddimah: An Introduction to History-Abridged Edition (Princeton University Press, 2015).
  • Arfken and Weber (1999) George B Arfken and Hans J Weber, “Mathematical methods for physicists,”  (1999).
  • Rogers (1966) Andrei Rogers, “The Multiregional Matrix Growth Operator and the Stable Interregional Age Structure,” Demography 3, 537–544 (1966).
  • Caswell (2001) Hal Caswell, Matrix Population Models: Construction, Analysis and Interpretation, 2nd ed. (Sinauer, Sunderland, MA, 2001).
  • Wood (1998) James W Wood, “A theory of preindustrial population dynamics demography, economy, and well-being in malthusian systems,” Current Anthropology 39, 99–135 (1998).
  • Jones (2009) James Holland Jones, “The force of selection on the human life cycle,” Evolution and Human Behavior 30, 305–314 (2009).
  • Jones and Tuljapurkar (2015) James Holland Jones and Shripad Tuljapurkar, “Measuring selective constraint on fertility in human life histories,” Proc. Natl. Acad. Sci. USA 112, 8982–8986 (2015).
  • Webster (2002) David Webster, The Fall of the Ancient Maya: Solving the Mystery of the Maya Collapse (Thames & Hudson, 2002).
  • Aimers (2007) James J Aimers, “What Maya collapse? Terminal classic variation in the Maya lowlands,” Journal of Archaeological Research 15, 329–377 (2007).
  • Kennett et al. (2012) Douglas J Kennett, Sebastian FM Breitenbach, Valorie V Aquino, Yemane Asmerom, Jaime Awe, James UL Baldini, Patrick Bartlein, Brendan J Culleton, Claire Ebert, Christopher Jazwa, et al., “Development and disintegration of Maya political systems in response to climate change,” Science 338, 788–791 (2012).
  • Turner and Sabloff (2012) Billie L Turner and Jeremy A Sabloff, “Classic period collapse of the central maya lowlands: Insights about human–environment relationships for sustainability,” Proceedings of the National Academy of Sciences 109, 13908–13914 (2012).
  • Hoggarth et al. (2017) Julie A. Hoggarth, Matthew Restall, James J. Wood,  and Douglas J. Kennett, “Drought and its demographic effects in the maya lowlands,” Current Anthropology 58, 82–113 (2017).
  • Crabtree et al. (2017a) Stefani A. Crabtree, Lydia J.S. Vaughn,  and Nathan T. Crabtree, “Reconstructing Ancestral Pueblo food webs in the southwestern United States,” Journal of Archaeological Science 81, 116–127 (2017a).
  • Dunne et al. (2016) Jennifer A Dunne, Herbert Maschner, Matthew W Betts, Nancy Huntly, Roly Russell, Richard J Williams,  and Spencer A Wood, “The roles and impacts of human hunter-gatherers in North Pacific marine food webs,” Scientific Reports 6, 21179 (2016).
  • Crabtree et al. (2019) Stefani A Crabtree, Douglas W Bird,  and Rebecca Bliege Bird, “Subsistence Transitions and the Simplification of Ecological Networks in the Western Desert of Australia,” Human Ecology 47, 165–177 (2019).
  • Meyn and Tweedie (2009) Sean Meyn and Richard L. Tweedie, Markov Chains and Stochastic Stability (Cambridge University Press, 2009) pp. 4868–4879.
  • Yeakel et al. (2014) Justin D. Yeakel, Mathias M. Pires, Lars Rudolf, Nathaniel J. Dominy, Paul L. Koch, Paulo R. Guimarães,  and Thilo Gross, “Collapse of an ecological network in Ancient Egypt,” Proceedings of the National Academy of Sciences 111, 14472–14477 (2014) .
  • Gray et al. (2011) Russel D Gray, Quentin D. Atkinson,  and Simon J. Greenhill, “Language evolution and human history: what a difference a date makes,” Phil. Trans. Roy. Soc. B 366, 1090–1100 (2011).
  • Scheinfeldt et al. (2010) Laura B. Scheinfeldt, Sameer Soi,  and Sarah A Tishkoff, “Working toward a synthesis of archaeological, linguistic, and genetic data for inferring African population history,” Proc. Natl. Acad. Sci. USA 107 S2, 8931–8938 (2010).
  • Blast et al. (2019) D. E. Blast, S. Moran, S. R. Moisik, P. Widmer, D. Dediu,  and B. Bickell, “Human sound systems are shaped by post-Neolithic changes in bite configuration,” Science 363, eaav3218 (2019).
  • Bhattacharya et al. (2018) Tanmoy Bhattacharya, Damian Blasi, William Croft, Michael Cysouw, Daniel Hruschka, Ian Maddieson, Lydia Müller, Nancy Retzlaff, Eric Smith, Peter F. Stadler, George Starostin,  and Hyejin Youn, “Studying Language Evolution in the Age of Big Data,” J. Language Evol. 3, 94–129 (2018).
  • Barron et al. (2018) Alexander TJ Barron, Jenny Huang, Rebecca L Spang,  and Simon DeDeo, “Individuals, institutions, and innovation in the debates of the French Revolution,” Proceedings of the National Academy of Sciences 115, 4607–4612 (2018).
  • Steffen et al. (2004) W. Steffen, A. Sanderson, P.D. Tyson, J. Jager, P.A. Matson, B. III Moore, F. Oldfield, K. Richardson, H.J. Schellnhuber, B.L. Turner,  and R.J. Wasson, Global Change and the Earth System: A Planet Under Pressure (Springer-Verlag, Berlin, 2004).
  • Painter et al. (2020) Deryc T Painter, Chris Kempes, Geoffrey West,  and Manfred Laubichler, “The four regimes of the Great Acceleration,” Anthropocence Review .. (2020), in review.
  • Bainbridge (1995) William Sims Bainbridge, “Neural network models of religious belief,” Sociological perspectives 38, 483–495 (1995).
  • Davidson (2017) Thomas Davidson, “Black Box Models and Sociological Explanations: Predicting GPA Using Neural Networks,”  (2017).
  • Carley (1996)

    Kathleen M Carley, “Artificial intelligence within sociology,” Sociological methods & research 

    25, 3–30 (1996).
  • Zeng (1999) Langche Zeng, “Prediction and classification with neural network models,” Sociological methods & research 27, 499–524 (1999).
  • Maltseva et al. (2016) AV Maltseva, NE Shilkina,  and OV Mahnitkina, “Data mining sociology: experience and outlook for research,” Sociological Studies 3, 35–44 (2016).
  • Zhan et al. (2018) Donglin Zhan, Shiyu Yi,  and Denglin Jiang, “Small-Scale Demographic Sequences Projection Based on Time Series Clustering and LSTM-RNN,” in 2018 IEEE International Conference on Data Mining Workshops (ICDMW) (IEEE, 2018) pp. 803–809.
  • MacLeod et al. (2016) Haley MacLeod, Shuo Yang, Kim Oakes, Kay Connelly,  and Sriraam Natarajan, “Identifying rare diseases from behavioural data: a machine learning approach,” in 2016 IEEE First International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE) (IEEE, 2016) pp. 130–139.
  • Baćak and Kennedy (2019) Valerio Baćak and Edward H Kennedy, “Principled machine learning using the super learner: an application to predicting prison violence,” Sociological Methods & Research 48, 698–721 (2019).
  • Granger (1969) Clive W. Granger, “Investigating causal relations by econometric models and cross-spectral methods,” Econometrica 37, 424–438 (1969).
  • Schreiber (2000) Thomas Schreiber, “Measuring information transfer,” Physical Review Letters 85, 461–464 (2000).
  • Amblard and Michel (2013) Pierre-Olivier Amblard and Olivier J. J. Michel, “The relation between Granger causality and directed information theory: A review,” Entropy 15, 113–143 (2013).
  • Romanowska et al. (2017) I. Romanowska, C. Gamble, S. Bullock,  and F. Sturt, “Dispersal and the movius line: Testing the effect of dispersal on population density through simulation,” Quaternary International 431(B), 53–63 (2017).
  • Crabtree et al. (2017b) Stefani A. Crabtree, R. Kyle Bocinsky, Paul L. Hooper, Susan C. Ryan,  and Timothy A. Kohler, “How to Make a Polity (in the central Mesa Verde Region),” American Antiquity 82, 71–95 (2017b).
  • Kohler et al. (2018) T. A. Kohler, S. A. Crabtree, R. K. Bocinsky,  and P. L. Hooper, “Sociopolitical Evolution in Mid-Range Societies: The Prehispanic Pueblo Case,” in The Emergence of Premodern States: New Perspectives on the Development of Complex Societies, edited by J. A. Sabloff and P. L. W. Sabloff (Santa Fe Institute, Santa Fe, NM, 2018) pp. 133–184.
  • Gould (1989) Stephen Jay Gould, Wonderful Life: The Burgess Shale and the Nature of History (W. W. Norton, New York, 1989).
  • Malik (2020) Nishant Malik, “Uncovering transitions in paleoclimate time series and the climate driven demise of an ancient civilization,” Chaos: An Interdisciplinary Journal of Nonlinear Science 30, 083108 (2020).
  • Currie et al. (2010) Thomas E Currie, Simon J Greenhill, Russell D Gray, Toshikazu Hasegawa,  and Ruth Mace, “Rise and fall of political complexity in island south-east asia and the pacific,” Nature 467, 801–804 (2010).
  • Sheehan et al. (2018) Oliver Sheehan, Joseph Watts, Russell D Gray,  and Quentin D Atkinson, “Coevolution of landesque capital intensive agriculture and sociopolitical hierarchy,” Proceedings of the National Academy of Sciences 115, 3628–3633 (2018).
  • Turchin (2018) Peter Turchin, “Fitting dynamic regression models to seshat data,” Cliodynamics 9 (2018).
  • Owen et al. (2018) Jeremy A. Owen, Artemy Kolchinsky,  and David H. Wolpert, “Number of hidden states needed to physically implement a given conditional distribution,” New Journal of Physics 21, 019501 (2018).
  • Wolpert et al. (2019) David H. Wolpert, Artemy Kolchinsky,  and Jeremy A. Owen, “A space-time tradeoff for implementing a function with master equation dynamics,” Nature Communications 10, 1727 (2019).
  • Pfante et al. (2014) Oliver Pfante, Nils Bertschinger, Eckehard Olbrich, Nihat Ay,  and Jürgen Jost, “Comparison between different methods of level identification,” Advances in complex systems 17, 1450007 (2014).
  • Bradley and Kantz (2015) Elizabeth Bradley and Holger Kantz, “Nonlinear time-series analysis revisited,” Chaos: An Interdisciplinary Journal of Nonlinear Science 25, 097610 (2015) .
  • Wolpert et al. (1995) David H Wolpert et al., “On the bayesian occam factors argument for occam’s razor,”  (1995).
  • Kantz and Schreiber (1997) Holger Kantz and Thomas Schreiber, Nonlinear Time Series Analysis (Cambridge Univ. Press, 1997).
  • Kohler et al. (2009) Timothy A. Kohler, Sarah Cole,  and Stanca Ciupe, “Population and Warfare: A Test of the Turchin Model in Pueblo Societies,” in Pattern and Process in Cultural Evolution, edited by Stephen Shennan (University of California Press, Berkeley, 2009) pp. 277–295.
  • Reese et al. (2019) Kelsey M. Reese, Donna M. Glowacki,  and Timothy A. Kohler, “Dynamic Communities on the Mesa Verde Cuesta,” American Antiquity 84, 728–747 (2019).
  • Turchin and Nefedov (2009) Peter Turchin and Sergey A. Nefedov, Secular Cycles (Princeton University Press, Princeton, NJ, 2009).
  • Goldstone (1991) Jack A. Goldstone, Revolution and Rebellion in the Early Modern World (University of California Press, Berkeley and Los Angeles, 1991).
  • Turchin (2003) Peter Turchin, Historical Dynamics: Why States Rise and Fall (Princeton University Press, Princeton, NJ, 2003).
  • (84) Presence of moralizing gods has now been added as variable in Seshat, to allow this analysis by Whitehouse et al. (2019), but was not a component of the PC values discussed in this section.
  • (85) To see this, as a null hypothesis, suppose that all jumps in social complexity occur during the interval from to in PC1. Suppose as well that MG onset PC1 values occur independently, by sampling a Gaussian centered on PC1 . Finally, suppose that due to archaeological artifacts, half of all time-series have a continuous sequence of PC1 values stretching back only to PC1 , and the other half stretch back to before PC1 . Then it will appear that there is probability of MG onset occurring after the jump in social complexity, even though the real probability is .
  • (86) Intuitively, this hypothesis is similar to supposing that the onset of MGs is a Poisson process, with a rate that is non-negligible across a broad span of PC1 values. We emphasize though that we have not yet done a proper statistical test of this hypothesis.
  • Turchin et al. (2018c) Peter Turchin, Thomas E. Currie, Harvey Whitehouse, Pieter François, Kevin Feeney, Daniel Mullins, Daniel Hoyer, Christina Collins, Stephanie Grohmann, Patrick Savage, Gavin Mendel-Gleason, Edward Turner, Agathe Dupeyron, Enrico Cioni, Jenny Reddish, Jill Levine, Greine Jordan, Eva Brandl, Alice Williams, Rudolf Cesaretti, Marta Krueger, Alessandro Ceccarelli, Joe Figliulo-Rosswurm, Po-Ju Tuan, Peter Peregrine, Arkadiusz Marciniak, Johannes Preiser-Kapeller, Nikolay Kradin, Andrey Korotayev, Alessio Palmisano, David Baker, Julye Bidmead, Peter Bol, David Christian, Connie Cook, Alan Covey, Gary Feinman, Árni Daníel Júlíusson, Axel Kristinsson, John Miksic, Ruth Mostern, Cameron Petrie, Peter Rudiak-Gould, Barend ter Haar, Vesna Wallace, Victor Mair, Liye Xie, John Baines, Elizabeth Bridges, Joseph Manning, Bruce Lockhart, Amy Bogaard,  and Charles Spencer, “Quantitative historical analysis uncovers a single dimension of complexity that structures global variation in human social organization,” Proceedings of the National Academy of Sciences 115, E144–E151 (2018c) .
  • (88) Certain technical constraints on must also be satisfied. In particular, must be irreducible and primitive, which is almost always true.
  • Fisher (1958) Ronald A. Fisher, The genetical theory of natural selection (Dover, New York, 1958).
  • Price et al. (2020) Michael Holton Price, José M. Capriles, Julie A. Hoggarth, R. Kyle Bocinsky, Claire E. Ebert,  and James Holland Jones, “End-to-end Bayesian analysis of 14C dates reveals new insights into lowland Maya demography,”  (2020), .