Thermal convection is a classical pattern-forming physical system, which in idealized settings is known to produce hexagonal cell structures . This type of pattern formation exists beyond simple, idealized conditions and appears also in geophysical settings, where it is a manifestation of self-organization in non-linear dynamical systems perturbed away from equilibrium [9, 24].
We study here a convective boundary layer (CBL, representing the lowest portion of the earth atmosphere) which is energetically forced by solar heating of the land surface. This energy input initiates and sustains convection, and if it is sufficiently strong, turbulence arises due to natural flow instabilities. This destroys the simple geometrical patterns found in the ideal Rayleigh-Bénard case and gives rise to a new type of organized motion where large-scale structures emerge, such as convective plumes. These are connected regions of the spacetime domain with positive buoyancy and can thus be seen as a type of coherent structure. Furthermore, interaction with the land surface modulates these processes, affecting such physical characteristics as the time scales of land-atmosphere coupling  and the cloud size distribution . The role of geometrical properties of the forcing pattern in determining the structure of the flow has recently come under study as well .
Descriptions of a CBL usually employ global statistical measurements, such as bulk profiles or spectral transform of model variables . The limitations of this approach are encountered, for example, when using such descriptors to discriminate between the effects of land surface patterns of varying heterogeneity on the structure of CBL flow . Here, we present a methodology that relies on the construction of a geometrical representation of flow structures, and uses its connectivity information as descriptor for the state and structure of the CBL. We show how this type of information corresponds to classical results on scaling properties of turbulent flows, and how it can be effectively used as a set of features in a statistical learning framework to perform inference on the land-atmosphere interaction problem. Furthermore, the spatial relationships encoded by connectivity are also found to describe the existence and structure of the plume-merging layer within the CBL .
Previous applications of topology to the study of non-linear physical systems have focused on establishing the existence of chaotic behavior and characterizing the response of such systems to noise [6, 7]; describing the structural characteristics of transition to fully-developed turbulence in confined plasma flows ; and specifically in the case of convection, the analysis of spiral-defect patterns which arise after the onset of convection in a low Prandtl number regime . All these studies rely on the computation of homological invariants in dimensions and (the Betti numbers). The present study, in contrast, emphasizes the use of low-level connectivity information (equivalent to zero-dimensional homology) to furnish physically meaningful descriptors for the state of a CBL in fully-developed turbulence, and introduces a different connectivity-based topological invariant, the merge tree, to give a quantitative description of the coherent structures formed therein. Computation of these connectivity-based descriptors is less computationally expensive than for their higher-order counterparts. Moreover, the connected components of a domain can be directly related to physical structures in the data. This is not necessarily the case for homological features of higher dimension, for which the problem of finding a representative cycle that corresponds to a feature in the data is subject of ongoing research .
2 Methodology and data
We use data from numerical simulations of a CBL. Specifically, four simulation runs of the Large-Eddy Simulation Atmosphere–Land-Surface Model (LES-ALM) as described in . The four simulations represent a dry boundary layer on Aug 5, 2009, based on a radio sounding at 08:00h, with the setup as described in . Each of the four simulations, referred to as SP1-4 in the sequel, features different boundary conditions at the surface. SP1 has the original land use data of the Selhausen-Merken site in Western Germany; SP2 has the same relative frequency of each land type as SP1, but all small-scale features are filtered out; SP3 is a fully randomized pattern with the relative frequency for each type; SP4 is a homogeneous grassland surface. See Fig. 1 for an illustration of the three heterogeneous land surface patterns. The computational domain covers an area of in the horizontal, with grid spacing of , and a height of , with variable spacing of up to . Model time step is , with data saved for analysis at intervals. Simulation time is 0800–2000 UTC, 5 Aug 2009. Lateral boundary conditions are periodic.
We will assume a computational domain , with
Each number denotes the number of grid points in each of the four dimensions (time, height, North-South, and East-West, in that order). For the present study we will focus on the vertical wind velocity, , as it is a direct representation of the exchange of energy and momentum at the heart of a convective system. The method then consists of two steps: the construction of a geometric representation of the flow, and computation of its connectivity information.
2.3 Geometric representation
We will use two- and three-dimensional cubical complexes as geometrical representations of flow structures (for background on cubical complexes and their homology, see ).
Let and . We denote by the subset of the domain where the time coordinate is equal to (this subset is then a three-dimensional array of points). We define analogously, as the subset of with time equal to and height equal to . Given a threshold value , we define the following sets of points:
We will use these as anchor points in the construction of the following two- and three-dimensional cubical complexes:
These geometrical objects will represent the area and volume of the subdomains which correspond to updrafts, as shown in Fig. 2. When it is clear from context we omit the subindices and refer to these as simply . We will also refer to the first two Betti numbers of the two-dimensional complexes as and , which count the number of connected components and the number of loops, respectively.
2.4 Two-dimensional analysis
Consider a two-dimensional cubical complex , as shown in Fig.2. We note that the sizes of its connected components are not distributed uniformly at random. Instead, they follow a specific pattern characterized by the existence of one large component, which accumulates most of the domain area, and a large number of much smaller components. Denote the connected components of by , the size of a component by , and the fraction of components with size equal to by . Observe that, even if we express the size in , it is still a discrete quantity. Finding the connected components of an object such as is a classical problem in computer science, which has a very efficient algorithmic solution as described by , making use of the Union-Find (UF) data structure. For the present study we implement this solution as described in , which requires storing the size of each component as a criterion to assign precedence when merging two components together. Thus, we get the size information essentially “for free” when splitting into its components. The empirical distribution of is shown in Fig. 3, for the cubical complexes obtained from two-dimensional slices with (left) an (right). In both cases the data for 30 timesteps is shown, that is, the sizes of components from 30 different cubical complexes are aggregated.
The general pattern becomes clear in these figures: the smallest sizes accumulate most of the density for . This density then decreases regularly as size increases, until it is several orders of magnitude smaller for the very largest components. This regular behavior is clearer in the data from the surface layer (left panel), and especially in the sizes for simulation SP3 (randomized land surface pattern). The distribution appears to decay linearly in a log-log scale. There is a marked difference between the density curve for SP4 (homogeneous land surface pattern) and SP1-3, but the three heterogeneous surface patterns exhibit a mostly similar curve. This serves to illustrate an important difference between the homogeneous and heterogeneous cases, namely a change in the dominant physical mechanism. For SP1-3, heterogeneity in the CBL is maintained as a consequence of forcing over a range of spatial scales, whereas in SP4 structures are produced only as a consequence of turbulence internal to the boundary layer. In other words, the absence of forcing at the largest spatial scales in the case of SP4 becomes manifest here as a scale break at . For the data from the mixing layer (right panel) the first part of the distribution still appears to decay linearly on the log-log scale, but there are now much larger fluctuations at the tail.
We will use the scaling patterns just described to analyze the effects of the different land surface patterns on the structure of CBL flow. Specifically, we will fit a parametric heavy-tailed distribution to the data, and perform the analysis on its scaling parameter. A reasonable choice of distribution is the power law, defined in its discrete form by
where is the scaling parameter and a normalization constant111 is here the generalized zeta function.. A second parameter is the support of , represented by a value which gives the lower bound for the power-law scaling. Distributions of this form are important in different areas of science 
. A statistical framework based on maximum-likelihood parameter estimation to analyze empirical data and ascertain whether it conforms to a power-law distribution has been proposed in. For the results presented in this paper, we have used the Python implementation of these methods contained in the package powerlaw .
An important consideration to keep in mind before doing this is the amount of data points available. Following  we take as the minimum size to obtain reasonable fit results. However, we find that only for a small proportion of all the complexes . Denote the range of simulation timesteps by . We begin by defining a time window , and splitting the time range into the time intervals defined as
Given a height value and a time interval , we will have two-dimensional slices, except perhaps for the last time interval, . For each of these slices, , we construct the cubical complex , This complex is conformed by connected components which we denote by , such that
Let denote the size of component . We then define as the set of all sizes of connected components that make up . Define furthermore the aggregate set
which accumulates the sizes of all connected components observed during the time interval , at height level . We then fit a power law distribution to the values for this set. We thus increase the sample size used to fit the power-law distribution, while allowing the time dependence of the power-law scaling behavior to be observed. In what follows, we use a time window of , which results in time periods (). This means we will fit power law distributions per simulation dataset, obtaining from each distribution its scaling parameter .
2.5 Three-dimensional analysis
Let be a three-dimensional cubical complex. Define its height function, by
where is such that . That is, for every point in the complex, this function returns the height of the three-dimensional elementary cube that contains it. Define, further, the following equivalence relation on points of : iff and both and belong to the same connected component of the sublevel set . The merge tree of is the quotient space . This topological invariant tracks the evolution of sublevel sets of , which can be born at level if a new independent component appears at that level, or die if they are merged into an older component when entering , or stop growing altogether. An example of this is shown in Fig. 4, where the graph represents the merge tree obtained from the scalar field . We distinguish three types of nodes: root nodes represent the birth of an independent component, merge nodes the death of a component by merging, and terminal nodes represent the level at which a component stops growing in the vertical direction. It is possible to perform statistical analysis on the set of graphs obtained by this method, but for the results discussed here we will focus on the spatiotemporal distribution of merge events.
3 Results and discussion
3.1 Updraft size scaling
3.1.1 Scaling parameter
The empirical PDF of the scaling parameter for all power law densities fitted for the four LES-ALM datasets are shown in Fig. 5 (left). These values are very similar across the four simulations, with the only significant difference being that the distribution for SP4 has a more clearly defined bimodal structure. The value distribution appears reasonable, insofar as power law densities encountered in practice tend to have a scaling parameter within the range . Further, the maximum of the PDF in all four cases occurs in a narrow band around , which is the value expected from the self-similarity of eddies cascading the spectrum of motion [14, 13, 21, 12].
The time-height sections of the scaling parameter are illustrated in Fig. 5 (right), as a function of height and time. The distribution of in the -plane is clearly non-uniform, and is seen to reflect the division of the CBL into its component regions. The qualitative features seen here are:
The existence of a well-defined surface layer, encompassing the first height levels closest to the land surface.
The evening transition is apparent as a sharp increase in the mean value of close to the surface, starting around . This effect is most pronounced for SP3 and SP4.
The inversion layer is characterized by larger values than those found either in the mixing layer below it, or in the free atmosphere above.
Values of occur predominantly within the mixing layer, which agrees with the fact that this is the region where strong turbulence develops.
It is difficult to accurately discriminate a power-law distribution from empirical data, due to the possibility of error introduced by the appearance of very large values at the tail of the distribution, which is one of the defining characteristics of a power law. We perform the following goodness-of-fit testing on the available data, based on the method described in : let be the data set, and the power law density fitted to it, with corresponding CDF denoted by . The empirical CDF for the data is denoted by . The method then computes the Kolmogorov-Smirnov (KS) statistic for and , and compares it to that obtained for synthetic datasets. The fraction of these for which their KS statistic is larger than that obtained for the data is the p-value for this test. Within this framework, it is small p-values that signal insufficient empirical evidence in favor of the power-law hypothesis. Following  we take “small enough” to be . We use a value of , which would give a p-value accurate up to 2 decimal places . Using these parameters we find that the hypothesis of power-law scaling is strongest within the surface layer, and in isolated mixing layer points. In terms of the four land-surface patterns, the hypothesis is strongest for SP3, the randomized surface pattern, and weakest for the surface layer in SP4. Overall this is in agreement with the data shown in Fig. 3, where the scaling for the data from surface layer in case SP3 is the most regular of all four simulations. In all four cases, this scaling behavior becomes less frequent after the evening transition takes place ().
3.1.3 Likelihood ratio test
We compare the likelihood of the data under the power-law hypothesis, , with the likelihood under an alternative heavy-tailed distribution, . Common alternatives are the exponential and the log-normal. We then compute the log-ratio
and an associated p-value for the likelihood ratio test . In this case, “small” p-values would indicate that the observed value is unlikely to be the result of random fluctuations alone. If the p-value is larger than a certain threshold, we consider the data to be insufficient for this test. Otherwise, the sign of indicates the hypothesis favored by the data.
Comparing the power-law hypothesis to an exponential alternative shows that the latter is not favored by the data. Indeed, the p-values for the corresponding log-likelihood test (not shown here) are smaller than for % of the pairs from SP1, % for SP2, % for SP3, and % for SP4. Moreover, the points where this happens agree with those where the log-likelihood ratio is positive and greatest in magnitude.
The log-likelihood test for a power law against a log-normal distribution paints a more nuanced picture. At a significance level of, the test is actually not able to distinguish between either of the two hypothesis for the vast majority of the domain. More important for our current analysis is that most of the surface layer points, for which the power law hypothesis has the greater likelihood, fall within the statistically significant tests (at the level). We summarize the results of this subsection so far:
The original observation made at the beginning, that the logarithm of frequency at which connected updraft regions appear in two-dimensional slices of the LES-ALM simulations decreases linearly with the logarithm of their size, finds limited evidential support across the totality of simulation data.
There is a strong indication of power-law scaling for the size of connected updraft components within the surface layer, and the degree to which a power-law distribution is a good fit for the data appears to be in direct relation to the heterogeneity of the underlying land surface. In all four simulations, the scaling parameter for the surface layer data is, on average, in the range .
Comparison of the power-law distribution with the alternative of an exponential or a log-normal distribution via a log-likelihood ratio test shows that, for the surface layer, the power law is in general the most adequate to describe the distribution of connected component size.
The evening transition brings an end to the power-law scaling behavior in the surface layer. After this point in time, the power-law distribution appears only in one vertical level adjacent to the land surface for SP3.
3.1.4 Comparison with the updraft Betti number,
We now turn to the question of recovering the land surface pattern using only the information provided by both the scaling parameter and the zeroth Betti number . In light of the discussion above, we will focus on the data emanating from the surface layer. A comparison of these values for the 4 LES-ALM simulation runs is shown in Fig. 6, together with the mean temperature profile. As was discussed before, it is known that the temperature profile can in general discriminate between homogeneous and heterogeneous land surface patterns, but fails at reflecting the effect of different heterogeneous patterns. Both and seem better at expressing this difference.
To make this more precise, we translate the aforementioned question to a classification problem. For each simulation SP () consider the matrix , formed by observations (rows) and features (columns). The entry will be equal to the value of obtained from the data at time window , height level , for simulation SP. Since we restrict the data to the surface layer, . We then construct the feature matrix by concatenation:
. A k-nearest-neighbors (kNN) classifier is trained on each (mean-centered and rescaled to unit variance) feature matrix separately, and then evaluated by computing the bootstrap estimate of the weighted averagescore ( samples). This estimate was computed for values of . For and best performance was achieved with , whereas for it was for . The results for this are shown in Table 1. The difference between the classifiers for and is significant, with an average score for of , compared to for . This shows that the scaling law found to describe the size of connected updraft components is sensitive to the differences in land surface patterns. It is not, however, more sensitive to these differences than the number of connected components, . In other words, when trying to determine which surface pattern produced a given set of surface-layer values, knowing how many connected components are present in the two dimensional slices is more informative than knowing how their sizes scale.
The classification metrics shown in the table also show that both sets of features have greatest discriminatory power when it comes to classifying the SP3 and SP4 data, with the power law model having greatest score for SP4, and the Betti number model for SP3. Both facts are consistent with the differences in power-law scaling across the four simulations, to wit: the power law densities fit to size data can distinguish between the homogeneous land surface pattern SP4 and the three heterogeneous patterns, but do not distinguish between the latter three. They are also consistent with the fact that, throughout most of the day, the value of for the surface layer of simulation SP3 is on average higher than it is for any of the other three simulations (see Fig. 7). An interpretation of these facts would be that land surface heterogeneity introduces an element of scale invariance to the size distribution of connected updraft regions, with this being clearest for the maximally heterogeneous SP3. Conversely, a purely uniform land surface, SP4, breaks away from this behavior, thus making its classification easier when only the scaling parameter is known.
3.2 Merge tree representation
We compute the merge tree at each timestep , for the four simulations, storing the three types of nodes that occur (see Fig. 4), together with their time and height indices . This allows us to visualize the number of merge events in each tree as a time series in Fig. 8. It can be seen that the four time series exhibit a similar qualitative pattern: a brief initial transient, followed by a period of growth until noon, after which the value stabilizes. There is a second growth phase in the late afternoon, which intensifies at the evening transition, with a marked decline afterwards. Also noteworthy is the fact that the average values of the four series are different, and stand in an inverse relationship with land surface heterogeneity length scale. That is, SP3 has on average the largest number of merge nodes, with SP4 having the smallest. Comparing these time series with those for (cf. Fig 7) shows that the number of merge nodes becomes stationary at a later time than does , and the former reaches its peak after the evening transition at an earlier time than the latter. Thus, despite the close relationship between both quantities (in general we can expect more merge events to occur if there are more components to be merged), they appear to respond differently to land-surface heterogeneity.
A closer view of the surface layer, where most of the merge nodes are concentrated, is given in Fig. 9. The clearest difference is again that between SP3 and SP4, with the former having most of its surface layer merge nodes distributed throughout the first 5 vertical levels. In the case of SP4, the merge nodes are concentrated in the first level adjacent to the land surface. Recall the size distribution shown in 3, and the discussion surrounding the KS goodness-of-fit test for these size distributions. It was shown that the surface layer size distributions in SP4 have the worst power law fit out of all four simulations. This fact is expressed in the empirical PDF (Fig. 3, left) for SP4 having a scale break followed by a gap of one order of magnitude, reflecting the presence of the dominant component. What the distribution of merge nodes suggests is that, in SP4, this dominant component appears already at the first level, whereas for simulations with increasing levels of land surface heterogeneity the process of component merging occupies a larger portion of the vertical direction. This shows how the vertical depth of the plume-merging layer can be modulated by land-surface heterogeneity.
This merging process results in the formation of a large connected structure which accumulates most of updraft volume in , close to 99%. This also happens very rapidly, within the first simulation minutes in all four cases, and once the structure forms it persists for the remainder of the simulation.
We have presented an application of topology to the analysis of numerical simulation data from atmospheric science. Our results specifically highlight the importance of connectivity and its associated topological invariants (the zeroth Betti number, , and the merge tree) in producing a quantitative description of the structural properties of a CBL, as well as the response of the CBL system to differences in the underlying land surface pattern.
Determining the size of connected components in subsets of the computational domain allows us to establish parametric densities which describe their scaling. The scaling laws thus established are found to correspond to the expected self-similar scaling in the energy cascade across the spectrum of turbulent motion. Further, the existence of different subregions that make up the CBL can be inferred from variability in scaling across the time and height dimensions.
The number of connected components, , is shown to better discriminate between different heterogeneous land surface patterns when considering the data close to the land surface. Indeed, we demonstrate that both and the scaling parameter are shown to be more informative in this problem than the bulk temperature profile.
Computing a merge tree representation for the flow allows us to quantify the rate of plume coalescence, as well as the spatiotemporal evolution of this process across the CBL diurnal cycle. Using this we can quantify the effect of land surface heterogeneity on the depth of the plume-merging layer. Component merging is also shown to respond differently to land surface heterogeneity than either the scaling parameter or the number of connected components.
This study thus shows these topologically-motivated descriptors as a viable alternative to bulk profiles and spectral transformations for analyzing data from numerical simulations of a CBL.
We gratefully acknowledge support by the DFG transregional research collaborative TR32 on Patterns in Soil–Vegetation–Atmosphere Systems.
-  Alstott, J., Bullmore, E., Plenz, D.: Powerlaw: A python package for analysis of heavy-tailed distributions. PLoS ONE 9(1) (2014). https://doi.org/10.1371/journal.pone.0085777
-  Cerisier, P., Rahal, S., Rivier, N.: Topological correlations in Bénard-Marangoni convective structures. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics 54(5), 5086–5094 (1996)
-  Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Review 51(4), 661–703 (2009). https://doi.org/10.1137/070710111
-  Clauset, A., Young, M., Gleditsch, K.S.: On the Frequency of Severe Terrorist Events. Journal of Conflict Resolution 51(1), 58–87 (2007)
-  Edelsbrunner, H., Harer, J.: Computational Topology: An Introduction. American Mathematical Society (2010)
-  Gameiro, M., Mischaikow, K., Kalies, W.: Topological characterization of spatial-temporal chaos. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics 70(3 2), 9–12 (2004). https://doi.org/10.1103/PhysRevE.70.035203
-  Gameiro, M., Mischaikow, K., Wanner, T.: Evolution of pattern complexity in the Cahn-Hilliard theory of phase separation. Acta Materialia 53(3), 693–704 (2005). https://doi.org/10.1016/j.actamat.2004.10.022
-  Garcia, L., Carreras, B.A., Llerena, I., Calvo, I.: Topological characterization of the transition from laminar regime to fully developed turbulence in the resistive pressure-gradient-driven turbulence model. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics 80(4), 1–11 (2009). https://doi.org/10.1103/PhysRevE.80.046410
-  Goehring, L.: Pattern formation in the geosciences. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 371(2004) (2013). https://doi.org/10.1098/rsta.2012.0352
-  Hopcroft, J.E., Ullman, J.D.: Set merging algorithms. SIAM Journal on Computing 2(4), 294–303 (1973)
-  Kaczynski, T., Mischaikow, K., Mrozek, M.: Computational Homology. Springer-Verlag New York, 1 edn. (2004)
-  Kaimal, J.C., Finnigan, J.J.: Atmospheric Boundary Layer Flows - Their Structure and Measurement. Oxford University Press, New York, NY, USA, 1st edn. (1994)
-  Kolmogorov, A.N.: Dissipation of Energy in the Locally Isotropic Turbulence. Dokl. Akad. Nauk SSSR 32(1) (1941). https://doi.org/10.1038/161375b0
-  Kolmogorov, A.N.: The Local Structure of Turbulence in Incompressible Viscous Fluid for Very Large Reynolds Numbers. Dokl. Akad. Nauk SSSR 30(1), 9–13 (1941)
-  Kondrashov, A., Sboev, I., Dunaev, P.: Evolution of convective plumes adjacent to localized heat sources of various shapes. International Journal of Heat and Mass Transfer 103, 298–304 (2016). https://doi.org/10.1016/j.ijheatmasstransfer.2016.07.065
-  Krishan, K., Kurtuldu, H., Schatz, M.F., Gameiro, M., Mischaikow, K., Madruga, S.: Homology and symmetry breaking in Rayleigh-Bénard convection: Experiments and simulations. Physics of Fluids 19(11), 1–6 (2007). https://doi.org/10.1063/1.2800365
-  Liu, S., Shao, Y., Kunoth, A., Simmer, C.: Impact of surface-heterogeneity on atmosphere and land-surface interactions. Environmental Modelling and Software 88, 35–47 (2017). https://doi.org/10.1016/j.envsoft.2016.11.006
-  Mellado, J.P., van Heerwaarden, C.C., Garcia, J.R.: Near-Surface Effects of Free Atmosphere Stratification in Free Convection. Boundary-Layer Meteorology 159(1), 69–95 (2016). https://doi.org/10.1007/s10546-015-0105-x
-  Mitzenmacher, M.: A brief history of lognormal and power law distributions. Internet Mathematics 1(2), 226–251 (2004)
-  Obayashi, I.: Volume-Optimal Cycle: Tightest Representative Cycle of a Generator in Persistent Homology. SIAM Journal on Applied Algebra and Geometry 2(4), 508–534 (2018)
-  Obukhov, A.M.: On the distribution of energy in the spectrum of turbulent flow. Dokl. Akad. Nauk SSSR 32(1), 22–24 (1941)
-  Rieck, M., Hohenegger, C., van Heerwaarden, C.C.: The Influence of Land Surface Heterogeneities on Cloud Size Development. Monthly Weather Review 142(10), 3830–3846 (2014). https://doi.org/10.1175/MWR-D-13-00354.1
-  Shao, Y., Liu, S., Schween, J.H., Crewell, S.: Large-Eddy Atmosphere–Land-Surface Modelling over Heterogeneous Surfaces: Model Development and Comparison with Measurements. Boundary-Layer Meteorology pp. 333–356 (2013). https://doi.org/10.1007/s10546-013-9823-0
-  Vereecken, H., Pachepsky, Y., Simmer, C., Rihani, J., Kunoth, A., Korres, W., Graf, A., Franssen, H.H., Thiele-Eich, I., Shao, Y.: On the role of patterns in understanding the functioning of soil-vegetation-atmosphere systems. Journal of Hydrology 542, 63–86 (2016). https://doi.org/https://doi.org/10.1016/j.jhydrol.2016.08.053
-  Vuong, Q.H.: Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses. Econometrica 57(2), 307—-333 (1989). https://doi.org/10.2307/2223855