1 Introduction
Understanding the dynamics and evolution of physical systems is of great importance to the development of all fields. With the recent focus on longterm shifts in temperature and weather patterns, and the potentially devastating impact for the future, the importance of earth and environmental science may never have been greater. There is a need for continued and varied studies of country hydrology behaviours, to ensure features such as streamflow are not exhibiting material changes in their behaviour. This paper explores streamflow patterns among Australian hydrology stations between 01 January 1980 and 01 January 2019.
Our work builds upon a substantial body of multivariate time series analysis in applied mathematics and statistics, which has been used in a variety of domains including epidemiology,james2020covidusa; Chowell2016; jamescovideu; Manchein2020; Blasius2020; James2021_virulence financial markets,Drod2020_entropy; Jamesfincovid; james2021_mobility; JamescryptoEq and other fields.Vazquez2006; Mendes2018; Shang2020
Such techniques that have been used are broad and include parametric models,
Hethcote2000; Perc2020 various aspects of distance and similarity measures, Moeckel1997; Szkely2007; Mendes2019; James2020_nsm networks, Karaivanov2020; Ge2020; Xue2020 clustering frameworks Machado2020 and others. Ngonghala2020; Cavataio2021; Nraigh2020; Glass2020In the hydrology community more specifically, there is a significant collection of work studying spatiotemporal patterns among hydrological processes mauget_multidecadal_2003; sanborn_predicting_2006; amrit_relationship_2018. Although the scope of such studies varies, there is generally a consistent theme of exploring the change in temporal behaviours at different locations in space. One feature of particular interest to hydrology researchers has been streamflow modelling redmond_surface_1991; bales_identification_2001, where a significant amount of work has focused on modelling regional locations across the United States cayan_enso_1999; hamlet_longrange_2000; hidalgo_enso_2003; nowak_colorado_2012; wise_hydroclimatology_2018. Many of these studies have taken a more concentrated approach, providing detailed analysis on relatively few hydrology stations. This paper takes a different approach, studying more than 400 stations, sampled from a broad range of geographic locations across Australia.
A great deal of previous research in the hydrology community has used traditional linear and nonlinear statistical modelling techniques such as linear regression
liu_enso_2001, multivariate linear regression and selforganizing maps (SOM)
barros_toward_2008 and spatial modelling techniques sun_multiple_2012to yield insights related to topics such as El Niño Southern Oscillation (ENSO), drought, hydroclimatic variability and streamflow. In more recent years, many researchers have exploited advances in statistical learning, using various artificial intelligence methods such as support vector machines
fung_improved_2020; maity_potential_2010and deep learning
kaur_deep_2020; khan_prediction_2020to address problems related to prediction and assessment. These methods are often criticised for the difficulty one has in interpreting the drivers behind their predictions. Rather than focusing on the application of machine learning methodologies, this paper takes a more fundamental approach. We use recently established and longstanding techniques in time series analysis to study temporal, spectral and spatial dynamics related to Australian streamflow, and subsequently explore collective similarity among our collection of hydrology stations.
Although the study of nonstationary processes Dahlhaus1997; Adak1998, and the application of spectral analysis have been widely studied in the statistics community Kohn1997; Wood2017; HadjAmar2019; james2021_spectral; Choudhuri2004, there have been relatively few studies applying such techniques to address problems in hydrology. Notable studies of stationarity include the exploration of spatial correlations between daily streamflows betterle_what_2017, monthly streamflow forecasts gibbs_state_2018, annual streamflow and flood peaks milly_stationarity_2008 and other topics in hydrology wagener_future_2010. Spectral analysis has been applied in hydrology to address several specific problems including the study of watershed properties Schuite2019, aquifiers Gelhar1974; Jimenez2013; Pedretti2016, and other topics Kirchler2007; Montanari2007. This paper uses a variety of techniques for the study of nonstationary processes and timevarying power spectra from the statistics literature, and applies them to model periodic phenomena in Australian streamflow. Furthermore, we explore the evolution of spectral patterns  testing for shifts in periodic dynamics over time.
To test for changes in behaviour over time, we explicitly, and nonparametrically, model the evolutionary dynamics of our underlying time series. Such studies of timevarying dynamics, correlation structure and dimensionality reduction have been widely applied among applied mathematicians in domains such as finance
Fenn2011; Laloux1999; Mnnix2012; Laloux1999; Kim2005; Pan2007; Wilcox2007. In hydrological research however, there is less abundance in the applications of such techniques barros_toward_2008. We build upon studies of timevarying correlation structure to generate insights into collective temporal and spectral behaviours in Australian streamflow.This paper is structured as follows. In Section 2 we describe the data used in this paper. In Section 3 we study the collective similarity of our underlying stations in the time domain. In Section 4 we apply an optimization framework and spectral analysis methodology to determine the similarity in various hydrological stations’ periodic behaviours. In Section 5 we introduce an algorithmic procedure to estimate a governing physical process, from the underlying stations. In Section 6 we explore the evolutionary dynamics in the governing physical process and the underlying time series, using a variety of techniques implemented in a timevarying capacity. In Section 7 we conclude.
2 Data
Our dataset consists of Australian hydrology stations, their location, catchment area and streamflow and is sourced from the Australian Bureau of Meteorology (http://www.bom.gov.au/water/hrs). We only study stations with reported daily streamflow (ML/day) between 01011980  01012019. This provides a total of 414 stations to be studied throughout this paper.
3 Temporal dynamics and similarity
Our central object of study throughout this paper is a collection streamflow time series indexed . We study daily data across days. We let be the multivariate time series of daily streamflow of station on day . Throughout this section, we study the collective temporal similarity between all hydrological stations during our window of analysis. First, we visualise all the time series and inspect notably persistent trends across all stations. Figure 1 displays 4 candidate time series from our collection, sampled from the Eastern, Northern and Western spatial extremes of Australia (four different states) to maximize potential spatial variability in their underlying behaviours. Figure 0(a) corresponds to the Paddy’s River at Riverlea station, situated in The Australian Capital Territory with latitude 35.3843 and longitude 148.9656. Figure 0(b) displays streamflow at the Goodradigbee River at Brindabella station, which is situation in New South Wales with latitude 35.42 and longitude 148.73. Figure 0(c) displays the Bluewater Creek station at Bluewater, situated in Queensland with latitude 19.185 and longitude 146.5461. Finally, Figure 0(d) displays daily streamflow for the Gascoyne River at Fishy Pool station, which is situated in Western Australia and located at latitude 24.9497 and longitude 114.6442. Despite their spatial variability, all streamflow time series present several similar features.
First, they all possess a similar annual spike in their streamflow values. Second, all locations display a similarly subdued level of streamflow prior to periodic spikes. Such periodic spikes appear to occur at similar times of the year, although this alignment between time series appears to deviate up to a small degree of perturbation. One key difference between the 4 stations visualised is the amplitude of their respective signals, which exhibit pronounced differences and can be seen by the upper bound of the yaxis. For instance, the peak daily streamflow in Paddy’s River at Riverlea is approximately 12,000 ML/day while the Gascoyne River at Fishy Pool is in excess of 800,000 ML/day.
To further investigate the structure of temporal similarity throughout our collection, we normalize each streamflow time series by its total amplitude and compute distances between the resulting time series. To do so, let be the
norm of the vector
. As all are nonnegative, this aggregates the total amount of streamflow observed between 19802019. We then define . The vectors reflect relative changes in streamflow, and capture periods of relatively more activity having normalised for amplitude. Using our earlier figures for the sake of exposition, a change from 0 to 10,000 in Figure 0(a) would constitute a more significant change than a change from 600,000 to 800,000 in Figure 0(d). We now define a temporal distance matrix that measures distance between normalized trajectories. To better interpret inherent similarity, we associate a affinity matrix by(1) 
Our resulting temporal affinity matrix is normalised such that all matrix elements now lie in
. We apply hierarchical clustering to the resulting affinity matrix, and study the collective temporal similarity.
Hierarchical clustering on the temporal affinity matrix is shown in Figure 2. The mean level of temporal similarity is determined by summing all matrix elements above the diagonal and normalizing appropriately. That is, . We refer to this as an affinity trajectory norm . However close inspection of the associated dendrogram reveals noteworthy distinction between various intercluster and intracluster similarities. Four clusters of temporal streamflow time series are identified, three of which are most prominent. The two largest clusters, displayed at the top (purple) and bottom (orange) of the dendrogram both exhibit high levels of intracluster similarity, and although not quite as high, reasonably high affinity between each other. The third primary cluster which is shown in red, displays markedly lower levels of similarity when compared with constituents of the purple and orange clusters. Having inspected the location of individual stations, cluster membership does not appear to be related to spatial location, but rather, temporal dynamics that are difficult to detect.
4 Spectral dynamics and similarity
Given the pronounced similarity in the stations’ spikes in streamflow, we wish to further explore the periodic nature of this phenomenon. Spectral analysis is the most prolific technique in the physical sciences for understanding a time series’ autocovariance structure. The most important properties to study are the peak amplitude and respective frequencies of a candidate power spectrum. In this section, we propose a Whittle Whittle1954; Whittle1957 Likelihoodbased optimization framework, to learn the most appropriate parameters used in Welch’s method Welch1967 for power spectral density estimation. In doing so, we are able to model the spectral dynamics of our entire collection of streamflow spectra. In describing our spectral analysis model for streamflow time series, we drop the subscript , referencing specific time series, for notational convenience.
4.1 Spectral analysis model
We commence our modelling procedure with our underlying streamflow time series, which we denote .
Definition 4.1.
is deemed to be stationary if:

Each random variable
is integrable with a finite mean for each point in time . We detrend our time series, and henceforth assume that we have a process where . 
The underlying autocovariance is only a function of , which we denote .
In our application of spectral analysis, we want to study the secondorder properties of a stationary time series expressed in the time series’ autocovariance. The power spectral density of such a time series is defined:
(2) 
Using frequency components which are termed Fourier frequencies , we define the
Discrete Fourier transform
of our streamflow time series:(3) 
For the Fourier frequency component, , an unbiased (noisy) estimate of the power spectral density at that point, , is provided by the periodogram, which is defined by . By symmetry, the periodogram contains effective observations, which corresponds to . Previous work has provided a parsimonious signal plus noise representation of the periodogram:
(4) 
which we will use in our subsequent analysis. Many approaches, especially in the Bayesian statistics community, use the Whittle likelihood function to nonparametrically model the log periodogram within a regression framework. The likelihood of the log periodogram conditional on the true power spectrum can be approximated by
(5) 
We use Welch’s method to yield an estimate of the latent power spectral density. Welch’s method aims to reduce noise in its estimating the power spectrum by sacrificing lower frequency resolution. Procedurally, the data is divided into overlapping segments where a modified periodogram is computed within each partition. The modified periodograms are averaged to produce a final estimate of the power spectrum. Welch’s method consists of two model parameters: the length of each segment, and the amount of overlap in data points between adjacent segments. Overlap is often defined in percentage terms, where 0% would mean that two adjacent segments have no data overlap while 50% would mean that any segment would share half the data of a neighbouring segment.
Let our collection of power spectral density estimates yielded by Welch’s method be , for all hydrological stations . Such estimates are highly dependent on model parameters , where is the size of each segment and is the overlap between neighbouring segments. To best estimate our optimal power spectrum for each streamflow time series, we determine parameter values which optimize the Whittle Likelihood over our entire collection of time series. This optimization can be written
(6) 
For this data, the above optimization results in parameter estimates of and . We apply Welch’s method using these optimal parameters to our streamflow time series, and generate our collection of optimal spectra .
Figure 3 shows the four power spectral density estimates corresponding to the underlying temporal processes in Figure 1. Figures 2(a), 2(b), 2(c) and 2(d) display remarkable similarity. All figures exhibit a low frequency component which exhibits significant power, and subsequently lose power as we move toward higher frequency components. The low frequency component corresponding to significant power is approximately located at , which corresponds to an annual periodic component  confirming the inherent seasonality in the data.
Given the clear similarity in these stations’ power spectra, we wish to further explore spectral similarity across the entire collection. We normalize for the amplitude of our power spectra with the same procedure as in Section 3. Let be our collection of optimal spectral trajectories. The vectors reflect relative differences in various stations’ spectra, and capture differences in key frequency components between streamflow spectra.
Analogously to Section 3, we now define a Spectral distance matrix that measures distance between normalized spectral trajectories. To better interpret inherent collective similarity, we transform our distance matrix
(7) 
and apply hierarchical clustering to the resulting spectral affinity matrix. Again, this affinity matrix is normalised so that all elements lie in . A high score (close to 1) in the affinity matrix would indicate that two hydrological stations exhibit similarly high power in key frequency components of their streamflow. A low score (close to 0) would indicate that two stations exhibit dissimilarity in their streamflow’s key frequency components.
Figure 4 displays the hierarchical clustering results when applied to our spectral affinity matrix. The mean of our spectral affinity matrix,
, is computed similarly (averaging the elements above the diagonal) and determined to be 0.79  significantly higher than the average temporal similarity. Furthermore, the cluster structure reveals the overwhelming spectral similarity among the collection. There is one predominant cluster (consisting of a small subcluster), and an outlier cluster which consists of only a small subset of hydrology stations. This analysis confirms that the periodic dynamics of the streamflow time series are highly uniform across the entire collection  where almost all streamflow spectra exhibit a high power low frequency component, and incrementally less power as one examines the spectrum against higher frequency components.
5 Governing physical process
Having observed the pronounced temporal and spectral similarity exhibited by Australian hydrological stations, regardless of their spatial location, we hypothesize that there may be a governing physical process which captures the dynamics of Australian streamflow. In this section we formulate an overall process, with each time series representing a potentially shifted version of this process.
For each of our time series , we seek to learn an offset , where each time series has a unique offset. We bound each such that it lies in . Let denote individual offsets for the time series , and be an overall governing process. We seek the offsets and overall process, , to be
(8) 
To minimize this loss function, we perform a sequence of updating steps. First, we update the governing process
. Here, we compute the mean of the current iteration of our governing process time series. Then, we update . This step involves updating the alignments for the underlying stations’ time series. Then, we update the governing process, , by computing another mean. Next, update . This procedure converges quickly after a few iterations. For initialization, we take to be the overall mean of the original, unshifted, time series.The initial estimate of the governing physical process and the estimate after the application of our algorithm are shown in Figure 5. The figure reveals several interesting takeaways. First, the spikes in aggregate aligned streamflow are highly periodic. First inspection suggests that a prominent spike in streamflow exists roughly once every year. This is certainly the most obvious pattern in the time series, with no clear insights in the periods between the big spikes. The peaks correspond to the significant power observed at low frequencies in our spectral analysis.
The second key observation is the broad similarity between the initial mean process, and the determined governing physical process, which minimizes our loss function. The distribution of offsets, shown in figure 5(a) confirms this. Approximately three quarters of the series are determined to have no offset or a minimal offset, suggesting that in most cases simply taking the mean of the underlying time series would serve as a good approximation to the overall process. An exhaustive list of nonzero offsets, separated by state, are shown in Table 1. The three states with the most significant number of nonzero offsets are Tasmania, Victoria and Western Australia. Interestingly, these three states also share the largest median (nonzero) offset value. However as a percentage of its total number of hydrology stations, Victoria’s 44 stations with nonzero offsets is significantly lower than Tasmania, Western Australia and Tasmania. As a percentage of the number of stations, the states that are most likely to exhibit phase shifting dynamics in the time domain are Tasmania (81%), Western Australia (69%) and South Australia (46.2%). As these states possess hydrology stations located at some of the most extreme points in Australia (by way of their spatial coordinates), this may indicate that there is some relationship between the offset , and a candidate hydrology station’s spatial location.
# nonzero offsets, % nonzero, and median value by state  

State  # nonzero offsets  % nonzero offsets  
ACT  0  0%  n/a 
Northern Territory  5  33.3%  87 
NSW  14  13.2%  80.5 
Queensland  11  13.75%  3 
South Australia  6  46.2%  11.5 
Tasmania  17  81%  181 
Victoria  44  36.6%  172 
Western Australia  36  69%  168 
Figure 5(b) shows the power spectral density for the governing physical process. Unsurprisingly, it displays the same key behaviours as the underlying stations’ streamflow time series. First, there is a low frequency high power component, indicative of the annual periodicity in streamflow, and reduced power in higher frequency components.
6 Evolutionary Dynamics in governing and underlying processes
Section 5
demonstrated a framework to determine a governing streamflow process. In doing so, we highlighted several key points. Critically, the underlying hydrological time series closely resemble the governing physical process’ behaviours. Most notably the temporal dynamics share a periodic spike in streamflow, which corresponds to high power at low frequencies, and decreasing power at higher frequency components in the frequency domain. However, it is possible that temporal or spectral dynamics may change over time if the underlying process is nonstationary. Furthermore, the underlying time series and the governing process may display differences in their behaviour prior to and after our alignment algorithm. In this section, we test the existence of nonstationarity among our collection of underlying streamflow processes, and in the governing physical process before and after the procedure introduced in Section
5.First, we turn to our collection of underlying streamflow time series. We start by generating a sequence of rolling correlation matrices, indexed by time. We select a rolling window of days, and compute the correlation matrix
(9) 
In the equation above, denotes the mean of the hydrology time series over any candidate rolling interval
. At each point in time, we compute an eigendecomposition and study the evolution of our eigenvalues
and eigenvectors
. To explore the explanatory variance exhibited by our first eigenvalue, a measure of the degree of collective behaviour within our streamflow data, we compute a normalized first eigenvalue
.We analogously compute the rolling correlation matrix for our collection of aligned time series, , and let the timevarying vector of eigenvalues and eigenvectors of our aligned time series’ be and respectively. We let the normalized first eigenvalue of our aligned collection be, .
In Figure 8 we show the normalized first eigenvalues, and for the initial and aligned time series collections respectively. There are some subtle differences between the functions. First, which is displayed in figure 6(a), exhibits more pronounced periodicity and a higher amplitude than which is shown in figure 6(b). That is, the first eigenvalue in the initial (unaligned) case accounts for more explanatory variance than in the aligned case. At first, this may seem counterintuitive. However, after aligning the time series as described in Section 5, a significant portion of the variance in our collection has now been accounted for. So one could argue that the captures a different feature of the collection’s variance, that which has not already been removed by alignment, and may provide similar insights to the explanatory variance exhibited by the initial collection’s second eigenvalue.
Next, we explore the evolutionary behaviour of both collections’ first eigenvector. The evolutionary eigenvectors are shown in figure 7, with figure 7(a) and figure 7(b) showing the initial collection and aligned collection’s first eigenvector, respectively. The first notable observation is the general uniformity in magnitude along all coefficients. That is, each hydrological station comprises a similar level of attribution to the first eigenvector’s total variance. Furthermore, this trend is consistent over time in both the initial and aligned collections. The second observation is a moderate increase in the uniformity of magitudes in the aligned collection, shown in figure 7(b). This is consistent with the core aim of our procedure, which translates the underlying time series such that the norm with the current governing process, , is minimized. To further explore this phenomenon, we compute the variance of timevarying eigenvector coefficients among both collections. The variance among the collection of initial time series is 1.03 x , while the aligned collection is 9.43 x . The modest reduction in variance is consistent with the increased uniformity observed in the figure. The reduction in variance of may indicate that the initial mean of the underlying time series, prior to the optimisation procedure, serves as a good estimate of an underlying physical process.
In keeping with our assessment of timevarying dynamics, we now turn to analyzing the power spectrum of the governing hydrological process. We wish to answer two main questions. First, do the spectral dynamics of the overall process resemble the dynamics of the underlying time series? Second, do we see differences over time in the power spectrum? Figure 9 shows the timevarying power spectrum of our governing process before and after our alignment procedure.
The spectral dynamics of the governing process display key similarities with the underlying collection of time series. Most importantly, they exhibit greatest power at the same frequency component, , representing the annual periodic component. Like the underlying time series, lower power is exhibited at higher frequency components.
Visual inspection of Figure 9 suggests that the power spectrum does not display significant variability over time, for either collection (figure 8(a) or figure 8(b)). At the beginning of our time window in 1980, the spectrum displays significant power at low frequencies, with decreasing power at higher frequencies. Although the precise amplitude at various frequencies does oscillate over time, the general pattern is relatively stable. The spectral similarity between the initial and aligned collections once again suggests that simply computing an average of the initial time series could serve as a good approximation to the underlying physical process.
7 Conclusion
In Section 3 we studied the similarity between hydrological stations’ temporal dynamics. Our trajectory analysis and associated hierarchical clustering revealed the existence of three predominant clusters. The three clusters shown in Figure 2 display a high level of intracluster similarity, with surprisingly low intercluster similarity. The temporal affinity norm of 0.26 highlights the low degree of affinity exhibited between trajectories belonging to separate clusters. Subsequent experiments reveal that these varied cluster behaviours were not related to stations’ spatial location, ruling out the assertion that the temporal dynamics of Australian streamflow may be related to geographic location. Despite the clear cluster separation, visual inspection suggests that most streamflow time series possess broadly similar features  small oscillations around a level close to zero, with sharp spikes which appear to occur on an annual periodicity.
Section 4 builds upon this analysis, introducing a new framework to explore similarity in periodic behaviours. Our experiments reveal uniform behaviours in the spectral dynamics of underlying stations’ streamflow. Figure 4 shows the existence of a majority cluster of spectral trajectories, and the spectral affinity norm of 0.79 confirms the profound similarity in streamflow spectral dynamics. This is primarily due virtually all locations displaying most significant power at at the frequency , corresponding to an annual streamflow cycle.
Given the general similarity in the temporal domain and the profound similarity displayed in the frequency domain, in Section 5 we introduce an iterative algorithmic procedure to determine a governing hydrological process. We introduce an based objective function, where each time series is translated to minimize an average loss. Figures 5 and 6 reveal several notable findings. First, the mean of all streamflow time series does a reasonably good job in estimating the general behaviour of the collection, and is similar to our estimated process after the application of our alignment procedure. Figure 5(a) shows that the majority of estimated offsets are in fact 0, confirming the sample mean’s utility as a rough estimate. Figure 5(b) demonstrates that the power spectrum of the governing process resembles that of the underlying time series, and therefore still resembles the overarching theme of periodic streamflow behaviour.
We round out our analysis in Section 6 where we examine the stationarity of such behaviours previously identified in the prior sections. In our timevarying PCA, we study the explanatory variance exhibited by the correlation matrix’s first eigenvalue, which is shown in Figure 7. The figure demonstrates a similar level of explanatory variance exhibited by the first eigenvalue in both the initial and aligned collections of streamflow time series. To further investigate, we accompany this analysis with a timevarying study of the magnitude of the first eigenvectors’ coefficients displayed by both collections. Both collections display relative uniformity in the coefficient magnitudes over time, indicating that there is no element, or collection of elements, along the vector accounting for the majority of the variance. Figure 9 shows the evolutionary power spectral density for both collections. It is clear that in both cases, the spectrum displays pronounced stationarity over time, exhibiting the same key features of significant power at low frequencies and decreasing power along higher frequency components.
8 Appendix
8.1 Mathematical objects
Mathematical objects table  

Object  Description 
# hydrology stations  
Streamflow time series  
Streamflow trajectory time series  
Temporal distance matrix  
Temporal affinity matrix  
Latent power spectral density  
Periodogram estimate of power spectrum  
Welch’s estimate of power spectral density  
Optimal spectra time series  
Optimal size of each segment  
Optimal overlap between neighbouring segments  
Optimal spectral trajectory time series  
Spectral distance matrix  
Spectral affinity matrix  
Offset for streamflow time series  
Governing streamflow process  
Initial collection timevarying correlation matrix  
Initial collection timevarying eigenvalues  
Initial collection timevarying eigenvectors  
Initial collection timevarying normalized first eigenvalue  
Aligned collection timevarying correlation matrix  
Aligned collection timevarying eigenvalues  
Aligned collection timevarying eigenvectors  
Aligned collection timevarying normalized first eigenvalue 
Comments
There are no comments yet.