Many engineering applications call for the generation of synthetic time series of wave conditions, e.g., the simulation of as yet unobserved and possibly unanticipated, high-impact storms (e.g., VanDongeren2017, ; Sebastian2017, ); the evaluation of long-term morphodynamic impacts of coastal interventions (e.g., Vitousek2017, ); and the planning and safe execution of offshore operations, where the prediction of calm periods is important (e.g., leontaris2016probabilistic, ).
In principle, sea storms are segments of multivariate temporal processes of metocean variables that pose a hazard to the environment or operation of interest. Typically, these processes are described by hourly statistics, for example, the significant wave height, which is computed from a spectrum of individual waves. The processes exhibit strong state-to-state autocorrelation on short time scales, seasonal cycles on annual and multi-annual time scales, inter-series dependences and, potentially, long-term trends (e.g., Mendez2006, ; Mendez2007, ; Solari2011, ). These statistical features make it challenging to model time series of metocean variables, including sea storms.
Many simulation methods are based on renewal processes to model alternating sequences of storm and calm durations (michele2007multivariate, ; callaghan2008statistical, ; corbella2013simulating, ; li2014probabilisticestimation, ; wahl2016probabilistic, ; davies2017Improved, ). For the storm periods, high temporal resolution time series of the relevant metocean variables are then derived from an idealized ‘storm shape’. For most applications hourly values are needed. A typical assumption is that each univariate time series segment corresponds to two sides of a symmetric triangle whose height determines the peak value and whose base is defined by the storm duration. Similar schematizations with alternative geometrical shapes have also been suggested (Martin-Hidalgo2014, ; Soldevilla2015, ). Furthermore, the peak values of different processes are modeled as interdependent, for example using copulas.
An advantage of this approach is that the modeling effort can be reduced, because features of the metocean time series that are less relevant for the application are not resolved. An example are serial dependence or dependencies between the variables during calm periods. On the other hand, resulting models are application specific, because they rely on predefined storm shapes and critical threshold values, which are likely to differ per application. For instance, an operating vessel can be sensitive to metocean conditions that a sandy beach is not. An alternative to the methods based on renewal processes is to model complete time series. This increases the modeling burden, but allows for more flexibility in terms of potential applications.
Currently, three lines of research concentrate on simulating multivariate time series of metocean variables with high temporal resolution. Guanche et al. (Guanche2013, , and references therein) developed a simulation method based on statistical downscaling. The authors statistically simulate time series of larger-scale sea level pressure fields with autoregressive moving average (ARMA) models from which they then derive local sea state time series.
Furthermore, ARMA models have been used to directly represent time series of metocean variables at a specific location, most of them at three-hourly scales. Multiple studies exist on univariate time series of significant wave height (Athanassoulis1995, ; GuedesSoares1996a, ; GuedesSoares1996, ; Scotto2000, ; Stefanakos2006, ). Extensions to bivariate processes have been made by including the mean wave periods (GuedesSoares2000, ) and by including surges (Cai2008, ). In addition to significant wave height and peak period, Solari and van Gelder Solari2011
incorporated parameters related to wind speed, wind direction and wave direction, thus simulating five interrelated processes. The bivariate and multivariate approaches used so-called vector ARMA (VARMA) models, which are able to capture linear interdependencies between multiple time series. However, Solari and van Gelder reported that dependencies could not always be adequately represented.
Finally, copulas and vine-copulas have been adopted to model both serial dependence as well as inter-series dependencies of metocean processes. For instance, Leontaris et al. leontaris2016probabilistic simulated wind speeds and significant wave height, while Jäger and Morales Nápoles jager2017joint simulated significant wave height and mean zero-crossing periods. A comparative study of a copula-based serial dependence model to an ARMA model for significant wave height time series has been conducted by Solari and van Gelder Solari2011a . They found that storm frequency and persistence of storms were better represented by the copula-based model, whereas longterm autocorrelation was better represented by the ARMA model.
The above studies used different techniques to account for non-stationarities. The simplest approach has been to focus on the most important season jager2017joint or to piecewise model seasons or months li2014probabilistic ; wahl2016probabilistic ; leontaris2016probabilistic . Other studies have used a superposition of linear or cyclic functions of time Athanassoulis1995 ; GuedesSoares1996 ; Stefanakos2006 ; callaghan2008statistical ; Solari2011 ; corbella2012predicting and climate indices as co-variates Mendez2006 ; Mendez2007 ; serafin2014simulating ; davies2017Improved
to represent trends or seasonal cycles on semiannual to decadal time scales. Climate indices under consideration were the North Atlantic Oscillation (NAO), the Southern Oscillation Index (SOI), the Pacific-North America (PNA) and the El Niño-Southern Oscillation (ENSO) index. Another difference between the techniques is that some studies decompose the metocean processes into seasonal mean process, a seasonal standard deviation process and a stationary process(Athanassoulis1995, ; GuedesSoares1996, ; Stefanakos2006, ; wahl2016probabilistic, )
, while the others apply non-stationary probability distributions (i.e., distributions with time-varying parameters)(Mendez2006, ; Mendez2007, ; callaghan2008statistical, ; Solari2011, ; corbella2012predicting, ; serafin2014simulating, ; li2014probabilistic, ; leontaris2016probabilistic, ; davies2017Improved, ).
In this article, we develop a new probabilistic simulation method for joint time series of non-stationary wave parameters with an hourly resolution. More precisely, the wave parameters are the spectral significant wave height, , the mean-zero crossing period, , and a directional regime . In the remainder of the article we use this notation for the variables. An overview is given in Table 1.
|Significant wave height||m|
|Mean zero-crossing period||s|
|Wave direction regime|
While we reapply many techniques that have been suggested in the literature cited above, the method presented here distinguishes itself from others on five principle points:
We develop a data-driven equation for the limiting wave steepness condition at the study location and use it for an initial variable transformation. In this way, we separate in our modeling the deterministic part of the relationship between and , which is steepness-induced wave breaking, from the stochastic part, which is due to common meteorological and geographical factors.
Modeling of the mean wave direction is simplified by assuming a categorical variable with two possible values, north and southwest, which we refer to as wave direction regime. The assumption is reasonable given the geographical context of the measurement station. The main advantage is that we circumvent challenges related to modeling a circular variable and avoid inaccuracies that could arise from ignoring the circular aspect. Time series of the wave direction regime are modeled as a seasonal alternating renewal process, inspired bySalvadori2006 .
Instead of applying a VARMA model with joint-normally distributed residuals to the bivariate time series ofand , we estimate two univariate ARMA models with a non-normal joint residual distribution constructed via a copula. Recent examples of such an approach for other types of environmental time series can be found in Erhardt2017 ; Bevacqua2017 .
The wave direction regime is used to trigger regime switches in the joint residual distribution to account for possible differences in the statistical characteristics of northern and south-western waves. Differences are expected, because south-western waves are mostly wind-sea due to a limited fetch length, while northern waves can be a mixture of swells and wind-seas.
Similar to existing studies, we use Fourier series to characterize a seasonal mean process and a standard deviation process. However, we assume that the Fourier coefficients are random variables, potentially dependent, instead of constants in order to represent inter-year differences and dependencies between the processes on yearly time scales.
The paper is organized as follows. Section 2 introduces a data set from the measuring station Europlatform in the Dutch Southern North Sea. This data set will be used to develop and illustrate the simulation method. The section also shows how the wave direction affects the bivariate distribution of and and motivates why wave directions are clustered into two regimes. Sections 3 and 4 represent the core of this article. Section 3 develops the methodology for jointly modeling time series of the wave parameters. Section 4 shows simulation results. Finally, Section 5 discusses the main limitations of the methods and Section 6 contains the conclusions.
2 Data and Regime Definition
The data stems from the Europlatform, which is located 38km off the coast of Rotterdam (N, ) at a water depth of approximately 5m (Figure 1). The measuring station is operated by the Dutch Ministry of Infrastructure and the Environment (in Dutch: Rijkswaterstaat). Our data set consists of hourly measurements of three wave statistics, , and the mean wave direction, for a period of 24 years (1 Jan 1991 - 31 Dec 2015).
The existence of leap years complicates time series modeling on an hourly scale, because the number of hours per year varies. Annual seasonal processes would thus have a period that differs by hours if the year is a leap year as compared to when it is not. To avoid this, we introduced a new calendar for modeling purposes, in which all years are equally long. This calendar assumes that all Februaries consist of days and additional hours. Thus, each year has hours, instead of non-leap years having hours and leap years having hours. Of course, days in different years start at different times of the day, but this is not relevant for our application and has no effect on the results. However, the reader should be aware that dates mentioned and displayed do not exactly correspond to actual dates and times.
The data coverage is higher than
; most missing values arise before 2003. The period 1 Jan 2003 - 31 Dec 2014, according to the new calendar, has only five instances in which values of the three wave statistics are jointly missing. We have filled these by linear interpolation. If a component of the simulation model required a complete time series record without missing values for parameter estimation, this shorter-length record was used and this is mentioned in the corresponding section. Otherwise, models were estimated based on the full-length record.
The observed time series for and are shown in Figures 1(a) and 1(b). Most waves either originate from distinctly northern or south-western directions (Figure 3). Because of the geographical characteristics of the location (cf. Figure 1), waves from northern directions can be swells, wind seas or a mixed sea state, while waves from south-western directions are mainly wind seas. For this reason, we expect differences in the statistical properties of northern and south-western waves.
Issues can arise when ignoring the circular nature of a variable. For a variable in polar coordinates and are identical directions. Furthermore, cannot be called larger than and neither is their meaningful average. To analyze and model such a variable, many standard statistical tools or measures for ratio variables, such as and , might not be suitable. For example, it would be difficult to interpret a rank correlation of the mean wave direction with another variable. If we ignore the circular nature of the mean wave direction and compute its rank correlation with , we obtain a value of . The calculation assumes that is the smallest and the largest possible direction. Now suppose that, we would have defined mean wave direction as the direction into which the waves are traveling, contrary to the norm, which is the direction of origin. All our measurements would be shifted by , but the dependence between mean wave direction and would remain the same. However, we compute a rank correlation value of for this case. The example shows the danger of obtaining misleading results, when neglecting the circular nature of mean wave direction.
In the present geographical context it seems natural to represent the wave direction as a categorical variable with two states in order to circumvent issues related to circularity. Hence, the data have been partitioned into two clusters representing the two main directional sectors and a new variable, the wave direction regime, is defined as follows:
3 Simulation Method
The statistical simulation method builds on several modeling steps with different probabilistic models. This section motivates and explains the steps rather than describes the underlying probabilistic models. However, the reader can find the main concepts of ARMA processes and copulas explained in A and B. These appendices also point to introductory literature.
The modeling steps are shown in the flowchart in Figure 4. First, is derived according to equation (1) from the mean wave direction series. Then, the durations for which the directions remain in each regime before switching to the other are represented by a seasonal renewal process (Section 3.2). Independently thereof, a limiting wave steepness condition is estimated for the collected data and used to remove the effects of steepness-induced wave breaking (Section 3.3.1). Next, the data of and are normalized and decomposed into stationary and non-stationary processes (Section 3.3.2). The non-stationary processes are modeled using Fourier series with random coefficients (Section 3.3.3), while the stationary processes are modeled as ARMA using a regime-switching, joint residual distribution (Section 3.3.4). The regime switches are triggered by .
3.2 Wave Angle Regimes
The process is modeled as an alternating binary renewal process, following Salvadori2006 , who described wet and dry periods of precipitation in this way. For this application, the durations for which waves are coming from one of the two directions are random variables, and . For example, the initial direction wave direction regime is and remains that for a time . Then it switches to and remains that for a time . It is then for time , and so on. Figure 5 illustrates the process.
An alternating renewal process supposes that both sequences and , , are independent and identically distributed. However, and may be dependent. We made two modifications to this set up. On one hand, we also allow to depend on , . On the other hand, we assume that both and , , depend on the time of the year as well. Thus, we estimated a bivariate distribution for and for for each season, using a decomposition into univariate distributions and a copula.
We could not find adequate parametric univariate distributions among well-known families and suspect that this is caused by the many one-valued observations (i.e., many durations are hour). However, we did not investigate if and how so-called zero-inflated probability distributions could be adapted to this problem (e.g., Zuur2009, ). For simplicity, and since we do not need to extrapolate beyond the range of the observations, we used the empirical distribution functions. Figure 6 shows box plots of the observations for and for the four seasons. Seasonal differences are more pronounced for .
The last column of Table 2 shows the empirical values of Kendall’s for , , and for , . Both variable pairs are positively associated. Thus, a wave direction regime tends to persist longer when the duration in the preceding regime was long than when it was short. This tendency is stronger and seasonal differences more distinct for than for . Bivariate copulas were selected according to the Akaike information criterion (AIC) using the VineCopula package (Schepsmeier2017, ), which compares twelve different families and, if applicable, their rotated versions. The corresponding parameters were estimated by maximum likelihoods. The selected families, parameter estimates and Kendall’s for , , and for , , are also given in Table 2. Bivariate density contour lines for observed data of the two variable pairs and the selected copula models indicate that the fit is valuable for our simulation purposes (Figures 7 and 8).
3.3 Significant wave heights and mean zero-crossing periods
3.3.1 Modeling a limiting wave steepness condition
While the relationship between and is in large part stochastic, there is a physical limit on the maximum steepness that individual waves can attain. As soon as waves approach this limit, they break111This is different in shallow water, where depth-induced breaking occurs before steepness-induced breaking.. Wave steepness is defined as wave height divided by wave length, but can be formulated as a function of wave height and wave period. In terms of and , it is
The limiting steepness condition is clearly visible in the scatter plot of and (Figure 8(a)): For a given the corresponding cannot exceed a certain upper limit, or equivalently, for a given the corresponding is bounded from below. Nonetheless, we observed a few data points that appear to be unusually distant from the others (gray crosses in Figure 8(a)). We suspect that these are anomalies in the measurements and substituted them by missing values before proceeding with the data analysis.
Recent studies showed that bivariate distributions constructed with - or
-parameter copula families can be suitable to reflect the limiting steepness condition and to represent the joint distribution ofand (vanem2016joint, ; jager2017joint, ; Zhang2018, ). However, we cannot directly apply such approaches to the ARMA modeling in this study. Instead, we separate the deterministic part and the stochastic part of the relationship between and to model them individually. The idea is to remove the deterministic part by subtracting the lower bound from :
where the lower bound can be determined from and as
After this transformation, we proceed with time series modeling of the pair .
A difficulty that arises is the choice of . Constant values for , as put forth by (Michel1999, ; holthuijsen2010waves, ), do not seem to fit our data set (cf. Figure 8(a)). To achieve a better fit, we set out to estimate a maximum steepness condition from the data. The scatter plot indicates that observations of are not constant, but depend on . More precisely, there appears to be a horizontal asymptote roughly below , while the observed is rapidly decreasing for small . To account for this behavior we fit the curve
to the data.
Details on the motivation for this functional form and on the fitting procedure can be found in Appendix C. The resulting estimates are , cm (the upper bound was fixed at ) and cm. The fitted -curve is also shown in Figure 8(b) and the corresponding -curve is depicted in Figure 8(a) (orange lines). It should be noted that while this relation fits the observational data well in statistical sense, we do not propose to use this formulation to describe the physics of wave steepness-related breaking.
Next, we substituted the data points that represented waves which are too steep according to the fitted limiting condition with missing values. (They amount to less than of the data.) Finally, we transformed to (equation 3). Figure 8(c) shows a scatter plot of the pair . In the remainder, we will denote , to simplify notation.
3.3.2 Normalization and Deseasonalization of data
). Trends do not seem apparent. Moreover, the data are notably right skewed and strictly positive. Different methods can be used to analyze and model such time series data(e.g., Box2015, ). The prevailing approach is to deseasonalize the data and to model the seasonal components and the stationary component separately.
We identified two main procedures for deseasonalization in the literature on wave parameter modeling and simulation that appeared successful. Suppose , , is a time series for an arbitrary wave parameter. The first procedure involves two steps (Stefanakos2006, ). One step is to transform the data to reduce skewness:
with being a suitable monotone transformation function. This facilitates finding a adequate residual distribution of the ARMA model for the stationary component, but is also important for simulations, as will be explained in the next paragraph. A second step is to represent the transformed time series data as a realization of the following process:
where and are slowly changing non-stationary components exhibiting seasonal feature and is a high-frequency, stationary component. Each of the components is then modeled and is obtained by combining them according to equation (7). GuedesSoares1996 followed a similar procedure, however they first decomposed the data in seasonal and stationary components and then applied a skewness reducing transformation to the stationary data series. The second procedure relies on using a non-stationary distribution function to transform the data to standard normal (Solari2011, ). With this approach no additional transformation is necessary.
We used a method in line with the first approach to develop the simulation model. As potential transformations we considered the Box-Cox family (Box1964, )
and a transformation to standard normal using the empirical distribution function in the probability integral transform:
is the inverse of the standard normal cumulative distribution function andis the empirical distribution function estimated for the data . All three transformations have proven valuable in similar applications (GuedesSoares1996, ; Cunha1999, ; Stefanakos2006, ).
The choice of transformation strongly influenced the simulation results that would be obtained at later stages. Box-Cox transformations whose was estimated by maximum likelihood or a (not shifted) logarithm resulted in unrealistically high simulated values. For instance, simulated were in the order of m, while the highest observed is below m. In contrast, choosing higher values of or shifting the logarithm by a positive constant would result in reasonable maximum heights, but also in negative ones. These issues arose both when transforming the data before the decomposition as well as when transforming the stationary component.
The transformation to standard normal via the probability integral transform resulted in simulation values that were representative of real values, at least when applied before the decomposition. When transforming after the decomposition, the simulated values of , and would sometimes combine to a negative value. This is not surprising, since we modeled and independent of , as will be explained in the next sections. Hence, the transformation to standard normal via the probability integral transform before a decomposition was chosen. Figures 9(a) and 9(b) show the relationship between transformed and original variables.
After each of the time series , , has been transformed to the series , the seasonal components and of equation 7 are extracted using a smoothing technique. can be interpreted as a local mean and as a local standard deviation. Both were computed with sliding windows and an Epanechnikov kernel as weighting function for smoothing, as follows:
where is the Epanechnikov (parabolic) kernel. The bandwidth was set to hours, which amounts to 30 days and is in line with the common practice to deseasonalize oceanographic variables via monthly statistics (recent examples are wahl2016probabilistic, ; davies2017Improved, ).
3.3.3 Model for the non-stationary components
Figures 11(a)-11(d) show the seasonal patterns and the inter-year variability of the non-stationary processes. At this stage, we neither studied climatological covariates nor cycles longer than year, which is different to many of the studies reviewed in the introduction. Instead, we developed another approach assuming that it is more important to represent the range of inter-year differences than their temporal sequencing.
We start our modeling by following earlier studies Mendez2006 ; Mendez2007 ; serafin2014simulating in using a -factor Fourier series in order to represent both annual and semiannual cycles following earlier studies. However, we additionally introduce randomness in the Fourier coefficients, which will cause them to vary each year and thus produce inter-year variations is simulated time series of and .
For simplicity, we let the coefficients vary randomly according to a multivariate Gaussian distribution. This also accounts for dependencies between the four local mean and standard deviation processes. In the future, it might be worthwhile to further investigate, if the coefficients could be predicted by climatic covariates.
The first step of this approach involves dissecting the -year data series , , and into -year segments. To easily identify them, we re-index the series using a double index, for the year and for the hour within the year2228766 hours correspond to one year when“correcting” for leap years.. Then, we estimate -factor Fourier series for each segment:
where and . To obtain the -year series, the fitted -year segments are concatenated, for example , and discontinuities at the transitions from one year to another are smoothed out using cubic spline interpolation. The continuous fitted series obtained in this way explain more than
of the variance in the corresponding data series (Table3).
In the next step, we assume that the estimated coefficients are i.i.d. observations of random variables , , and estimate a joint distribution for them. Since the sample size is small () compared to the dimension of the random vector (), an extensive analysis of its distribution seems infeasible and for simplicity we assumed it to be multivariate Gaussian.
We modeled the distribution in terms of univariate Gaussian marginals and a multivariate Gaussian copula. Despite the relatively high dimensionality, the correlation matrix is sparse. On one hand, the basis functions of a Fourier series are mutually orthogonal, hence their coefficients uncorrelated leading to many zero-valued entries. On the other hand, we performed the bivariate asymptotic independence test based on Kendall’s for the remaining pairs of coefficients, which is implemented in the VineCopula package (Schepsmeier2017, ). According to the test, most pairs are independent. Only three correlations were found to be significant: , , and , where
denotes the product moment correlations. Thus, these correlation values correspond to the only three non-zero off-diagonal elements of the the correlation matrix parameterizing the Gaussian copula. The parameters of the univariate distributions were estimated by maximum likelihood and can be found in Table4.
3.3.4 Model for the stationary components
The deseasonalised processes, obtained through
are represented as ARMA. To find adequate orders and for the two series, we followed this process: We used well known properties of the behavior of the ACF and PACF to make an initial guess about the orders p and q (for methodological background on ARMA processes see A). For these p and q, we estimated the parameters of the ARMA based on maximum likelihood, as implemented in the arima() function of R’s stats package (CoreTeam2015, )
. With diagnostic plots, we checked, if the obtained residual series resembled white noise and if the model could produce simulated series with ACF and PACF behavior resembling the observed. If this was not the case, we iteratively increased the ordersand , estimated model parameters and reassessed the adequacy of the model.
The initial guess for was and , since its ACF decays slowly and its PACF has a cut of at lag . This choice of parameters was found to be adequate, which we show here with the plots of the ACF and PACF of the obtained residuals and squared residuals in Figures 14(a)-(d). The initial guess for was and , since both its ACF and PACF decay slowly while being dominated by damped sine waves. These orders were not found to be adequate yet, but and were. The plots of the ACF and PACF of the obtained residuals and squared residuals are given in Figures 15(a)-(d). The estimated parameters are listed in Table 5.
The residuals, , of the fitted model for should be (almost) i.i.d. and following an arbitrary distribution with zero mean and constant variance. Because we expect differences across the wave direction regimes, we estimate conditional distributions for the residuals given the regime, , for each
. We cannot expect these distributions to be Gaussian and we need to find a suitable parametric family of distributions. We considered the following families in this study: normal, skew-normal, t and skew-t. We select the family and attest its goodness of fit with visual diagnostic tools. We examine qq-plots and check if applying a probability integral transform (PIT) to the residuals via the selected family results in uniformly distributed values. We refer to these as PIT residuals and visually assess their uniformity with histograms. All four univariate distributions are best approximated by a skew-t family. The estimated parameters can be found in Table6. Diagnostic plots to verify the adequacy the marginal distribution are shown in Figures 14(e) and (f) and 15 (e) and (f).
. Standard errors are given in parenthesis.
: (a) ACF of residuals, (b) PACF of residuals, (c) ACF of squared residuals, (d) PACF of squared residuals, (e) histogram of fitted quantiles for northern directions, (f) histogram of fitted quantiles for southern directions.
Finally, the residual processes, and could depend on each other, because the original time series are interrelated. Therefore, we construct regime-dependent bivariate residual distributions via copulas. We fit them on the empirical ranks of the conditional residuals normalized to . As in section 3.2, we use the AIC criteria for model selection and estimate the parameters by maximum likelihood. Table 7 contains the selected bi-variate copula families, estimated parameters and Kendall’s . Figure 16 shows the bivariate density contours for observed and simulated residuals and attests, by visual diagnostic, a good fit of the selected copula models.
We created synthetic hourly records of , and for years with the developed simulation method. We first describe the results for and then the results for the and .
4.1 Wave Angle Regimes
To evaluate the method’s ability to generate realistic records of , we compare the percentage of time per year in which waves come from the either direction (Southwest or North) in observed and simulated data. We only use years (2003 - 2014) of the observed record, because the remaining years have gaps, making it more difficult to estimate directional persistence.
Figure 17 shows boxplots of percentages of time per season with south-westerly wave directions. The model correctly reflects that the highest percentage of south-western waves occurs in spring, while the lowest percentage of south-western waves occurs in autumn. All except one measured percentage fall within
times the interquartile range from the lower and higher quartile, respectively. Also a two-sided, sample-based Kolmogorov-Smirnov (KS) test does not reject the null hypothesis that observed and simulated percentages are equal in distribution for any of the seasons. P-values range fromfor winter to for autumn.
4.2 Significant wave heights and mean zero-crossing periods
Figures 17(a) and 17(b) show examples of the simulated records for and . The lengths of the simulated series are 24 years, the same as the lengths of the observed series shown in Figures 1(a) and 1(b). By visual comparison of the Figures, the simulated series appear to reflect the main characteristics of the observed ones. In particular, the model produces annual cycles and inter-year differences. Nonetheless, the maximum never exceeds its highest observed value, while the maximum mean zero-crossing period does exceeds its highest observed values.
To evaluate the method’s ability to generate realistic records of and corresponding the direction regime time series in more depth, we compare the univariate and bivariate empirical densities of simulated hourly values and their persistence above predefined thresholds to the ones observed. For the comparisons, we rely on visual diagnostics instead of statistical goodness of fit tests, because any model would be rejected for a sample size this large: there are observed and simulated hourly values.
4.2.1 Univariate and Bivariate Densities of Hourly Data
In this section, we present figures of annual densities of the different variables under consideration. In every figure the left panel shows the densities for waves from both directional regimes, the middle panel shows the densities for waves from the Northern regime and the right panel shows the density for waves from the Southwestern regime.
Figure 19 shows the annual empirical probability densities of hourly for 1000 simulated years together with the annual empirical probability densities the 24 observed years. The simulation model can reproduce the characteristics of the densities of the observed data. The densities of the simulated series form a cloud around the densities of the observed series. The differences between annual densities are notable in all directions in simulated and observed series. Moreover, northern waves tend to be higher than southwestern waves and have a more narrow distribution in both cases.
Similarly, Figure 20 shows the densities for . Again, annual differences are notable in the observed series, but differences between directional regimes are less pronounced. This is captured by the simulated series for all, but two years. In these cases, the densities of observed values do not fall within the cloud of densities of simulated values. This is most pronounced for the mode of the density for waves from the southwestern direction.
Next, we computed the steepness of waves. This is another important property of waves, which depends on both and (equation 2). Figure 21 shows the densities for wave steepness. While the densities of the simulated values form a cloud around almost all densities of the observed values, they appear to have different characteristics by visual comparison. Notable is that some of the densities of the observed values for the northern regime appear to have bimodal densities. This is not the case for densities of simulated values (To see this, we inspected them one by one).
The model appears to approximate the density contours well. In particular, the maximum steepness condition is well-represented. Nonetheless, a couple of features are not accurately represented. On one hand, the joint maxima of and at this density level tend to be underestimated by the model. On the hand, tends to be higher than observed for for waves from the southwestern regime.
The model approximates the density contours less well than the density contours. When we are not distinguishing between the directional regimes, the model still produces realistic contours, though tends to be overestimated for values around . For wave from the northern regime, this is the case as well. In addition, the joint maxima of and at this density level end to be underestimated. For waves from the southwestern regime, tends to be overestimated for values of smaller than . On the other hand, joint maxima of and are capture well.
4.2.2 Storm durations and interarrival times
In this section, we investigate the capability of the model to simulate sequences of storms. We focus on storm durations and interarrival times. In the literature these are often referred to as persistence regimes above and below predefined thresholds. We analyzed persistence for six pairs of thresholds for and : both variables jointly exceeding their respective univariate , , , , , and quantiles. The corresponding values are listed in Table 8
The different quantiles were chosen so that they would represent a wide range of conditions. The choice of the quantile is motivated byli2014probabilistic who simulated sea storms for a location in the Dutch North sea with comparable geographical properties. They use a threshold of m in combination with a surge threshold of m. They chose these thresholds, because more severe conditions are likely to cause morphological change (Quartel2007, ). This value of corresponds to the quantile in our data set, which is why we selected it as well.
Figure 24 shows densities of storm durations and Figure 25 shows densities of storm interarrival times in observed and simulated data. The six panels correspond to the six pairs of thresholds listed in Table 8. Each panel shows a density that has been computed from a 12-year segment of the observed time series (the same one that has been analyzed in Section 4.2) and 83 densities that have been computed from separate -year long segments of the -year long simulated time series.
In general, the simulation model produces realistic storm durations and interarrival times for all pairs of thresholds. In the case of storm durations, the density computed from observed values lies within the cloud of densities computed from simulated values for the lowest and the highest pair of thresholds. For the other pairs, the mode of the observed density is higher than any mode of the simulated densities. Furthermore, some simulated storm durations are approximately two to three times as long as the longest observed storm durations, depending on the pair of thresholds. In the case of the storm interarrival times, the densities computed from observed values lie within the cloud of densities computed from simulated values for all pairs of thresholds. Similarly to the storm durations, some of simulated storm interarrival times are approximately one and a half to three times as long as the longest observed storm interarrival times, depending on the pair of thresholds.
Furthermore, the model simulates a realistic number of storms, when comparing the observed year segment to the simulated year segments. Of course, this is expected given the results on storm duration and interarrival time. For all pairs of thresholds the observed number of storms lies within the - and the - quantile of simulated number of storms. The exact values are reported in Table 9.
|Observed number of storms|
|quantile of simulated number of storms|
|quantile of simulated number of storms|
While the simulation method is suitable to generate time series that exhibit statistical features relevant for coastal and offshore risk analysis, it still has limitations that affect its applicability. We discuss the main limitations in this section.
The modeling of the wave direction has been simplified by assuming a categorical variable with two states which representing two directional sectors: north and southwest. The main advantage of this approach is that difficulties related to the circular nature of the variable can be avoided. Moreover, the results in Section 4.2 show that time series of can be modeled accurately as a seasonal renewal process. However, this simplification can be a limitation for practical applications, because the wave direction affects structural loading as well as sediment transport: Loads are highest for waves that hit the structure under normal angles (e.g., Goda2008 ) whereas erosion can be significantly higher for waves that hit the coast under non-normal than normal angles (e.g., Heijer2013 ). Hence, it would be desirable to further develop the proposed methodology in order to simulate waves at a higher directional resolution. Eventually, additional variables, such as wind speeds and surges should be included as well to broaden the applicability of the method. Wind speeds are often crucial for offshore operations (e.g., leontaris2016probabilistic ), while surges are central to coastal risk assessments and the prediction of longterm morphological changes (e.g. li2014probabilisticestimation ; davies2017Improved ).
The choice of using two regimes for the wave direction was inspired by its clear bimodal distribution (cf. Figure 3) and the univariate and joint densities of hourly values of and being different in the two modes. We tried to capture the difference by using a regime-switching joint distribution for the residuals of the ARMA models corresponding to and . With this approach, we were able to capture part of the difference, but not all. In particular, the approach worked well for the univariate densities of hourly values of . The univariate densities of hourly values of are also captured, but the difference across regimes is less noticeable for this variable. Nonetheless, the bi-modality of the densities of hourly values of in the northern regime could not be represented. In terms of bivariate density, the model produced differences between the regimes, but they were not as pronounced as the ones observed. Hence, it does not appear to be sufficient to rely on regime switches in the joint residual distribution to capture the difference between northern and southwestern wave conditions. To improve this, future research could explore regime-switching ARMA parameters and and the extension to a vector-ARMA. However, investigating these options would require the derivation and implementation of parameter estimation procedures. As far as we know, estimation procedures have only been implemented for AR processes, but not for ARMA processes (e.g., tsdyn2018 ; mswm2018 ).
Another limitation of the methodology is due to the use of the empirical cumulative distribution function in the initial normalization of the data (cf. equation 10): In simulations values of are obtained by applying an inverse PIT to simulated values of using the inverse empirical cumulative distribution function of . Hence, the simulated values of span the same range as observed values of . In other words, we will never simulate more extreme values of than we have observed. The case of is different, because simulated values of , and not of , are obtained from simulated values of via the inverse PIT. Hence, simulated values of span the same range as the corresponding observed values. However, next
(cf. equation 3) and higher values than observed can arise for for certain combinations of and . This behavior in the extremes can be recognized in the simulated time series shown in Figures 17(a) and 17(b). Since we were aware of this limitation, we did not investigate the model behavior for extreme values, such as persistence above quantiles larger than or bivariate contours with density lower than . Nonetheless, finding an alternative variable transformation that overcomes this limitation and exploring the methods skill to simulate extremes is relevant for applications related to the design of infrastructures or reliability analyses.
In this paper, we presented a simulation method for joint time series of , and . The latter is a categorical variable that distinguishes northern and southwestern waves. Time series can be simulated at a high resolution of hour, which is useful for risk analyses in various coastal and offshore applications. The method has been applied to a data set in the Dutch North sea.
The method contains several modeling steps and relies on renewal processes, Fourier series with random coefficients, ARMA processes, copulas, and regime-switching. A particular feature is a data-driven estimate for a wave height-dependent limiting wave-steepness condition, which we use to describe part of the dependence between and and which facilitates the copula-based dependence modeling later on. Similar to many other studies, annual seasonality is represented by Fourier series. The coefficients are modeled as inter-dependent random variables to account for inter-year differences. At this point we did not consider climatic covariates and recommend to examine, if they have predictive skill for inter-year differences.
The stationary components of the two processes are represented as ARMA with a regime-switching joint residual distribution, constructed with copulas. The regime-switches are triggered by switches in wave direction from North to Southwest. While these regime-switches result in differences between bivariate distributions of simulated and when conditioned on the northern and southwestern regime, they are not as pronounced as in observed data. We recommend that future research is directed at improving and extending the simulation method as to better capture these differences. Nonetheless, the unconditioned bivariate distribution of and appears to be represented adequately.
Moreover, storm durations and storm interarrival times are well captured for two different storm definitions, which rest on different critical threshold values for and . As storm sequences are adequately represented, the model has potential value for applications in coastal and offshore engineering, such as the prediction of long-term morphological changes, or the planning and budgeting of offshore operations.
Appendix A Autoregressive moving-average (ARMA) models
ARMA models provide a parsimonious description of weakly stationary time series. For a comprehensive introduction to the topic see, for example, Box2015 , Brockwell2016 or Shumway2017 . A stochastic process is considered to be weakly stationary if all its moments up to the order of two do not vary in time. Thus, the mean and the variance of random variable is equal to a constant and the covariance between any pair , only depends on but not on .
A process is called ARMA, if it can be expressed as the following function of past observations, , and past residuals, :
where is a constant intercept term, and are non-zero constants, and the residuals are independent and identically distributed (i.i.d.) with zero mean. If every is zero, the process is said to be a moving average process of order , , and if every is zero, then it is called an autoregressive process of order , .
For given orders and , the model parameters, and , can be estimated by maximum likelihood or by minimizing the conditional sum of squares of the fitted residuals. An indication for suitable orders can usually be found by inspecting the autocorrelation function (ACF) and the partial ACF (PACF). The ACF at lag is defined as
where denotes the product moment correlation. In contrast, the PACF measures the correlation between and , for , with the linear effects of removed. In order to define the PACF, let denote the estimated mean from a regression of on and denote the estimated mean from a regression of on . The PACF for lag can then be defined as:
ARMA models with different orders have distinctive ACF and PACF behaviors. The ACF of an process decays slowly, while its PACF has a cut off at lag . Conversely, the ACF of an process has a cut off at lag , but its PACF decays more slowly. Finally, both ACF and PACF tail off in processes and are dominated by mixtures of exponentials and damped sine waves after the first lags and lags respectively.
Appendix B Copulas
Copula models are used to represent the joint behavior of several random variables. With a copula approach, the main limitation that is encountered with classical families of bivariate distributions (e.g., Gaussian, student-t, Gamma or generalized extreme value) is avoided. The limitation is that the individual behavior of the of variables must be characterized by the same family of univariate distributions as the joint distribution. For example, if two variables are joint normally distributed, each of them must follow a univariate normal distribution as well. However, in many practical applications the joint distribution of variables which follow different univariate distributions is sought.
A copula describes the dependence between random variables, separately from their respective marginal behaviors. The underlying theory is based on Sklar’s theorem (Sklar1959fonctions ), which states a joint distribution function, , of random variables, , with univariate distribution functions, , can be represented by a copula, , in this way:
The copula itself is a -variate distribution function on with uniform margins. Thus, a valid model for can be constructed from appropriate models for and for .
In this article we focus on the bivariate case. In the literature, many parametric copula families have been proposed, covering a wide range of dependence structures including tail dependences. For instance, Nelsen2006 ; joe2014dependence ; Durante2015 provide a comprehensive theoretical overview, while, for example, genest2007everything ; Salvadori2007 provide a good introduction for engineering purposes. In particular, guidelines for using copulas in maritime engineering are illustrated in salvadori2014practical ; salvadori2015practical .
Appendix C Statistical limiting wave steepness condition
Visual inspection of the data indicated that the limiting wave steepness is not constant, but varies with , as described in the main text in Section 3.3.1. To account for this behavior we fit the curve
to the data.
The scatter plot in Figure 8(b) shows a horizontal asymptote roughly below , while the observed is rapidly decreasing for small . This motivates the functional form of the curve: Suppose is a large value that cannot be attained by measurements of at the Europlatform. Then, can be interpreted as the value defining the horizontal asymptote, since , as . Finally, affects the slope of for smaller values of .
The procedure to fit the -curve was the following: First, we discretized the data into discrete bins. These were not equally spaced, but contained an equal number of data points (). Most bins cover a range in height of or . An exception is the widest bin, which spans from cm to cm. Next, we computed the maximum value of in each bin and associated it with the value of at the bin center. These data points are shown as orange circles in Figure 8(b). Finally, we estimated the coefficients , and using nonlinear least-squares. The resulting estimates are , cm (the upper bound was fixed at ) and cm. The coefficient of determination is and the root mean square error is .
According to this limiting steepness condition,
measurements are classified as anomalies, because they are too steep. These aremore than initially identified by visual inspection and amount to less than of the data. These data were classified as anomalies and substituted with missing values.
This work was supported by the European Community’s 7th Framework Programme through the grant to RISC-KIT (”Resilience-increasing Strategies for Coasts - Toolkit”), contract no. 603458, and by the Technical University of Munich through the Global Challenges for Women in Math Science award. We also gratefully acknowledge the valuable feedback of the anonymous reviewers.
- (1) A. van Dongeren, P. Ciavola, G. Martinez, C. Viavattene, T. Bogaard, R. Higgins, R. McCall, Introduction to RISC-KIT: Resilience-increasing strategies for coasts, Coastal Engineering (this issue).
A. Sebastian, E. Dupuits, O. Morales-Nápoles, Applying a bayesian network based on gaussian copulas to model the hydraulic boundary conditions for hurricane flood risk analysis in a coastal watershed, Coastal Engineering 125 (2017) 42–50.
- (3) S. Vitousek, P. L. Barnard, P. Limber, L. Erikson, B. Cole, A model integrating longshore and cross-shore processes for predicting long-term shoreline response to climate change, Journal of Geophysical Research: Earth Surface 122 (4) (2017) 782–806.
- (4) G. Leontaris, O. Morales-Nápoles, A. R. Wolfert, Probabilistic scheduling of offshore operations using copula based environmental time series–an application for cable installation management for offshore wind farms, Ocean Engineering 125 (2016) 328–341.
- (5) F. J. Méndez, M. Menéndez, A. Luceño, I. J. Losada, Estimation of the long-term variability of extreme significant wave height using a time-dependent peak over threshold (pot) model, Journal of Geophysical Research: Oceans 111 (C7).
- (6) F. J. Méndez, M. Menéndez, A. Luceño, I. J. Losada, Analyzing monthly extreme sea levels with a time-dependent gev model, Journal of Atmospheric and Oceanic Technology 24 (5) (2007) 894–911.
- (7) S. Solari, P. van Gelder, On the use of vector autoregressive (var) and regime switching var models for the simulation of sea and wind state parameters, Marine Technology and Engineering (2011) 217–230.
- (8) C. De Michele, G. Salvadori, G. Passoni, R. Vezzoli, A multivariate model of sea storms using copulas, Coastal Engineering 54 (10) (2007) 734–751.
- (9) D. Callaghan, P. Nielsen, A. Short, R. Ranasinghe, Statistical simulation of wave climate and extreme beach erosion, Coastal Engineering 55 (5) (2008) 375–390.
- (10) S. Corbella, D. D. Stretch, Simulating a multivariate sea storm using archimedean copulas, Coastal Engineering 76 (2013) 68–78.
- (11) F. Li, P. van Gelder, J. Vrijling, D. Callaghan, R. Jongejan, R. Ranasinghe, Probabilistic estimation of coastal dune erosion and recession by statistical simulation of storm events, Applied Ocean Research 47 (2014) 53–62.
- (12) T. Wahl, N. G. Plant, J. W. Long, Probabilistic assessment of erosion and flooding risk in the northern Gulf of Mexico, Journal of Geophysical Research: Oceans 121 (5) (2016) 3029–3043.
- (13) G. Davies, D. P. Callaghan, U. Gravois, W. Jiang, D. Hanslow, S. Nichol, T. Baldock, Improved treatment of non-stationary conditions and uncertainties in probabilistic models of storm wave climate, Coastal Engineering 127 (2017) 1–19.
- (14) M. Martín-Hidalgo, M. J. Martín-Soldevilla, V. Negro, P. Aberturas, J. López-Gutiérrez, Storm evolution characterization for analysing stone armour damage progression, Coastal Engineering 85 (2014) 1–11.
- (15) M. J. M. Soldevilla, M. Martín-Hidalgo, V. Negro, J. López-Gutiérrez, P. Aberturas, Improvement of theoretical storm characterization for different climate conditions, Coastal Engineering 96 (2015) 71–80.
- (16) Y. Guanche, R. Mínguez, F. Méndez, Climate-based monte carlo simulation of trivariate sea states, Coastal Engineering 80 (2013) 107–121.
- (17) G. Athanassoulis, C. N. Stefanakos, A nonstationary stochastic model for long-term time series of significant wave height, Journal of Geophysical Research: Oceans 100 (C8) (1995) 16149–16162.
C. Guedes Soares, A. Ferreira, Representation of non-stationary time series of significant wave height with autoregressive models, Probabilistic Engineering Mechanics 11 (3) (1996) 139–148.
- (19) C. Guedes Soares, A. Ferreira, C. Cunha, Linear models of the time series of significant wave height on the southwest coast of portugal, Coastal Engineering 29 (1-2) (1996) 149–167.
- (20) M. Scotto, C. G. Soares, Modelling the long-term time series of significant wave height with non-linear threshold models, Coastal Engineering 40 (4) (2000) 313–327.
- (21) C. N. Stefanakos, G. A. Athanassoulis, S. F. Barstow, Time series modeling of significant wave height in multiple scales, combining various sources of data, Journal of Geophysical Research: Oceans 111 (C10).
- (22) C. Guedes Soares, C. Cunha, Bivariate autoregressive models for the time series of significant wave height and mean period, Coastal Engineering 40 (4) (2000) 297–311.
- (23) Y. Cai, B. Gouldby, P. Hawkes, P. Dunning, Statistical simulation of flood variables: Incorporating short-term sequencing, Journal of Flood Risk Management 1 (1) (2008) 3–12.
- (24) W. Jäger, O. Morales Nápoles, A vine-copula model for time series of significant wave heights and mean periods in the North Sea, ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering.
- (25) S. Solari, M. Losada, Non-stationary wave height climate modeling and simulation, Journal of Geophysical Research: Oceans 116 (C9).
- (26) F. Li, P. van Gelder, R. Ranasinghe, D. Callaghan, R. Jongejan, Probabilistic modelling of extreme storms along the Dutch coast, Coastal Engineering 86 (2014) 1–13.
- (27) S. Corbella, D. D. Stretch, Predicting coastal erosion trends using non-stationary statistics and process-based models, Coastal Engineering 70 (2012) 40–49.
- (28) K. A. Serafin, P. Ruggiero, Simulating extreme total water levels using a time‐dependent, extreme value approach, Journal of Geophysical Research: Oceans 119 (9) (2014) 6305–6329.
- (29) G. Salvadori, C. De Michele, Statistical characterization of temporal structure of storms, Advances in Water Resources 29 (6) (2006) 827–842.
- (30) T. M. Erhardt, C. Czado, Standardized drought indices: a novel univariate and multivariate approach, Journal of the Royal Statistical Society: Series C (Applied Statistics).
- (31) E. Bevacqua, D. Maraun, I. H. Haff, M. Widmann, M. Vrac, Multivariate statistical modelling of compound events via pair-copula constructions: analysis of floods in ravenna (Italy), Hydrology and Earth System Sciences 21 (6) (2017) 2701.
- (32) A. F. Zuur, E. N. Ieno, N. J. Walker, A. A. Saveliev, G. M. Smith, Zero-truncated and zero-inflated models for count data, in: Mixed effects models and extensions in ecology with R, Springer, 2009, pp. 261–293.
- (33) U. Schepsmeier, J. Stoeber, E. C. Brechmann, B. Graeler, T. Nagler, T. Erhardt, C. Almeida, A. Min, C. Czado, M. Hofmann, et al., Package ‘vinecopula’.
- (34) E. Vanem, Joint statistical models for significant wave height and wave period in a changing climate, Marine Structures 49 (2016) 180–205.
- (35) Y. Zhang, C.-W. Kim, M. Beer, H. Dai, C. G. Soares, Modeling multivariate ocean data using asymmetric copulas, Coastal Engineering 135 (2018) 91–111.
- (36) W. H. Michel, Sea spectra revisited, Mar Technol 36 (4) (1999) 211–227.
- (37) L. H. Holthuijsen, Waves in oceanic and coastal waters, Cambridge University Press, 2010.
- (38) G. E. Box, G. M. Jenkins, G. C. Reinsel, G. M. Ljung, Time series analysis: forecasting and control, John Wiley & Sons, 2015.
- (39) G. E. Box, D. R. Cox, An analysis of transformations, Journal of the Royal Statistical Society. Series B (Methodological) (1964) 211–252.
- (40) C. Cunha, C. G. Soares, On the choice of data transformation for modelling time series of significant wave height, Ocean Engineering 26 (6) (1999) 489–506.
- (41) R. CoreTeam, R: A language and environment for statistical computing. vienna, austria: R foundation for statistical computing; 2015 (2015).
- (42) S. Quartel, B. Ruessink, A. Kroon, Daily to seasonal cross-shore behaviour of quasi-persistent intertidal beach morphology, Earth Surface Processes and Landforms 32 (9) (2007) 1293–1307.
- (43) Y. Goda, Random seas and design of maritime structures, Vol. 33, World Scientific Publishing Company, 2008.
- (44) C. Den Heijer, The role of bathymetry, wave obliquity and coastal curvature in dune erosion prediction, Doctoral dissertation, Delft University of Technology, Delft (2013).
- (45) A. F. Di Narzo, J. L. Aznarte, M. Stigler, Nonlinear Time Series Models with Regime Switching, R package version 0.9-48 (2018).
- (46) J. A. Sanchez-Espigares, A. Lopez-Moreno, Fitting Markov Switching Models Models, R package version 1.4 (2018).
- (47) P. J. Brockwell, R. A. Davis, Introduction to time series and forecasting, springer, 2016.
- (48) R. H. Shumway, D. S. Stoffer, Time series analysis and its applications: with R examples, Springer Science & Business Media, 2017.
- (49) M. Sklar, Fonctions de répartition à n dimensions et leurs marges, Publications de l’Institut Statistique de Université de Paris 8, 1959.
- (50) R. B. Nelsen, An introduction to copulas, Lecture Notes in Statistics. New York: Springer.
- (51) H. Joe, Dependence modeling with copulas, CRC Press, 2014.
- (52) F. Durante, C. Sempi, Principles of copula theory, Chapman and Hall/CRC, 2015.
- (53) C. Genest, A.-C. Favre, Everything you always wanted to know about copula modeling but were afraid to ask, Journal of Hydrologic Engineering 12 (4) (2007) 347–368.
- (54) G. Salvadori, C. De Michele, On the use of copulas in hydrology: theory and practice, Journal of Hydrologic Engineering 12 (4) (2007) 369–380.
G. Salvadori, G. Tomasicchio, F. D’Alessandro, Practical guidelines for multivariate analysis and design in coastal and off-shore engineering, Coastal Engineering 88 (2014) 1–14.
- (56) G. Salvadori, F. Durante, G. Tomasicchio, F. D’Alessandro, Practical guidelines for the multivariate assessment of the structural risk in coastal and off-shore engineering, Coastal Engineering 95 (2015) 77–83.