A Foreign Exchange (FX) option is a contract that gives the buyer the right, but not the obligation, to buy or sell a particular currency at a pre-specified rate (exercise price) at a pre-specified future date. Implied volatility (IV) is where the market’s expectation of future realised volatility is backed out of traded option prices. Models including Black & Scholes (1973), assume that IV is constant across all option contracts for the same underlying currency. However, constant IV is not what is observed empirically, and IV varies according to both option delta and time-to-expiry. This concept results in non-flat IV surfaces. Further, the shape of these IV surfaces change from day to day, with Deuskar et al. (2008); Konstantinidi et al. (2008) highlighting how important it is to model daily changes in IV surfaces. In order to model and predict these daily IV surface changes we propose a dynamic functional time-series approach, considering daily IV surfaces as the time-series of observations.
Theoretical general equilibrium models have been used by David & Veronesi (2000), Guidolin & Timmermann (2003), Garcia et al. (2003) and Bernales & Guidolin (2015), among others, to explain why we observe non-flat IV surfaces. This strand of literature links agent learning and uncertainty in options markets to macroeconomic and dividend fundamentals to explain IV surface shape. Latent factors are then said to drive changes in the shape of the IV surface over time. Bernales & Guidolin (2014, 2015) also conclude that IV predictability is driven by the learning behaviour of agents in options markets. They highlight that as options constitute forward-looking market information, agents’ expectations of future economic scenarios can be determined by interpreting time-to-expiry as investor forecast horizons. Goncalves & Guidolin (2006) find predictable patterns in how the S&P 500 IV surface changes over time, however they conclude that their results do not constitute first-order evidence against the efficient market hypothesis. Bernales & Guidolin (2014) conclude that if the efficient market hypothesis is imposed, the discovery of such predictable patterns could be associated with microstructural imperfections and time-varying risk premia. Following Bernales & Guidolin (2014), we hypothesise that isolated pockets of option market inefficiency exist, and that these inefficiencies could lead to the presence of predictable patterns for segments of the IV surface. We seek to identify these patterns using functional time-series techniques.
In our analysis, we consider IV smiles of a single maturity as a univariate functional time-series. Nevertheless, IV also varies considerably across time to maturity, as highlighted by Borovkova & Permana (2009), among others. Therefore, when forecasting the IV surface (essentially multiple IV smiles of differing maturities for the same underlying), it could be advantageous to incorporate information from other maturity series simultaneously. Our multivariate and multilevel functional time-series methods follow this principle, allowing us to extend the previously proposed univariate functional time-series method. Both multivariate and multilevel functional time-series methods capture correlation among IV smiles of differing maturities. We illustrate their use in producing forecasts of daily FX IV surfaces comprised of a number of maturities. To the best of our knowledge, no study to date evaluates and compares forecast errors associated with univariate, multivariate, and multilevel functional time-series methods. We attempt to fill this gap using both static and dynamic functional principal component approaches.
Functional time-series analysis has become increasingly popular in both implied and realised volatility modelling and forecasting. For instance, Benko et al. (2009) test stochastic similarities and differences between IV smiles of one- and three-month maturity equity options by conducting a functional principal component analysis. Further, Müller et al. (2011) utilise a functional model to characterise high frequency S&P 500 volatility trajectories. They find that when combined with forecasting techniques, their model accurately predicts future volatility levels. Kearney et al. (2015) use the smoothness in the IV smile to obtain a measure of IV steepness in oil options using functional data analysis. In a closely related study to ours, Kearney et al. (2018) model and forecast foreign exchange IV smile separately using static univariate functional time-series techniques, outperforming benchmark methods from Goncalves & Guidolin (2006) and Chalamandaris & Tsekrekos (2010, 2011).
Functional principal component analysis is the functional counterpart of traditional discrete principal component analysis. Functional principal component analysis reduces the dimensionality of the data to a small number of latent components, in order to effectively summarise the main features of the data. While discrete principal component analysis searches for an optimal set of orthonogal eigenseries that describe the highest level of variance in a discrete data set, functional principal component analysis seeks to identify eigenfunctions that best describe a set of continuous/functional observations. Advances of functional principal component analysis date back to the early 1940s whenKarhunen (1946) and Loève (1946) independently developed a theory on the optimal series expansion of a continuous stochastic process. Later, Rao (1958) and Tucker (1958) provided applications of the Karhunen-Loève expansion to functional data, by applying multivariate principal component analysis to observed function values. For additional methodological details of functional principal component analysis, we direct the interested reader to survey articles by Hall (2011); Shang (2014) and Wang et al. (2016).
Our paper extends Kearney et al. (2018)
through firstly utilising dynamic functional time-series specifications, to see if their ability to dynamically update their predictions can help improve forecast accuracy. This dynamic approach captures the inherent time-series relationship present between daily observations of IV surfaces. When there is moderate to high temporal dependence, such as between daily observations of IV surfaces, the extracted principal components obtained from the (static) functional principal component analysis are not consistent. This inconsistency leads to inefficient estimators. To overcome this issue, we consider a dynamic approach that extracts principal components based on the long-run covariance between daily observations of IV surfaces, instead of using variance alone. Long-run covariance and spectral density estimation enjoy a vast literature in the case of finite-dimensional time-series, beginning with the seminal work ofBartlett (1950) and Parzen (1957), and still the most commonly used techniques resort to smoothing the periodogram at frequency zero by employing a smoothing weight function and bandwidth parameter. Data-driven bandwidth selection methods have received a great deal of attention (see, e.g., Andrews (1991) for the univariate time-series context and Rice & Shang (2017) for the functional time-series context). Similar to a univariate time-series framework, the long-run covariance includes the variance function as a component, but it also measures temporal auto-covariance at different positive and negative lags. The estimation of the long-run covariance is the sum of the empirical auto-covariance functions and is often truncated at some finite lag in practice. As shown in Shang (2019) and Martínez-Hernández et al. (2020), a dynamic functional principal component analysis is a more suitable decomposition than a static functional principal component. Shang (2019) demonstrate this for mortality forecasting, while Martínez-Hernández et al. (2020) show this for yield curve forecasting. However, we highlight its use for forecasting FX IV surfaces.
Our second contribution is the use of the multivariate functional principal component regression (see, e.g., Shang & Yang, 2017) and multilevel functional principal component regression models (see, e.g., Shang et al., 2016) to model and forecast FX IV surfaces. These multivariate and multilevel approaches overcome the issue that two-dimensional univariate approaches have when modelling IV surfaces, where they proceed by essentially slicing the surface and modelling the smiles of different maturities separately. There are, however, intuitive and theoretical links between smiles of different maturities for the same IV surface, so we adopt the multivariate and multilevel extensions to explicitly incorporate information derived from all smiles simultaneously. Both multilevel and multivariate regression models can jointly model and forecast multiple functional time-series that are potentially correlated with each other. By taking into account this correlation between IV smiles of differing maturities, the point and interval forecast accuracies of the IV surfaces are improved.
The remainder of the paper is organised as follows. In Section 2, we introduce the FX options data set. In Section 3, we provide a background to the univariate, multivariate, and multilevel functional time-series methods, and present extensions based on the dynamic functional principal component analysis. In Section 4, we outline our forecast evaluation procedure and present several forecast error measures. In Section 5, we present comparisons of in-sample model fitting and out-of-sample forecast accuracy among the functional time-series approaches and traditional IV surface forecasting benchmarks. In our comparisons, we consider the benchmark methods of Chalamandaris & Tsekrekos (2011) and Goncalves & Guidolin (2006)
, as well as the random walk (RW) model and an autoregressive model of order 1 (AR(1)). Using the model confidence set procedure ofHansen et al. (2011), we determine a set of superior forecasting models based on the out-of-sample accuracy of predicting daily IV surfaces. Further, we present a stylised trading strategy based on the proposed dynamic functional time-series approach. As a robustness check, we also consider an SPX S&P 500 equity options dataset to further showcase the use of the functional time-series methods. We outline our conclusions in Section 6, along with some thoughts on how the methods presented here can be extended.
2 Data description
We mirror Kearney et al. (2018) in utilising aggregated market participant contributed IV quotes, thus circumventing the need to adopt a specific options pricing model or impose no-arbitrage assumptions or calibration methods to back out IV from option prices. The use of contributed IV quotes is a common convention in FX options markets, as unlike equity or commodity options, the contracts are most actively traded over-the-counter between market participants, as opposed to being listed on an exchange. We treat these retrieved IV quotes as economic variables of interest in their own right and model how they change over time. Our data set comprises at-the-money, risk reversal, and butterfly composition IV quotes for Euro/United States Dollar (EUR-USD), Euro/British Pound (EUR-GBP), and Euro/Japanese Yen (EUR-JPY) currency options obtained from Bloomberg. These four currencies account for almost 77% of total global foreign exchange market turnover according to the Bank of International Settlements’s (2016) report. The same four currencies are also considered by Chalamandaris & Tsekrekos (2011).
IV quotes associated with delta values of 5, 10, 15, 25, 35, 50, 65, 75, 85, 90 and 95 are available from Bloomberg. However, in line with Chalamandaris & Tsekrekos (2011) and Kearney et al. (2018), we restrict our forecast prediction to the segments of the surface with the greatest trading intensity; namely, delta values of 10, 25, 50, 75 and 90. Therefore, our models and forecasts focus on these specific delta values. In addition, for each FX rate, we observe three maturities; one month (1M), six months (6M) and two years (2Y). Our data set runs from January 1, 2008 to December 30, 2016.
Figure 1 presents rainbow plots of the EUR-USD IV smiles for the three maturities. The rainbow plot is a plot of all the IV data in our data set, with a rainbow colour palette used to indicate when the observations occur (see also Hyndman & Shang, 2010). Days in the distant past are shown in red, with the most recent days presented in violet. We observe that as the maturity period increases from 1M to 2Y, the spread of IV decreases.
Figure 2 presents plots of implied volatilities (IVs) averaged across delta values. They show that the 1M-averaged IVs reached a peak on October 27, 2008 and a trough on August 1, 2014. Both the 6M- and 2Y-averaged IVs reached their peaks on December 17, 2008 and troughs on July 3, 2014.
3.1 Univariate functional time-series method
3.1.1 Functional principal component analysis
be an arbitrary functional time-series defined on a common probability space. It is assumed that the observations are elements of the Hilbert space equipped with the inner product , where represents a continuum and denotes a function support range (i.e., range of delta values), and is the real line. Each function is a square-integrable function satisfying and associated norms. The notation is used to indicate for some . When , has the mean curve ; when , a non-negative definite covariance function is given by
for all . The covariance function in (1) allows the covariance operator of , denoted by , to be defined as
Via Mercer’s lemma, there exists an orthonormal sequence of continuous functions in and a non-increasing sequence of positive numbers, such that
By the separability of Hilbert spaces, the Karhunen-Loève expansion of a stochastic process can be expressed as
where principal component scores are given by the projection of in the direction of the eigenfunction , i.e., ; denotes the model truncation error function with a mean of zero and a finite variance; and is the number of retained components. Equation (2) facilitates dimension reduction as the first terms often provide a good approximation to the infinite sums, and thus the information inherited in can be adequately summarised by the
. These scores constitute an uncorrelated sequence of random variables with zero mean and variance, and they can be interpreted as the weights of the contribution of the functional principal components to .
The estimation accuracy of the functional time-series method crucially relies on the optimal selection of
. There are at least five conventional approaches; namely, eigenvalue ratio tests(Ahn & Horenstein, 2013), cross-validation (Ramsay & Silverman, 2005), Akaike’s information criterion (Akaike, 1974), the bootstrap method (Hall & Vial, 2006) and explained variance (Chiou, 2012). Here, the value of is determined as the minimum that reaches a certain level of the cumulative percentage of variance (CPV) explained by the leading components such that
where , is to exclude possible zero eigenvalues, and represents the binary indicator function. As a sensitivity test, we also consider the number of retained components , since there are 5 discrete delta values.
3.1.2 Long-run covariance and its estimation
While Section 3.1.1 presents a method based on the functional principal component analysis extracted from the variance function, these static functional principal components are not consistent in the presence of moderate to strong temporal dependence. This issue motivates the development of dynamic functional principal component analysis, such as in Hörmann et al. (2015) and Rice & Shang (2017), where functional principal components are extracted from the long-run covariance function.
We implement a stationarity test of Horvàth et al. (2014), and from the -values reported in Table 3 in Section 5.2, we conclude that all series are stationary. For a stationary functional time-series , the long-run covariance function is defined as
and is a well-defined element of for a compact support interval
, under mild dependence and moment conditions. By assumingis a continuous and square-integrable covariance function, the function induces the kernel operator . Through right integration, defines a Hilbert-Schmidt integral operator on given by
whose eigenvalues and eigenfunctions are related to the dynamic functional principal components defined in Hörmann et al. (2015). Hörmann et al. (2015) provide asymptotically optimal finite-dimensional representations of the sample mean of a functional time-series.
In practice, we need to estimate from a finite sample . Given its definition as a bi-infinite sum, a natural estimator of is
where is the bandwidth parameter, and
is an estimator of . is a symmetric weight function with bounded support of order , and the following properties:
if for some
is continuous on
exists and satisfies
The kernel sandwich estimator in (3) is discussed in Horvàth et al. (2013), Panaretos & Tavakoli (2013), Rice & Shang (2017), and Martínez-Hernández et al. (2020), among others. As with the kernel sandwich estimator, a crucial consideration here is the estimation of the bandwidth parameter . One approach is to use a data-driven approach, such as the plug-in algorithm proposed in Rice & Shang (2017). Rice & Shang (2017) prove that the estimator in (3) is a consistent estimator of the true and unknown long-run covariance. They also find that the estimated dynamic functional principal components and associated scores extracted from the estimated long-run covariance are consistent too.
3.1.3 Dynamic functional principal component analysis
Using the estimated long-run covariance function , we apply functional principal component decomposition to extract the dynamic functional principal components and their associated scores. The sample mean and sample covariance are given by
where are the sample eigenvalues of , and are the corresponding orthogonal sample eigenfunctions. Via the Karhunen-Loève expansion, the realisations of the stochastic process can be written as
where is the th estimated principal component score for the th observation.
Dynamic functional principal component analysis has three properties:
It minimizes the mean integrated squared error of the reconstruction error function over the whole functional data set.
It provides an effective way of extracting a large amount of long-run covariance.
Its dynamic principal component scores are uncorrelated.
3.1.4 Univariate functional time-series forecasting method
We obtain an IV smile of a specific maturity by interpolating the moneyness level (in terms of delta) of the option contract. We extract estimated functional principal component functions, using the empirical covariance function. Conditioning on the past functions and the estimated functional principal components , the -step-ahead point forecast of can be expressed as
where denotes time-series forecasts of the principal component scores. The forecasts of these scores can be obtained via a univariate time-series forecasting method, such as the autoregressive integrated moving average (ARIMA) model (see Ord & Fildes, 2013, for a review). We use the automatic algorithm of Hyndman & Khandakar (2008) to choose the optimal orders of , , and . In the supplement, we also present the results using exponential smoothing, which is a generalisation of the ARIMA model (see, e.g., Kearney & Shang, 2020). The forecast accuracy results are qualitatively similar between the two univariate time-series methods.
3.2 Multivariate functional time-series method
Section 3.1 introduces a method for modelling and forecasting one functional time-series or multiple functional time-series individually. However, explicitly modelling the correlation between multiple time-series may improve forecast accuracy. Therefore, we introduce a multivariate functional time-series method to jointly model and forecast multiple functional time-series that could be correlated with each other.
Let be the IV smile for 1M-maturity options, the IV smile for 6M-maturity options, and the IV smile for 2Y-maturity options. As our multiple functional time-series have the same function support, we consider data where each observation consists of functions, i.e., , where and in our data set.
The multivariate functional time-series are stacked in a vector with . We assume that . For , the theoretical cross-covariance function can be defined with elements
where and denote the mean functions for the and series, respectively. By assuming that is a continuous and square-integrable covariance function, the function induces the kernel operator given by
Via Mercer’s lemma, there exists an orthonormal sequence of continuous functions in and a non-increasing sequence of positive numbers, such that
By the separability of Hilbert space, the Karhunen-Loève expansion of a stochastic process can be expressed as
where denotes the number of retained functional principal components. Its matrix formulation is
where and denote stacked historical functions. Moreover, is the vector of the basis expansion coefficients, , and
Conditioning on the past functions and the estimated functional principal components , the -step-ahead point forecast of can be expressed as
where denotes the time-series forecasts of the principal component scores corresponding to the three maturities. In our implementation, each series is standardised to have equal scale before being combined into a long vector (see, e.g., Chiou et al., 2014).
3.3 Multilevel functional time-series method
The multilevel functional data model bears a strong resemblance to a two-way functional analysis of variance model studied by Morris et al. (2003) and Cuesta-Albertos & Febrero-Bande (2010). It is a special case of the general “functional mixed model” proposed in Morris & Carroll (2006) and applied in Shang (2016) and Shang et al. (2016). As the IV surface comprises distinct IV smiles for multiple maturities of the same currency pair, the basic idea is to decompose the related smiles into an average IV smile across the entire surface , a maturity-specific deviation from the averaged IV smile , a common trend across maturities , and a maturity-specific residual trend
. The common trend and maturity-specific residual trend are modelled by projecting them onto the eigenvectors of covariance functions of the aggregate and maturity-specific centred stochastic processes, respectively. The curve at observationcan be written as
where corresponds to the 1-month, 6-month and 2-year maturities, respectively. To ensure model and parameter identifiability, the observations of the two stochastic processes and are uncorrelated since we implement two-stage functional principal component decomposition.
Since the centred stochastic processes and are unknown in practice, the population eigenvalues and eigenfunctions can only be approximated at best through a set of realisations and . From the covariance of , we can extract a set of functional principal components and the associated scores, along with a set of residual functions. From the covariance function of the residual functions, we can then extract a second set of functional principal components and their associated scores. While the first functional principal component decomposition captures the common trend shared across various maturities of the same currency, the second functional principal component decomposition captures the maturity-specific residual trend.
The sample versions of the aggregate mean function, maturity-specific mean function deviation, a common trend, and maturity-specific residual trend for a set of functional data can be estimated by
where represents a time-series of functions for the currency exchange averaged across different maturities; represents the simple average of the averaged currency exchange, whereas represents the simple average of the currency exchange at the maturity; represents the sample principal component scores of ; and denotes the corresponding orthogonal sample eigenfunctions in a square-integrable function space.
where denotes measurement error with a finite variance .
To select the optimal number of components, we use a CPV criterion to determine and . The optimal numbers of and are determined by
where denotes a binary indicator function. Following Chiou (2012), we chose .
From the estimated eigenvalues, we can estimate the proportion of variability explained by aggregated data (also known as the variance explained by the within-cluster variability, see for example, Di et al., 2009). A possible measure of within-cluster variability is given by
where denotes the population variance, which can be approximated by the estimated and retained eigenvalues. When the common trend can explain the primary mode of total variability, the value of the within-cluster variability is close to 1.
Conditioning on observed data and basis functions and , the -step-ahead forecasts can be obtained by
where and are the forecasted principal component scores, obtained from a univariate time-series forecasting method, such as an ARIMA model.
While Sections 3.2 and 3.3 present static multivariate and multilevel functional time-series methods, dynamic counterparts can be produced using dynamic functional principal components and their associated scores. These quantities are constructed based on an estimated long-run covariance function in (3).
4 Forecast evaluation
4.1 Expanding-window approach
An expanding-window analysis of time-series models is commonly used to assess model and parameter stability over time. It determines the constancy of a model’s parameters by computing parameter estimates, and their forecasts over an expanding window of a fixed size through the sample (see Zivot & Wang, 2006, Chapter 9 for details). Using the first 1,827 observations from January 1, 2008 to December 31, 2014 for each currency pair, we produce one-step-ahead point forecasts. Through an expanding window approach, we re-estimate the parameters in all models using the first 1,828 observations from January 1, 2008 to January 1, 2015. One-step-ahead forecasts are then produced for each model. We iterate this process by increasing the sample size by one trading day until we reach the end of the data period. This process produces 522 one-step-ahead forecasts from January 1, 2015 to December 30, 2016. We compare these forecasts with the holdout samples to determine out-of-sample point forecast accuracy. This evaluation is in line with Chalamandaris & Tsekrekos (2010), who adopt a recursive one-day-ahead strategy scheme. Also, we consider 518 five-step-ahead forecasts and 513 ten-step-ahead forecasts to assess forecast accuracy at comparably longer horizons. We choose to expand the training set and re-estimate all the models daily to incorporate all available information into our prediction. This approach more accurately simulates the action likely to be taken by a market practitioner who aims to predict the following day’s movement.
To control for sensitivity to specific out-of-sample periods, various window lengths are tested: In the case of one-step-ahead prediction, we consider 522 days (out-of-sample: January 1, 2015 to December 30, 2016), 261 days (out-of-sample: January 1, 2016 to December 30, 2016), 131 days (out-of-sample: July 1, 2016 to December 30, 2016). The out-of-sample forecasts between the end of the in-sample period and December 30, 2016 are obtained using an expanding-window scheme.
4.2 Forecast error measures
A number of criteria exist for evaluating point forecast accuracy (see Armstrong & Collopy, 1992; Hyndman & Koehler, 2006). They can be grouped into four categories: (1) scale-dependent metrics, (2) percentage-error metrics, (3) relative-error metrics, which average the ratios of the errors from a designated method to the errors of a naïve method, and (4) scale-free error metrics, which express each error as a ratio to the average error from a baseline method. Here, we use the following three accuracy measures:
Mean absolute forecast error (MAFE); a measure of the absolute difference between the forecast, , and the corresponding observation, . It measures the average error magnitude in the forecasts, regardless of error direction and serves to aggregate the errors into a single measure of predictive power. The MAFE can be expressed as
where denotes the number of discretised points in a curve and denotes the number of observations in the forecasting period; denotes the holdout sample for the delta and curve in the forecasting period; and denotes the point forecast of the holdout sample.
Mean squared forecast error (MSFE); a measure of the squared difference between the predicted and realised values, which again serves to aggregate the errors into a single measure of predictive power. The MSFE can be expressed as
Mean mixed error (MME); an asymmetric loss function. MME(U) penalises under-predictions more heavily, while MME(O) penalises over-predictions more heavily. This feature is crucial for investors in options markets, as an under (over)-prediction of IV is more likely to be of greater concern to a seller (buyer) than a buyer (seller). This measure has previously been used in studies that evaluate volatility forecasting techniques, such asBrailsford & Faff (1996), Fuertes et al. (2009), and Kearney et al. (2018). With a slight abuse of notation , the MME(U) and MME(O) are given as
where denotes the number of under-predictions and denotes the number of over-predictions. represents the indices of the over-predictions, and represents the indices of the under-predictions.
5 Data analysis results
This section presents the results of modelling IV using the three functional time-series models for the entire data sample from January 1, 2008 to December 30, 2016. Using the entire EUR-USD dataset, we present the first two estimated static and dynamic principal components and their associated scores in Figure 3. While the static principal components are extracted based on sample variance, the dynamic principal components are extracted based on sample long-run covariance. When a time-series has moderate to strong temporal dependence, the extracted static and dynamic principal components are different, as shown in Figure 3. Due to the variation in the extracted principal components, the associated scores for both approaches also exhibit visible differences.
In Section 5.1, the models are fitted in-sample to ascertain goodness-of-fit when seeking to capture the empirical dynamics of the IV surfaces for each currency pair during the entire sample period. In Section 5.2, we evaluate and compare out-of-sample forecast accuracy among the functional time-series methods. In Section 5.3, we examine statistical significance based on point forecast accuracy.
5.1 In-sample model fitting
Three functional time-series models are fitted to the underlying IV data, for each day over the full sample of January 1, 2008 to December 30, 2016. There are various ways of assessing the fit of a linear model, and we initially consider a criterion introduced by Ramsay & Silverman (2005, Section 16.3). The approach borrowed from conventional linear models is to consider the squared correlation function:
In Figure 4, we present over the five delta values, for the three maturities and three currencies.
Total is useful if one requires a single numerical measure of fit. It can be expressed as
The total for the three maturities for each currency pair are listed in Table 1. Kearney et al. (2018) find that the univariate functional time-series model provides the best goodness-of-fit in comparison to the benchmark methods of Goncalves & Guidolin (2006) and Chalamandaris & Tsekrekos (2010, 2011). Here, we improve on Kearney et al. (2018) by showing that the goodness-of-fit of the univariate functional time-series model they propose is outperformed by our multivariate and multilevel functional time-series models, as measured by total . We also observe that the optimal values are often associated with the dynamic methods.
A key measure of goodness-of-fit for the multilevel functional time-series method is the within-cluster variability defined in (10). When the common factor explains the primary mode of total variability, the value of within-cluster variability is close to 1. We calculate this within-cluster variability for each currency pair and maturity period in Table 2.
5.2 Out-of-sample forecast evaluation
It can be seen from the previous section that, the multivariate functional time-series and multilevel functional time-series models provide better in-sample fit when seeking to capture shape of the IV surface. We now turn our attention to out-of-sample forecasting.
5.2.1 Stationarity test of functional time-series
Since the computation of long-run covariance is only meaningful if the functional time-series is stationary, we implement the stationarity test of Horvàth et al. (2014) and its corresponding R function T_stationary in the ftsa package (Hyndman & Shang, 2020). Based on the -values reported in Table 3, we conclude that all the functional time-series we consider are stationary.
With the null hypothesis being that of stationarity, this table presents the probabilities of rejecting the null hypothesis (-values) for each currency and maturity using the stationarity test of Horvàth et al. (2014).
A summary of the out-of-sample forecast measures calculated under a recursive parameter estimation scheme and 522-day out-of-sample window length for EUR-USD are presented in Table 4. We consider the Chalamandaris & Tsekrekos’s (2011) method, the Goncalves & Guidolin’s (2006) method, RW and AR(1) as benchmakr methods. We use a generalised least squares (GLS) procedure to produce these Goncalves & Guidolin (2006) and Chalamandaris & Tsekrekos (2011) benchmarks, in line with Bernales & Guidolin (2014). GLS is used to control for the option measurement errors identified by Hentschel (2003). We direct the interested reader to Appendix B of Goncalves & Guidolin (2006) for the technical detail of applying the GLS method suggested by Hentschel (2003) in the context of deterministic IV surface models.
In Table 5, we present a comparison of out-of-sample accuracy for forecasting EUR-USD IV for the five delta values considered. The errors are aggregated over three maturities of one month, six months and two years. In the supplement, we also present the results using exponential smoothing, which is a generalisation of the ARIMA model (see, e.g., Kearney & Shang, 2020).