Dynamic functional time-series forecasts of foreign exchange implied volatility surfaces

This paper presents static and dynamic versions of univariate, multivariate, and multilevel functional time-series methods to forecast implied volatility surfaces in foreign exchange markets. We find that dynamic functional principal component analysis generally improves out-of-sample forecast accuracy. More specifically, the dynamic univariate functional time-series method shows the greatest improvement. Our models lead to multiple instances of statistically significant improvements in forecast accuracy for daily EUR-USD, EUR-GBP, and EUR-JPY implied volatility surfaces across various maturities, when benchmarked against established methods. A stylised trading strategy is also employed to demonstrate the potential economic benefits of our proposed approach.



There are no comments yet.


page 6


Dynamic principal component regression for forecasting functional time series in a group structure

When generating social policies and pricing annuity at national and subn...

A robust functional time series forecasting method

Univariate time series often take the form of a collection of curves obs...

Implied volatility surface predictability: the case of commodity markets

Recent literature seek to forecast implied volatility derived from equit...

Intraday forecasts of a volatility index: Functional time series methods with dynamic updating

As a forward-looking measure of future equity market volatility, the VIX...

Bayesian Testing Of Granger Causality In Functional Time Series

We develop a multivariate functional autoregressive model (MFAR), which ...

Multivariate Pair Trading by Volatility Model Adaption Trade-off

Pair trading is one of the most discussed topics among financial researc...

Spatial Correlation in Weather Forecast Accuracy: A Functional Time Series Approach

A functional time series approach is proposed for investigating spatial ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A Foreign Exchange (FX) option is a contract that gives the buyer the right, but not the obligation, to buy or sell a particular currency at a pre-specified rate (exercise price) at a pre-specified future date. Implied volatility (IV) is where the market’s expectation of future realised volatility is backed out of traded option prices. Models including Black & Scholes (1973), assume that IV is constant across all option contracts for the same underlying currency. However, constant IV is not what is observed empirically, and IV varies according to both option delta and time-to-expiry. This concept results in non-flat IV surfaces. Further, the shape of these IV surfaces change from day to day, with Deuskar et al. (2008); Konstantinidi et al. (2008) highlighting how important it is to model daily changes in IV surfaces. In order to model and predict these daily IV surface changes we propose a dynamic functional time-series approach, considering daily IV surfaces as the time-series of observations.

Theoretical general equilibrium models have been used by David & Veronesi (2000), Guidolin & Timmermann (2003), Garcia et al. (2003) and Bernales & Guidolin (2015), among others, to explain why we observe non-flat IV surfaces. This strand of literature links agent learning and uncertainty in options markets to macroeconomic and dividend fundamentals to explain IV surface shape. Latent factors are then said to drive changes in the shape of the IV surface over time. Bernales & Guidolin (2014, 2015) also conclude that IV predictability is driven by the learning behaviour of agents in options markets. They highlight that as options constitute forward-looking market information, agents’ expectations of future economic scenarios can be determined by interpreting time-to-expiry as investor forecast horizons. Goncalves & Guidolin (2006) find predictable patterns in how the S&P 500 IV surface changes over time, however they conclude that their results do not constitute first-order evidence against the efficient market hypothesis. Bernales & Guidolin (2014) conclude that if the efficient market hypothesis is imposed, the discovery of such predictable patterns could be associated with microstructural imperfections and time-varying risk premia. Following Bernales & Guidolin (2014), we hypothesise that isolated pockets of option market inefficiency exist, and that these inefficiencies could lead to the presence of predictable patterns for segments of the IV surface. We seek to identify these patterns using functional time-series techniques.

In our analysis, we consider IV smiles of a single maturity as a univariate functional time-series. Nevertheless, IV also varies considerably across time to maturity, as highlighted by Borovkova & Permana (2009), among others. Therefore, when forecasting the IV surface (essentially multiple IV smiles of differing maturities for the same underlying), it could be advantageous to incorporate information from other maturity series simultaneously. Our multivariate and multilevel functional time-series methods follow this principle, allowing us to extend the previously proposed univariate functional time-series method. Both multivariate and multilevel functional time-series methods capture correlation among IV smiles of differing maturities. We illustrate their use in producing forecasts of daily FX IV surfaces comprised of a number of maturities. To the best of our knowledge, no study to date evaluates and compares forecast errors associated with univariate, multivariate, and multilevel functional time-series methods. We attempt to fill this gap using both static and dynamic functional principal component approaches.

Functional time-series analysis has become increasingly popular in both implied and realised volatility modelling and forecasting. For instance, Benko et al. (2009) test stochastic similarities and differences between IV smiles of one- and three-month maturity equity options by conducting a functional principal component analysis. Further, Müller et al. (2011) utilise a functional model to characterise high frequency S&P 500 volatility trajectories. They find that when combined with forecasting techniques, their model accurately predicts future volatility levels. Kearney et al. (2015) use the smoothness in the IV smile to obtain a measure of IV steepness in oil options using functional data analysis. In a closely related study to ours, Kearney et al. (2018) model and forecast foreign exchange IV smile separately using static univariate functional time-series techniques, outperforming benchmark methods from Goncalves & Guidolin (2006) and Chalamandaris & Tsekrekos (2010, 2011).

Functional principal component analysis is the functional counterpart of traditional discrete principal component analysis. Functional principal component analysis reduces the dimensionality of the data to a small number of latent components, in order to effectively summarise the main features of the data. While discrete principal component analysis searches for an optimal set of orthonogal eigenseries that describe the highest level of variance in a discrete data set, functional principal component analysis seeks to identify eigenfunctions that best describe a set of continuous/functional observations. Advances of functional principal component analysis date back to the early 1940s when

Karhunen (1946) and Loève (1946) independently developed a theory on the optimal series expansion of a continuous stochastic process. Later, Rao (1958) and Tucker (1958) provided applications of the Karhunen-Loève expansion to functional data, by applying multivariate principal component analysis to observed function values. For additional methodological details of functional principal component analysis, we direct the interested reader to survey articles by Hall (2011); Shang (2014) and Wang et al. (2016).

Our paper extends Kearney et al. (2018)

through firstly utilising dynamic functional time-series specifications, to see if their ability to dynamically update their predictions can help improve forecast accuracy. This dynamic approach captures the inherent time-series relationship present between daily observations of IV surfaces. When there is moderate to high temporal dependence, such as between daily observations of IV surfaces, the extracted principal components obtained from the (static) functional principal component analysis are not consistent. This inconsistency leads to inefficient estimators. To overcome this issue, we consider a dynamic approach that extracts principal components based on the long-run covariance between daily observations of IV surfaces, instead of using variance alone. Long-run covariance and spectral density estimation enjoy a vast literature in the case of finite-dimensional time-series, beginning with the seminal work of

Bartlett (1950) and Parzen (1957), and still the most commonly used techniques resort to smoothing the periodogram at frequency zero by employing a smoothing weight function and bandwidth parameter. Data-driven bandwidth selection methods have received a great deal of attention (see, e.g., Andrews (1991) for the univariate time-series context and Rice & Shang (2017) for the functional time-series context). Similar to a univariate time-series framework, the long-run covariance includes the variance function as a component, but it also measures temporal auto-covariance at different positive and negative lags. The estimation of the long-run covariance is the sum of the empirical auto-covariance functions and is often truncated at some finite lag in practice. As shown in Shang (2019) and Martínez-Hernández et al. (2020), a dynamic functional principal component analysis is a more suitable decomposition than a static functional principal component. Shang (2019) demonstrate this for mortality forecasting, while Martínez-Hernández et al. (2020) show this for yield curve forecasting. However, we highlight its use for forecasting FX IV surfaces.

Our second contribution is the use of the multivariate functional principal component regression (see, e.g., Shang & Yang, 2017) and multilevel functional principal component regression models (see, e.g., Shang et al., 2016) to model and forecast FX IV surfaces. These multivariate and multilevel approaches overcome the issue that two-dimensional univariate approaches have when modelling IV surfaces, where they proceed by essentially slicing the surface and modelling the smiles of different maturities separately. There are, however, intuitive and theoretical links between smiles of different maturities for the same IV surface, so we adopt the multivariate and multilevel extensions to explicitly incorporate information derived from all smiles simultaneously. Both multilevel and multivariate regression models can jointly model and forecast multiple functional time-series that are potentially correlated with each other. By taking into account this correlation between IV smiles of differing maturities, the point and interval forecast accuracies of the IV surfaces are improved.

The remainder of the paper is organised as follows. In Section 2, we introduce the FX options data set. In Section 3, we provide a background to the univariate, multivariate, and multilevel functional time-series methods, and present extensions based on the dynamic functional principal component analysis. In Section 4, we outline our forecast evaluation procedure and present several forecast error measures. In Section 5, we present comparisons of in-sample model fitting and out-of-sample forecast accuracy among the functional time-series approaches and traditional IV surface forecasting benchmarks. In our comparisons, we consider the benchmark methods of Chalamandaris & Tsekrekos (2011) and Goncalves & Guidolin (2006)

, as well as the random walk (RW) model and an autoregressive model of order 1 (AR(1)). Using the model confidence set procedure of

Hansen et al. (2011), we determine a set of superior forecasting models based on the out-of-sample accuracy of predicting daily IV surfaces. Further, we present a stylised trading strategy based on the proposed dynamic functional time-series approach. As a robustness check, we also consider an SPX S&P 500 equity options dataset to further showcase the use of the functional time-series methods. We outline our conclusions in Section 6, along with some thoughts on how the methods presented here can be extended.

2 Data description

We mirror Kearney et al. (2018) in utilising aggregated market participant contributed IV quotes, thus circumventing the need to adopt a specific options pricing model or impose no-arbitrage assumptions or calibration methods to back out IV from option prices. The use of contributed IV quotes is a common convention in FX options markets, as unlike equity or commodity options, the contracts are most actively traded over-the-counter between market participants, as opposed to being listed on an exchange. We treat these retrieved IV quotes as economic variables of interest in their own right and model how they change over time. Our data set comprises at-the-money, risk reversal, and butterfly composition IV quotes for Euro/United States Dollar (EUR-USD), Euro/British Pound (EUR-GBP), and Euro/Japanese Yen (EUR-JPY) currency options obtained from Bloomberg. These four currencies account for almost 77% of total global foreign exchange market turnover according to the Bank of International Settlements’s (2016) report. The same four currencies are also considered by Chalamandaris & Tsekrekos (2011).

IV quotes associated with delta values of 5, 10, 15, 25, 35, 50, 65, 75, 85, 90 and 95 are available from Bloomberg. However, in line with Chalamandaris & Tsekrekos (2011) and Kearney et al. (2018), we restrict our forecast prediction to the segments of the surface with the greatest trading intensity; namely, delta values of 10, 25, 50, 75 and 90. Therefore, our models and forecasts focus on these specific delta values. In addition, for each FX rate, we observe three maturities; one month (1M), six months (6M) and two years (2Y). Our data set runs from January 1, 2008 to December 30, 2016.

Figure 1 presents rainbow plots of the EUR-USD IV smiles for the three maturities. The rainbow plot is a plot of all the IV data in our data set, with a rainbow colour palette used to indicate when the observations occur (see also Hyndman & Shang, 2010). Days in the distant past are shown in red, with the most recent days presented in violet. We observe that as the maturity period increases from 1M to 2Y, the spread of IV decreases.

Figure 1: This figure presents rainbow plots of the EUR-USD IV smile for the 1-month, 6-month and 2-year maturities from January 1, 2008 to December 30, 2016. The daily smile observations are ordered chronologically according to the colours of the rainbow. The oldest curves are shown in red, with the most recent curves presented in violet.

Figure 2 presents plots of implied volatilities (IVs) averaged across delta values. They show that the 1M-averaged IVs reached a peak on October 27, 2008 and a trough on August 1, 2014. Both the 6M- and 2Y-averaged IVs reached their peaks on December 17, 2008 and troughs on July 3, 2014.

Figure 2: This figure presents univariate time-series plots of the EUR-USD IV for the 1-month, 6-month and 2-year maturities averaged over the five delta values of 10, 25, 50, 75 and 90, from January 1, 2008 to December 30, 2016.

3 Methodology

3.1 Univariate functional time-series method

3.1.1 Functional principal component analysis


be an arbitrary functional time-series defined on a common probability space

. It is assumed that the observations are elements of the Hilbert space equipped with the inner product , where represents a continuum and denotes a function support range (i.e., range of delta values), and is the real line. Each function is a square-integrable function satisfying and associated norms. The notation is used to indicate for some . When , has the mean curve ; when , a non-negative definite covariance function is given by


for all . The covariance function in (1) allows the covariance operator of , denoted by , to be defined as

Via Mercer’s lemma, there exists an orthonormal sequence of continuous functions in and a non-increasing sequence of positive numbers, such that

By the separability of Hilbert spaces, the Karhunen-Loève expansion of a stochastic process can be expressed as


where principal component scores are given by the projection of in the direction of the eigenfunction , i.e., ; denotes the model truncation error function with a mean of zero and a finite variance; and is the number of retained components. Equation (2) facilitates dimension reduction as the first terms often provide a good approximation to the infinite sums, and thus the information inherited in can be adequately summarised by the

-dimensional vector

. These scores constitute an uncorrelated sequence of random variables with zero mean and variance

, and they can be interpreted as the weights of the contribution of the functional principal components to .

The estimation accuracy of the functional time-series method crucially relies on the optimal selection of

. There are at least five conventional approaches; namely, eigenvalue ratio tests

(Ahn & Horenstein, 2013), cross-validation (Ramsay & Silverman, 2005), Akaike’s information criterion (Akaike, 1974), the bootstrap method (Hall & Vial, 2006) and explained variance (Chiou, 2012). Here, the value of is determined as the minimum that reaches a certain level of the cumulative percentage of variance (CPV) explained by the leading components such that

where , is to exclude possible zero eigenvalues, and represents the binary indicator function. As a sensitivity test, we also consider the number of retained components , since there are 5 discrete delta values.

3.1.2 Long-run covariance and its estimation

While Section 3.1.1 presents a method based on the functional principal component analysis extracted from the variance function, these static functional principal components are not consistent in the presence of moderate to strong temporal dependence. This issue motivates the development of dynamic functional principal component analysis, such as in Hörmann et al. (2015) and Rice & Shang (2017), where functional principal components are extracted from the long-run covariance function.

We implement a stationarity test of Horvàth et al. (2014), and from the -values reported in Table 3 in Section 5.2, we conclude that all series are stationary. For a stationary functional time-series , the long-run covariance function is defined as

and is a well-defined element of for a compact support interval

, under mild dependence and moment conditions. By assuming

is a continuous and square-integrable covariance function, the function induces the kernel operator . Through right integration, defines a Hilbert-Schmidt integral operator on given by

whose eigenvalues and eigenfunctions are related to the dynamic functional principal components defined in Hörmann et al. (2015). Hörmann et al. (2015) provide asymptotically optimal finite-dimensional representations of the sample mean of a functional time-series.

In practice, we need to estimate from a finite sample . Given its definition as a bi-infinite sum, a natural estimator of is


where is the bandwidth parameter, and

is an estimator of . is a symmetric weight function with bounded support of order , and the following properties:

  1. if for some

  2. is continuous on

  3. exists and satisfies

The kernel sandwich estimator in (3) is discussed in Horvàth et al. (2013), Panaretos & Tavakoli (2013), Rice & Shang (2017), and Martínez-Hernández et al. (2020), among others. As with the kernel sandwich estimator, a crucial consideration here is the estimation of the bandwidth parameter . One approach is to use a data-driven approach, such as the plug-in algorithm proposed in Rice & Shang (2017). Rice & Shang (2017) prove that the estimator in (3) is a consistent estimator of the true and unknown long-run covariance. They also find that the estimated dynamic functional principal components and associated scores extracted from the estimated long-run covariance are consistent too.

3.1.3 Dynamic functional principal component analysis

Using the estimated long-run covariance function , we apply functional principal component decomposition to extract the dynamic functional principal components and their associated scores. The sample mean and sample covariance are given by

where are the sample eigenvalues of , and are the corresponding orthogonal sample eigenfunctions. Via the Karhunen-Loève expansion, the realisations of the stochastic process can be written as

where is the th estimated principal component score for the th observation.

Dynamic functional principal component analysis has three properties:

  1. It minimizes the mean integrated squared error of the reconstruction error function over the whole functional data set.

  2. It provides an effective way of extracting a large amount of long-run covariance.

  3. Its dynamic principal component scores are uncorrelated.

3.1.4 Univariate functional time-series forecasting method

We obtain an IV smile of a specific maturity by interpolating the moneyness level (in terms of delta) of the option contract. We extract estimated functional principal component functions

, using the empirical covariance function. Conditioning on the past functions and the estimated functional principal components , the -step-ahead point forecast of can be expressed as

where denotes time-series forecasts of the principal component scores. The forecasts of these scores can be obtained via a univariate time-series forecasting method, such as the autoregressive integrated moving average (ARIMA) model (see Ord & Fildes, 2013, for a review). We use the automatic algorithm of Hyndman & Khandakar (2008) to choose the optimal orders of , , and . In the supplement, we also present the results using exponential smoothing, which is a generalisation of the ARIMA model (see, e.g., Kearney & Shang, 2020). The forecast accuracy results are qualitatively similar between the two univariate time-series methods.

3.2 Multivariate functional time-series method

Section 3.1 introduces a method for modelling and forecasting one functional time-series or multiple functional time-series individually. However, explicitly modelling the correlation between multiple time-series may improve forecast accuracy. Therefore, we introduce a multivariate functional time-series method to jointly model and forecast multiple functional time-series that could be correlated with each other.

Let be the IV smile for 1M-maturity options, the IV smile for 6M-maturity options, and the IV smile for 2Y-maturity options. As our multiple functional time-series have the same function support, we consider data where each observation consists of functions, i.e., , where and in our data set.

The multivariate functional time-series are stacked in a vector with . We assume that . For , the theoretical cross-covariance function can be defined with elements

where and denote the mean functions for the and series, respectively. By assuming that is a continuous and square-integrable covariance function, the function induces the kernel operator given by

Via Mercer’s lemma, there exists an orthonormal sequence of continuous functions in and a non-increasing sequence of positive numbers, such that

By the separability of Hilbert space, the Karhunen-Loève expansion of a stochastic process can be expressed as

where denotes the number of retained functional principal components. Its matrix formulation is

where and denote stacked historical functions. Moreover, is the vector of the basis expansion coefficients, , and

Conditioning on the past functions and the estimated functional principal components , the -step-ahead point forecast of can be expressed as

where denotes the time-series forecasts of the principal component scores corresponding to the three maturities. In our implementation, each series is standardised to have equal scale before being combined into a long vector (see, e.g., Chiou et al., 2014).

3.3 Multilevel functional time-series method

The multilevel functional data model bears a strong resemblance to a two-way functional analysis of variance model studied by Morris et al. (2003) and Cuesta-Albertos & Febrero-Bande (2010). It is a special case of the general “functional mixed model” proposed in Morris & Carroll (2006) and applied in Shang (2016) and Shang et al. (2016). As the IV surface comprises distinct IV smiles for multiple maturities of the same currency pair, the basic idea is to decompose the related smiles into an average IV smile across the entire surface , a maturity-specific deviation from the averaged IV smile , a common trend across maturities , and a maturity-specific residual trend

. The common trend and maturity-specific residual trend are modelled by projecting them onto the eigenvectors of covariance functions of the aggregate and maturity-specific centred stochastic processes, respectively. The curve at observation

can be written as


where corresponds to the 1-month, 6-month and 2-year maturities, respectively. To ensure model and parameter identifiability, the observations of the two stochastic processes and are uncorrelated since we implement two-stage functional principal component decomposition.

Since the centred stochastic processes and are unknown in practice, the population eigenvalues and eigenfunctions can only be approximated at best through a set of realisations and . From the covariance of , we can extract a set of functional principal components and the associated scores, along with a set of residual functions. From the covariance function of the residual functions, we can then extract a second set of functional principal components and their associated scores. While the first functional principal component decomposition captures the common trend shared across various maturities of the same currency, the second functional principal component decomposition captures the maturity-specific residual trend.

The sample versions of the aggregate mean function, maturity-specific mean function deviation, a common trend, and maturity-specific residual trend for a set of functional data can be estimated by


where represents a time-series of functions for the currency exchange averaged across different maturities; represents the simple average of the averaged currency exchange, whereas represents the simple average of the currency exchange at the maturity; represents the sample principal component scores of ; and denotes the corresponding orthogonal sample eigenfunctions in a square-integrable function space.

Substituting equations (6) to (9) into equations (4)–(5), we obtain

where denotes measurement error with a finite variance .

To select the optimal number of components, we use a CPV criterion to determine and . The optimal numbers of and are determined by

where denotes a binary indicator function. Following Chiou (2012), we chose .

From the estimated eigenvalues, we can estimate the proportion of variability explained by aggregated data (also known as the variance explained by the within-cluster variability, see for example, Di et al., 2009). A possible measure of within-cluster variability is given by


where denotes the population variance, which can be approximated by the estimated and retained eigenvalues. When the common trend can explain the primary mode of total variability, the value of the within-cluster variability is close to 1.

Conditioning on observed data and basis functions and , the -step-ahead forecasts can be obtained by

where and are the forecasted principal component scores, obtained from a univariate time-series forecasting method, such as an ARIMA model.

While Sections 3.2 and 3.3 present static multivariate and multilevel functional time-series methods, dynamic counterparts can be produced using dynamic functional principal components and their associated scores. These quantities are constructed based on an estimated long-run covariance function in (3).

4 Forecast evaluation

4.1 Expanding-window approach

An expanding-window analysis of time-series models is commonly used to assess model and parameter stability over time. It determines the constancy of a model’s parameters by computing parameter estimates, and their forecasts over an expanding window of a fixed size through the sample (see Zivot & Wang, 2006, Chapter 9 for details). Using the first 1,827 observations from January 1, 2008 to December 31, 2014 for each currency pair, we produce one-step-ahead point forecasts. Through an expanding window approach, we re-estimate the parameters in all models using the first 1,828 observations from January 1, 2008 to January 1, 2015. One-step-ahead forecasts are then produced for each model. We iterate this process by increasing the sample size by one trading day until we reach the end of the data period. This process produces 522 one-step-ahead forecasts from January 1, 2015 to December 30, 2016. We compare these forecasts with the holdout samples to determine out-of-sample point forecast accuracy. This evaluation is in line with Chalamandaris & Tsekrekos (2010), who adopt a recursive one-day-ahead strategy scheme. Also, we consider 518 five-step-ahead forecasts and 513 ten-step-ahead forecasts to assess forecast accuracy at comparably longer horizons. We choose to expand the training set and re-estimate all the models daily to incorporate all available information into our prediction. This approach more accurately simulates the action likely to be taken by a market practitioner who aims to predict the following day’s movement.

To control for sensitivity to specific out-of-sample periods, various window lengths are tested: In the case of one-step-ahead prediction, we consider 522 days (out-of-sample: January 1, 2015 to December 30, 2016), 261 days (out-of-sample: January 1, 2016 to December 30, 2016), 131 days (out-of-sample: July 1, 2016 to December 30, 2016). The out-of-sample forecasts between the end of the in-sample period and December 30, 2016 are obtained using an expanding-window scheme.

4.2 Forecast error measures

A number of criteria exist for evaluating point forecast accuracy (see Armstrong & Collopy, 1992; Hyndman & Koehler, 2006). They can be grouped into four categories: (1) scale-dependent metrics, (2) percentage-error metrics, (3) relative-error metrics, which average the ratios of the errors from a designated method to the errors of a naïve method, and (4) scale-free error metrics, which express each error as a ratio to the average error from a baseline method. Here, we use the following three accuracy measures:

  1. Mean absolute forecast error (MAFE); a measure of the absolute difference between the forecast, , and the corresponding observation, . It measures the average error magnitude in the forecasts, regardless of error direction and serves to aggregate the errors into a single measure of predictive power. The MAFE can be expressed as

    where denotes the number of discretised points in a curve and denotes the number of observations in the forecasting period; denotes the holdout sample for the delta and curve in the forecasting period; and denotes the point forecast of the holdout sample.

  2. Mean squared forecast error (MSFE); a measure of the squared difference between the predicted and realised values, which again serves to aggregate the errors into a single measure of predictive power. The MSFE can be expressed as

  3. Mean mixed error (MME); an asymmetric loss function. MME(U) penalises under-predictions more heavily, while MME(O) penalises over-predictions more heavily. This feature is crucial for investors in options markets, as an under (over)-prediction of IV is more likely to be of greater concern to a seller (buyer) than a buyer (seller). This measure has previously been used in studies that evaluate volatility forecasting techniques, such as

    Brailsford & Faff (1996), Fuertes et al. (2009), and Kearney et al. (2018). With a slight abuse of notation , the MME(U) and MME(O) are given as


    where denotes the number of under-predictions and denotes the number of over-predictions. represents the indices of the over-predictions, and represents the indices of the under-predictions.

5 Data analysis results

This section presents the results of modelling IV using the three functional time-series models for the entire data sample from January 1, 2008 to December 30, 2016. Using the entire EUR-USD dataset, we present the first two estimated static and dynamic principal components and their associated scores in Figure 3. While the static principal components are extracted based on sample variance, the dynamic principal components are extracted based on sample long-run covariance. When a time-series has moderate to strong temporal dependence, the extracted static and dynamic principal components are different, as shown in Figure 3. Due to the variation in the extracted principal components, the associated scores for both approaches also exhibit visible differences.

Figure 3: The figure presents the estimated mean function and the first two estimated static and dynamic functional principal components, along with their associated scores. For both approaches, the number of components, , is retained as two, based on explaining at least 99% of the total variance or long-run covariance.

In Section 5.1, the models are fitted in-sample to ascertain goodness-of-fit when seeking to capture the empirical dynamics of the IV surfaces for each currency pair during the entire sample period. In Section 5.2, we evaluate and compare out-of-sample forecast accuracy among the functional time-series methods. In Section 5.3, we examine statistical significance based on point forecast accuracy.

5.1 In-sample model fitting

Three functional time-series models are fitted to the underlying IV data, for each day over the full sample of January 1, 2008 to December 30, 2016. There are various ways of assessing the fit of a linear model, and we initially consider a criterion introduced by Ramsay & Silverman (2005, Section 16.3). The approach borrowed from conventional linear models is to consider the squared correlation function:

In Figure 4, we present over the five delta values, for the three maturities and three currencies.

Figure 4: These figures present a comparison of goodness-of-fit among the three functional time-series methods over five delta values of 10, 25, 50, 75 and 90, as measured by . FTS, MFTS and MLFTS denote univariate, multivariate and multilevel functional time-series methods, respectively.

Total is useful if one requires a single numerical measure of fit. It can be expressed as

The total for the three maturities for each currency pair are listed in Table 1. Kearney et al. (2018) find that the univariate functional time-series model provides the best goodness-of-fit in comparison to the benchmark methods of Goncalves & Guidolin (2006) and Chalamandaris & Tsekrekos (2010, 2011). Here, we improve on Kearney et al. (2018) by showing that the goodness-of-fit of the univariate functional time-series model they propose is outperformed by our multivariate and multilevel functional time-series models, as measured by total . We also observe that the optimal values are often associated with the dynamic methods.

Currency Method 1M 6M 2Y
EUR-USD FTS 0.9387 0.9246 0.9174
MFTS 0.9773 0.9842 0.9797
MLFTS 0.9784 0.9715 0.9668
DFTS 0.9976


DMFTS 0.9849 0.9934 0.9914




EUR-GBP FTS 0.9360 0.9207 0.9031
MFTS 0.9811 0.9842 0.9723
MLFTS 0.9752 0.9689 0.9265
DFTS 0.9975 0.9929 0.9764
DMFTS 0.9896 0.9942





EUR-JPY FTS 0.8553 0.8331 0.7869
MFTS 0.9825 0.9721 0.9277
MLFTS 0.9488 0.8863 0.9041
DFTS 0.9032 0.8850 0.8337
DMFTS 0.9872





0.9594 0.9811
Table 1: This table presents a comparison of goodness-of-fit among the univariate, multivariate and multilevel functional time-series methods using static and dynamic functional principal component analyses, as measured by total . The results are aggregated over the five delta values of 10, 25, 50, 75 and 90, over the January 2008 to December 2016 sample period. FTS, MFTS and MLFTS represent the univariate, multivariate and multilevel functional time-series models, respectively, with a leading ‘D’ indicating that dynamic principal component evaluation is used. The largest values are highlighted in bold for each maturity and currency pair.

A key measure of goodness-of-fit for the multilevel functional time-series method is the within-cluster variability defined in (10). When the common factor explains the primary mode of total variability, the value of within-cluster variability is close to 1. We calculate this within-cluster variability for each currency pair and maturity period in Table 2.

Currency Method 1M 6M 2Y
EUR-USD MLFTS 0.7507 0.8021





EUR-GBP MLFTS 0.7423 0.8087 0.7134






0.6918 0.7117
DMLFTS 0.6047



Table 2: This table presents the computed within-cluster variability of the multilevel functional time-series method for each currency pair and each maturity period aggregated over the five delta values of 10, 25, 50, 75 and 90. The sample period is from January 2008 to December 2016. MLFTS and DMLFTS represent the static and dynamic multilevel functional time-series models, respectively. The largest values are highlighted in bold for each maturity and currency pair.

5.2 Out-of-sample forecast evaluation

It can be seen from the previous section that, the multivariate functional time-series and multilevel functional time-series models provide better in-sample fit when seeking to capture shape of the IV surface. We now turn our attention to out-of-sample forecasting.

5.2.1 Stationarity test of functional time-series

Since the computation of long-run covariance is only meaningful if the functional time-series is stationary, we implement the stationarity test of Horvàth et al. (2014) and its corresponding R function T_stationary in the ftsa package (Hyndman & Shang, 2020). Based on the -values reported in Table 3, we conclude that all the functional time-series we consider are stationary.

Currency 1M 6M 2Y
EUR-USD 0.533 0.424 0.415
EUR-GBP 0.771 0.747 0.487
EUR-JPY 0.455 0.646 0.363
Table 3:

With the null hypothesis being that of stationarity, this table presents the probabilities of rejecting the null hypothesis (

-values) for each currency and maturity using the stationarity test of Horvàth et al. (2014).

5.2.2 Eur-Usd

A summary of the out-of-sample forecast measures calculated under a recursive parameter estimation scheme and 522-day out-of-sample window length for EUR-USD are presented in Table 4. We consider the Chalamandaris & Tsekrekos’s (2011) method, the Goncalves & Guidolin’s (2006) method, RW and AR(1) as benchmakr methods. We use a generalised least squares (GLS) procedure to produce these Goncalves & Guidolin (2006) and Chalamandaris & Tsekrekos (2011) benchmarks, in line with Bernales & Guidolin (2014). GLS is used to control for the option measurement errors identified by Hentschel (2003). We direct the interested reader to Appendix B of Goncalves & Guidolin (2006) for the technical detail of applying the GLS method suggested by Hentschel (2003) in the context of deterministic IV surface models.

Horizon Maturity Method MME(U) MME(O) MME(U) MME(O)
1 day 1M FTS 0.5936 0.5709 0.6653 0.6356 0.5610 0.5155 0.6414 0.6062
MFTS 0.6557 0.7748 0.7016 0.6878 0.5237 0.4776 0.6090 0.5749
MLFTS 0.5074 0.4678 0.5910 0.5624 0.4429 0.3701 0.5314 0.5128
DFTS 0.4346 0.3586 0.5187 0.5094 0.3975 0.3177 0.4938 0.4669
DMFTS 0.6426 0.7541 0.6830 0.6831 0.4587 0.3916 0.5503 0.5204
DMLFTS 0.5493 0.5024 0.6100 0.6194 0.4023 0.3252 0.4978 0.4717
CT11 0.6040 0.6486 0.6479 0.6582
GG06 0.6675 0.7272 0.7097 0.7063
RW 0.4071 0.3311 0.4983 0.4793
AR(1) 0.4129 0.3328 0.5120 0.4802
6M FTS 0.5026 0.3950 0.6258 0.5355 0.4580 0.3323 0.5941 0.4919
MFTS 0.3708 0.2821 0.4685 0.4496 0.3791 0.2959 0.4699 0.4616
MLFTS 0.3674 0.2772 0.4834 0.4253 0.2573 0.1585 0.3805 0.3277
DFTS 0.3649 0.2462 0.4472 0.4719 0.2308 0.1557 0.3332 0.3142
DMFTS 0.3748 0.2858 0.4439 0.4808 0.4013 0.3205 0.4610 0.5095
DMLFTS 0.3961 0.2836 0.4638 0.5079 0.2288 0.1490 0.3320 0.3130
CT11 0.7322 0.8314 0.7871 0.7370
GG06 0.7557 0.9552 0.7966 0.7549
RW 0.2358 0.1775 0.3353 0.3175
AR(1) 0.2395 0.1704 0.3361 0.3187
2Y FTS 0.4629 0.3101 0.6122 0.4885 0.4162 0.2401 0.5815 0.4430
MFTS 0.3360 0.1977 0.4880 0.3738 0.3374 0.1879 0.4970 0.3728
MLFTS 0.3070 0.1772 0.4399 0.3631 0.2280 0.0953 0.3713 0.2892
DFTS 0.3259 0.1527 0.4213 0.4423 0.1617 0.0582 0.2643 0.2556
DMFTS 0.2713 0.1384 0.4015 0.3383 0.2600 0.1206 0.4082 0.3149
DMLFTS 0.1960 0.0746 0.2949 0.3001 0.1576 0.0561 0.2604 0.2504
CT11 0.2548 0.1203 0.3585 0.3529
GG06 0.2435 0.1067 0.3521 0.3403
RW 0.1593 0.0629 0.2599 0.2479
AR(1) 0.1626 0.0643 0.2596 0.2466
Average FTS 0.5197 0.4253 0.6344 0.5532 0.4784 0.3626 0.6057 0.5137
MFTS 0.4542 0.4182 0.5527 0.5037 0.4134 0.3205 0.5253 0.4698
MLFTS 0.3939 0.3074 0.5048 0.4503 0.3094 0.2080 0.4277 0.3766
DFTS 0.3751 0.2525 0.4624 0.4745 0.2633 0.1772 0.3638 0.3456
DMFTS 0.4295 0.3928 0.5095 0.5007 0.3733 0.2776 0.4732 0.4483
DMLFTS 0.3805 0.2869 0.4562 0.4758





CT11 0.5303 0.5334 0.5978 0.5827
GG06 0.5556 0.5964 0.6195 0.6005
RW 0.2674 0.1905 0.3645 0.3482
AR(1) 0.2717 0.1892 0.3692 0.3452
5 day 1M FTS 0.9598 1.5348 0.9398 0.9200 0.9481 1.5261 0.9327 0.9075
MFTS 1.0091 1.8200 0.9640 0.9579 0.9365 1.5892 0.9172 0.8980
MLFTS 0.9298 1.5650 0.9284 0.8769 0.9116 1.5657 0.9140 0.8607
DFTS 0.9172 1.5958 0.9095 0.8703 0.9002 1.5563 0.8981 0.8554
DMFTS 1.0162 1.8861 0.9672 0.9597 0.9270 1.6338 0.9075 0.8862
DMLFTS 0.9652 1.6986 0.9411 0.9136 0.9092 1.5883 0.9047 0.8629
CT11 0.9953 1.7950 0.9612 0.9355
GG06 1.0148 1.8012 0.9831 0.9459
RW 0.9100 1.5753 0.9080 0.8625
AR(1) 0.9203 1.5898 0.9206 0.8879
6M FTS 0.6420 0.6596 0.7153 0.6685 0.6124 0.6189 0.6935 0.6397
MFTS 0.5758 0.6414 0.6477 0.6105 0.5753 0.6440 0.6455 0.6109
MLFTS 0.5640 0.6032 0.6400 0.6006 0.5123 0.5325 0.5942 0.5606
DFTS 0.5732 0.6186 0.6290 0.6291 0.5042 0.5289 0.5791 0.5584
DMFTS 0.5853 0.6757 0.6476 0.6248 0.5983 0.7091 0.6570 0.6339
DMLFTS 0.5933 0.6528 0.6362 0.6540 0.5000 0.5248 0.5761 0.5533
CT11 0.8295 1.2002 0.8687 0.7917
GG06 0.8726 1.3253 0.8884 0.8330
RW 0.5127 0.5441 0.5874 0.5657
AR(1) 0.5140 0.5362 0.5943 0.5638
2Y FTS 0.5389 0.4449 0.6454 0.5729 0.5051 0.3882 0.6244 0.5399
MFTS 0.4615 0.3782 0.5626 0.5124 0.4583 0.3668 0.5610 0.5095
MLFTS 0.4409 0.3503 0.5348 0.5045 0.3995 0.2974 0.4982 0.4710
DFTS 0.4656 0.3595 0.5396 0.5515 0.3733 0.2662 0.4662 0.4575
DMFTS 0.4259 0.3418 0.5196 0.4929 0.4194 0.3233 0.5154 0.4873
DMLFTS 0.3904 0.2840 0.4775 0.4760 0.3742 0.2669 0.4658 0.4595
CT11 0.4203 0.3196 0.5062 0.5005
GG06 0.4220 0.3206 0.5088 0.5020
RW 0.3765 0.2713 0.4692 0.4593
AR(1) 0.3754 0.2647 0.4656 0.4543
Average FTS 0.7148 0.8862 0.7668 0.7215 0.6885 0.8444 0.7502 0.6957
MFTS 0.6821 0.9465 0.7248 0.6936 0.6567 0.8667 0.7079 0.6728
MLFTS 0.6449 0.8395 0.7011 0.6607 0.6078 0.7985 0.6688 0.6308
DFTS 0.6520 0.8580 0.6926 0.6837





DMFTS 0.6758 0.9679 0.7115 0.6925 0.6482 0.8887 0.6933 0.6691
DMLFTS 0.6496 0.8785 0.6849 0.6812 0.5945 0.7933 0.6489 0.6252
CT11 0.7484 1.1049 0.7787 0.7426
GG06 0.7698 1.1490 0.7934 0.7603
RW 0.5997 0.7969 0.6549 0.6292
AR(1) 0.6032 0.7969 0.6602 0.6353
10 day 1M FTS 1.2442 2.4842 1.1378 1.1328 1.2452 2.5329 1.1433 1.1267
MFTS 1.2935 2.8260 1.1776 1.1548 1.2534 2.6819 1.1543 1.1210
MLFTS 1.2405 2.6064 1.1631 1.0944 1.2527 2.7404 1.1746 1.0978
DFTS 1.2862 2.8628 1.1875 1.1350 1.2747 2.8248 1.1792 1.1262
DMFTS 1.3352 3.0192 1.2095 1.1798 1.2834 2.8693 1.1732 1.1418
DMLFTS 1.2865 2.8253 1.1816 1.1383 1.2774 2.8527 1.1801 1.1283
CT11 1.3039 2.9100 1.1966 1.1457
GG06 1.2983 2.8572 1.1936 1.1419
RW 1.2786 2.8572 1.1815 1.1264
AR(1) 1.2628 2.7316 1.1809 1.1291
6M FTS 0.7615 0.9441 0.7948 0.7717 0.7454 0.9348 0.7850 0.7538
MFTS 0.7324 0.9967 0.7768 0.7300 0.7347 1.0048 0.7812 0.7280
MLFTS 0.7138 0.9389 0.7529 0.7239 0.6908 0.9378 0.7376 0.6979
DFTS 0.7386 1.0367 0.7624 0.7498 0.6849 0.9482 0.7300 0.6916
DMFTS 0.7525 1.0859 0.7918 0.7407 0.7635 1.1112 0.8037 0.7469
DMLFTS 0.7625 1.0729 0.7856 0.7664 0.6866 0.9424 0.7287 0.6965
CT11 0.9473 1.6022 0.9614 0.8680
GG06 1.0002 1.7614 0.9880 0.9216
RW 0.6843 0.9544 0.7323 0.6942
AR(1) 0.6863 0.9337 0.7347 0.6947
2Y FTS 0.6001 0.5765 0.6752 0.6368 0.5793 0.5363 0.6631 0.6173
MFTS 0.5508 0.5400 0.6177 0.6021 0.5522 0.5417 0.6157 0.6062
MLFTS 0.5340 0.5083 0.5973 0.5971 0.5187 0.4907 0.5906 0.5795
DFTS 0.5820 0.5690 0.6359 0.6443 0.5085 0.4772 0.5824 0.5700
DMFTS 0.5399 0.5360 0.6076 0.5943 0.5374 0.5222 0.6016 0.5976
DMLFTS 0.5166 0.4836 0.5842 0.5821 0.5081 0.4709 0.5792 0.5743
CT11 0.5416 0.5213 0.6071 0.6029
GG06 0.5443 0.5316 0.6038 0.6091
RW 0.5058 0.4717 0.5829 0.5691
AR(1) 0.5120 0.5328 0.5812 0.5704
Average FTS 0.8702 1.3441 0.8707 0.8480 0.8566


0.8638 0.8326
MFTS 0.8589 1.4542 0.8574 0.8290 0.8468 1.4095 0.8504 0.8184
MLFTS 0.8294 1.3512 0.8377 0.8051 0.8207 1.3896 0.8343


DFTS 0.8689 1.4893 0.8619 0.8430 0.8227 1.4167 0.8305 0.7959
DMFTS 0.8759 1.5470 0.8696 0.8383 0.8614 1.5009 0.8595 0.8288
DMLFTS 0.8552 1.4606 0.8505 0.8289 0.8240 1.4220


CT11 0.9309 1.6778 0.9217 0.8722
GG06 0.9476 1.7167 0.9285 0.8909
RW 0.8229 1.4278 0.8322 0.7966


1.3394 0.8323 0.7981
Table 4: To forecast EUR-USD IV from January 1, 2015 to December 30, 2016, we compute one-step-ahead, five-step-ahead and ten-step-ahead out-of-sample forecast performance measures for each maturity. The measures are aggregated over the five delta values of 10, 25, 50, 75 and 90. FTS, MFTS and MLFTS represent the univariate, multivariate and multilevel functional time-series models, respectively, with a leading ‘D’ indicating that dynamic principal component evaluation is used. We use ARIMA with orders selected by the automatic algorithm of Hyndman & Khandakar (2008) to forecast the principal component scores. CPV indicates the number of principal components is determined as the minimum that reaches a cumulative percentage of the variance of 99%, whereas fixes this number at four. The smallest averaged error measures are highlighted in bold for each maturity and each horizon.

In Table 5, we present a comparison of out-of-sample accuracy for forecasting EUR-USD IV for the five delta values considered. The errors are aggregated over three maturities of one month, six months and two years. In the supplement, we also present the results using exponential smoothing, which is a generalisation of the ARIMA model (see, e.g., Kearney & Shang, 2020).

Horizon Delta Delta
Method 10 25 50 75 90 10 25 50 75 90
1 day
CPV FTS 0.7114 0.4847 0.3888 0.4507 0.5628 0.7330 0.3712 0.2442 0.2928 0.4853
DFTS 0.4811 0.3305 0.4047 0.2882 0.3711 0.3981 0.2273 0.2491 0.1484 0.2396
MFTS 0.6692 0.4438 0.2969 0.3430 0.5178 0.8482 0.3783 0.1634 0.2171 0.4841
DMFTS 0.6273 0.4518 0.2869 0.3087 0.4729 0.8019 0.3716 0.1610 0.1997 0.4296
MLFTS 0.5441 0.3520 0.2735 0.3400 0.4599 0.5283 0.2509 0.1544 0.2095 0.3939
DMLFTS 0.4819 0.3568 0.4003 0.2896 0.3738 0.4393 0.2621 0.2857 0.1691 0.2781
CT11 0.8555 0.5321 0.2413 0.4563 0.5664 1.1277 0.5310 0.1411 0.3431 0.5244
GG06 0.8299 0.5783 0.3702 0.4428 0.5565 1.2154 0.5403 0.2565 0.3739 0.5959
RW 0.4388 0.3692 0.3255 0.3133 0.3404 0.4113 0.2817 0.2264 0.2070 0.2762
AR(1) 0.4440 0.3738 0.3296 0.3178 0.3432 0.4117 0.2827 0.2277 0.2085 0.2651
FTS 0.4916 0.5449 0.4583 0.4903 0.4069 0.4099 0.4552 0.3176 0.3459 0.2846
DFTS 0.3392 0.2759 0.2364 0.2252 0.2400 0.2977 0.1835 0.1303 0.1127 0.1618
MFTS 0.6084 0.4289 0.3088 0.3101 0.4107 0.6274 0.3191 0.1709 0.1721 0.3129
DMFTS 0.5603 0.4209 0.3045 0.2640 0.3171 0.5566 0.3144 0.1741 0.1357 0.2069
MLFTS 0.3777 0.3094 0.2601 0.2687 0.3310 0.3273 0.2048 0.1406 0.1404 0.2267
DMLFTS 0.3393 0.2734 0.2328 0.2233 0.2457