Bayesian analysis of predictive Non-Homogeneous hidden Markov models using Polya-Gamma data augmentation

02/08/2018 ∙ by Constandina Koki, et al. ∙ University of Athens Athens University of Economics and Business 0

We consider Non-Homogeneous Hidden Markov Models (NHHMMs) for forecasting univariate time series. We introduce two state NHHMMs where the time series are modeled via different predictive regression models for each state. Also, the time-varying transition probabilities depend on exogenous variables through a logistic function. In a hidden Markov setting, inference for logistic regression coefficients becomes complicated and in some cases impossible due to convergence issues. To address this problem, we use a new latent variable scheme, that utilizes the Pólya-Gamma class of distributions, introduced by Po13. Given an available set of predictors, we allow for model uncertainty regarding the predictors that affect the series both linearly -- in the mean -- and non-linearly -- in the transition matrix. Predictor selection and inference on the model parameters are based on a MCMC scheme with reversible jump steps. Single-step and multiple-steps-ahead predictions are obtained based on the most probable model, median probability model or a Bayesian Model Averaging (BMA) approach. Simulation experiments, as well as an empirical study on real financial data, illustrate the performance of our algorithm in various setups, in terms of mixing properties, model selection and predictive ability.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 30

page 33

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This paper follows and extends the work of Meligkotsidou and Dellaportas (2011) by employing recent methodological advances on Pólya-Gamma data augmentation for predictive discrete-time, finite state-space, Non-Homogeneous Hidden Markov Models (NHHMMs) that can be used for modeling and predicting univariate time-series. We consider two-state NHHMM (easily extended to m-state NHHMMs) where the time series are modeled via different predictive regression models for each state, whereas the transition probabilities are modeled via logistic regressions. Given an available set of predictors we allow for model uncertainty, regarding the predictors that affect the series both linearly, that is directly in the mean regressions, and non-linearly, that is in the transition matrix.

Discrete-time finite state-space Homogeneous Hidden Markov models (HHMMs) have been extensively studied and widely used to model stochastic processes consisting of an observed process and a latent (hidden) sequence of states which is assumed to affect the observation sequence, see for example Cappé et al. (2005) and Billio et al. (1999). We refer to Scott (2002) and Rydén (2008)

for a review of Bayesian approaches for HHMMs. Bayesian inference, using Markov chain Monte Carlo (MCMC) techniques, has enhanced the applicability of HHMMs and has led to the construction of more complex model specifications including NHHMMs. Initially

Diebold et al. (1994)

studied the two-state Gaussian NHHMMs where the time varying transition probabilities were modeled via logistic functions and their approach was based on the Expectation-Maximization algorithm (EM).

Filardo and Gordon (1998) adopted a Bayesian perspective to overcome technical and calculation issues of classical approaches. Since then, various Bayesian methods have been proposed in the literature. For example, Spezia (2006)

modeled the time-varying transition probabilities via a logistic function depending on exogenous variables and performed model selection based on the Bayes factor. In the same spirit,

Meligkotsidou and Dellaportas (2011) considered a

-stage NHMM and assumed that the elements of the transition matrix are linked through exogenous variables with a multinomial logistic link whereas the observed process conditional on the unobserved process follows an autoregressive model of order

.

This paper considers inference and variable selection for predictive NHHMMs which exploit a set of available predictors by allowing them to affect the transition probabilities and/or the state-specific predictive regressions. We perform Bayesian inference for the proposed models based on MCMC. Our MCMC scheme aims at overcoming difficulties and solving convergence issues arising with existing MCMC algorithms, thus offering an improved inferential procedure for estimation and variable selection in NHHMMs. To this end we exploit the missing data representation of hidden Markov models and construct an MCMC algorithm based on data augmentation, consisting of several steps. First, the latent sequence of states is sampled via the Scaled Forward-Backward algorithm

(Scott, 2002), which is a modification of the Forward-Backward algorithm of Baum et al. (1970) who used it to implement the classical EM algorithm. Then, following Meligkotsidou and Dellaportas (2011) we use a logistic regression representation of the transition probabilities and we simulate the parameters of the mean predictive regression model for each stage, via Gibbs sampling steps. Using the data augmentation scheme of Holmes and Held (2006), as in Meligkotsidou and Dellaportas (2011), may result in serious convergence issues, especially in cases that there exists model uncertainty. Frühwirth-Schnatter and Frühwirth (2010)

give a detailed comparison between various methods for dealing with binary and multinomial logit data and argue about the efficiency of

Holmes and Held (2006) data augmentation scheme. To deal with this serious issue we use the Pólya-Gamma data augmentation scheme of Polson et al. (2013) which has a significantly improved performance. The recent work of Holsclaw et al. (2017) confirms that using Pólya-Gamma data augmentation to parametrize the transition probabilities of NHHMMs results in an algorithm that mixes well and provides adequate estimates of the model parameters. Finally, we incorporate variable selection within our MCMC scheme by using the well-known model selection method of Green (1995), the reversible jump algorithm. Within our algorithm we perform a couple of reversible jump steps to allow for a different set of covariates to affect the mean equation and the transition probabilities.

Regarding stochastic variable selection in the transition matrix of the NHHMM we design a reversible jump step based on the Pólya-Gamma data augmentation representation for the logistic regression coefficients. Recently, Holsclaw et al. (2017) consider a NHHMM similar to ours for modeling multivariate meteorological time series data. In that paper, the transition probabilities are modeled via multinomial logistic regressions affected by a specific set of exogenous variables. The authors use the BIC criterion for choosing the best model among a specified class of models. We extend this work by considering the problems of statistical inference and variable selection jointly, in a purely Bayesian setting, and we describe the proposed methodology in the context of 2-state NHHMMs, noting though that it can be easily extended to the case of m-state models.

Apart from developing a stable algorithm for inferring NHHMMs in the presence of model uncertainty, another main goal of our work is to explore the advantages of applying the proposed models and methods to realized volatility prediction. Forecasting future volatility accurately is extremely important for asset allocation, portfolio and risk managemen. A vast amount of literature has investigated the relationship between volatility and macroeconomic and/or financial variables (see for example, Paye (2012); Christiansen et al. (2012); Meligkotsidou et al. (2018)

among others). The proposed NHHMMs are able to capture the nonlinear relationship between the logarithm of realized volatility and a set of predictors, as well as other special characteristics of the analyzed series, such as heteroscedasticity and autocorrelation.Our application shows that the proposed model outperforms benchmark alternatives, such as the Homogeneous HMM and the predictive regression model with autoregressive terms, in terms of forecasting ability. We argue that the logarithmic scoring rule for evaluating different models and their predictive accuracy is not appropriate, since the data have multimodal distribution. When data are multimodal, scoring rules that are sensible to distance are preferred (

Gneiting and Raftery (2007)). Besides logarithmic scoring rule gives harsh penalty for low probability events (Boero et al. (2011); Gneiting and Raftery (2007)) and prefers the forecast density that is less informative (Machete (2013)). We propose the use of Continuous Ranked Probability Score, which has gained a lot of interest in meteorological community (Grimit et al. (2006)), as a better alternative for assessing the quality of forecasts as well as for validating the model performance.

The paper proceeds as follows. In Section 2 we briefly describe the proposed model and in Section 3 we present analytically the Bayesian inference for this model, specifically for the model with a fixed number of predictors (covariates), as well as for the model with unknown number of predictors. Then, in Section 4, we present the forecasting criteria we used to asses the predictive ability of our method. To check our methodology we performed extensive experiments with simulated data. We provide in Section 5 some indicative case studies. We illustrate our results regarding the variable selection and forecasting evaluation and we make comparisons with benchmark models. Next, we apply our methodology to monthly realized volatility data set (Section 6). Finally, Section 7 concludes the paper.

2 The Non-Homogeneous Hidden Markov Model

In this section, we present the proposed Non-Homogeneous Hidden Markov Model (NHHMM) for modeling univariate time series. Consider an observed random process and a hidden underlying process which is a two-state non-homogeneous discrete-time Markov chain that determines the states of the observed process. Let and be the realizations of the observed random process and of the hidden process , respectively. We assume that at time , depends on the current state and not on the previous states. Consider also a set of available predictors with realization at time . A subset of the predictors of length is used in the regression model for the observed process and a subset of length is used to model/describe the dynamics of the time-varying transition probabilities. Thus, we allow the covariates to affect the observed process in a non-linear fashion.

In the case of a univariate random process , the NHHMM can be written in the form

where and . We use

to denote the normal distribution with mean

and variance

. In a less formal way, if represents the hidden states, the observed series given the unobserved process has the form

The dynamics of the unobserved process can be described by the time-varying transition probabilities, which depend on the predictors and are given by the following relationship

where

is the vector of the logistic regression coefficients to be estimated and

is the set of covariates that affect the transition probabilities. The unknown quantities of the NHHMM are , that is the parameters in the mean predictive regression equation and the parameters in the logistic regression equation for the transition probabilities of the unobserved process , . Our model and the methods developed in this paper can easily be generalized into an m-state NHHMM, where the rows of the transition matrix are modeled by multinomial logistic regressions.

3 Bayesian Inference and Computational Strategy

This section presents the Bayesian approach to inference for the Non-Homogeneous Hidden Markov model. The key steps in our proposed framework are the following. First, for a given Hidden Markov model with time-varying probabilities, we construct a Markov chain which has as a stationary distribution the posterior distribution of the model parameters. Simulation of this Markov chain provides, after some burn-in period and adequately many iterations, samples from the posterior distribution of interest; see, for details, Besag et al. (1995)

. Second, for a given set of competing models each including a different set of predictors in the mean regression and/or in the transition probabilities equation, we base our inference about the models on their posterior probabilities. Thus, we avoid the usual approach which considers the models separately and chooses the best model via significance tests or via model selection criteria.

3.1 Inference for fixed sets of predictors

Below we provide detailed guidelines on how to estimate the parameters of a given NHHMM, i.e. for fixed sets of predictors used in the mean equation and the transition probabilities and , respectively. The proposed approach to Bayesian inference on the parameters of the NHHMM is based on constructing a Markov chain Monte Carlo algorithm which updates, in turn, the mean regression parameters, the logistic regression coefficients, and the latent variables . Let be the history of the observed process, the sequence of states up to time , and let

denote the normal probability density function of

, and the initial distribution of . The joint likelihood function of the observed data, , and the unobserved sequence of states, , is given by

If a prior distribution is specified for the model parameters, then inference on all the unknown quantities in the model is based on their joint posterior distribution

For the parameters in the mean predictive regression equation, we use conjugate prior distributions, i.e.

where

denotes the Inverted-Gamma distribution. To make inference about the logistic regression coefficients we use the auxiliary variables method of

Polson et al. (2013) as described in Subsection 3.1.1. Given an auxiliary variable , a conjugate prior for the logistic regression coefficients , is multivariate normal distribution .

The joint likelihood of the observed data and the hidden states is

We use the notation for the number of times the chain was in state , that is , the with the indicator function.

The MCMC sampling scheme is constructed with recursive updates of (i) the latent variables given the current value of the model parameters by using the scaled Forward-Backward algorithm (Scott (2002)) (ii) the logistic regression coefficients by adopting the auxiliary variables method of Polson et al. (2013) given the sequence of states , and (iii) the mean regression coefficients conditional on by using the Gibbs sampling algorithm.

3.1.1 Simulation of the logistic regression coefficients

In a two-state NHHMM, as Meligkotsidou and Dellaportas (2011) observe, we can model the two diagonal elements of probability transition matrix by linking them to the set of covariates using a logistic link. However, the algorithm adopting the auxiliary representation of Holmes and Held (2006) in the method of Meligkotsidou and Dellaportas (2011) does not converge in some cases, especially if there exists model uncertainty in the transition probabilities. Many data-augmentation or Metropolis-Hastings algorithms are proposed to model the logistic regression model, see for example O’Brien and Dunson (2004); Fussl et al. (2013); Polson et al. (2013). We follow Polson et al. (2013) since as shown in their work, using Pólya-Gamma data augmentation gives superior results, it terms of convergence and mixing, among all the competing methods.

Given the unobserved (latent) data we define . In words is the number of times where the chain was at the same stage for two consecutive time periods. Then,

Polson et al. (2013)

proved that binomial likelihoods (thus Bernoulli likelihoods in our simpler case) parametrized by log odds can be represented as mixtures of Gaussian distributions with respect to Pólya-Gamma distribution. The main result of

Polson et al. (2013) is that letting be the density of a latent variable with for , the following integral identity holds for all :

where . Furthermore, the conditional distribution of is also Pólya-Gamma, . Using the previous result and setting as a set of latent variables the likelihood for each state is

Conditioning on , one can derive the proportion

Assuming as prior distributions and , simulation from the posterior distribution can be done iteratively in two steps:

where denotes the Pólya-Gamma distribution and .

3.1.2 Simulation of the mean equation parameters using the Gibbs algorithm

We used conditionally conjugate prior distributions on the parameters of the mean predictive distribution, i.e.

After some straightforward algebra we derive the conditional and the marginal posterior distribution for the state specific parameters and ,

with

3.2 Inference under model uncertainty

Here, we consider the full model comparison problem where the uncertainty about which predictors should be included in the mean regression model is taken into account together with the uncertainty about the predictors that should be included in the transition probability equation. The proposed model is flexible, since we do not decide a priori which covariates affect the observed or the unobserved process. Instead, we have a common pool of covariates and within the MCMC algorithm we gauge which covariates are included in subset affecting the mean predictive equation of the observed process, and which covariates are included in subset affecting the time-varying transition probabilities. It is worth mentioning that within the framework of the proposed model, a set of covariates implies that there are possible models, hence the model selection problem becomes complicated.

Different approaches have been used in the literature to cope with the model selection problem. The use of information criteria, such as the Akaike’s Information Criterion (AIC, Akaike et al. (1973)), the Bayesian Information Criterion (BIC) of Schwarz (1978) ,the Deviance Information Criterion (DIC, Spiegelhalter et al. (2002)) or the Widely applicable Bayesian Information Criterion (WBIC, Watanabe (2013)), is another approach to variable selection.

We propose a probabilistic approach to inference, which is based on the calculation of the posterior distribution of different NHHMMs or equivalently on computing the posterior probabilities of different hidden Markov models. Posterior probabilities can be used either for selecting the most probable model (i.e. making inference using the model with the highest posterior probability), or for Bayesian model averaging (i.e. producing inferences averaged over different NHHMM). Barbieri and Berger (2004) argue that the optimal predictive model is not necessarily the model with highest posterior probability but the median probability model, which is defined as the model consisting of those covariates which have overall posterior probability of being included in the model (inclusion probability) greater or equal to 0.5. Our method allows us to calculate the posterior probability of the model as well as the probabilities of inclusion. To this end, we develop a Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm (Green (1995),Green and Hastie (2009)) which explores the model space by jumping between different hidden Markov models. A more applied tutorial based on the lines of Green (1995), can be found Waagepetersen and Sorensen (2001). A study for comparing variable selection methods is well presented in O’Hara and Sillanpää (2009) whilst Dellaportas et al. (2002) study the variable selection methods in the context of model choice.

As Green and Hastie (2009) noticed, reversible jump MCMC is in fact a Metropolis-Hastings algorithm, formulated to allow sampling from a distribution on a union of spaces of differing dimension and to permit state-dependent choice of move type. Suppose that a prior is specified over models in a countable set and for each we are given a prior distribution along with a likelihood for data . The joint prior for and is and obviously the joint posterior distribution is

The standard formulation of the Metropolis-Hastings algorithm relies on the construction of a time-reversible Markov chain, that satisfies the detailed balance condition. This condition means that the probability that the state of a chain is in a general set and moves to a general set is the same with the probability that the state is in the set and moves to set . Note that this is a simple way to ensure that the limiting distribution of the chain is the desired target distribution. Define as the state and state space of chain. At the current state we generate random numbers from a known joint density . The new state of chain is constructed by some suitable deterministic function such that . The inverse move is made by generating from a suitable known density and then using the inverse function of to move from to . In practice, the construction of proposal moves between different models is achieved via the concept of ’dimension matching’. Specifically, if the dimensions of and are and respectively, then . If the move from to is accepted with probability then, the reverse move is accepted with probability . Under this formulation the detailed balance condition is

If the transformation from to and its inverse are differentiable then the equality holds if

where is the Jacobian of the transformation. Thus, the acceptance probability , for moving is

Holsclaw et al. (2017) recently introduced NHHMMs using the Pólya-gamma latent data method. As a model selection strategy, they use BIC values to choose the best model among a prespecified set of models. We apply a reversible jump algorithm to account model uncertainty. Due to the fact that in the proposed model there is uncertainty both in the mean predictive equation, as well as in the transition probability equation, we will have to perform two reversible jump steps within our MCMC algorithm. Let us now assume that in a reversible jump step we propose a move type from to . Let denote the probability that move is attempted at state and the probability of the reverse move attempted at state . We accept the proposed move with probability where

We adopt the proposal of Meligkotsidou and Dellaportas (2011) to implement the above algorithm. In each step, we choose to add or remove one covariate with probability and then we randomly choose which covariate we will add/remove. We propose a new value for the mean equation coefficients or for the regression equation coefficients from the full conditional posterior density, conditionally on the other coefficients, thus the Jacobian of transformation will be equal to unity. To be more specific, if we want to update the covariates in the mean equation, the proposal distribution is just the product of the two conditional posterior distributions for derived in Subsection 3.1.2 given and if we want to update the covariates that affect the transition matrix, the proposal distribution is the product of conditional normal distributions derived in Subsection 3.1.1 given for . With some straightforward matrix algebra, the acceptance probability for the mean equation is and the acceptance probability for the transition matrix is where

and

3.3 MCMC Sampling Scheme

In the next lines, we summarize the MCMC algorithm that we have constructed for joint inference on model specification and model parameters.

  1. Start with initial values of .

  2. Calculate the probabilities of time-varying transition matrix for .

  3. Run a Scaled Forward-Backward (Scott (2002)) algorithm to simulate the hidden states given the parameters of the model.

  4. Simulate the parameters of the regression model for the mean via a Gibbs sampler method.

  5. Simulate the coefficients using the Pólya-gamma representation by Polson et al. (2013).

  6. Do a double reversible jump algorithm to define which covariates will affect the transition matrix and which covariates will affect the mean regression model.

  7. Make one-step-ahead predictions conditional on the simulated unknown quantities.

Repeat steps 3-6 until convergence and then repeat steps 3-7.

4 Bayesian Forecasting and Scoring rules

4.1 One-step-ahead predictions

The proposed modeling and inferential approach is used for forecasting. The posterior predictive density can not be found in closed form, but it can instead be evaluated numerically. Given model

, the predictive distribution of is

where

In practice we follow an iterative procedure within the MCMC algorithm to draw a sample from the posterior predictive distribution. At the -th iteration of our algorithm, the algorithm chooses model . Also, the hidden states and the unknown parameters are simulated as described Subsection 3.1. Let be the hidden state is at time . To make an one-step-ahead prediction (i.e. simulate ), we first simulate the hidden state for time , from the discrete distribution based on the transition probabilities and then conditional on the hidden state we draw a value from

Given , the hidden state and the covariates , we may also update the transition matrix , simulate and then simulate the prediction from its respective predictive distribution.

4.2 Forecasting criteria

An issue of major importance in forecasting experiments is to evaluate the quality of the obtained forecasts. Moreover, one way to evaluate a model under consideration is through the accuracy of its forecasts (Geweke and Whiteman (2006)). As noted by Gelman et al. (2014), predictive accuracy is valued not only for its own sake but also for comparing different models. Advances in numerical integration via MCMC algorithms made probabilistic forecasts possible, which are in most cases, preferable. Besides, having the posterior predictive distribution, one can obtain point effective forecasts, using suitable scoring functions (Gneiting (2011)).

Scoring rules provide summary measures for the evaluation of probabilistic forecasts, by assigning a numerical score based on the forecast and on the event or value that it materializes. We refer to Gneiting and Raftery (2007) for a review on the theory and properties of scoring rules. If the forecaster quotes the predictive distribution and the event materializes, then his reward is . If is the forecaster judgment then is the expected value of his reward under . A scoring rule is said to be proper if and if the equality holds if and only if , then the rule is strictly proper. Intuitively, proper scoring rules encourage a forecaster to report the truth about his judgment distribution and they are strictly proper is the issued forecasts are the same with the forecaster’s judgment.

There are various proper scoring rules for continuous variables, such as the probabilistic score, the Linear Score (LinS), the Quadratic Score (QS), the Logarithmic Score (LogS) and the Continuous Ranked Probability Score (CRPS), among others (Machete (2013); Gneiting and Raftery (2007)

). Probabilistic and Linear score are often used, nevertheless they are improper and can have misleading results. A widely used and quite powerful criterion is the Logarithmic Score. It is based on the posterior predictive density evaluated at the observed value and there is a strong connection with the classical Kullback-Leibler divergence, which is one of the reasons for being studied quite intensively (see for example

Gelman et al. (2014), Gschlößl and Czado (2007)). The Logarithmic Score is defined as follows: Let be the total sample size and the number of out-of-sample forecasts. For let be the real observed values of the forecasts, the estimated forecasts and the unknown quantities. Finally, let be the number of draws of the simulation. Using the notation for the posterior predictive density of the new data, for the distribution of the true model and for the likelihood of the data, the predictive fit for the -th observation would be,

Having in mind though that the new data are unknown Gelman et al. (2014) defined the expected out-of-sample log predictive density for the -th observation as

To compute the predictive density for estimated forecasts in practice, since is unknown, we shall calculate

Although the Logarithmic Score is a strictly proper rule, it lacks robustness as it involves harsh penalty for low probability events and thus is sensitive to extreme cases (Boero et al. (2011)). Besides, comparing the entropies of the forecasts, Machete (2013) showed that LS prefers the forecast density that is less informative. As Gneiting and Raftery (2007) notice, measures which are not sensitive to distance give no credit for assigning high probabilities to values near but not identical to the one materializing. Sensitivity to distance seems desirable when predictive distributions tend to be multimodal, which is the case for our model. To deal with this, one could calculate the Continuous Ranked Probability Score which is based on the cumulative predictive distribution. Recently, Boero et al. (2011) made a comparison between LS, CRPS and QS and they argued that when density forecasts are collected in histogram format, then the ranked probability score has advantages over the other scoring rules. Note than when the forecast is deterministic, CRPS is reduced to the Mean Absolute Forecast Error and therefore it is easily interpretable and provides a direct way for comparing various deterministic and probabilistic forecasts.

The posterior predictive cumulative density function is defined as , where is the forecast of interest and y is the history of the predictive quantity. Using the same notation as in Logarithmic score, the CRPS for the real observed value is defined as,

where denotes a step function along the real line that attains the value 1 if and the value 0 otherwise, is the cumulative distribution of the real value and is the known Heaviside function (Hersbach (2000))

As described above, in our setting, at each iteration of the MCMC algorithm, we obtain a sample of length from the predictive distributions of

. Hence we can evaluate both the posterior predictive probability and cumulative distribution function. Instead, there is an easier way of evaluating CRPS using the identity of

Székely and Rizzo (2005).

were

are independent copies of a random variable with distribution function

(see also Gschlößl and Czado (2007)).

Finally, for the sake of completeness we estimate the mean, mode and median of the predictive distribution and compare them using two well known point forecasting criteria, the Mean Square Forecast Error, and the Mean Absolute Forecast Error, . Based on the MCMC sample, the computed point forecasting criteria is just the average score of MSFE or MAFE for all the sample values.

5 Simulation Study

We have conducted a series of simulation experiments to assess the performance of the proposed approach in terms of inference, model selection and predictive ability. We have tested quite extensively our algorithms, accounting for model uncertainty or not, using different sample sizes and assigning various values to the parameters. We compared our model with a Homogeneous Hidden Markov model and a linear model with autoregressive terms (LR). Although considered standard, for the sake of completeness, we give the definitions of HHMM and LR models. In the Homogeneous Hidden Markov model, covariates affect the mean equation but the transition probability matrix is constant. That is,

In the linear regression with autocovariates, besides covariates

, lagged values of are assumed to affect the mean equation. Specifically,

The data were generated either from a Homogeneous Hidden Markov model (HHMM) or from a NHHMM model with covariates simulated from independent normal distributions. We found that the mean equation parameters converge rapidly whereas the logistic regression coefficients needed some burn in period to converge. The hidden chain

was well estimated. For each iteration we kept a replication of the hidden chain (thus the number of the hidden-chain replications is the same with the MCMC sample size) and compared it with the real simulated hidden chain, using a 1-0 loss function. Also the reversible jump algorithm behaved really well in all of our simulations. Even in the case where the data were homogeneous, no covariates were selected in the logistic regression equation, thus the transition probability matrix remained constant through time. Furthermore, in order to test the predictive ability of our model we kept

out-of-sample observations and we calculated the CRPS for every observation. In all our experiments we found that our model outperforms the competing models in forecasting the observed process . A summary of our findings is presented and discussed in the following subsections.

5.1 The fixed model

Below, we present two experiments with simulated data from the fixed NHHMM. For all our experiments we used non-informative priors for the unknown parameters , that is , and finally . Inferences are based on an MCMC sample of 15000 iterations after a burn in period of 10000 iterations. In the first experiment we used a sample of observations. From a common pool of covariates we chose to affect the mean equation and to affect the transition matrix. The predictors, as already mentioned, were simulated from independent normal distributions with means and variance . The mean equation parameters were and whereas the logistic regression coefficients where and for states respectively. Secondly, we run another experiment with sample size . From a common pool of independently normally distributed covariates with means and variance we used 3 covariates affecting the mean equation and the transition matrix. The mean equation parameters were and whereas the logistic regression coefficients where and for states respectively. For both experiments, we kept 10 out-of-sample observations and we computed a sequence of one-step-ahead forecasts of the real observed process. Our algorithm gives accurate estimations and has excellent convergence performance. Also, comparing our forecasts with the forecasts of the two aforementioned simpler models, we found that our methodology outperforms the two models. For completeness we present in Section 8 various plots that confirm the estimation performance and convergence of our methodology. Parameter estimates and summary statistics are presented in Table 7 and a comparison of the three competing models in terms of forecasting ability in Table 8 for the first experiment and in Tables 10 and 9 for the second experiment study.

Figure 1: Experiment 1. The upper panel shows a comparison between the actual (simulated) transition probabilities and the corresponding estimated transition probabilities. True probability values are marked with red dots whereas the posterior mean estimated transition probabilities are marked with blue crosses. The lower panel plots the difference between actual transition probabilities and the estimated ones.
Figure 2: Forecasts of experiment 1. Empirical continuous approximation of the posterior predictive distribution (based on a normal kernel function) using the NHHMM (blue continuous line) and the HHMM (red dotted line). Actual out-of-sample values are marked with yellow asterisks.
Figure 3:

Experiment 1. Quantiles of the predictive distribution. The ten one-step-look-ahead actual values (as taken from the sample) are marked with dashed blue line. The gray line represents the median (the gray dots represent the point forecasts) of the predictive distribution and the gray area shows how the 95

credible interval (CI) of forecasts evolves in time. The CI at each point is defined by the and quantiles of the respective predictive distributions. The red dots represent the median values of the predictive distribution using the HHMM.
Figure 4: Experiment 2. The upper panel shows a comparison between the actual (simulated) transition probabilities and the corresponding estimated transition probabilities. True probability values are marked with red dots whereas the posterior mean estimated transition probabilities are marked with blue crosses. The lower panel plots the difference between actual transition probabilities and the estimated ones.
Figure 5: Forecasts of experiment 2. Empirical continuous approximation of the posterior predictive distribution (based on a normal kernel function) using the NHHMM (blue continuous line) and the HHMM (red dotted line). Actual out-of-sample values are marked with yellow asterisks.
Figure 6: Experiment 2. Quantiles of the predictive distribution. The ten one-step-look-ahead actual values (as taken from the sample) are marked with dashed blue line. The gray line represents the median (the gray dots represent the point forecasts) of the predictive distribution and the gray area shows how the 95 credible interval (CI) of forecasts evolves in time. The CI at each point is defined by the and quantiles of the respective predictive distributions. The red dots represent the median values of the predictive distribution using the HHMM.

5.2 Model selection

In this section we present two simulation experiments (experiments 3 and 4) regarding the problem of variance selection. We compared results from the three models (NHHMM, HHMM, LR) subject to model uncertainty. We performed stochastic variable selection with reversible jump for all the competing models. We used the same non-informative priors and burn-in period, as in Subsection 5.1.

For the first experiment we simulated data of size <