Inference with Hamiltonian Sequential Monte Carlo Simulators

12/19/2018 ∙ by Remi Daviet, et al. ∙ 0

The paper proposes a new Monte-Carlo simulator combining the advantages of Sequential Monte Carlo simulators and Hamiltonian Monte Carlo simulators. The result is a method that is robust to multimodality and complex shapes to use for inference in presence of difficult likelihoods or target functions. Several examples are provided.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 12

page 14

page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Econometrics has traditionally relied on methods based on the optimization of a target function to perform inference. Such methods include the widespread Maximum Likelihood Estimators (MLE), Generalized Method of Moments (GMM) or the Least Squares Estimators (LSE). These methods are especially effective when the target function is unimodal and is shown to converge rapidly to a point.

Recent developments in economic modeling have lead to the emergence of more complicated target functions where multimodality, complex shapes and slow convergence make traditional inference approaches ineffective. Numerical maximization methods such as Nelder-Mead, Newton-Raphson or Expectation Maximization do not guarantee the convergence toward a global maximum

[Gourieroux and Monfort, 1995]. In addition, these methods provide a point-estimate and do not give information about the shape of the target function.

These more complicated target functions are found in almost every subfield of economics: Multimodal likelihoods for instance can be the result of using DSGE models in Macroeconomics [Herbst and Schorfheide, 2014], GARCH models in Finance [Doornik et al., 2000], BLP models in industrial organization [Bajari, 2003], Spatial linear models in Urban Economics [Mardia and Watkins, 1989]. In fact, with many non-linear models, likelihood functions are non-smooth and multimodal [Koop and Potter, 1999]

. A similar observation can be made for models with structural breaks and outliers

Koop and Potter [2000]

. Various attempts have been made to mitigate the problem, generally relying on Bayesian or Quasi-Bayesian Methods using Markov-Chain Monte Carlo (MCMC) or Sequential Monte Carlo (SMC) simulations.

MCMC simulation has the advantage of being able to recover more information about a target function than optimization algorithms [Chernozhukov and Hong, 2003]. The only restriction is that the target function has to be positive. In the case of a non-positive target function, the exponential of the function can be taken. This exponentiated function can be treated as a quasi-likelihood. The draws from the simulation provide a Monte Carlo approximation of the distribution of interest, facilitating inference. Application of this method to economics can be found in several papers across all subfields. For instance, Herbst has proposed several Monte Carlo methods to solve the estimation problems with Macroeconomic DSGE models [Herbst and Schorfheide, 2014]. We find similar approaches in industrial organization for demand estimation models [Jiang et al., 2009], or in time series analysis [Burda, 2015].

The key problem with traditional MCMC simulators using Metropolis-Hastings or Gibbs Sampling methods is that they do not behave necessarily well under multimodality, concentrated mass or complex shapes. The Markov Chain used to simulate the distribution can get trapped in one of the modes if it is not close enough to the other modes and there is no probability mass between them. When facing complex concentrated shapes, these simulators are not performing well exploring the space of interest due to a high level of rejections when trying to move away from the current point in the chain. Some specific simulators have shown interesting properties concerning these problems. The families of Population Monte Carlo (PMC) and more generally SMC simulators solve the problem of multimodality by not having a unique Markov Chain but rather a large set of particles exploring the parameter space

[Cappe et al., 2004, Del Moral et al., 2006, Durham and Geweke, 2013]. The sequential approach of the SMC also partly solves the problem of concentrated mass by allowing the target function to progressively converge toward its final form [Chopin, 2002]. The MCMC simulators using Hamiltonian dynamics, sometimes refereed to as Hamiltonian Monte Carlo (HMC) simulators, have shown to be very effective in exploring the parameter space when the target function has a complex elongated shape or isolated concentrated mass.

Our contribution is to provide a method that is robust to most types of multimodality and complex shapes by combining the advantages of the SMC and the HMC. Moreover, we implement a kernel based resampling method to improve robustness and efficiency. While most econometrics papers describing MCMC or SMC methods are put in a Bayesian framework, we keep the description in a general framework. Using our method, practitioners should be able to perform inference without having to worry about the shape or complexity of their target function.

The advantages of recovering the full shape of the target function are multiple. Multimodality can easily be identified, concentration can be measured and counter-factual checking or prediction can be done by integration over the parameter space. Moreover, the Monte Carlo approach allows for easy variable transformation without the need of derivation of a complicated Jacobian.

Our method can also be used for maximization of a complicated function by simulated annealing: the target quasi-likelihood is of the form , where is chosen by the practitioner. The quasi-likelihood will become concentrated on the maximizers of as [Hwang, 1980]. The use of SMC methods for optimization via simulated annealing has already been detailed and proven effective [Zhou and Chen, 2013].

We will first present a brief review of the various uses of statistical simulation. We will then describe three common simulations methods (Metropolis-Hastings, HMC, and SMC). We will compare their performance using simple examples. We will finally introduce our HSMC method, its properties and present several simple examples as applications.

2 Statistical simulation

Statistical simulation generates samples following a distribution with density

to get an approximation of various quantities depending on that distribution. These quantities are usually moments, modes or quantiles.

To compute population moments, we can use the sample moments which converge under a Law of Large Number to the desired quantity under weak regularity assumptions. For some continuous function

and a Euclidean vector

, we have:

There are several ways to test for the existence of several modes, count them and find their location. A review of standard kernel based methods can be found in Vieu [1996]. Most popular methods based on critical kernel bandwidth are presented in Silverman [1981] and Minnotte [1997]. Another popular approach called excess mass approach can be found in several other papers [Müller and Sawitzki, 1991, Butucea et al., 2007]. The goal of this paper is not to treat mode estimation methods in details and we will leave further readings to the interested reader.

Transformations of the target function can be computed directly by evaluating the transformations of the resulting draws, without the need for analytical derivation or approximation of the Jacobian term which is in many cases very complicated.

3 Common simulation methods

As it is often impossible to draw directly from a distribution, several methods have been developed to draw a sample that approximately follows the target distribution. The researcher only needs to be able to evaluate a kernel of the target distribution density function.

The Metropolis-Hastings (MH) algorithm and the HMC are both Markov Chain Monte Carlo (MCMC) methods. Consider a set of particles whose distribution seeks to approximate an underlying target function of interest . The generic principle of an MH or HMC algorithm is as follows. A particle randomly moves to a new position in the space . After each move, with some probability the new position is accepted and . If the move is rejected, . The passage from to is called an MCMC step. After performing a series of initial steps called burn-in period the Markov Chain should reach its equilibrium distribution which corresponds to the target distribution. We can then use the history of the positions of as a sample approximating the target distribution. The difference between MH and HMC is in the way the random move is done, and the formulas to compute acceptance probabilities at the end of the move.

With SMC methods, particles are moved simultaneously using MCMC steps and resampled at each iteration. These methods have the advantage of being able to recover the shape of multimodal distributions with separate areas of concentrated mass. These distributions usually cannot be recovered properly using standard MCMC methods as the Markov Chains get trapped under one of the modes without being able to move to the other mode if no probability mass connects them.

3.1 Metropolis-Hastings

The Metropolis-Hastings algorithm [Metropolis et al., 1953, Hastings, 1970] is the most common accept-reject MCMC method to approximate a target density proportional to . Starting from an arbitrary point , for each iteration a new particle is drawn from a distribution with density . For instance, this distribution could be a . Then, we set with probability , and otherwise.

After an initial burn-in period of , the values should be approximately distributed folowing the normalized target density.

3.2 Hamiltonian Monte Carlo

As with MH, the purpose of the HMC method is to formulate a Markov chain for which, under certain conditions, a multiple of is the density of the stationary distribution. It relies on Hamiltonian dynamics to move a particle to a new point in the space. This new point, called proposal will then be accepted or rejected as the new value for following a method similar to Metropolis-Hastings rejection step. The movement in the space can be constrained and we then refer to the method as a Constrained Hamiltonian Monte Carlo (CHMC).

To describe the HMC step, we need to define the function with gradient denoted :

We also need to define an auxiliary vector of dimension that will represent the momentum of the particle when moving following Hamiltonian dynamics. Intuitively, the particle will move on a surface where the potential energy depending on the altitude is represented by . When the particle goes up a slope, it will slow down or even turn back. The continuous movement is approximated by a series of steps of size . At the end of the last step, the momentum is reversed to make the proposal symmetric. If we start at the final position with the final reversed momentum, we will find the particle going back to the original position after steps. This ensures reversibility and facilitate the computation of the acceptance probability.

The HMC step proceeds as follow:

  1. define the starting position of the proposal

  2. draw an initial momentum vector

    from a multivariate Gaussian distribution:

  3. update the momentum vector by half a step taking the gradient into account:

  4. Repeat for

    1. update the position by a full step:

    2. update the momentum by a full step, except at the end of the trajectory: if , then

  5. update the momentum vector by half a step:

  6. negate the momentum vector:

  7. compute the acceptance probability:

  8. set with probability , and otherwise

The HMC mutation step needs to be tuned by choosing appropriately the quantities . To learn more about how to choose these quantities, see the review paper by Neal [2011].

When we want to put constraints on some of the dimensions of , we can modify the HSMC method to have the particles to bounce off the constraints as if they were walls. Most of the time, the constraints are of the form or for some dimensions of . The position updating step 4a is then replaced by:

  1. for each dimension of :

    1. update position in dimension :

    2. if is constrained, repeat the following until satisfies all constraints:

      1. if , then and

      2. if , then and

With this modified approach, if the particle passes a constraint ”wall” during a position update, the symmetric of the particle relative to the wall in the space is taken, and the momentum in the constrained dimension is reversed. This approach simulates the particle bouncing off the wall and preserves reversibility. A similar approach can be used for more complex constraints of the form .

3.3 Sequential Monte Carlo

As mentioned before, SMC methods use multiple particles moving in parallel [Del Moral et al., 2006]. While there are many different types of SMC, we are going to describe one of the most popular approaches alternating Importance Resampling and MH steps. The goal is to obtain a sample from a sequence of distributions with densities . The simulator we propose works best when target densities in the sequence are smoother at the beginning and progressively converging toward the final sharper target density. We also require an initial distribution with density that is easy to sample from and covers well the mass of the first distribution in the sequence. An SMC simluator provides us for each step with a set of values that approximately follow the distribution .

A sequence of distributions can be found in many applications relevant to economics. In frequentist econometrics, the sequence can be the likelihood or quasi-likelihood for data collected until time , e.g. . The quasi-likelihood can be built from any estimator maximizing or minimizing a target function such as the Generalized Method of Moments or a Least Squares Estimator [Chernozhukov and Hong, 2003]. The counterpart in the Bayesian framework would be the posterior distribution of the parameter given the data until time . Compared to a standard MCMC approach which requires an evaluation of the target function with every observation at each step, the SMC approach is less computationally intensive. Moreover, adding the observations one or a few at a time creates a desirable tempering effect [Chopin, 2002]. This approach is particularly efficient in large datasets where new observations come regularly as updating the estimator can be done in one step.

A second application is kernel density estimation when observations are added a few at a time and bandwidth is progressively shrunk. An application of this approach can be found later in this paper.

Another application proposed by Neal [2001] shows the benefits of moving progressively from a tractable distribution to a target distribution by geometrically reweighting them : with .

Finally, SMC can be used for maximization using a simulated annealing approach. Our sequence of distributions will then become with increasing to high values as increases.

The algorithm is as follow:

  1. Initialization: Draw particles from

  2. Repeat for

    1. Correction: assign weight to each of the particles

    2. Selection: draw new particles with replacement from the current sample of particles using weights . Give the new particles a weight of .

    3. Mutation: For each particle, perform a MH step as described in section 3.1 to obtain a new sample of particles .

3.4 Examples

We are now going to illustrate the advantages of various algorithms with simple examples. The first example is a Rosenbrock’s banana function and illustrates the problems classical MH face when the function has an elongated shape. The second example is a 6-dimensional normal distribution. This example is designed to show the difference in convergence speeds when dimensionality increases.

3.4.1 Banana function

The Rosenbrock’s banana function (Figure 1) is defined as follow:

Figure 1: Contour plot of the Rosenbrock function

Using the MH algorithm, we run two trials with different for the proposal. When , the acceptance rate is but the chain fails to cover the distribution after 1000 iterations. When , the coverage improves but the acceptance rate decreases to . A visual representation of the paths is provided in Figure 2.

Using the HMC approach, with and steps, the target density is well covered and the acceptance rate is . A visual representation of the paths is provided in Figure 3.

Figure 2: Metropolis-Hastings paths for a Rosenbrock function
Figure 3: Hamiltonian Monte-Carlo path for a Rosenbrock function

3.4.2 6-dimensional normal distribution

Using a as the target, where , we compare the performance of the MH and HMC approach starting the chain at the point . The for the MH method is chosen to get at least a acceptance rate.

In the case of the MH approach, the chain requires a burn in of about 800 iterations to reach the mass of the target density. With the HMC approach, only 50 iterations are required for the chain to converge. A visual representation of the paths is provided in Figure 4.

Figure 4: Comparison of Metropolis-Hastings path (Top) and Hamiltonian Monte-Carlo path (Bottom) for a 6-dimensional normal distribution.

4 Hamiltonian Sequential Monte Carlo

In this section, we are going to describe our method and its properties. The HSMC method replaces the MH step of standard SMC by a Hamiltonian step. The re-sampling is also done using a leave-one-out approximation of the observed distribution of the particles instead of the theoretical distribution as in specific cases the particles do not have the time to converge to their stationary distribution in one step. This method shows good convergence rates when applied to importance resampling [Delyon et al., 2016].

4.1 Algorithm

  1. Initialization: Draw particles from

  2. Repeat for

    1. Correction: assign weight to each of the particles , where is a ”leave-one-out” kernel density estimate.

    2. Selection: draw new particles with replacement from the current sample of particles using weights . Give the new particles a weight of .

    3. Mutation: For each particle, perform a Hamiltonian step as described in section 3.2 to obtain a new sample of particles .

In the initialization phase we obtain a sample of particles distributed according to distribution. For the HSMC method to perform well we need a distribution that covers well the whole space and has mass where the other distributions in the sequence have mass too.

In the loop, before the correction phase, we have particles all weighted to that provides a Monte-Carlo simulation of the distribution . In the correction phase, we reweight them to obtain an approximation of the distribution by importance sampling.

In the selection phase, we perform sampling importance resampling to obtain an approximation of the distribution using particles with equal weights. Note that at the end of the selection phase, we can expect to have several particles sharing the same value.

Finally, in the mutation phase, we explore the space by moving the particles using a Hamiltonian Monte Carlo (HMC) approach. Since is the stationary distribution of our HMC, the particles both before and after the HMC step should approximate . However, in the case do not follow exactly the distribution , performing a HMC step should improve the approximation by the convergence properties of HMC [Neal, 2011].

The main contribution of the paper lies in the use of Hamiltonian dynamics for the mutation phase in a SMC method. In the literature, SMC algorithms are generally found to use a standard Metropolis-Hastings step in their mutation phase. Conversely, HMC methods using multiple particles do not have the resampling phases 2a/2b.

4.2 Algorithm’s properties

Our HSMC method fits into the SMC framework described in Chopin [2004]

and his central limit theorem can be applied to our simulator. Provided that our Hamiltonian mutation step preserves the distribution

, the following convergences hold almost surely as for any measurable function such that the expectations below exists:

The proof that our Hamiltonian transition kernel satisfies the conditions to have as a stationary distribution can be found in the review paper on Hamiltonian Monte Carlo by Neal [2011].

5 Method variations

While the Hamiltonian step is designed to be applied on continuous distributions, it is easy to extend our method for spaces with discrete dimensions. We can split our space in two blocks where includes the dimensions where is continuous and includes the dimensions where is discrete. From this separation into blocks, a standard Metropolis within Gibbs [Gilks et al., 1995] step can be used with the Hamiltonian step being used in the continuous block.

Another variation of the method consists in running the algorithm in parallel for groups of particles. At the end, the sample properties of the groups can be compared. If the properties differ too much from each other, we can suspect a convergence problem. This is the approach taken by Durham and Geweke [2013] in their adaptive SMC simulator.

Finally, since the Hamiltonian step preserves the distribution of interest, multiple steps can be made at each mutation phase. This solution increases the performance of the method when particles are not exploring the space fast enough.

6 Examples

We used the HSMC method on kernel density estimates of two functions known to challenge classical optimizers and MCMC simulators. The first function is created for this paper and called the smiley function. It is a mixture of 3 Rosenbrock smile functions often found as an example to show the limitations of MCMC algorithms. The second function is a dropwave function. Both functions present multimodality and follow complex shapes. For ease of visualization, we kept the functions two-dimentional.

We chose to simulate Gaussian kernel density estimates as they can mimic the progressive convergence of several target functions when data are added progressively. In our case, data are added by blocks of points and the bandwidth of the kernel density estimate is . This rate has been chosen as it is the order of the bandwidth reduction rate when using optimal bandwidth for most distributions [Yatchew, 1998].

We also used the algorithm on a non-linear logit model where the log-likelihood shows both multimodality and complex shapes.

6.1 Smiley kernel density estimate

We generated a sample of 2048 data points with coordinates using a density proportional to the following function:

The contour plot and 3D plot of the function used can be found in Figure 5.

Figure 5: contour plot and 3D plot of the smiley function

We can easily see the multimodality and the elongated shapes on the function plots. The data generated match the shape of the smiley function and have been represented in Figure 6.

Figure 6: Generated data

The target function we want to simulate is a kernel density estimate using these data points. We used 4 independent groups of 512 particles to simulate the kernel density for a total of 2048 particles. The data points are partitioned in 20 blocks of 100 points and 1 block of 48 points, for a total of 21 blocks. Consequently, the sequence will include target functions with the addition of the initial density. The initial density used is a bivariate normal distribution with parameters . The HMC tunning parameters used are , where is the identity matrix. The results of the HSMC simulation as well as a smoothed histogram of the simulated points have been represented in Figure 7. We can see that the HSMC method provided satisfying results and successfully converged with only 21 iterations. The lowest mutations acceptance rate we observed across several runs was .

Figure 7: HSMC Simulation results

To illustrate the performance of our HSMC method, we also tried to simulate the same sequence of functions using a parallel HMC approach with 21 iterations on the same kernel density. To do so we performed the mutation phase 21 times on . The result represented in Figure 8 show that several particles failed to converge. Moreover, there is no guarantee that the particles around each mode are distributed according to the mass around these modes in the target function. As particles tend to converge to the closest mode, it is possible to have a first mode with twice the mass of a second mode but only half of the particles around it.

Figure 8: Parallel HMC Simulation results

6.2 Constrained dropwave kernel density estimate

We generated a sample of 4096 data points with coordinates using a density proportional to the following function defined on :

The contour plot and 3D plot of the function used can be found in Figure 9.

Figure 9: contour plot and 3D plot of the dropwave function

The data generated match the shape of the dropwave function and have been represented in Figure 10.

Figure 10: Generated data

The target function we want to simulate is a constrained kernel density estimate using these data points. The Gaussian kernel is defined on but we want to limit the domain to . To do so we keep the Gaussian kernel as is but use the constrained HMC method during our mutation phase. We used 4 independent groups of 512 particles to simulate the kernel density for a total of 2048 particles. The data points are partitioned in 40 blocks of 100 points and 1 block of 96 points, for a total of 41 blocks. Consequently, the sequence will include target functions with the addition of the initial density. The initial density used is a bivariate normal with parameters . The HMC tunning parameters used are , where is the identity matrix. The results of the HSMC simulation as well as a smoothed histogram of the simulated points are presented in Figure 11. We can see that the constrained HSMC method also provided satisfying results and successfully converged with 41 iterations. The lowest mutations acceptance rate we observed across several runs was .

Figure 11: CHSMC Simulation results

6.3 Non-linear logit

We use a Logit discrete choice model [Train, 2009] where the deterministic part of the utility function is non-linear. In this hypothetical experiment, an individual who has been given a black t-shirt is presented with another t-shirt of a different color . The individual can exchange it against his own t-shirt or keep his current t-shirt. The individual chooses or in order to maximize utility represented by:

where and are the standard extreme value errors. The deterministic part of the utility function is represented by the fuction over the color palette :

This function is represented on Figure 12 for the values and .

Figure 12: ,

The choice probabilities for and can be shown to equal:

We can note that as , the choice probabilities converge to . We simulated experiments by drawing randomly an uniformly on .

Figure 13: Theoretical Likelihood
Figure 14: Theoretical Loglikelihood

A first look at the shape of the likelihood function represented in Figure 13 seems to indicate that the likelihood behaves nicely with a global maximum around the true value. However, examining the log-likelihood on Figure 14 reveals the multimodality of the function.

The MH and HMC methods both fail ton converge after iterrations and do not get close to the global-maximum on numerous trials.

For the HMC method, the sequence of functions is created by adding the observations progressively in the likelihood function, adding observations for each iteration. We compare the result to a standard SMC algorithm.

Figure 15: Smoothed HSMC simulation
Figure 16: Smoothed SMC simulation

We can see that the HSMC method (Figure 16) could converge in only 4 iterations, whereas the SMC method (Figure 16) couldn’t converge completely. First, in the SMC case, iterations might constitute a too short sequence to obtain convergence. Adding more intermediate functions in the sequence might improve convergence. Second, we notice that some of the particles in the SMC case were trapped in a mode around the point and these particles are unlikely to be moved to the main area of mass concentration with more iterations. With the HSMC algorithm, if some particles are trapped in a different mode and misrepresent the mass in the underlying function, they get relocated during the resampling phase. This is the consequence of using as a weight instead of the theoretical used in standard SMC.

7 Future work and conclusion

We have shown that the HSMC algorithm is able to approximate by simulation many functions with irregularities. The encouraging performance of the algorithm finds direct potential applications in empirical work where standard MCMC methods have shown limitations. One of this potential application is the use of optimal instruments for non-linear BLP models in industrial organization [Reynaert and Verboven, 2014].

References

  • Bajari [2003] P. Bajari. Discussion of Allenby, Chen and Yang. Quantitative Marketing and Economics, 1(3):277–283, 2003.
  • Burda [2015] M. Burda. Constrained hamiltonian monte carlo in BEKK GARCH with targeting. Journal of Time Series Econometrics, 7(1):95–113, 2015.
  • Butucea et al. [2007] C. Butucea, M. Mougeot, K. Tribouley, et al. Functional approach for excess mass estimation in the density model. Electronic Journal of Statistics, 1:449–472, 2007.
  • Cappe et al. [2004] O. Cappe, A. Guillin, J.-M. Marin, and C. P. Robert. Population monte carlo. Journal of Computational and Graphical Sstatistics, 13:907–929, 2004.
  • Chernozhukov and Hong [2003] V. Chernozhukov and H. Hong. An MCMC approach to classical estimation. Journal of Econometrics, 115(2):293–346, 2003.
  • Chopin [2002] N. Chopin. A sequential particle filter method for static models. Biometrika, 89(3):539–552, 2002.
  • Chopin [2004] N. Chopin.

    Central limit theorem for sequential Monte Carlo methods and its application to bayesian inference.

    Annals of statistics, pages 2385–2411, 2004.
  • Del Moral et al. [2006] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3):411–436, 2006.
  • Delyon et al. [2016] B. Delyon, F. Portier, et al. Integral approximation by kernel smoothing. Bernoulli, 22(4):2177–2208, 2016.
  • Doornik et al. [2000] J. A. Doornik, M. Ooms, et al. Multimodality and the GARCH likelihood. In World Congress of the Econometric Society, Seattle, August, 2000.
  • Durham and Geweke [2013] G. Durham and J. Geweke. Adaptive sequential posterior simulators for massively parallel computing environments. Bayesian Model Comparison (Advances in Econometrics); Jeliazkov, I., Poirier, DJ, Eds, pages 1–44, 2013.
  • Gilks et al. [1995] W. R. Gilks, N. Best, and K. Tan. Adaptive rejection metropolis sampling within gibbs sampling. Applied Statistics, pages 455–472, 1995.
  • Gourieroux and Monfort [1995] C. Gourieroux and A. Monfort. Statistics and econometric models, volume 1. Cambridge University Press, 1995.
  • Hastings [1970] W. K. Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1):97–109, 1970.
  • Herbst and Schorfheide [2014] E. Herbst and F. Schorfheide. Sequential Monte Carlo sampling for DSGE models. Journal of Applied Econometrics, 29(7):1073–1098, 2014.
  • Hwang [1980] C.-R. Hwang. Laplace’s method revisited: weak convergence of probability measures. The Annals of Probability, pages 1177–1182, 1980.
  • Jiang et al. [2009] R. Jiang, P. Manchanda, and P. E. Rossi. Bayesian analysis of random coefficient logit models using aggregate data. Journal of Econometrics, 149(2):136–148, 2009.
  • Koop and Potter [1999] G. Koop and S. M. Potter. Bayes factors and nonlinearity: evidence from economic time series. Journal of Econometrics, 88(2):251–281, 1999.
  • Koop and Potter [2000] G. Koop and S. M. Potter. Nonlinearity, structural breaks or outliers in economic time series. Nonlinear Econometric Modeling in Time Series Analysis, pages 61–78, 2000.
  • Mardia and Watkins [1989] K. Mardia and A. Watkins. On multimodality of the likelihood in the spatial linear model. Biometrika, 76(2):289–295, 1989.
  • Metropolis et al. [1953] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6):1087–1092, 1953.
  • Minnotte [1997] M. C. Minnotte. Nonparametric testing of the existence of modes. The Annals of Statistics, pages 1646–1660, 1997.
  • Müller and Sawitzki [1991] D. W. Müller and G. Sawitzki. Excess mass estimates and tests for multimodality. Journal of the American Statistical Association, 86(415):738–746, 1991.
  • Neal [2001] R. M. Neal. Annealed importance sampling. Statistics and Computing, 11(2):125–139, 2001.
  • Neal [2011] R. M. Neal. MCMC using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2:113–162, 2011.
  • Reynaert and Verboven [2014] M. Reynaert and F. Verboven. Improving the performance of random coefficients demand models: the role of optimal instruments. Journal of Econometrics, 179(1):83–98, 2014.
  • Silverman [1981] B. W. Silverman. Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society. Series B (Methodological), pages 97–99, 1981.
  • Train [2009] K. E. Train. Discrete choice methods with simulation. Cambridge university press, 2009.
  • Vieu [1996] P. Vieu. A note on density mode estimation. Statistics & probability letters, 26(4):297–307, 1996.
  • Yatchew [1998] A. Yatchew. Nonparametric regression techniques in economics. Journal of Economic Literature, 36(2):669–721, 1998.
  • Zhou and Chen [2013] E. Zhou and X. Chen. Sequential Monte Carlo simulated annealing. Journal of Global Optimization, 55(1):101–124, 2013.