Statistical and Computational Guarantees for the Baum-Welch Algorithm

by   Fanny Yang, et al.

The Hidden Markov Model (HMM) is one of the mainstays of statistical modeling of discrete time series, with applications including speech recognition, computational biology, computer vision and econometrics. Estimating an HMM from its observation process is often addressed via the Baum-Welch algorithm, which is known to be susceptible to local optima. In this paper, we first give a general characterization of the basin of attraction associated with any global optimum of the population likelihood. By exploiting this characterization, we provide non-asymptotic finite sample guarantees on the Baum-Welch updates, guaranteeing geometric convergence to a small ball of radius on the order of the minimax rate around a global optimum. As a concrete example, we prove a linear rate of convergence for a hidden Markov mixture of two isotropic Gaussians given a suitable mean separation and an initialization within a ball of large radius around (one of) the true parameters. To our knowledge, these are the first rigorous local convergence guarantees to global optima for the Baum-Welch algorithm in a setting where the likelihood function is nonconvex. We complement our theoretical results with thorough numerical simulations studying the convergence of the Baum-Welch algorithm and illustrating the accuracy of our predictions.


page 1

page 2

page 3

page 4


Nonasymptotic control of the MLE for misspecified nonparametric hidden Markov models

We study the problem of estimating an unknown time process distribution ...

Statistical guarantees for the EM algorithm: From population to sample-based analysis

We develop a general framework for proving rigorous guarantees on the pe...

Maximum Entropy Estimator for Hidden Markov Models: Reduction to Dimension 2

In the paper, we introduce the maximum entropy estimator based on 2-dime...

Kullback-Leibler Divergence and Akaike Information Criterion in General Hidden Markov Models

To characterize the Kullback-Leibler divergence and Fisher information i...

Sharp global convergence guarantees for iterative nonconvex optimization: A Gaussian process perspective

We consider a general class of regression models with normally distribut...

The minimax risk in testing the histogram of discrete distributions for uniformity under missing ball alternatives

We consider the problem of testing the fit of a discrete sample of items...

Some Insights About the Small Ball Probability Factorization for Hilbert Random Elements

Asymptotic factorizations for the small-ball probability (SmBP) of a Hil...

Please sign up or login with your details

Forgot password? Click here to reset