Learning Summary Statistic for Approximate Bayesian Computation via Deep Neural Network

10/08/2015 ∙ by Bai Jiang, et al. ∙ 0

Approximate Bayesian Computation (ABC) methods are used to approximate posterior distributions in models with unknown or computationally intractable likelihoods. Both the accuracy and computational efficiency of ABC depend on the choice of summary statistic, but outside of special cases where the optimal summary statistics are known, it is unclear which guiding principles can be used to construct effective summary statistics. In this paper we explore the possibility of automating the process of constructing summary statistics by training deep neural networks to predict the parameters from artificially generated data: the resulting summary statistics are approximately posterior means of the parameters. With minimal model-specific tuning, our method constructs summary statistics for the Ising model and the moving-average model, which match or exceed theoretically-motivated summary statistics in terms of the accuracies of the resulting posteriors.



There are no comments yet.


page 30

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


  • Asmussen and Glynn (2007) Asmussen, S., and Glynn, P. W. (2007). Stochastic simulation: Algorithms and analysis (Vol. 57). Springer Science & Business Media.
  • Beaumont, Zhang, and Balding (2002) Beaumont, M. A., Zhang, W., and Balding, D. J. (2002). Approximate Bayesian computation in population genetics. Genetics, 162(4), 2025-2035.
  • Toni, Welch, Strelkowa, Ipsen, and Stumpf (2009) Toni, T., Welch, D., Strelkowa, N., Ipsen, A. and Stumpf, M. P. (2009). Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of the Royal Society Interface, 6(31), 187-202.
  • Lopes and Beaumont (2010) Lopes, J. S. and Beaumont, M. A. (2010). ABC: a useful Bayesian tool for the analysis of population data. Infection, Genetics and Evolution, 10(6), 825-832.
  • Beaumont (2010) Beaumont, M. A. (2010). Approximate Bayesian computation in evolution and ecology. Annual review of ecology, evolution, and systematics, 41, 379-406.
  • Csilléry, Blum, Gaggiotti and François (2010) Csilléry, K., Blum, M. G., Gaggiotti, O. E. and François, O. (2010). Approximate Bayesian computation (ABC) in practice. Trends in ecology & evolution, 25(7), 410-418.
  • Marin, Pudlo, Robert, and Ryder (2012) Marin, J. M., Pudlo, P., Robert, C. P. and Ryder, R. J. (2012). Approximate Bayesian computational methods. Statistics and Computing, 22(6), 1167-1180.
  • Sunnåker, Busetto, Numminen, Corander, Foll, and Dessimoz (2013) Sunnåker, M., Busetto, A. G., Numminen, E., Corander, J., Foll, M. and Dessimoz, C. (2013). Approximate Bayesian computation. PLoS Comput. Biol., 9(1), e1002803.
  • Tavaré, Balding, Griffiths, and Donnelly (1997) Tavaré, S., Balding, D. J., Griffiths, R. C. and Donnelly, P. (1997). Inferring coalescence times from DNA sequence data. Genetics, 145(2), 505-518.
  • Fu and Li (1997) Fu, Y. X. and Li, W. H. (1997). Estimating the age of the common ancestor of a sample of DNA sequences. Molecular biology and evolution, 14(2), 195-199.
  • Weiss, and von Haeseler (1998) Weiss, G. and von Haeseler, A. (1998). Inference of population history using a likelihood approach. Genetics, 149(3), 1539-1546.
  • Pritchard, Seielstad, Perez-Lezaun, and Feldman (1999) Pritchard, J. K., Seielstad, M. T., Perez-Lezaun, A., and Feldman, M. W. (1999). Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular biology and evolution, 16(12), 1791-1798.
  • Blum, Nunes, Prangle, and Sisson (2013) Blum, M. G., Nunes, M. A., Prangle, D., Sisson, S. A. (2013). A comparative review of dimension reduction methods in approximate Bayesian computation. Statistical Science, 28(2), 189-208.
  • Kolmogorov (1942) Kolmogorov, A. N. (1942) Definition of center of dispersion and measure of accuracy from a finite number of observations. Izv. Akad. Nauk S.S.S.R. Ser. Mat., 6, 3-32 (in Russian).
  • Lehmann and Casella (1998) Lehmann, E. L. and Casella, G. (1998). Theory of point estimation (Vol. 31). Springer Science & Business Media.
  • Marjoram, Molitor, Plagnol, and Tavaré (2003) Marjoram, P., Molitor, J., Plagnol, V., and Tavaré, S. (2003). Markov chain Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences, 100(26), 15324-15328.
  • Sisson, Fan, and Tanaka (2007) Sisson, S. A., Fan, Y. and Tanaka, M. M. (2007). Sequential monte carlo without likelihoods. Proceedings of the National Academy of Sciences, 104(6), 1760-1765.
  • Joyce and Marjoram (2008) Joyce, P. and Marjoram, P. (2008). Approximately sufficient statistics and Bayesian computation. Statistical applications in genetics and molecular biology, 7(1).
  • Nunes and Balding (2010) Nunes, M. A. and Balding, D. J. (2010). On optimal selection of summary statistics for approximate Bayesian computation. Statistical applications in genetics and molecular biology, 9(1).
  • Wegmann, Leuenberger, and Excoffier (2009) Wegmann, D., Leuenberger, C. and Excoffier, L. (2009). Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics, 182(4), 1207-1218.
  • Fearnhead and Prangle (2012) Fearnhead, P. and Prangle, D. (2012). Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(3), 419-474.
  • Blum and François (2010) Blum, M. G. and François, O. (2010). Non-linear regression models for Approximate Bayesian Computation. Statistics and Computing, 20(1), 63-73.
  • LeCun, Bottou, Bengio, and Haffner (1998) LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
  • Hinton and Salakhutdinov (2006) Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507.
  • Hinton, Osindero, and Teh (2006) Hinton, G. E., Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554.
  • Bengio, Courville, and Vincent (2013) Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1798-1828.
  • Schmidhuber (2015) Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117.
  • Faragó and Lugosi (1993) Faragó, A., and Lugosi, G. (1993). Strong universal consistency of neural network classifiers. Information Theory, IEEE Transactions on, 39(4), 1146-1151.
  • Sutskever and Hinton (2008) Sutskever, I., and Hinton, G. E. (2008). Deep, narrow sigmoid belief networks are universal approximators. Neural Computation, 20(11), 2629-2636.
  • Le Roux and Bengio (2010)

    Le Roux, N., and Bengio, Y. (2010). Deep belief networks are compact universal approximators.

    Neural computation, 22(8), 2192-2207.
  • Caruana, Lawrence, and Giles (2001) Caruana, R., Lawrence, S. and Giles, C.L. (2000). Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference, 13, 402-408.
  • Nowlan, and Hinton (1992) Nowlan, S. J., and Hinton, G. E. (1992). Simplifying neural networks by soft weight-sharing. Neural computation, 4(4), 473-493.
  • Ng (2004)

    Ng, A. Y. (2004, July). Feature selection, L1 vs. L2 regularization, and rotational invariance. In

    Proceedings of the twenty-first international conference on Machine learning (p. 78). ACM.
  • Srivastava, Hinton, Krizhevsky, Sutskever, and Salakhutdinov (2014) Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929-1958.
  • Onsager (1944) Onsager, L. (1944). Crystal statistics. I. A two-dimensional model with an order-disorder transition. Physical Review, 65(3-4), 117.
  • Landau (1976) Landau, D. P. (1976). Finite-size behavior of the Ising square lattice. Physical Review B, 13(7), 2997.