Publication:
Variationally inferred sampling through a refined bound

Loading...
Thumbnail Image
Full text at PDC
Publication Date
2021-01-19
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
MDPI
Citations
Google Scholar
Research Projects
Organizational Units
Journal Issue
Abstract
In this work, a framework to boost the efficiency of Bayesian inference in probabilistic models is introduced by embedding a Markov chain sampler within a variational posterior approximation. We call this framework “refined variational approximation”. Its strengths are its ease of implementation and the automatic tuning of sampler parameters, leading to a faster mixing time through automatic differentiation. Several strategies to approximate evidence lower bound (ELBO) computation are also introduced. Its efficient performance is showcased experimentally using statespace models for time-series data, a variational encoder for density estimation and a conditional variational autoencoder as a deep Bayes classifier.
Description
Unesco subjects
Keywords
Citation
1. Blei D.M., Kucukelbir A., McAuliffe J.D. Variational inference: A review for statisticians. J. Am. Stat. Assoc. 2017;112:859–877. doi: 10.1080/01621459.2017.1285773. 2. Insua D., Ruggeri F., Wiper M. Bayesian Analysis of Stochastic Process Models. Volume 978 John Wiley & Sons; New York, NY, USA: 2012. 3. Alquier P. Approximate Bayesian Inference. Entropy. 2020;22:1272. doi: 10.3390/e22111272. 4. Kucukelbir A., Tran D., Ranganath R., Gelman A., Blei D.M. Automatic differentiation variational inference. J. Mach. Learn. Res. 2017;18:430–474. 5. Riquelme C., Johnson M., Hoffman M. Failure modes of variational inference for decision making; Proceedings of the Prediction and Generative Modeling in RL Workshop (AAMAS, ICML, IJCAI); Stockholm, Sweden. 15 July 2018. 6. Andrieu C., Doucet A., Holenstein R. Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B Stat. Methodol. 2010;72:269–342. doi: 10.1111/j.1467-9868.2009.00736.x. 7. Neal R.M. Handbook of Markov Chain Monte Carlo. Volume 2. CRC Press; Boca Raton, FL, USA: 2011. MCMC using Hamiltonian dynamics; p. 2. 8. Van Ravenzwaaij D., Cassey P., Brown S.D. A simple introduction to Markov Chain Monte–Carlo sampling. Psychon. Bull. Rev. 2018;25:143–154. doi: 10.3758/s13423-016-1015-8. 9. Nalisnick E., Hertel L., Smyth P. Approximate inference for deep latent gaussian mixtures; Proceedings of the NIPS Workshop on Bayesian Deep Learning; Barcelona, Spain. 10 December 2016. 10. Salimans T., Kingma D., Welling M. Markov chain Monte Carlo and variational inference: Bridging the gap; Proceedings of the International Conference on Machine Learning; Lille, France. 6–11 July 2015; pp. 1218–1226. 11. Tran D., Ranganath R., Blei D.M. The variational Gaussian process; Proceedings of the 4th International Conference on Learning Representations; San Juan, Puerto Rico. 2–4 May 2016. 12. Wood F., Meent J.W., Mansinghka V. A new approach to probabilistic programming inference; Proceedings of the Artificial Intelligence and Statistics; Reykjavik, Iceland. 22–25 April 2014; pp. 1024–1032. 13. Ge H., Xu K., Ghahramani Z. Turing: A language for flexible probabilistic inference; Proceedings of the International Conference on Artificial Intelligence and Statistics; Lanzarote, Spain. 9–11 April 2018; pp. 1682–1690. 14. Papaspiliopoulos O., Roberts G.O., Sköld M. A general framework for the parametrization of hierarchical models. Stat. Sci. 2007;22:59–73. doi: 10.1214/088342307000000014. [CrossRef] 15. Hoffman M., Sountsov P., Dillon J.V., Langmore I., Tran D., Vasudevan S. Neutra-lizing bad geometry in hamiltonian Monte Carlo using neural transport. arXiv. 20191903.03704 16. Li S.H., Wang L. Neural Network Renormalization Group. Phys. Rev. Lett. 2018;121:260601. doi: 10.1103/PhysRevLett.121.260601. 17. Parno M., Marzouk Y. Transport map accelerated markov chain monte carlo. arXiv. 2014 doi: 10.1137/17M1134640.1412.5492 18. Rezende D., Mohamed S. Variational Inference with Normalizing Flows; Proceedings of the International Conference on Machine Learning; Lille, France. 6–11 July 2015; pp. 1530–1538. [Google Scholar] 19. Chen C., Li C., Chen L., Wang W., Pu Y., Carin L. Continuous-Time Flows for Efficient Inference and Density Estimation; Proceedings of the International Conference on Machine Learning; Vienna, Austria. 25–31 July 2018. 20. Liu G., Liu Y., Guo M., Li P., Li M. Variational inference with Gaussian mixture model and householder flow. Neural Netw. 2019;109:43–55. doi: 10.1016/j.neunet.2018.10.002. 21. Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. Generative adversarial nets; Proceedings of the Advances in Neural Information Processing Systems; Montreal, QC, USA. 8–13 December 2014; pp. 2672–2680. [Google Scholar] 22. Mandt S., Hoffman M.D., Blei D.M. Stochastic Gradient Descent as Approximate Bayesian Inference. J. Mach. Learn. Res. 2017;18:4873–4907. 23. Huszár F. Variational inference using implicit distributions. arXiv. 20171702.08235 24. Titsias M.K., Ruiz F. Unbiased Implicit Variational Inference; Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics; Naha, Japan. 16–18 April 2019; pp. 167–176. 25. Yin M., Zhou M. Semi-Implicit Variational Inference. arXiv. 20181805.11183 26. Hoffman M.D. Learning deep latent Gaussian models with Markov chain Monte Carlo; Proceedings of the 34th International Conference on Machine Learning; Sydney, Australia. 22–31 July 2017; pp. 1510–1519. 27. Feng Y., Wang D., Liu Q. Learning to draw samples with amortized stein variational gradient descent. arXiv. 20171707.06626 28. Cremer C., Li X., Duvenaud D. Inference suboptimality in variational autoencoders. arXiv. 20181801.03558 29. Ruiz F., Titsias M. A Contrastive Divergence for Combining Variational Inference and MCMC; Proceedings of the International Conference on Machine Learning; Long Beach, CA, USA. 10–15 June 2019; pp. 5537–5545. 30. Dai B., Dai H., He N., Liu W., Liu Z., Chen J., Xiao L., Song L. Coupled variational bayes via optimization embedding; Proceedings of the Advances in Neural Information Processing Systems; Montreal, QC, USA. 3–8 December 2018; pp. 9690–9700. 31. Fang L., Li C., Gao J., Dong W., Chen C. Implicit Deep Latent Variable Models for Text Generation. arXiv. 20191908.11527 32. Welling M., Teh Y.W. Bayesian learning via stochastic gradient Langevin dynamics; Proceedings of the 28th International Conference on Machine Learning (ICML-11); Montreal, QC, USA. 11–13 June 2014; pp. 681–688. 33. Li C., Chen C., Carlson D., Carin L. Preconditioned stochastic gradient Langevin dynamics for deep neural networks; Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence; Phoenix, AZ, USA. 12–17 February 2016. 34. Li C., Chen C., Fan K., Carin L. High-order stochastic gradient thermostats for Bayesian learning of deep models; Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence; Phoenix, AZ, USA. 12–17 February 2016. 35. Abbati G., Tosi A., Osborne M., Flaxman S. Adageo: Adaptive geometric learning for optimization and sampling; Proceedings of the International Conference on Artificial Intelligence and Statistics; Canary Islands, Spain. 9–11 April 2018; pp. 226–234. 36. Gallego V., Insua D.R. Stochastic Gradient MCMC with Repulsive Forces. arXiv. 20181812.00071 37. Ma Y.A., Chen T., Fox E. A complete recipe for stochastic gradient MCMC; Proceedings of the Advances in Neural Information Processing Systems; Montreal, QC, Canada. 7–12 December 2015; pp. 2917–2925. 38. Baydin A.G., Pearlmutter B.A., Radul A.A., Siskind J.M. Automatic differentiation in machine learning: A survey. J. Mach. Learn. Res. 2017;18:5595–5637. 39. Pavliotis G. Texts in Applied Mathematics. Springer; New York, NY, USA: 2014. Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations. 40. Liu Q., Wang D. Stein variational gradient descent: A general purpose Bayesian inference algorithm; Proceedings of the Advances In Neural Information Processing Systems; Barcelona, Spain. 5–10 December 2016; pp. 2378–2386. 41. Kingma D.P., Ba J. Adam: A method for stochastic optimization. arXiv. 20141412.6980 42. Graves T.L. Automatic step size selection in random walk Metropolis algorithms. arXiv. 20111103.5986 43. Brooks S., Gelman A., Jones G., Meng X.L. Handbook of Markov Chain Monte Carlo. CRC Press; Boca Raton, FL, USA: 2011. 44. Murray I., Salakhutdinov R. Notes on the KL-Divergence between a Markov Chain and Its Equilibrium Distribution; 2008. [(accessed on 12 June 2020)]; Available online: http://www.cs.toronto.edu/~rsalakhu/papers/mckl.pdf 45. Franceschi L., Donini M., Frasconi P., Pontil M. Forward and reverse gradient-based hyperparameter optimization; Proceedings of the 34th International Conference on Machine Learning; Sydney, Australia. 22–31 July 2017; pp. 1165–1173. 46. Paszke A., Gross S., Massa F., Lerer A., Bradbury J., Chanan G., Killeen T., Lin Z., Gimelshein N., Antiga L., et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In: Wallach H., Larochelle H., Beygelzimer A., d’Alché-Buc F., Fox E., Garnett R., editors. Advances in Neural Information Processing Systems 32. Curran Associates, Inc.; Granada, Spain: 2019. pp. 8024–8035. 47. Rabiner L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE. 1989;77:257–286. doi: 10.1109/5.18626. 48. Zarchan P., Musoff H. Fundamentals of Kalman filtering: A Practical Approach. American Institute of Aeronautics and Astronautics, Inc.; Washington, DC, USA: 2013. 49. Gneiting T., Raftery A.E. Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 2007;102:359–378. doi: 10.1198/016214506000001437. 50. Keeling C.D. Atmospheric Carbon Dioxide Record from Mauna Loa. Scripps Institution of Oceanography, The University of California; La Jolla, CA, USA: 2005. 51. Kingma D.P., Welling M. Auto-encoding variational bayes. arXiv. 20131312.6114 52. LeCun Y., Cortes C. MNIST handwritten Digit Database. [(accessed on 12 May 2020)]; Available online: http://yann.lecun.com/exdb/mnist/ 53. Xiao H., Rasul K., Vollgraf R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv. 20171708.07747 54. Shi J., Sun S., Zhu J. A Spectral Approach to Gradient Estimation for Implicit Distributions; Proceedings of the International Conference on Machine Learning; Vienna, Austria. 25–31 July 2018; pp. 4651–4660. 55. Duvenaud D., Maclaurin D., Adams R. Early stopping as nonparametric variational inference; Proceedings of the Artificial Intelligence and Statistics; Cadiz, Spain. 9–11 May 2016; pp. 1070–1077. 56. Lunn D.J., Thomas A., Best N., Spiegelhalter D. WinBUGS-a Bayesian modelling framework: Concepts, structure, and extensibility. Stat. Comput. 2000;10:325–337. doi: 10.1023/A:1008929526011. 57. Carpenter B., Gelman A., Hoffman M.D., Lee D., Goodrich B., Betancourt M., Brubaker M., Guo J., Li P., Riddell A. Stan: A probabilistic programming language. J. Stat. Softw. 2017;76 doi: 10.18637/jss.v076.i01. 58. Tran D., Hoffman M.W., Moore D., Suter C., Vasudevan S., Radul A. Simple, distributed, and accelerated probabilistic programming; Proceedings of the Advances in Neural Information Processing Systems; Montreal, QC, Canada. 3–8 December 2018; pp. 7609–7620. 59. Bingham E., Chen J.P., Jankowiak M., Obermeyer F., Pradhan N., Karaletsos T., Singh R., Szerlip P., Horsfall P., Goodman N.D. Pyro: Deep Universal Probabilistic Programming. arXiv. 20181810.09538 60. West M., Harrison J. Bayesian Forecasting and Dynamic Models. Springer; New York, NY, USA: 2006.
Collections