Publication:
The unreasonable effectiveness of small neural ensembles in high-dimensional brain

Loading...
Thumbnail Image
Full text at PDC
Publication Date
2019
Authors
Gorban, Alexander N.
Tyukin, Ivan Y.
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Elsevier
Citations
Google Scholar
Research Projects
Organizational Units
Journal Issue
Abstract
Complexity is an indisputable, well-known, and broadly accepted feature of the brain. Despite the apparently obvious and widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain. In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea of the blessing of dimensionality becomes gradually more and more popular. Ensembles of non-interacting or weakly interacting simple units prove to be an effective tool for solving essentially multidimensional and apparently incomprehensible problems. This approach is especially useful for one-shot (non-iterative) correction of errors in large legacy artificial intelligence systems and when the complete re-training is impossible or too expensive. These simplicity revolutions in the era of complexity have deep fundamental reasons grounded in geometry of multidimensional data spaces. To explore and understand these reasons we revisit the background ideas of statistical physics. In the course of the 20th century they were developed into the concentration of measure theory. The Gibbs equivalence of ensembles with further generalizations shows that the data in high-dimensional spaces are concentrated near shells of smaller dimension. New stochastic separation theorems reveal the fine structure of the data clouds. We review and analyse biological, physical, and mathematical problems at the core of the fundamental question: how can high-dimensional brain organise reliable and fast learning in high-dimensional world of data by simple tools? To meet this challenge, we outline and setup a framework based on statistical physics of data. Two critical applications are reviewed to exemplify the approach: one-shot correction of errors in intellectual systems and emergence of static and associative memories in ensembles of single neurons. Error correctors should be simple; not damage the existing skills of the system; allow fast non-iterative learning and correction of new mistakes without destroying the previous fixes. All these demands can be satisfied by new tools based on the concentration of measure phenomena and stochastic separation theory. We show how a simple enough functional neuronal model is capable of explaining: i) the extreme selectivity of single neurons to the information content of high-dimensional data, ii) simultaneous separation of several uncorrelated informational items from a large set of stimuli, and iii) dynamic learning of new items by associating them with already “known” ones. These results constitute a basis for organisation of complex memories in ensembles of single neurons.
Description
Keywords
Citation
[1] Amaral DG, Witter MP. The three-dimensional organization of the hippocampal formation: a review of anatomical data. Neuroscience 1989;31:571–91. [2] Anderson J, Belkin M, Goyal N, Rademacher L, Voss J. The more, the merrier: the blessing of dimensionality for learning large Gaussian mixtures. J. Mach. Learn. Res. Workshop Conf. Proc. 2014;35:1–30. [3] Averbeck BB, Latham PE, Pouget A. Neural correlations, population coding and computation. Nat Rev Neurosci 2006;7(5):358–66. [4] Ball K. An elementary introduction to modern convex geometry. Flavors of geometry, vol. 31. Cambridge, UK: MSRI Publications; 1997. [5] Barlow HB. Single units and sensation: a neuron doctrine for perceptual psychology? Perception 1972;1(4):371–94. [6] Benito N, Fernandez-Ruiz A, Makarov VA, Makarova J, Korovaichuk A, Herreras O. Spatial modules of coherent activity in pathway-specific LFPs in the hippocampus reflect topology and different modes of presynaptic synchronization. Cereb Cortex 2014;24(7):1738–52. [7] Benito N, Martin-Vazquez G, Makarova J, Makarov VA, Herreras O. The right hippocampus leads the bilateral integration of gamma-parsed lateralized information. eLife 2016;5:e16658. https://doi.org/10.7554/eLife.16658. [8] Bowers JS. On the biological plausibility of grandmother cells: implications for neural network theories in psychology and neuroscience. Psychol Rev 2009;116(1):220–51. [9] Bowers JS. More on grandmother cells and the biological implausibility of PDP models of cognition: a reply to Plaut and McClelland (2010) and Quian Quiroga and Kreiman (2010). Psychol Rev 2010;117(1):300–8. [10] Brock W, Lakonishok J, LeBaron B. Simple technical trading rules and the stochastic properties of stock returns. J Finance 1992;47(5):1731–64. [11] Burton R. NOTTINGHAM video. https://youtu.be/SJbhOJQCSuQ, 2016. A test video for pedestrians detection taken from the streets of Nottingham by an action camera. [12] Cangelosi R, Goriely A. Component retention in principal component analysis with application to cDNA microarray data. Biol Direct 2007;2:2. https://doi.org/10.1186/1745-6150-2-2. [13] Cerf M, Thiruvengadam N, Mormann F, Kraskov A, Quian Quiroga R, Koch C, et al. On-line, voluntary control of human temporal lobe neurons. Nature 2010;467(7319):1104–8. [14] Cucker F, Smale S. On the mathematical foundations of learning. Bull Am Math Soc 2002;39(1):1–49. [15] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proc. of the IEEE conference on computer vision and pattern recognition (CVPR 2005), vol. 1. IEEE Press; 2005. p. 886–93. [16] Donoho DL. High-dimensional data analysis: the curses and blessings of dimensionality. In: AMS math challenges lecture; 2000. p. 1–33. See https://pdfs.semanticscholar.org/63c6/8278418b69f60b4814fae8dd15b1b1854295.pdf. [17] Donoho DL. Compressed sensing. IEEE Trans Inf Theory 2006;52(4):1289–306. [18] Donoho D, Tanner J. Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Philos Trans R Soc A 2009;367:4273–93. [19] Dormann CF, Elith J, Bacher S, Buchmann C, Carl G, Carré G, et al. Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 2013;36(1):27–46. [20] Eichenbaum H. Thinking about brain cell assemblies. Science 1993;261(5124):993–5. [21] Einstein A. Autobiographical notes. In: Schilpp PA, editor. Albert Einstein, philosopher-scientist. Evanston: Open Court; 1949. [22] Eldar YC, Kutyniok G, editors. Compressed sensing: theory and applications. Cambridge University Press; 2012. [23] Face recognition systems and error rates – is this a concern? Biometrics Institute; Thu 24 May 2018. https://www.biometricsinstitute.org/blogs/face recognition-systems-and-error-rates-is-this-a-concern-. [24] Fernandez-Ruiz A, Makarov VA, Herreras O. Sustained increase of spontaneous input and spike transfer in the CA3-CA1 pathway following long term potentiation in vivo. Front. Neural Circuits 2012;6:71. https://doi.org/10.3389/fncir.2012.00071. [25] Feynman R. The character of physical law. Boston: MIT Press; 2017. [26] Finnerty CT, Jefferys JGR. Functional connectivity from CA3 to the ipsilateral and contralateral CA1 in the rat dorsal hippocampus. Neuroscience 1993;56(1):101–8. [27] Fontaine B, Peña JL, Brette R. Spike-threshold adaptation predicted by membrane potential dynamics in vivo. PLoS Comput Biol 2014;10(4):e1003560. [28] Fried I, MacDonald KA, Wilson C. Single neuron activity in human hippocampus and amygdale during recognition of faces and objects. Neuron 1997;18:753–65. [29] Foxx C. Face recognition police tools ‘staggeringly inaccurate’. BBC News, Technology, 15 May 2018. http://www.bbc.co.uk/news/technology-44089161. [30] Giannopoulos AA, Milman VD. Concentration property on probability spaces. Adv Math 2000;156:77–106. [31] Gibbs GW. Elementary principles in statistical mechanics, developed with especial reference to the rational foundation of thermodynamics. New York: Dover Publications; 1960 [1902]. [32] Gorban AN. Looking for tomorrows mainstream. An inaugural lecture given at the University of Leicester in February, 2006. https://doi.org/ 10.13140/RG.2.2.34822.47682. [33] Gorban AN. Model reduction in chemical dynamics: slow invariant manifolds, singular perturbations, thermodynamic estimates, and analysis of reaction graph. Curr. Opin. Chem. Eng. 2018;21:48–59. [34] Gorban AN. Hilbert’s sixth problem: the endless road to rigour. Philos Trans R Soc A 2018;376(2118):20170238. [35] Gorban AN, Golubkov A, Grechuk B, Mirkes EM, Tyukin IY. Correction of AI systems by linear discriminants: probabilistic foundations. Inf Sci 2018. https://doi.org/10.1016/j.ins.2018.07.040. [36] Gorban AN, Grechuk B, Tyukin IY. Augmented artificial intelligence: a conceptual framework. arXiv preprint. arXiv:1802.02172 [cs.AI], 2018. [37] Gorban AN, Kégl B, Wunsch D, Zinovyev A, editors. Principal manifolds for data visualisation and dimension reduction. Lecture notes in computational science and engineering, vol. 58. Berlin–Heidelberg–New York: Springer; 2008. [38] Gorban AN, Kazantzis N, Kevrekidis IG, Öttinger HC, Theodoropoulos C, editors. Model reduction and coarse–graining approaches for multiscale phenomena. Berlin–Heidelberg–New York: Springer; 2006. [39] Gorban AN, Romanenko I, Burton R, Tyukin IY. One-trial correction of legacy AI systems and stochastic separation theorems. arXiv preprint. arXiv:1610.00494 [stat.ML]. https://arxiv.org/abs/1610.00494, 2016. [40] Gorban AN, Tyukin IY. Stochastic separation theorems. Neural Netw 2017;94:255–9. [41] Gorban AN, Tyukin IY. Blessing of dimensionality: mathematical foundations of the statistical physics of data. Philos Trans R Soc A 2018;376:20170237. [42] Gorban AN, Tyukin I, Prokhorov D, Sofeikov K. Approximation with random bases: pro et contra. Inf Sci 2016;364–365:129–45. [43] Gorban AN, Tyukin IY, Romanenko I. The blessing of dimensionality: separation theorems in the thermodynamic limit. IFAC-PapersOnLine 2016;49:64–9. [44] Gorban AN, Yablonsky GS. Grasping complexity. Comput Math Appl 2013;65(10):1421–6. [45] Gorban AN, Zinovyev A. Principal manifolds and graphs in practice: from molecular biology to dynamical systems. Int J Neural Syst 2010;20(03):219–32. [46] Gromov M. Isoperimetry of waists and concentration of maps. Geom Funct Anal 2003;13:178–215. [47] Gross CG. Genealogy of the “grandmother cell”. Neuroscientist 2002;8(5):512–8. [48] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proc. of the IEEE conference on computer vision and pattern recognition; 2016. p. 770–8. arXiv:1512.03385. [49] Hecht-Nielsen R. Context vectors: general-purpose approximate meaning representations self-organized from raw data. In: Zurada J, Marks R, Robinson C, editors. Computational intelligence: imitating life. New York: IEEE Press; 1994. p. 43–56. [50] Hintikka J. Individuals, possible worlds, and epistemic logic. Noûs 1967;1(1):33–62. [51] Hintikka J. Impossible possible worlds vindicated. J Philos Log 1975;4(4):475–84. [52] Huang C, Resnik A, Celikel T, Englitz B. Adaptive spike threshold enables robust and temporally precise neuronal encoding. PLoS Comput Biol 2016;12(6):e1004984. [53] Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 mb model size. arXiv preprint. arXiv:1602.07360, 2016. [54] Ishizuka N, Weber J, Amaral DG. Organization of intrahippocampal projections originating from CA3 pyramidal cells in the rat. J Comp Neurol 1990;295:580–623. [55] Ison MJ, Quian Quiroga R, Fried I. Rapid encoding of new memories by individual neurons in the human brain. Neuron 2015;87(1):220–30. [56] Izhikevich EM, Edelman GM. Large-scale model of mammalian thalamocortical systems. Proc Natl Acad Sci 2008;105:3593–8. [57] James W. The principles of psychology, vol. 1. NY: Henry Holt and Co; 1890. [58] Kainen PC. Utilizing geometric anomalies of high dimension: when complexity makes computation easier. In: Computer-intensive methods in control and signal processing: the curse of dimensionality. New York: Springer; 1997. p. 283–94. [59] Kainen P, K˚urková V. Quasiorthogonal dimension of Euclidean spaces. Appl Math Lett 1993;6:7–10. [60] Kaski S. Dimensionality reduction by random mapping: fast similarity computation for clustering. In: Neural networks proceedings, the 1998 IEEE international joint conference on (vol. 1). IEEE world congress on computational intelligence. IEEE; 1998, May. p. 413–8. [61] Khinchin AY. Mathematical foundations of statistical mechanics. New York: Courier Corporation; 1949. English translation from the Russian edition, Moscow–Leningrad, 1943. [62] Konorski J. Integrative activity of the brain, an interdisciplinary approach. Chicago: University of Chicago Press; 1967. [63] Kramer MA. Nonlinear principal component analysis using autoassociative neural networks. AIChE J 1991;37(2):233–43. [64] Kreiman G, Koch C, Fried I. Category-specific visual responses of single neurons in the human medial temporal lobe. Nat Neurosci 2000;3:946–53. [65] K˚urková V, Sanguineti M. Probabilistic lower bounds for approximation by shallow perceptron networks. Neural Netw 2017;91:34–41. [66] Learned-Miller E, Huang GB, RoyChowdhury A, Li H, Hua G. Labeled faces in the wild: a survey. In: Kawulok M, Celebi ME, Smolka B, editors. Advances in face detection and facial image analysis. Springer; 2016. p. 189–248. [67] Ledoux M. The concentration of measure phenomenon. Math surv monogr, vol. 89. Providence: AMS; 2001. 2005. [68] Lévy P. Problèmes concrets d’analyse fonctionnelle. Paris: Gauthier-Villars; 1951. [69] LFW Database: Main http://vis-www.cs.umass.edu/lfw/. [Accessed 29 July 2018]. [70] Li XG, Somogyi P, Ylinen A, Buzsaki G. The hippocampal CA3 network: an in vivo intracellular labeling study. J Comp Neurol 1994;339:181–208. [71] Minsky M. The society of mind. Cambridge, MA: MIT Press; 1988. [72] Oja E. A simplified neuron model as a principal component analyzer. J Math Biol 1982;15:267–73. [73] Olshausen BA, Field DJ. Sparse coding of sensory inputs. Curr Opin Neurobiol 2004;14(4):481–7. [74] Peierls R. Model-making in physics. Contemp Phys 1980;21(1):3–17. [75] Penrose R. Shadows of the mind. Oxford: Oxford University Press; 1994. [76] Perlovsky LI. Toward physics of the mind: concepts, emotions, consciousness, and symbols. Phys Life Rev 2006;3:23–55. [77] Plaut DC, McClelland JL. Locating object knowledge in the brain: comment on Bowers’s (2009) attempt to revive the grandmother cell hypothesis. Psychol Rev 2010;117:284–90. [78] Pouget A, Dayan P, Zemel R. Information processing with population codes. Nat Rev Neurosci 2000;1:125–32. [79] Quian Quiroga R, Fried I, Koch C. Brain cells for grandmother. Sci Am 2013;308(2):31–5. [80] Quian Quiroga R, Kreiman G. Measuring sparseness in the brain: comment on Bowers (2009). Psychol Rev 2010;117(1):291–7. [81] Quian Quiroga R, Kreiman G. Postscript: about grandmother cells and Jennifer Aniston neurons. Psychol Rev 2010;117(1):297–9. [82] Quian Quiroga R, Kreiman G, Koch C, Fried I. Sparse but not ‘grandmother-cell’ coding in the medial temporal lobe. Trends Cogn Sci 2008;12(3):87–91. [83] Quian Quiroga R, Panzeri S. Extracting information from neuronal populations: information theory, and decoding approaches. Nat Rev Neurosci 2009;10:173–85. [84] Quian Quiroga R, Reddy L, Kreiman G, Koch C, Fried I. Invariant visual representation by single neurons in the human brain. Nature 2005;435(7045):1102–7. [85] Ritter H, Kohonen T. Self-organizing semantic maps. Biol Cybern 1989;61(4):241–54. [86] Roy A. An extension of the localist representation theory: grandmother cells are also widely used in the brain. Front Psychol 2013;4:300. https://doi.org/10.3389/fpsyg.2013.00300. [87] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. ImageNet large scale visual recognition challenge. Int J Comput Vis 2015;115(3):211–52. [88] Schroff F, Kalenichenko D, Philbin J. FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition CVPR2015. IEEE; 2015. p. 815–23. [89] Sherrington CS. Man on his nature. Cambridge: Cambridge University Press; 1941. [90] Sidorov S. On the 1-convexity of random points in the d-dimensional spherical layer. arXiv preprint. arXiv:1806.04732 [math.CO], 2018. [91] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations; 2015. arXiv preprint. arXiv:1409.1556. [92] Slemrod M. Hilbert’s sixth problem and the failure of the Boltzmann to Euler limit. Philos Trans R Soc A 2018;376:20170222. [93] Talagrand M. Concentration of measure and isoperimetric inequalities in product spaces. Publ. Math. IHÉS 1995;81:73–205. [94] Tomlin AS, Turányi T, Pilling MJ. Mathematical tools for the construction, investigation and reduction of combustion mechanisms. In: Pilling MJ, editor. Low-temperature combustion and autoignition. Series comprehensive chemical kinetics, vol. 35. Elsevier; 1997. p. 293–437. [95] Tyukin IY, Gorban AN, Calvo C, Makarova J, Makarov VA. High-dimensional brain. A tool for encoding and rapid learning of memories by single neurons. Bull Math Biol 2018. https://doi.org/10.1007/s11538-018-0415-5. [96] Tyukin IY, Gorban AN, Sofeykov KI, Romanenko I. Knowledge transfer between artificial intelligence systems. Front. Neurorobot. 2018. https://doi.org/10.3389/fnbot.2018.00049. arXiv:1709.01547. [97] Tyutchev F. Selected poems, translated with an introduction and notes by J. Dewey. The Mount, Buckhorn Weston Gillingham, Dorset: Brimstone Press; 2014. [98] Vapnik V. The nature of statistical learning theory. Springer; 2000. [99] Valdez AB, Papesh MH, Treiman DM, Smith KA, Goldinger SD, Steinmetz PN. Distributed representation of visual objects by single neurons in the human brain. J Neurosci 2015;35(13):5180–6. [100] Vincent P, Larochelle H, Bengio Y, Manzagol PA. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM; 2008. p. 1096–103. [101] Wigner EP. The unreasonable effectiveness of mathematics in the natural sciences. Commun Pure Appl Math 1960;13(1):1–14. [102] Wilson MA, McNaughton BL. Dynamics of the hippocampal ensemble code for space. Science 1993;261(5124):1055–8. [103] Wittner L, Henze DA, Zaborszky L, Buzsaki G. Three-dimensional reconstruction of the axon arbor of a CA3 pyramidal cell recorded and filled in vivo. Brain Struct Funct 2007;212(1):75–83. [104] Zinovyev A, Mirkes E. Data complexity measured by principal graphs. Comput Math Appl 2013;65(10):1471–82. [105] Zolotarev VM. Modern theory of summation of random variables. Utrecht, the Netherlands: Walter de Gruyter; 2011.
Collections