Meta-learning in natural and artificial intelligence

11/26/2020 ∙ by Jane X Wang, et al. ∙ 0

Meta-learning, or learning to learn, has gained renewed interest in recent years within the artificial intelligence community. However, meta-learning is incredibly prevalent within nature, has deep roots in cognitive science and psychology, and is currently studied in various forms within neuroscience. The aim of this review is to recast previous lines of research in the study of biological intelligence within the lens of meta-learning, placing these works into a common framework. More recent points of interaction between AI and neuroscience will be discussed, as well as interesting new directions that arise under this perspective.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Humans are remarkable for continuously learning throughout the entirety of their lives, from acquiring physical reasoning and language skills at a young age [64, 43], to the ability to reason about the detailed complexities inherent in everyday adult life. One key quality of this learning is that it happens at multiple scales, both in terms of time and abstraction, in a process termed meta-learning or learning to learn. The fundamental principle of meta-learning is that learning proceeds faster with more experience, via the acquisition of inductive biases or knowledge that allows for more efficient learning in the future [66, 59, 57].

These favorable properties of meta-learning have recently gained it considerable renewed interest within the deep learning/artificial intelligence community. Despite their tremendous successes in recent years

[46, 61], deep learning systems still require many orders of magnitude of data than humans [40, 12]

. Although early work demonstrated the feasibility for neural networks to discover their own learning rules

[10, 58]

, it was only recently that the field has experienced a resurgence of new research in meta-learning using deep neural networks. This has demonstrated the wide-ranging potential of neural networks to meta-learn all aspects of the learning process. Deep neural networks are typically trained via backpropagation, which adjusts the weights of the neural network so that given a set of input data, the network outputs match some desired target outputs (e.g., classification labels). Popular meta-learning techniques have therefore spanned everything from methods for meta-learning the initial weights of the network

[25], the weight update rule itself [50, 1]

, or some nonparametric representation of the inputs that is easier to classify

[70, 62]

; to deriving an implicit learning algorithm from a black-box recurrent neural network

[72, 23, 56] (see [69] for a comprehensive review).

On the other hand, the idea of "learning to learn" originated within the psychological sciences many decades prior [30], and focused on one or few-shot learning of learning sets and educational theory [15]. Given the rapid pace of progress, it’s illustrative to examine how different lines of work in psychology, cognitive science, and neuroscience fit within the meta-learning perspective as understood currently in artificial intelligence (AI). This review aims to demonstrate that meta-learning is prevalent in nature, being naturally multi-scaled, and examines past work centered on the points of interaction between neuroscience and incipient research on meta-learning in the field of artificial intelligence. I then suggest interesting new questions and avenues of research that naturally arise under this framework.

2 The scales of meta-learning: across and within lifetimes

Biological learning, at its fundamental level, is the ability of an organism to represent and adapt to changes and challenges presented to it by the external environment. This adaptation is typically in pursuit of a specific drive or goal, such as survival or reproduction. The challenges that one can face in everyday life are widely varying in scope and duration. Accordingly, there exists a range of learning mechanisms that span these different timescales.

There are not only different scales of learning, they are also often nested, such that learning occurring at a longer timescale drives more efficient learning at shorter timescales (see Fig. 1). One of the most interesting examples of this is known as the Baldwin effect [4], whereby phenotypic expression of fast adaption and learning creates positive selection pressure, allowing for indirect selection of the genetic basis for these traits to be passed on to future generations. That faster learning can be selected for by evolution was compellingly demonstrated in simulation by Hinton and Nowlan [33]111Interestingly, one of the most popular meta-learning approaches, Model-Agnostic Meta-Learning [25] has been proposed to be closely related to the Baldwin effect [24].. In this way, innate (evolutionarily pre-programmed) or developmentally predetermined behaviors interact with learned behaviors and representations [77]

. For example, the propensity to form place cells (or neurons that tend to fire when only in one particular place in an environment) is innate, while the specific content of these spatial representations in any given environment is learned. The ability to form place cells (and closely related grid cells) thus presumably arose from the benefits conferred by flexibly and quickly representing one’s spatial location, which allowed for the evolutionary selection of this innate process. Indeed, the ability to organize and scaffold new knowledge via spatial and relational configurations has been found to be useful for learning even nonspatial conceptual representations in humans

[6, 18].

Innate learning does not have to be present from birth, but rather can be expressed in relatively stereotyped and predictable trajectories throughout early development. According to Alison Gopnik’s theory theory [27], human children tend to formulate increasingly complex theories and ways of testing their hypotheses in a relatively predetermined way. Building on this, Elizabeth Spelke posited a core body of knowledge (i.e. object representation, agency, etc.) upon which all other understanding is built, which is present from very early life [65] (see also [40] for a comprehensive review on these topics). That such core knowledge and set developmental trajectories are so conserved indicates their value in building foundational knowledge and skills vital for higher order cognition in humans.

Figure 1: Multiple nested scales of learning in nature. At the highest level, learning is done across generations, via evolution, to learn highly invariant universal structure such as intuitive physics, motor primitives, or other kinds of "core knowledge" [65]. These priors help to make learning faster at the level below, where learning is done within a lifetime and involves learning the general structure of different tasks, such as video game playing, how to navigate around a city, or acquiring specific skills. Learning at the innermost level involves fast adaptation within a specific task, such as playing a new video game or finding a certain restaurant within a new city. Again, such fast adaptation is crucially dependent on having learned useful priors and inductive biases at the level above.

Within a single lifetime, we can see evidence of meta-learning in various animal paradigms of cognition. In one of the first experimental studies of learning to learn, monkeys were challenged to learn an abstract rule for object-role bindings [30]. Two new objects were presented every six trials, only one of which was rewarding, irrespective of object placement. The optimal policy was to choose randomly on the first trial, and then thereafter choose based on the reward outcome of that trial, i.e. perform one-shot learning. Monkeys were able to learn this policy only after an extended period of learning and many sets of new objects.

Humans tend to meta-learn to much greater extent and at greater levels of abstraction and nesting. For instance, we can perform meta-cognition in order to monitor and improve our own learning progress [44], as well as meta-reasoning to perform decision-making given finite computational resources and time [29]. Learning to learn also has roots within educational psychology and theories of classroom learning and how children learn [14, 27]. Within cognitive science, hierarchical Bayesian models of cognition capture how learning can occur at multiple scales and via the acquisition of useful, structured priors [26, 39]. This closely parallels the general formulation of meta-learning, and in fact constitutes an exact equivalence for certain forms of meta-learning in AI [28].

3 Neuroscience of meta-learning

While there has been a robust history of meta-learning within the psychological and cognitive sciences, the ties between meta-learning and neuroscience are relatively newer. In this section, I detail several lines of research with direct relevance to meta-learning, and draw ties to corresponding work in AI.

3.1 Meta-learning as learning of meta-parameters

Perhaps one of the most straightforward implementations of meta-learning is to learn the parameters of the learning algorithm itself (for instance, the learning rate or discount factor; also called "hyper-" or "meta-parameters"). A notable early account of biological meta-learning proposed that various neuromodulators such as dopamine, serotonin, and noradrenaline played critical roles in the regulation of the meta-parameters of reinforcement learning

[22, 60]. Relatedly, activity within anterior cingulate cortex (ACC) has been shown to track recent volatility and uncertainty to drive learning rate changes in a Bayesian manner [7], and ACC further was proposed to play a central role in dynamically regulating the trade-off between exploration and exploitation during reward-based task learning [36]. The ACC and certain areas of prefrontal cortex (PFC) have also been suggested to function as a meta-controller dynamically arbitrating between model-free and model-based learning systems [41] (see also [19]).

Learning of meta-parameters has already been popularized within machine learning, due to its practical benefits on performance (e.g.

[34, 75]), and lack of needing to hand-tune. Indeed, the momentum has increasingly shifted toward meta-learning more and more aspects of the learning process in recent years [78].

3.2 Meta-learning over representations

Some research lines within neuroscience broadly related to meta-learning are those of learning control over existing representations. In particular, mental schemas [67] are described as structured mental representations that allow for faster learning, by aiding in retrieval of existing knowledge and integrating new knowledge (see also [68] for a good review). Such processes are suggested to be mediated by hippocampal-cortical interactions and a specific time course of memory consolidation [67]. In general, the focus of these works center on how existing mental schemas affect new learning, rather than the learning process giving rise to the schema in the first place, and thus can be subsumed within the broader scope of meta-learning. Another relevant line of research is that on hierarchical representation and cognitive control (the ability to perform task-relevant processing without external support or in the face of distractors) [38], and a particularly compelling set of works proposing hierarchical organization of prefrontal cortex along the rostro-caudal axis to support increasingly abstract levels of cognitive control [37, 2, 3]. Such a hierarchically structured organization is intriguingly suggestive of the multi-scaled nature of meta-learning systems.

Less explored in this area is how such hierarchical representations emerge in the first place. To examine this, it’s helpful to turn to developmental neuroscience, which shows that infants can learn latent structure to construct hierarchical rules [74] and extract statistical regularity from language [55]. Human adults also learn new structure, and indeed have a bias toward structure learning [26], even when not strictly needed [17], since such structure affords faster learning and generalization in as-yet unseen situations. This work ties together hierarchical structure learning and previous proposed theories of basal ganglia-PFC gating models of working memory [48, 54]. Computational accounts have also been put forth that show hierarchical control can emerge implicitly, as a function of training on task distributions for which the optimal policy assumes this hierarchical form [11], essentially allowing for optimal learning efficiency on new problems drawn from this distribution.

3.3 Meta-learning as latent state and Bayesian inference

Human representation learning and inference is increasingly characterized through a Bayesian lens, and is seen to be key to fast human learning. Lake and colleagues proposed an influential Bayesian framework of learning probabilistic programs, which is able to model how humans acquire new concepts from just a few examples [39]. Such a model meta-learns by developing structured, hierarchical priors, in which previous experience with similar tasks induce the formation of concepts that improve learning of new concepts. Humans have also been shown to perform tasks by subdividing and decomposing them into optimal hierarchies, which map to optimal policies that are able to generalize to the distribution of all related tasks [63].

Parallel to these developments in cognitive science, advancements in AI and deep learning have leveraged the ability to train powerful models on large quantities of structured data to learn these efficient representations and learning mechanisms end-to-end [50, 70, 56, 75, 25, 1, 72, 23, 45]

. In the supervised case, it can be shown that a neural network training on an environment of related tasks is equivalent to performing hierarchical Bayesian inference, i.e. learning an appropriate prior (or hypothesis space

) such that the error for learning on new related tasks is minimized [5]. This correspondence has also been demonstrated to hold for a popular meta-learning method, model-agnostic meta-learning [28], in which what is meta-learned are the initial parameters of the neural network, constituting the learned prior. Similarly, memory-based meta-learning approaches [56, 72, 23, 45] meta-learn the weights of a recurrent neural network, such that the time-dependent evolution of the activity dynamics effectively tracks sufficient statistics of the current task and performs Bayesian updating in a way close to Bayes optimal (given sufficient training) [49].

Taking this perspective further, we can see that, at least in the context of reinforcement learning, there is little distinction between the fastest inner scale of learning (which integrates incoming information with a learned, structured prior) and latent state inference for decision-making or cognitive control. An account for how this process could occur in the brain was put forth by Nakahara et al [47], which posited that dopamine encodes more than just reward prediction error, and actually mediates learning more complex reward structure via a learned internal state representation. Further, [21] suggested that PFC performs online Bayesian inference combined with hypothesis testing to quickly reason over potential strategies which have been learned and stored in memory. Such proposals make contact with a particular subclass of meta-learning models in AI based on episodic memory, in which memory banks of past experiences are incorporated as part of the meta-learning process [56, 53, 73].

4 Bridging between AI and neuroscience: New questions and future directions

We’ve seen that the multi-scale nature of learning in nature maps well to the framework of meta-learning, as implemented in AI. These points of connection allow us to define new potentially fruitful avenues of research. At the same time, it is important to note the fundamentally different goals of neuroscience compared to AI research. Animal intelligence is already rife with practically useful mental models. Therefore, the driving force of neuroscience is to discover what already exists; that is, the representations already acquired by animals and the mechanisms for control over such representations, strategies, knowledge, and subsequent impacts on new learning.

In contrast, the end goal of AI is to engineer a learning system from scratch. Deep neural networks are typically initialized with random weights and possess very weak inductive biases [12]. Therefore, there has been a recognized need for either directly hand-designing algorithmic/architectural biases or learning inductive biases to improve learning. Although the former has been quite successful in achieving state-of-the-art results in various domains, the vast improvements in available training data in recent years have also led to a renewed interest in inducing models to learn these inductive biases through meta-learning approaches. This has inevitably shifted the engineering problem one layer of abstraction, from how to construct a model that learns to how to construct a model of learning itself. It is in this aspect that cognitive science and neuroscience are well-positioned to offer unique insights to the AI community.

Recent work has already demonstrated how the two can be fruitfully combined, showing that deep reinforcement learning models can capture meta-learning effects similar to how animals learn and in accordance with previous neural findings [71]. In a nice demonstration of the "virtuous circle" between AI and neuroscience put forth by Hassabis et al [31], these advances have led to resurgent interest in meta-learning within neuroscience, for instance in extending meta-learning to more biologically plausible spiking networks and forms of weight updating [8, 9]. More generally, we are witnessing in the last few years great interest in incorporating deep neural networks as models of biological learning [32, 16, 51, 42, 13, 52], sensory processing [76, 35], or even simultaneously fitted decison-making behavior and neural activity [20].

Structure learning [26] and model-building are likely to be increasingly important in artificial agent construction, and it’s in this area that neuroscience has the potential to offer even more valuable insights. This highlights the need to focus on discovering the processes of structure learning, rather than the structured representations themselves. Furthermore, this perspective has strong implications for task design, emphasizing recording and measuring neural signals during the training process itself rather than current practices of focusing on already trained animals. Additionally, it points to a need for more precise determination of existing priors and biases that animals already possess and how these priors interact with new learning in everyday, complex settings.

5 Highlights

  • Multiple scales of learning (and hence meta-learning) are ubiquitous in nature.

  • Many existing lines of work in neuroscience and cognitive science touch upon different aspects of meta-learning, of which we outline three in particular.

  • The distinct but complementary goals of AI and neuroscience point to new points of possible contact, among which meta-learning is well-positioned.

6 Acknowledgements

The author would like to thank Matthew Botvinick, Kevin Miller, and Kim Stachenfeld for helpful discussions and feedback, and DeepMind for funding.


  • Andrychowicz et al. [2016] Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., & de Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems, pp. 3981–3989.
  • Badre [2008] Badre, D. (2008). Cognitive control, hierarchy, and the rostro–caudal organization of the frontal lobes. Trends in Cognitive Sciences, 12, 193–200.
  • Badre et al. [2010] Badre, D., Kayser, A. S., & D’Esposito, M. (2010). Frontal cortex and the discovery of abstract action rules. Neuron, 66, 315–326.
  • Baldwin [1896] Baldwin, J. M. (1896). A new factor in evolution. The American Naturalist, 30, 441–451.
  • Baxter [1998] Baxter, J. (1998). Theoretical models of learning to learn. In Learning to learn. (Springer), pp. 71–94.
  • Behrens et al. [2018] Behrens, T. E., Muller, T. H., Whittington, J. C., Mark, S., Baram, A. B., Stachenfeld, K. L., & Kurth-Nelson, Z. (2018). What is a cognitive map? organizing knowledge for flexible behavior. Neuron, 100, 490–509.
  • Behrens et al. [2007] Behrens, T. E., Woolrich, M. W., Walton, M. E., & Rushworth, M. F. (2007). Learning the value of information in an uncertain world. Nature Neuroscience, 10, 1214–1221.
  • Bellec et al. [2018] Bellec, G., Salaj, D., Subramoney, A., Legenstein, R., & Maass, W. (2018). Long short-term memory and learning-to-learn in networks of spiking neurons. In Advances in Neural Information Processing Systems, pp. 787–797.
  • Bellec et al. [2020] Bellec, G., Scherr, F., Subramoney, A., Hajek, E., Salaj, D., Legenstein, R., & Maass, W. (2020). A solution to the learning dilemma for recurrent networks of spiking neurons. Nature Communications, 11.
  • Bengio et al. [1991] Bengio, Y., Bengio, S., & Cloutier, J. (1991). Learning a synaptic learning rule. In Neural Networks, 1991., IJCNN-91-Seattle International Joint Conference on, vol. 2, pp. 969–vol. IEEE.
  • Botvinick & Plaut [2004] Botvinick, M. & Plaut, D. C. (2004). Doing without schema hierarchies: a recurrent connectionist approach to normal and impaired routine sequential action. Psychological Review, 111, 395.
  • Botvinick et al. [2019] Botvinick, M., Ritter, S., Wang, J. X., Kurth-Nelson, Z., Blundell, C., & Hassabis, D. (2019). Reinforcement learning, fast and slow. Trends in Cognitive Sciences.
  • Botvinick et al. [2020] Botvinick, M., Wang, J. X., Dabney, W., Miller, K. J., & Kurth-Nelson, Z. (2020). Deep reinforcement learning and its neuroscientific implications. Neuron.
  • Bransford et al. [2000] Bransford, J. D., Brown, A. L., Cocking, R. R., et al. (2000). How people learn, vol. 11. (Washington, DC: National Academy Press).
  • Brown & Kane [1988] Brown, A. L. & Kane, M. J. (1988).

    Preschool children can learn to transfer: Learning to learn and learning from example.

    Cognitive Psychology, 20, 493–523.
  • Cichy & Kaiser [2019] Cichy, R. M. & Kaiser, D. (2019). Deep neural networks as scientific models. Trends in Cognitive Sciences.
  • Collins & Frank [2013] Collins, A. G. & Frank, M. J. (2013). Cognitive control over learning: Creating, clustering, and generalizing task-set structure. Psychological Review, 120, 190.
  • Constantinescu et al. [2016] Constantinescu, A. O., O’Reilly, J. X., & Behrens, T. E. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science, 352, 1464–1468.
  • Daw et al. [2005] Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8, 1704.
  • Dezfouli et al. [2018] Dezfouli, A., Morris, R., Ramos, F. T., Dayan, P., & Balleine, B. (2018). Integrated accounts of behavioral and neuroimaging data using flexible recurrent neural network models. In Advances in Neural Information Processing Systems, pp. 4228–4237.
  • Donoso et al. [2014] Donoso, M., Collins, A. G., & Koechlin, E. (2014). Foundations of human reasoning in the prefrontal cortex. Science, 344, 1481–1486.
  • Doya [2002] Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15, 495–506.
  • Duan et al. [2016] Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I., & Abbeel, P. (2016). Rl: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779.
  • Fernando et al. [2018] Fernando, C., Sygnowski, J., Osindero, S., Wang, J., Schaul, T., Teplyashin, D., Sprechmann, P., Pritzel, A., & Rusu, A. (2018). Meta-learning by the baldwin effect.

    In Proceedings of the Genetic and Evolutionary Computation Conference Companion, p. 1313–1320.

  • Finn et al. [2017] Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. International Conference on Machine Learning.
  • Gershman & Niv [2010] Gershman, S. J. & Niv, Y. (2010). Learning latent structure: carving nature at its joints. Current Opinion in Neurobiology, 20, 251–256.
  • Gopnik et al. [1999] Gopnik, A., Meltzoff, A. N., & Kuhl, P. K. (1999). The scientist in the crib: Minds, brains, and how children learn. (William Morrow & Co).
  • Grant et al. [2018] Grant, E., Finn, C., Levine, S., Darrell, T., & Griffiths, T. (2018). Recasting gradient-based meta-learning as hierarchical bayes. In International Conference on Learning Representations.
  • Griffiths et al. [2019] Griffiths, T. L., Callaway, F., Chang, M. B., Grant, E., Krueger, P. M., & Lieder, F. (2019). Doing more with less: meta-reasoning and meta-learning in humans and machines. Current Opinion in Behavioral Sciences, 29, 24–30.
  • Harlow [1949] Harlow, H. F. (1949). The formation of learning sets. Psychol. Rev., 56, 51.
  • Hassabis et al. [2017] Hassabis, D., Kumaran, D., Summerfield, C., & Botvinick, M. (2017). Neuroscience-inspired artificial intelligence. Neuron, 95, 245–258.
  • Hasson et al. [2020] Hasson, U., Nastase, S. A., & Goldstein, A. (2020). Direct fit to nature: An evolutionary perspective on biological and artificial neural networks. Neuron, 105, 416–434.
  • Hinton & Nowlan [1987] Hinton, G. E. & Nowlan, S. J. (1987). How learning can guide evolution. Complex systems, 1, 495–502.
  • Jaderberg et al. [2017] Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W. M., Donahue, J., Razavi, A., Vinyals, O., Green, T., Dunning, I., Simonyan, K., et al. (2017). Population based training of neural networks. arXiv preprint arXiv:1711.09846.
  • Kell et al. [2018] Kell, A. J., Yamins, D. L., Shook, E. N., Norman-Haignere, S. V., & McDermott, J. H. (2018). A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98, 630–644.
  • Khamassi et al. [2013] Khamassi, M., Enel, P., Dominey, P. F., & Procyk, E. (2013). Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters. In Progress in Brain Research, vol. 202. (Elsevier), pp. 441–464.
  • Koechlin et al. [2003] Koechlin, E., Ody, C., & Kouneiher, F. (2003). The architecture of cognitive control in the human prefrontal cortex. Science, 302, 1181–1185.
  • Koechlin & Summerfield [2007] Koechlin, E. & Summerfield, C. (2007). An information theoretical approach to prefrontal executive function. Trends in Cognitive Sciences, 11, 229–235.
  • Lake et al. [2015] Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350.
  • Lake et al. [2017] Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that learn and think like people. Behav. Brain Sci., 40.
  • Lee et al. [2014] Lee, S. W., Shimojo, S., & O’Doherty, J. P. (2014). Neural computations underlying arbitration between model-based and model-free learning. Neuron, 81, 687–699.
  • Marblestone et al. [2016] Marblestone, A. H., Wayne, G., & Kording, K. P. (2016). Toward an integration of deep learning and neuroscience. Frontiers in Computational Neuroscience, 10, 94.
  • Marcus et al. [1999] Marcus, G. F., Vijayan, S., Rao, S. B., & Vishton, P. M. (1999). Rule learning by seven-month-old infants. Science, 283, 77–80.
  • Metcalfe et al. [1994] Metcalfe, J., Shimamura, A. P., et al. (1994). Metacognition: Knowing about knowing. (MIT press).
  • Mishra et al. [2017] Mishra, N., Rohaninejad, M., Chen, X., & Abbeel, P. (2017). A simple neural attentive meta-learner. International Conference on Learning Representations.
  • Mnih et al. [2015] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518, 529.
  • Nakahara & Hikosaka [2012] Nakahara, H. & Hikosaka, O. (2012). Learning to represent reward structure: A key to adapting to complex environments. Neuroscience Research, 74, 177–183.
  • O’Reilly & Frank [2006] O’Reilly, R. C. & Frank, M. J. (2006). Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput., 18, 283–328.
  • Ortega et al. [2019] Ortega, P. A., Wang, J. X., Rowland, M., Genewein, T., Kurth-Nelson, Z., Pascanu, R., Heess, N., Veness, J., Pritzel, A., Sprechmann, P., et al. (2019). Meta-learning of sequential strategies. arXiv preprint arXiv:1905.03030.
  • Ravi & Larochelle [2016] Ravi, S. & Larochelle, H. (2016). Optimization as a model for few-shot learning. In International Conference on Learning Representations.
  • Richards et al. [2019] Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen, A., Clopath, C., Costa, R. P., de Berker, A., Ganguli, S., et al. (2019). A deep learning framework for neuroscience. Nature Neuroscience, 22, 1761–1770.
  • Ritter et al. [2018a] Ritter, S., Wang, J. X., Kurth-Nelson, Z., & Botvinick, M. M. (2018a). Episodic control as meta-reinforcement learning. In Annual Meeting of the Cognitive Science Society.
  • Ritter et al. [2018b] Ritter, S., Wang, J. X., Kurth-Nelson, Z., Jayakumar, S. M., Blundell, C., Pascanu, R., & Botvinick, M. (2018b). Been there, done that: Meta-learning with episodic recall. International Conference on Machine Learning (ICML).
  • Rougier et al. [2005] Rougier, N. P., Noelle, D. C., Braver, T. S., Cohen, J. D., & O’Reilly, R. C. (2005). Prefrontal cortex and flexible cognitive control: Rules without symbols. Proceedings of the National Academy of Sciences USA, 102, 7338–7343.
  • Saffran et al. [1996] Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928.
  • Santoro et al. [2016] Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. In International conference on machine learning, pp. 1842–1850.
  • Schmidhuber [1987] Schmidhuber, J. (1987). Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-… hook. Ph.D. thesis, Technische Universität München.
  • Schmidhuber [1993] Schmidhuber, J. (1993). A neural network that embeds its own meta-levels. In IEEE International Conference on Neural Networks, pp. 407–412. IEEE.
  • Schmidhuber et al. [1996] Schmidhuber, J., Zhao, J., & Wiering, M. (1996). Simple principles of metalearning. Tech. rep., SEE.
  • Schweighofer & Doya [2003] Schweighofer, N. & Doya, K. (2003). Meta-learning in reinforcement learning. Neural Networks, 16, 5–9.
  • Silver et al. [2016] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529, 484.
  • Snell et al. [2017] Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems, pp. 4077–4087.
  • Solway et al. [2014] Solway, A., Diuk, C., Córdova, N., Yee, D., Barto, A. G., Niv, Y., & Botvinick, M. M. (2014). Optimal behavioral hierarchy. PLOS Comput Biol, 10, e1003779.
  • Spelke et al. [1992] Spelke, E. S., Breinlinger, K., Macomber, J., & Jacobson, K. (1992). Origins of knowledge. Psychological Review, 99, 605.
  • Spelke & Kinzler [2007] Spelke, E. S. & Kinzler, K. D. (2007). Core knowledge. Developmental Science, 10, 89–96.
  • Thrun & Pratt [1998] Thrun, S. & Pratt, L. (1998). Learning to learn: Introduction and overview. In Learning to learn. (Springer), pp. 3–17.
  • Tse et al. [2007] Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M. P., & Morris, R. G. (2007). Schemas and memory consolidation. Science, 316, 76–82.
  • Van Kesteren et al. [2012] Van Kesteren, M. T., Ruiter, D. J., Fernández, G., & Henson, R. N. (2012). How schema and novelty augment memory formation. Trends in Neurosciences, 35, 211–219.
  • Vanschoren [2018] Vanschoren, J. (2018). Meta-learning: A survey. arXiv preprint arXiv:1810.03548.
  • Vinyals et al. [2016] Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al. (2016). Matching networks for one shot learning. In Advances in Neural Information Processing Systems, pp. 3630–3638.
  • Wang et al. [2018] Wang, J. X., Kurth-Nelson, Z., Kumaran, D., Tirumala, D., Soyer, H., Leibo, J. Z., Hassabis, D., & Botvinick, M. (2018). Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience, 21, 860–868.
  • Wang et al. [2016] Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., Blundell, C., Kumaran, D., & Botvinick, M. (2016). Learning to reinforcement learn. In Annual Meeting of the Cognitive Science Society.
  • Wayne et al. [2018] Wayne, G., Hung, C.-C., Amos, D., Mirza, M., Ahuja, A., Grabska-Barwinska, A., Rae, J., Mirowski, P., Leibo, J. Z., Santoro, A., et al. (2018). Unsupervised predictive memory in a goal-directed agent. arXiv preprint arXiv:1803.10760.
  • Werchan et al. [2015] Werchan, D. M., Collins, A. G., Frank, M. J., & Amso, D. (2015). 8-month-old infants spontaneously learn and generalize hierarchical rules. Psychological Science, 26, 805–815.
  • Xu et al. [2018] Xu, Z., van Hasselt, H. P., & Silver, D. (2018). Meta-gradient reinforcement learning. In Advances in neural information processing systems, pp. 2396–2407.
  • Yamins & DiCarlo [2016] Yamins, D. L. & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19, 356.
  • Zador [2019] Zador, A. M. (2019). A critique of pure learning and what artificial neural networks can learn from animal brains. Nature Communications, 10, 1–7.
  • Zahavy et al. [2020] Zahavy, T., Xu, Z., Veeriah, V., Hessel, M., Oh, J., van Hasselt, H., Silver, D., & Singh, S. (2020). Self-tuning deep reinforcement learning. arXiv preprint arXiv:2002.12928.