A Perspective on Objects and Systematic Generalization in Model-Based RL

06/03/2019 ∙ by Sjoerd van Steenkiste, et al. ∙ 4

In order to meet the diverse challenges in solving many real-world problems, an intelligent agent has to be able to dynamically construct a model of its environment. Objects facilitate the modular reuse of prior knowledge and the combinatorial construction of such models. In this work, we argue that dynamically bound features (objects) do not simply emerge in connectionist models of the world. We identify several requirements that need to be fulfilled in overcoming this limitation and highlight corresponding inductive biases.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Artificial General Intelligence (AGI) requires an intelligent agent to solve a wide variety of real-world tasks. Learning to solve these tasks efficiently involves sharing knowledge between tasks, and systematic generalization from relatively few samples. In contrast, agents trained with Reinforcement Learning (RL) frequently fall short in this regard: they rely on excessive amounts of data 

(François-Lavet et al., 2018) and are unable to generalize beyond their initial training regime (Leike et al., 2017).

Model-based RL promises to alleviate this problem by using a general (non task-specific) world model that captures the latent structure of the environment. This more abstract knowledge about the world is expected to be useful for many (even novel) tasks and facilitate simulation and planning. It is unclear what constitutes a good model, and frequently models are either engineered (de Avila Belbute-Peres et al., 2018)

or obtained by training a deep (recurrent) neural network to predict future states of the world 

(Schmidhuber, 1990; Ha & Schmidhuber, 2018)

. In the latter case, an underlying assumption is that the learned representations of such a network present suitable abstractions for transfer and planning, analogous to the versatility of features learned by a deep convolutional image classifier. However, the limited success of learned models for model-based RL in these domains raises doubts about the validity of this assumption.

In this work, we argue that, rather than learning a single monolithic model that handles all situations in all environments, what is needed is a flexible system which dynamically infers a suitable model on the fly. A human playing the game of Space Invaders uses a mental model that revolves around space ships and aliens, without simultaneously also considering all other aspects of the real world that are relevant for other tasks. Humans are also quick to adapt their model to new information by adding or removing additional assumptions. For example, reading the manual of a game before playing greatly increases first-episode performance (Tsividis et al., 2017).

Initially, it may appear that in arguing for a dynamic model we have mostly made the task of model-learning harder: we now require learning many different models that fit specific situations. Why then would we expect such a model to perform any better or even work at all?


Objects are the key piece to this puzzle, in that they facilitate the modular reuse of prior knowledge and the combinatorial construction of novel models. It is well-established that objects play a central role in human cognition, both for internal reasoning and as the basis for communicating about the world. Indeed, objects are widely considered to be core knowledge (Spelke & Kinzler, 2007), and infants learn about objects already within their first year of life (Munakata et al., 1997).

RL methods that leverage the combinatorics of objects and relations have shown similar benefits in terms of systematic generalization (Zambaldi et al., 2019), sample efficiency (Diuk et al., 2008), and transferring skills and knowledge across domains (Kansky et al., 2017). Recently, there appears to be an emerging consensus that objects are important in learning intelligent agents (Lake et al., 2017)

, while it remains unclear in how to fully realize this potential. The discrete and compositional nature of objects seems at odds with many of the core tenets of connectionism, and they are unlikely to emerge naturally in neural networks. Reconciling the two is a difficult problem and requires careful thought to ensure a synergistic integration.

The Binding Problem

How then should we think about objects? Why do they not simply emerge in neural networks, what is missing, and how can this be addressed? Many of these questions have been raised and debated in theoretical neuroscience and have become known as the binding problem: How does the brain bind features together into objects while keeping them separate from other objects. Inspired by this literature (cf. Treisman (1999); von der Malsburg (1995)) we will focus on three main challenges in incorporating objects in connectionist models of the world: segregation, representation, and composing, which we discuss in the next sections.

Segregation is about object discovery, i.e. given a set of observations, what are good candidates for representational objects and how can they be extracted. Representation is about storing this representational content in neural networks, and as we will find, plain fully-connected feedforward networks are ill-equipped to solve this task. Finally, composition is about using representational objects efficiently in a way that ensures combinatorial generalization (systematicity; Niklasson & van Gelder (1994)).

2 Representation

What are good object representations? If objects are to serve as the primitives for compositional reasoning, it is important that their representations support that end. Here we argue for three main requirements:111Keeping space limitations in mind we will not attempt at exhaustively listing all requirements. Instead we focus on those that we believe to be most important.


Each object representation should be able to represent any object regardless of position, class or other properties. It should facilitate generalization, even to unseen objects (zero-shot generalization), which in practice means that its representation should be distributed and disentangled.


It should be possible to represent multiple objects simultaneously, such that they can be related and composed but also transformed individually. This only needs to cover a small number of objects at the same time (e.g. ; Miller 1956), since there is an intractable number of possible objects. Instead, objects should be swapped in and out of this working memory on demand.

Common Format

All objects should be represented in the same format, i.e. in terms of the same features. This makes representations comparable, provides a unified interface for compositional reasoning and allows the transfer of knowledge between objects.

It is easy to see how regular representations of fully-connected neural networks fall short in this regard: When representing multiple objects, they can either reuse the same features for all objects simultaneously, thus superimposing representations which leads to ambiguities ({red, square} + {blue, triangle} = {red, blue, square, triangle}). Alternatively, they can allocate a different set of features per object which violates common format. Without any architectural bias in the form of weight sharing, useful multi-object representations are thus unlikely to emerge naturally in a neural network. In what way can this problem be addressed?

Weight sharing, as it is used, for example, in ConvNets and RNNs, is a step in the right direction. We call these approaches “slot-based” because they provide several slots that all share weights and can thus be used to represent objects in a common format. In the case of RNNs there is one slot per time-step (Eslami et al., 2016), while in ConvNets there is one slot per spatial location in the image (Santoro et al., 2017). Note that both are in slight violation of universality because they tie a slot to a specific time step or location, while RNNs additionally do not simultaneously represent multiple objects. We can extend the idea of representational slots and consider a setting in which each object has its own universal slot and all slots share a common format (instance slots, c.f. Figure 1). While this constitutes a good object representation, it raises another problem: if all slots are identical and share weights, then how do they not end up all representing the same object? Solving this conundrum requires a dynamic information routing process that goes beyond simple feed-forward processing (see Section 3).

There are two, less developed, alternatives to slot-based approaches that have the potential to meet our requirements: Augmentation approaches keep a single set of features but augment each feature to include some extra grouping information. Examples include complex-valued activations (e.g. Reichert & Serre (2013)) or spiking networks that encode grouping via synchronization (e.g. Lane & Henderson (1998)). Embedding approaches carefully embed multi-object representations in a higher-dimensional space (e.g. Tensor Product RepresentationsSmolensky 1990).

Figure 1: Different types of slot-based representation strategies.

3 Segregation

It can be difficult to provide precise boundaries or definitions even for concrete objects like a tree, a mountain, and a river. Matters become even worse with slightly more abstract objects like a hole, a shadow, or a street corner. Clearly, sensory information does not come pre-structured into objects, yet we so effortlessly and consistently perceive them. How can we aid our agents in developing an equally general understanding of objects? We address this by focusing on the role of objects as computational primitives in a compositional reasoning system, namely as abstract patterns of the input data that are modular, dynamic, and consistent:


Objects subdivide the input into parts with strong internal coherence while being mostly independent of each other given some task under consideration. This division can be thought of as a form of clustering by mutual predictability and helps minimize the error that results from treating them as independent entities.


Objects are task-dependent, i.e. there is no one fixed definition of objects that applies to all tasks. For example, objects can be part-whole hierarchies whose parts are objects themselves: a stack of chairs can be viewed as a single object (the stack) or as multiple objects (the individual chairs). It necessitates top-down feedback: interaction between the up-stream problem solving and down-stream segregation to obtain a dynamic definition of objects.


Representational objects often “refer to” physical objects in the real world (although this does not need to be the case), and their usefulness depends on the reliability of that link. The output of the segregation process must thus be stable and consistent to ensure that the results from internal reasoning can be mapped back onto the environment. Consistency is also important in communication (different agents should agree on objects), and in the absence of information, e.g. as a result of occlusion.

Modularity rules out standard convolutional neural networks as a means to learn object representations given by the representational content at each spatial slot. Each convolutional layer with a kernel size exceeding

creates dependencies between local spatial neighborhoods. Through depth, the representational content of upper layers encode information from all spatial positions and are no longer modular: a change affecting a single object in the input image affects the representations at all spatial locations in the upper layers.

Dynamicity implies that we can not treat segregation as a pre-processing step that extracts objects from input data. This rules out the use of large quantities of labeled data to pre-train an image segmenter, or the use of domain-specific engineering as is commonly found in generative models that essentially encode a fixed definition of object. Moreover, human labor is an expensive resource that we can not spend exhaustively to overcome all possible situations.

We conclude that to a large extend object learning must be unsupervised through a specialized mechanism that allows for the possibility to incorporate top-down feedback. Two promising approaches from the literature are attention and differentiable clustering.

Attention mechanisms are used to selectively attend to a subset of the image, i.e. parts that correspond to a single object (Schmidhuber & Huber, 1991; Eslami et al., 2016). In this way, attention restricts the information intake and ensures that the resulting representations are modular. Top-down feedback can be incorporated by granting control of the attention window to the agent that learns to solve some task (Mnih et al., 2014). A downside is that objects are processed in an iterative fashion, which may make it more challenging to reason about multiple objects simultaneously (Kosiorek et al., 2018).

An alternative mechanism is differentiable clustering, which seeks to partition the input in a number of segments while learning the similarity function. Individual segments are disjoint and result in modularity, while the iterative nature of these clustering procedures allow top-down feedback to be incorporated (Greff et al., 2017, 2019).

4 Composition

Let us now assume that representation and segregation have been addressed, and we have available a set of relevant independent objects represented in a common format. Note that when used correctly, these object representations can already make tasks like performing basic feature comparisons very easy. For example, a function that receives a pair of objects as input and compares their size-related features could easily be learned, and would almost automatically generalize to arbitrary pairs of objects.

In contrast, combinatorial generalization is not a given for more complex relational reasoning. While it also involves learning general functions that accept objects as their arguments, one has to take extra care in being able to flexibly assign the right objects to their corresponding function arguments, as well as in learning about different structural forms that imply different ways of generalizing (Kemp & Tenenbaum, 2008). These then imply the following requirements:

General Relations

Relations differ both in their meaning and in the patterns of generalization that they imply. A general reasoning system, therefore, has to be able to instantiate many different types of relations, which necessitates a general representational form.

Dynamic Binding

In order to construct a model for a specific situation, the system needs the flexibility to freely combine objects and relations into an arbitrary structure. Both the structure of relations and the associated objects (variable binding; Browne & Sun 2000) have to be inferred dynamically during run-time.

Role-filler Independence

The content of objects should be independent of their structural roles (Hummel & Holyoak, 2003). That is, any object can take part in any relation, and the interpretation of the whole is determined by both the parts and the structure. This is related to common format and is the key to compositionality that enables the powerful systematic generalization that is characteristic of many symbolic systems.

One approach is to implement complex relational reasoning in a sequential fashion. At each step, an object associated with a particular role is processed and the resulting intermediate computation is stored, to be combined in the next step. While it is clear that a plain RNN can perform this type of computation, the dual role of intermediate representations in representing objects and intermediate computation suggests a very specific function that may be hard to learn (Graves et al., 2014). Alternatively, by combining an RNN with a suitable memory mechanism (eg. Das et al. (1992); Mozer & Das (1993); Reed & de Freitas (2015); Graves et al. (2016)) or fast weights (eg. Schmidhuber (1992, 1993); Schlag & Schmidhuber (2018)) it may be more easy to learn general functions of this kind.

An alternative approach is to embed objects, and intermediate representations as nodes in a (directed) graph and let computation take place along its edges. These computation graphs can implement arbitrary relationships, including recursive computation by re-applying the same function successively. Graph Networks (Battaglia et al., 2018) structure neural network computations according to this underlying graph and perform relational reasoning through repeated message-passing between the nodes in the graph. Compositionality is achieved through weight-sharing, i.e. by learning a general function that operates on (pairs of) nodes following their topological relationship. However, while graph networks have been successfully applied in the domain of physical reasoning (e.g. Battaglia et al. (2016); van Steenkiste et al. (2018)), a remaining challenge is in dynamically inferring the right structure (i.e. dynamic binding).

While graph networks appear most promising in addressing the challenges of composition, one other approach deserves a mention. Embedding approaches, such as Poincaré embeddings (Nickel & Kiela, 2017), generalize Euclidean representations to other spaces that more suited in modeling certain types of relations, in this case: hierarchical relationships. However, the feature representations are essentially adapted to reflect the underlying relation during training, which implies fixed roles and binding during inference.

5 Conclusion

We have argued that feature representations alone are inadequate abstractions for planning, reasoning, and for systematically transferring knowledge to novel situations. To meet the diverse challenges on our quest towards AGI an agent needs to be able to dynamically construct new models about its environment on the fly while reusing as much prior knowledge as possible. We have argued that objects (dynamically bound features) are adequate building blocks to quickly and flexibly compose such task-specific models. Although our examples were centered around vision, we believe that the notion of objects applies equally to other domains like audio, tactile and even abstract thought. By focusing on their role as compositional primitives, we have identified some inductive biases that we believe are necessary for objects to arise within a connectionist system. They can be categorized into three areas: representation, segregation, and composition of objects.

Among these three we find that segregation is most frequently neglected and deserves more attention. Common approaches rely either on some combination of pre-processing pipelines, supervision, or highly engineered generative models of objects. Meanwhile, the few approaches that tackle this challenge in a holistic and unsupervised way are brittle and have not yet been scaled to real-world data. Developing better methods for tackling the segregation problem within the framework of connectionism is going to be a central challenge on the way towards AGI. Similarly, we would like to stress the importance of integrating solutions to all three aspects into a single system. The potential of objects as modular building blocks can only be realized in full if they are both informed by learned representations, and by feedback from the composite model.

Another important direction is the integration of objects with other critical cognitive mechanisms such as attention and memory. Because objects are optimized to be modular, they naturally aggregate features that need to be processed together, but which can be separated from other information. This makes them ideal primitives for attention and for storage and retrieval from long-term memory. Attention, in turn, can simplify a task by filtering out irrelevant information and can guide the processing required for more complex reasoning chains. Such a reasoning process could then also query objects from memory on demand to be compared to or integrated with the current model.

With this short essay, we hope to draw attention to the intricacies of objects and inspire others to think critically about their integration in connectionist models.


This research was funded by SNF grant 200021_165675/1.


  • Battaglia et al. (2016) Battaglia, P. W., Pascanu, R., Lai, M., and Rezende, D. J. Interaction networks for learning about objects, relations and physics. In Advances in Neural Information Processing Systems, pp. 4502–4510, 2016.
  • Battaglia et al. (2018) Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., Gulcehre, C., Song, H. F., Ballard, A., Gilmer, J., Dahl, G. E., Vaswani, A., Allen, K., Nash, C., Langston, V., Dyer, C., Heess, N., Wierstra, D., Kohli, P., Botvinick, M., Vinyals, O., Li, Y., and Pascanu, R. Relational inductive biases, deep learning, and graph networks. arXiv:1806.01261 [cs, stat], June 2018.
  • Browne & Sun (2000) Browne, A. and Sun, R. Connectionist variable binding. Hybrid Neural Systems. Springer Verlag, Heidelberg, pp.  42, 2000.
  • Das et al. (1992) Das, S., Giles, C., and Sun, G. Learning context-free grammars: Capabilities and limitations of a neural network with an external stack memory. In Proceedings of the The Fourteenth Annual Conference of the Cognitive Science Society, Bloomington, 1992.
  • de Avila Belbute-Peres et al. (2018) de Avila Belbute-Peres, F., Smith, K., Allen, K., Tenenbaum, J., and Kolter, J. Z. End-to-end differentiable physics for learning and control. In Advances in Neural Information Processing Systems, pp. 7178–7189, 2018.
  • Diuk et al. (2008) Diuk, C., Cohen, A., and Littman, M. L. An object-oriented representation for efficient reinforcement learning. In

    Proceedings of the 25th International Conference on Machine Learning

    , pp. 240–247, 2008.
  • Eslami et al. (2016) Eslami, S. M. A., Heess, N., Weber, T., Tassa, Y., Szepesvari, D., Kavukcuoglu, K., and Hinton, G. E.

    Attend, infer, repeat: Fast scene understanding with generative models.

    In Advances In Neural Information Processing Systems, pp. 3225–3233, 2016.
  • François-Lavet et al. (2018) François-Lavet, V., Henderson, P., Islam, R., Bellemare, M. G., Pineau, J., et al. An introduction to deep reinforcement learning. Foundations and Trends® in Machine Learning, 11(3-4):219–354, 2018.
  • Graves et al. (2014) Graves, A., Wayne, G., and Danihelka, I. Neural Turing machines. October 2014.
  • Graves et al. (2016) Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A. P., Hermann, K. M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., and Hassabis, D. Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471–476, October 2016. ISSN 0028-0836. doi: 10.1038/nature20101.
  • Greff et al. (2017) Greff, K., van Steenkiste, S., and Schmidhuber, J.

    Neural expectation maximization.

    In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30, pp. 6694–6704. Curran Associates, Inc., 2017.
  • Greff et al. (2019) Greff, K., Kaufman, R. L., Kabra, R., Watters, N., Burgess, C. P., Zoran, D., Matthey, L., Botvinick, M., and Lerchner, A. Multi-object representation learning with iterative variational inference. In International Conference on Machine Learning, pp. to appear, 2019.
  • Ha & Schmidhuber (2018) Ha, D. and Schmidhuber, J. Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems, pp. 2450–2462, 2018.
  • Hummel & Holyoak (2003) Hummel, J. E. and Holyoak, K. J. A symbolic-connectionist theory of relational inference and generalization. Psychological review, 110(2):220, 2003.
  • Kansky et al. (2017) Kansky, K., Silver, T., Mély, D. A., Eldawy, M., Lázaro-Gredilla, M., Lou, X., Dorfman, N., Sidor, S., Phoenix, D. S., and George, D. Schema networks: Zero-shot transfer with a generative causal model of intuitive physics. In ICML’17, pp. 1809–1818, Sydney, NSW, Australia, 2017. JMLR.org.
  • Kemp & Tenenbaum (2008) Kemp, C. and Tenenbaum, J. B. The discovery of structural form. Proc. Natl. Acad. Sci. U. S. A., 105(31):10687–10692, August 2008. ISSN 0027-8424. doi: 10.1073/pnas.0802631105.
  • Kosiorek et al. (2018) Kosiorek, A., Kim, H., Teh, Y. W., and Posner, I. Sequential attend, infer, repeat: Generative modelling of moving objects. In Advances in Neural Information Processing Systems, pp. 8606–8616, 2018.
  • Lake et al. (2017) Lake, B. M., Ullman, T. D., Tenenbaum, J. B., and Gershman, S. J. Building machines that learn and think like people. Behav. Brain Sci., 40:e253, January 2017. ISSN 0140-525X. doi: 10.1017/S0140525X16001837.
  • Lane & Henderson (1998) Lane, P. C. and Henderson, J. B. Simple synchrony networks: Learning to parse natural language with temporal synchrony variable binding. In International Conference on Artificial Neural Networks, pp. 615–620. Springer, 1998. doi: https://doi.org/10.1007/978-1-4471-1599-1˙93.
  • Leike et al. (2017) Leike, J., Martic, M., Krakovna, V., Ortega, P. A., Everitt, T., Lefrancq, A., Orseau, L., and Legg, S. AI safety gridworlds. arXiv preprint arXiv:1711.09883, 2017.
  • Miller (1956) Miller, G. A. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological review, 63(2):81, 1956.
  • Mnih et al. (2014) Mnih, V., Heess, N., Graves, A., and Kavukcuoglu, K. Recurrent models of visual attention. In Advances in Neural Information Processing Systems 27, pp. 2204–2212, 2014.
  • Mozer & Das (1993) Mozer, M. C. and Das, S. A connectionist symbol manipulator that discovers the structure of context-free languages. Advances in Neural Information Processing Systems (NIPS), pp. 863–863, 1993.
  • Munakata et al. (1997) Munakata, Y., McClelland, J. L., Johnson, M. H., and Siegler, R. S. Rethinking infant knowledge: Toward an adaptive process account of successes and failures in object permanence tasks. Psychological review, 104(4):686, 1997.
  • Nickel & Kiela (2017) Nickel, M. and Kiela, D. Poincaré embeddings for learning hierarchical representations. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30, pp. 6338–6347. Curran Associates, Inc., 2017.
  • Niklasson & van Gelder (1994) Niklasson, L. F. and van Gelder, T. On being systematically connectionist. Mind Lang., 1994. ISSN 0268-1064.
  • Reed & de Freitas (2015) Reed, S. and de Freitas, N. Neural programmer-interpreters. In International Conference on Learning Representations, November 2015.
  • Reichert & Serre (2013) Reichert, D. P. and Serre, T. Neuronal synchrony in complex-valued deep networks. arXiv:1312. 6115 [cs, q-bio, stat], December 2013.
  • Santoro et al. (2017) Santoro, A., Raposo, D., Barrett, D. G. T., Malinowski, M., Pascanu, R., Battaglia, P. W., and Lillicrap, T. A simple neural network module for relational reasoning. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30, pp. 4967–4976. Curran Associates, Inc., 2017.
  • Schlag & Schmidhuber (2018) Schlag, I. and Schmidhuber, J. Learning to reason with third order tensor products. In Advances in Neural Information Processing Systems, pp. 9981–9993, 2018.
  • Schmidhuber (1990) Schmidhuber, J. Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, Institut für Informatik, Technische Universität München, 1990.
  • Schmidhuber (1992) Schmidhuber, J. Learning to control fast-weight memories: An alternative to recurrent nets. Neural Computation, 4(1):131–139, 1992.
  • Schmidhuber (1993) Schmidhuber, J. On decreasing the ratio between learning complexity and number of time-varying variables in fully recurrent nets. In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pp. 460–463. Springer, 1993.
  • Schmidhuber & Huber (1991) Schmidhuber, J. and Huber, R. Learning to generate artificial fovea trajectories for target detection. Int. J. Neural Syst., 2(01n02):125–134, 1991. ISSN 0129-0657.
  • Smolensky (1990) Smolensky, P. Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif. Intell., 46(1):159–216, November 1990. ISSN 0004-3702. doi: 10.1016/0004-3702(90)90007-M.
  • Spelke & Kinzler (2007) Spelke, E. S. and Kinzler, K. D. Core knowledge. Developmental science, 10(1):89–96, 2007.
  • Treisman (1999) Treisman, A. Solutions to the binding problem: Progress through controversy and convergence. Neuron, 24(1):105–10, 111–25, September 1999. ISSN 0896-6273.
  • Tsividis et al. (2017) Tsividis, P. A., Pouncy, T., Xu, J. L., Tenenbaum, J. B., and Gershman, S. J. Human learning in Atari. In 2017 AAAI Spring Symposium Series, March 2017.
  • van Steenkiste et al. (2018) van Steenkiste, S., Chang, M., Greff, K., and Schmidhuber, J. Relational neural expectation maximization: Unsupervised discovery of objects and their interactions. In Proceedings of the International Conference on Learning Representations (ICLR), January 2018.
  • von der Malsburg (1995) von der Malsburg, C. Binding in models of perception and brain function. Curr. Opin. Neurobiol., 5(4):520–526, 1995. ISSN 0959-4388.
  • Zambaldi et al. (2019) Zambaldi, V., Raposo, D., Santoro, A., Bapst, V., Li, Y., Babuschkin, I., Tuyls, K., Reichert, D., Lillicrap, T., Lockhart, E., Shanahan, M., Langston, V., Pascanu, R., Botvinick, M., Vinyals, O., and Battaglia, P. Deep reinforcement learning with relational inductive biases. In International Conference on Learning Representations, 2019.