Exploration is the process of searching new solutions for a problem or new information about a system. It is a cornerstone of many machine learning algorithms. For example, a robotic reinforcement learning agent may have to explore for discovering new objects and effects it can produce on them. Moreover, exploration is an important part for knowledge discovery in science and engineering. In order to understand or optimize a system, one must explore to discover what are its properties and behaviours.
Intrinsically motivated goal exploration processes (IMGEPs) have shown to be efficient exploration strategies for autonomous agents to discover and map the diversity of effects they can produce in their environment Baranes and Oudeyer (2013); Forestier et al. (2017); Laversanne-Finot et al. (2018). With IMGEPs, agents self-define their own experiments by imagining goals, then try to achieve them by leveraging their past discoveries (Fig. 1). Progressively they learn which goals are achievable and which are are not. The goals are defined in a space of representations called goal space. It describes the important features of the direct observations. For a robot that interacts with objects the locations and properties of those objects could be such features Forestier et al. (2017)et al. (2018); Péré et al. (2018).
So far, IMGEPs have been mainly used in the context of autonomous learning agents and robots. They enable an efficient exploration of diverse skill repertoires in high-dimensional robots Baranes and Oudeyer (2013); Forestier et al. (2017). Nonetheless, their exploration capabilities are not constrained to this field and can be used in a variety of application scenarios. In this paper we exemplify their application for automating the discovery of complex patterns in high-dimensional complex systems such as studied in developmental (theoretical) biology, chemistry or physics. Based on our results, IMGEPs show a high potential to be efficient tools for helping scientists to discover and analyze novel high-dimensional self-organized structures in these complex systems. In a recent step in that direction, Grizou et al. Grizou et al. (2019) showed that IMGEPs are capable of making autonomously discoveries in a chemical system. However, the approach was based on a simple low-dimensional hand engineered goal space. In this paper we show that the full abilities of IMGEPs can be utilized for such environments by also learning in an unsupervised manner the representations that define the goal space. We show this ability on the example of discovering morphogenetic patterns in Lenia Chan (2018), a continuous game-of-life cellular automaton.
Moreover, we introduce with this paper a new IMGEP algorithm able to learn online the representations for its goal space. In previous IMGEP approaches, the goal representation either had to be defined by hand Forestier et al. (2017) or learned before the exploration with prerecorded data Péré et al. (2018). This required either expert knowledge about the system or existing data that might bias the exploration. The online learning approach can be directly applied without expert knowledge or preexisting data.
In summary, the paper provides the following new contributions:
The application of IMGEPs in a new domain: The discovery of structures and patterns in high-dimensional complex systems with autonomous learning of goal representations.
A novel online representation learning algorithm for IMGEPs that does not rely on expert knowledge or precollected data.
2 Related Work
Intrinsically-motivated learning Baldassarre and Mirolli (2013); Baranes and Oudeyer (2013) is a family of computational models that autonomously organize an agents exploration curriculum in order to discover efficiently a maximally diverse set of outcomes the agent can produce in an unknown environment. The models are inspired from the way children self-develop a hierarchy of skills in order to make sense of the world. Intrinsically Motivated Goal Exploration Processes (IMGEPs) Baranes and Oudeyer (2013); Forestier et al. (2017) build upon those models. They are a family of curiosity-driven algorithms which have been developed to allow the exploration of high-dimensional complex real world systems. Population-based versions of these algorithms, which leverage episodic memory and hindsight learning, have shown to enable artificial agents, such as robots, to acquire diverse repertoires of skills Forestier et al. (2017); Rolf et al. (2010). For example, they are able to bootstrap the exploration capacity for deep reinforcement learning problems with rare or deceptive rewards Colas et al. (2018). Recent work Laversanne-Finot et al. (2018); Péré et al. (2018)
studies how to automatically learn the goal representations with the use of deep variational autoencoders. However, the training is done passively and in an early stage on a precollected set of available observations. A related family of algorithms to IMGEPs in evolutionary computation is novelty searchLehman and Stanley (2008) and quality-diversity algorithms Pugh et al. (2016), which can be formalized as special kind of population-based IMGEPs with a fixed random goal sampling policy.
Intrinsically motivated learning techniques have also been widely developed to handle exploration in reinforcement learning. Diverse approaches were studied ranging from estimating visitation countsBellemare et al. (2016), measures of empowerment Gregor et al. (2016), goal exploration approaches Florensa et al. (2017) with hindsight learning Andrychowicz et al. (2017) and automated curriculum Colas et al. (2019), or related concepts such as auxiliary tasks Riedmiller et al. (2018) and general value functions Sutton et al. (2011). Recent approaches Nair et al. (2018); Pong et al. (2019) also introduced the usage of an online training of VAEs to learn the important features of a goal space similar to the methods in this paper. However, these approaches focused on the problem of sequential decisions in MDPs (incurring a cost on sample efficiency). The approaches are orthogonal to the automated discovery framework considered here with independent experiments allowing the use of memory-based sampling efficient methods.
Active inquiry-based learning strategies have been used in biology King et al. (2009, 2004), chemistry Duros et al. (2017) and astrophysics Richards et al. (2011) to autonomously query which set of experiments to perform in order to improve the overall model of the system. These data-driven approaches considerably reduce the experimental costs but still require a database of representative experiments. Recently, machine learning algorithms Raccuglia et al. (2016); Reizman et al. (2016); Schweidtmann et al. (2018) have been integrated into the experimental laboratory and often combined to the use of robotics and automation platforms Granda et al. (2018); Houben and Lapkin (2015). These methods open a brand new perspective to the way scientific experiments are conducted, but most of them rely on expert knowledge and optimize specific target properties. Rather than trying to find the optimal physico-chemical model from a database of collected experiments, we are interested to automatically discover a diversity of unseen patterns without requiring prior knowledge of the system.
We are using representation learning methods to identify autonomously goal spaces for IMGEPs. Representation learning aims at finding low-dimensional explanatory factors representing high-dimensional input data Bengio et al. (2013). It is a key problem in a many areas in order to understand the underlying structure of complex observations. Deep variational autoencoders (VAE) Kingma and Welling (2013) are one of the most popular approaches. Many state-of-the-art methods Chen et al. (2018); Gulrajani et al. (2016); Higgins et al. (2017); Kim and Mnih (2018); Kumar et al. (2017); Zhao et al. (2017a) build on top of them using varying objectives and network architectures. See Tschannen et al. Tschannen et al. (2018) for an in-depth review.
This section describes intrinsically motivated goal exploration processes and the online learning approach for representations used as their goal spaces.
3.1 Intrinsically Motivated Goal Exploration Processes
An IMGEP is a sequence of experiments that explore the parameters of a system by targeting self-generated goals (Fig. 1). It aims to maximize the diversity of observations from that system within a budget of experiments.
The systems are defined by three components. A parameter space corresponding to the controllable system parameters . An observation space where an observation
is a vector representing all the signals captured from the system. For this paper, the observations are a time series of images which depict the morphogenesis of activity patterns. Finally, an unknown environment dynamic: which maps parameters to observations.
To explore a system, an IMGEP defines a goal space that represents relevant features of its observations. For a robot that has to manipulate objects and observes them with a video camera, those features could be the object positions. From this goal space a goal is sampled by a goal sampling distribution. In the robot example this corresponds to a sampling of positions to which the robot should move the objects. Then, a parameter is chosen that will be explored to reach goal . The parameter is chosen according to a parameter sampling policy . Usually, the parameter sampling policy and in some cases the goal sampling distribution take into account previous explorations which are stored in a history . After a parameter is selected it is explored on the system and the outcome observed. Based on the observation the actually reached goal is computed using an encoding function . The encoder is either hand-defined or in the case of our online approach learned via a variational autoencoder. The reached goal is together with its corresponding parameter and observation stored in a history . The exploration process is repeated until a certain number of steps or another constraint is reached. Because the sampling of goals and parameters depend on a history of explored parameters, an initial set of parameters are randomly sampled and explored before the intrinsically motivated goal exploration process starts.
. We chose for both basic approaches. Goals are sampled from a uniform distribution over the goal space. Parameters are chosen by selecting for a given goal the parameter from the history whose reached goal has the shortest distance in the goal space to the given goal. This parameter is then mutated by a random process.
3.2 Learning of Goal Spaces via Online Representation Learning
For IMGEPs the definition of the goal space and its corresponding encoder
are a critical part, because they define which observations the process tries to identify from the target system. A straightforward choice to define a goal space is by selecting features manually, such as by using computer vision algorithms to detect the positions of objects from video imagesForestier et al. (2017); Grizou et al. (2019). A problematic point of this approach is its requirement of expert knowledge to select helpful features. Moreover, even experts might not know which features are important or how to formulate them for unknown or high-dimensional systems.
A more elaborated approach is to learn goal space features by unsupervised representation learning. Representation learning is able to learn a mapping from the raw sensor observations to a compact latent vector . This latent mapping can be used as a goal space where a latent vector is interpreted as a goal.
Previous approaches already applied successfully the learning of goal spaces with variational autoencoders (VAE) Laversanne-Finot et al. (2018); Péré et al. (2018). However, the goal spaces were learned before the start of the exploration from a prerecorded dataset of observations from the target system. During the exploration the learned representations were kept fixed. A problem with this pretraining approach is that training samples may be limited and often biased towards the initial knowledge about the system.
In this paper we attempt to address this problem by continuously adapting the learned representation to the observations that are encountered during the exploration process. We believe it is crucial to learn the representation of features for new and unseen observations to further enable the discovery of a diversity of similar observations. To address this challenge, we propose an online goal space learning IMGEP (IMGEP-OGL), which learns the goal space in an incremental manner during the exploration process (Algorithm 1). We evaluated different variants of VAEs for the representation learning part of the algorithm. The Supplementary Material provides further details about the different VAE variants.
The training procedure of the VAE is integrated in the goal sampling exploration process by first initializing the VAE with random weights. The VAE network is then trained every explorations for epochs on the observation collected in the history . Importance sampling is used to give more weight to recently discovered patterns.
The usefulness of IMGEPs for the discovery of novel patterns in complex system was evaluated on the Lenia system. The following sections introduce Lenia, the different algorithms that were compared and the experimental procedure. Please refer to the Supplementary Materials for further details about the procedure and additional algorithm variants that have been compared.
4.1 Target System: Lenia
Lenia Chan (2018) is a continuous cellular automaton Wolfram (1983) similar to Conway’s Game of Life Gardener (1970). Game-of-life systems have been used many times as abstract models for theoretical understanding of how self-organized structures, including solitons, may form in natural morphogenetic systems. Lenia, in particular, represents a high-dimensional complex dynamical system where diverse visual structures can self-organize and yet are hard to find by manual exploration. It is therefore well suited to test the performance of exploration algorithms for unknown and complex systems.
Lenia consists of a two-dimensional grid of cells where the state of each cell is a real-valued scalar activity . The state of cells evolves over discrete time steps (Fig. 2, a). The activity change is computed by integrating the activity of neighbouring cells. Lenia’s behavior is controlled by its initial pattern and several settings that control the dynamics of the activity change (). Please see the Supplementary Material for details about Lenia.
Lenia can be understood as a morphogenetic system where the parameters represent the genes of a developmental process. They control into which final activity pattern the initial pattern morphs. Lenia can produce diverse patterns with different dynamics such as stable, non-stable or chaotic patterns. Most interesting, patterns that resemble microscopic animals can be produced (Fig. 2, b, c). We use Lenia to study if IMGEPs can autonomously discover such patterns.
We implemented different pattern classifiers to analyze the exploration results. We differentiate between dead and alive patterns. A pattern is dead if the activity of all cells are eitheror . Alive patterns are separated into animals and non-animals. Animals are a connected areas of positive activity which are finite, i.e. which do not infinitely cross several borders. All other patterns are non-animals whose activity usually spreads over the whole state space.
The exploration behaviors of different IMGEP algorithms were evaluated and compared to a random exploration. The IMGEP variants differ in their way how the goal space is defined or learned. We tested for each algorithm class several variants and selected the optimal ones. Please see the Supplementary Material for more information about the different variants.
Random exploration: The IMGEP variants were compared to a random exploration that sampled randomly for each of the exploration iterations the parameters including the initial state .
IMGEP-HGS - Goal exploration with a hand-defined goal space: The first IMGEP uses a hand-defined goal space that is composed of 5 features. Each feature measures a certain property of the final pattern that emerged in Lenia: 1) the sum over the activity of all cells, 2) the number of activated cells, 3) the density of the activity center, 4) an asymmetry measure of the pattern and 5) a distribution measure of the pattern.
IMGEP-PGL - Goal exploration with a pretrained goal space: For this IMGEP variant the goal space was learned with a VAE approach on training data before the exploration process started. The training set consisted of 558 Lenia patterns. Half of the patterns were animals that have been manually identified by Chan (2018). The other half were randomly initialized patterns that were created with the same procedure as described in Section 4.3.
IMGEP-OGL - Goal exploration with online learning of the goal space: The final algorithm is the new online variant for IMGEPs (Algorithm 1).
4.3 Experimental Procedure
For each algorithm 10 repetitions of the exploration experiment were conducted to measure their average performance. Each experiment consisted of exploration iterations. For IMGEP variants the first iterations were random explorations to populate their histories . For all algorithms an identical initial set of random explorations was used to allow a better comparison between them. For the following iterations each IMGEP approach sampled a goal via an uniform distribution over its goal space. Then, the parameter from a previous exploration in was selected whose reached goal had the minimum euclidean distance to the current goal within the goal space. This parameter was then mutated by a random process to generate the parameter that was explored.
The parameters consisted of a compositional pattern producing network (CPPN) Stanley (2006) that generates the initial state for Lenia and the settings defining Lenia’s dynamics:
. CPPNs are recurrent neural networks that were originally used to generate and evolve gray scale images, but that can be similarly used to generate Lenia patterns. The networks are initialized and mutated by a random process that defines their structure and connection weights as done inStanley (2006)
. The random initialization of the other Lenia settings was done by an uniform distribution and their mutation by a Gaussian distribution around the original values. The meta parameters to initialize and mutate the parameters were the same for all algorithms (see the Supplementary Material). They were manually chosen without optimizing them for a specific algorithm. The parameters of the CPPN networks were set to initialize and mutate networks that generate similar patterns as inStanley (2006).
We compared random explorations and IMGEP algorithms on their ability to identify a set of Lenia patterns with a high diversity. Diversity is measured in an analytic behavior space constructed by hand-defined and learned features. Furthermore, we compared the goal spaces of hand-defined and learned IMGEP variants.
5.1 Diversity of Explored Lenia Patterns
Diversity is measured by the spread of the exploration in an analytic behavior space defined by 13 features. We constructed this space because the observation space , i.e. the final Lenia patterns , are too high-dimensional. The features of the new space are: 1) the sum over the activity of all cells, 2) the number of activated cells, 3) the density of the activity center, 4) an asymmetry measure of the pattern, 5) a distribution measure of the pattern and 6-13) a latent representation of the pattern by a VAE. The VAE was trained on a dataset of Lenia patterns that were identified during all performed experiments. Please see the Supplementary Material for details.
|(a) Diversity in Parameter Space||(b) Diversity in Behavior Space|
|(c) Behavior Space Diversity for Animals||(d) Behavior Space Diversity for Non-Animals|
) with the standard deviation as shaded area (for some not visible because it is too small).
For each experiment all explored patterns were projected into the analytic behavior space. To measure the diversity of the found solutions we used a simple measure of the area that the exploration covered in the analytic behavior space. The measure discretizes the analytic behavior space into bins of equal size by splitting each dimension into sections resulting in bins. The number of bins in which at least one explored entity falls is used as a measure for diversity.
We also measured the diversity in the space of parameters by constructing an analytic parameter space. The 15 features of this space consisted of Lenia’s parameters (, , , , , , ) and the latent representation of a VAE. The VAE was trained on a dataset of initial Lenia states () that were used during the experiments. Also for this diversity 7 bins per dimension where used to discretize the space.
Comparing the diversity between the analytic parameter and behavior space reveals the advantage of IMGEPs over random explorations (Fig. 3, a, b). Although random explorations have reached the highest diversity in the space of parameters, they are outperformed in terms of diversity by the IMGEP approaches in the analytic behavior space. Thus, the IMGEP approaches are better in the actual objective of our exploration, finding a diverse set of Lenia patterns.
Both IMGEPs (PGL and OGL) with learned goal spaces reached a higher diversity over all patterns than the one with a hand-defined goal space (HGS) (Fig. 3, b). Nonetheless, this is not the the case for certain subgroups of patterns. In the case of comparing the diversity only over explored animals (Fig. 3, c) the new online approach IMGEP-OGL is finding the highest diversity of animals. It is closely followed by the pretrained IMGEP-PGL approach. The hand-defined goal space approach IMGEP-HGS can identify a diversity of 50% compared to IMGEP-OGL. Random explorations find only less than 30% compared to IMGEP-OGL. In the case of diversity over non-animal patterns (Fig. 3, d) the IMGEP-HGS reached the highest diversity followed by the IMGEP-OGL and IMGEP-PGL. Random explorations reached the lowest diversity. These results show that the goal-space has a critical influence on the type of patterns that are identified.
5.2 Differences in Goal Spaces
|(a) IMGEP-HGS Goal Space||(b) IMGEP-OGL Goal Space|
We analyzed the goal spaces of the different IMGEP variants to understand their behavior by visualizing their reached goals in a two-dimensional space. T-SNE Maaten and Hinton (2008) was used to reduce the high-dimensional goal spaces. It puts points that were nearby in the high-dimensional space also close to each other in the two-dimensional visualization.
The goal spaces of IMGEP-HGS and IMGEP-OGL show strong differences between each other (Fig. 4) which we believe explain their different abilities to find either a high diversity of non-animals or animals (Fig. 3, c, d). The goal space of the IMGEP-HGS shows large areas and several clusters for non-animal patterns (Fig. 4, a). Animals form only few and nearby clusters. Thus, the hand-defined features seem poor to discriminate and describe animal patterns in Lenia. As a consequence, when goals are uniformly sampled within this goal space during the exploration process, then more goals are generated in regions that describe non-animals. This can explain why IMGEP-HGS explored a high diversity of non-animal patterns but only a low diversity of animal patterns.
In contrast, features learned by IMGEP-OGL capture better factors that differentiate animal patterns. This is indicated by the several clusters of animals that span a wide area in its goal space (Fig. 4, b). We attribute this effect to the difficulty of VAEs to capture sharp details Zhao et al. (2017b). They therefore represent mainly the general form of the patterns but not their fine-grained structures. Animals differ often in their form wheras non-animals occupy often the whole cell grid and differ in their fine-grained details. The goal spaces learned by VAEs seem therefore better suited for exploring diverse sets of animal patterns.
We presented in this paper the application of intrinsically motivated exploration via IMGEPs towards a new and exciting application area: the discovery of patterns and structures in complex systems. All evaluated IMGEP variants were able to discover a diverse set of patterns for Lenia, a cellular automaton, by directly exploring its high-dimensional parameters and observing its high-dimensional output patterns. We could demonstrate that goal spaces for such systems can be successfully learned via deep VAEs which allow the identification of animal-like patterns similar to those identified by human experts (Fig. 2). Moreover, our new approach of learning goal spaces online via data collected during the exploration process could outperform a pretrained and fixed goal space in terms of identifying a diverse set of animal-like patterns.
We believe that IMGEPs are able to facilitate the study of similar complex and high-dimensional systems in different fields of engineering and science such as physics and chemistry. IMGEPs allow to explore and discover efficiently interesting behaviors and patterns of unknown systems. This knowledge helps to understand the systems better or to find new solutions for problems in them.
We thank Bert Wang-Chak Chan for his helpful discussions about the Lenia system and Jonathan Grizou for his comments on the visualization of our results.
-  (2017) Hindsight experience replay. In Advances in Neural Information Processing Systems, pp. 5048–5058. Cited by: §2.
-  (2013) Intrinsically motivated learning in natural and artificial systems. Springer. Cited by: §2.
-  (2013) Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics and Autonomous Systems 61 (1), pp. 49–73. Cited by: §1, §1, §2, §3.1.
-  (2016) Unifying count-based exploration and intrinsic motivation. In Advances in Neural Information Processing Systems, pp. 1471–1479. Cited by: §2.
-  (2013) Representation learning: a review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35 (8), pp. 1798–1828. Cited by: §2.
-  (2018) Lenia-biology of artificial life. arXiv preprint arXiv:1812.05433. Cited by: §1, Figure 2, §4.1, §4.2.
-  (2018) Isolating sources of disentanglement in variational autoencoders. In Advances in Neural Information Processing Systems, pp. 2610–2620. Cited by: §2.
-  (2018) Gep-pg: decoupling exploration and exploitation in deep reinforcement learning algorithms. arXiv preprint arXiv:1802.05054. Cited by: §2.
-  (2019) CURIOUS: intrinsically motivated multi-task, multi-goal reinforcement learning. In International Conference on Machine Learning (ICML), Cited by: §2.
-  (2017) Human versus robots in the discovery and crystallization of gigantic polyoxometalates. Angewandte Chemie 129 (36), pp. 10955–10960. Cited by: §2.
-  (2017) Reverse curriculum generation for reinforcement learning. arXiv preprint arXiv:1707.05300. Cited by: §2.
-  (2017) Intrinsically motivated goal exploration processes with automatic curriculum learning. arXiv preprint arXiv:1708.02190. Cited by: §1, §1, §1, §2, §3.2.
-  (2016) Modular active curiosity-driven discovery of tool use. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3965–3972. Cited by: §3.1.
-  (1970) MATHEMATICAL games: the fantastic combinations of john conway’s new solitaire game" life,". Scientific American 223, pp. 120–123. Cited by: §4.1.
-  (2018) Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559 (7714), pp. 377. Cited by: §2.
-  (2016) Variational intrinsic control. arXiv preprint arXiv:1611.07507. Cited by: §2.
-  (2019) Exploration of self-propelling droplets using a curiosity driven robotic assistant. arXiv preprint arXiv:1904.12635. Cited by: §1, §3.2.
-  (2016) Pixelvae: a latent variable model for natural images. arXiv preprint arXiv:1611.05013. Cited by: §2.
-  (2017) Beta-vae: learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, Vol. 3. Cited by: §2.
-  (2015) Automatic discovery and optimization of chemical processes. Current opinion in chemical engineering 9, pp. 1–7. Cited by: §2.
-  (2018) Disentangling by factorising. arXiv preprint arXiv:1802.05983. Cited by: §2.
-  (2009) The automation of science. Science 324 (5923), pp. 85–89. Cited by: §2.
-  (2004) Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 427 (6971), pp. 247. Cited by: §2.
-  (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §2.
-  (2017) Variational inference of disentangled latent concepts from unlabeled observations. arXiv preprint arXiv:1711.00848. Cited by: §2.
-  (2018-29–31 Oct) Curiosity driven exploration of learned disentangled goal spaces. In Proceedings of The 2nd Conference on Robot Learning, A. Billard, A. Dragan, J. Peters, and J. Morimoto (Eds.), Proceedings of Machine Learning Research, Vol. 87, , pp. 487–504. External Links: Cited by: §1, §2, §3.2.
-  (2008) Exploiting open-endedness to solve problems through the search for novelty.. In ALIFE, pp. 329–336. Cited by: §2.
-  (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §5.2.
-  (2018) Visual reinforcement learning with imagined goals. In Advances in Neural Information Processing Systems, pp. 9191–9200. Cited by: §2.
-  (2018-04) Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration. In ICLR2018 - 6th International Conference on Learning Representations, Vancouver, Canada. External Links: Cited by: §1, §1, §2, §3.2.
-  (2019) Skew-fit: state-covering self-supervised reinforcement learning. arXiv preprint arXiv:1903.03698. Cited by: §2.
-  (2016) Quality diversity: a new frontier for evolutionary computation. Frontiers in Robotics and AI 3, pp. 40. Cited by: §2.
-  (2016) Machine-learning-assisted materials discovery using failed experiments. Nature 533 (7601), pp. 73. Cited by: §2.
-  (2016) Suzuki–miyaura cross-coupling optimization enabled by automated feedback. Reaction chemistry & engineering 1 (6), pp. 658–666. Cited by: §2.
-  (2011) Active learning to overcome sample selection bias: application to photometric variable star classification. The Astrophysical Journal 744 (2), pp. 192. Cited by: §2.
-  (2018) Learning by playing-solving sparse reward tasks from scratch. arXiv preprint arXiv:1802.10567. Cited by: §2.
-  (2010) Goal babbling permits direct learning of inverse kinematics. IEEE Transactions on Autonomous Mental Development 2 (3), pp. 216–229. Cited by: §2.
-  (2018) Machine learning meets continuous flow chemistry: automated optimization towards the pareto front of multiple objectives. Chemical Engineering Journal 352, pp. 277–282. Cited by: §2.
-  (2006) Exploiting regularity without development. In Proceedings of the AAAI Fall Symposium on Developmental Systems, pp. 37. Cited by: §4.3.
-  (2011) Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pp. 761–768. Cited by: §2.
-  (2018) Recent advances in autoencoder-based representation learning. arXiv preprint arXiv:1812.05069. Cited by: §2.
-  (1983) Statistical mechanics of cellular automata. Reviews of modern physics 55 (3), pp. 601. Cited by: §4.1.
-  (2017) Infovae: information maximizing variational autoencoders. arXiv preprint arXiv:1706.02262. Cited by: §2.
-  (2017) Towards deeper understanding of variational autoencoding models. arXiv preprint arXiv:1702.08658. Cited by: §5.2.