As stated in theories of development, an agent’s brain can contain modules that function as models of its interaction with the world. These models are used by the brain to evaluate the possible actions in “imagined space” and the agent only performs the most promising ones in physical space. The role of a theory on these models is to describe how precisely a sensorimotor model is learnt from experience and how it interacts with other existing models in a developmental context.
There are different types of such models. Machine learning (ML), for example, solves the problem of fitting a model to data in a problem independent form. The ML approach usually relies on apreprocessing step to transform the raw data into the required form. Using ML methods we can learn sensorimotor models of transitions in sensorimotor space up to a desired accuracy. This level of modelling provides the grounding in sensorimotor space. An important question is how to map the raw sensorimotor data to sensorimotor training data for realizing specific functions needed inside a developmental model.
This paper introduces the concept of tapping for designing and analysing models of developmental learning. The concept is adopted from signal processing where it is used to describe a filter as a weighted sum of delayed copies of a signal as shown in Figure 1
. The simplest sensorimotor tapping then is just the same as a filter tapping, using past values of a single variable to predict a future value of the same variable. In realistic situations the number of past values can be numerous, include different modalities, and the linear filter is a general nonlinear function whose parameters are learned from data. This view allows us to discuss a wide range of issues in temporal learning. For example, concepts from developmental robotics, reinforcement learning, neuroscience, and information theory can be represented and compared by exposing relational properties independent of terminology.
The paper is structured as follows. In section II, related work and existing concepts are discussed. Section III explains the basics of tappings, and in section IV specific examples are discussed. In section V some particular application areas for tappings are named. The paper ends with a discussion and conclusions.
Ii Related work
A central concept in signal processing are linear filters. These were originally implemented as analog circuits using delay lines to store a finite amount of the signal’s past values. In time-discrete implementations a filter’s output is computed as a weighted sum over a finite number of past inputs. This is realized by tapping
into fixed positions within a sliding window. Each tap is multiplied by a corresponding weight which together comprise the filter’s coefficients. This provides the starting point for sensorimotor tappings. A filter can be seen as linear regression and its coefficients can be learned with a least squares fit. This is known as an adaptive filter in signal processing and is the same as a linear adaptive forward model in a developmental robotics context.
The main techniques used for describing developmental models are plain text accounts, equations, and various types of block diagrams. Equations and diagrams are each highlighting different aspects of a model’s function and behaviour. Equations are precise in representing functional dependencies including general temporal relations. Block diagrams emphasize which functions are used and which of those functions are interacting directly. None of them provides an intuitive representation of the global extent and the microstructure of interaction between variables for a given robot. This also means that reoccurring patterns of these properties and their systematic variation across different robots are hard to express.
. Backup diagrams track how the instantaneous information is related to previous states and indicates how it is propagated back in time to update the relevant state in the agent’s controller. These diagrams do not however differentiate sensory modalities very well. Probabilistic graphical models, especially dynamic bayesian networks, provide a natural complement to the current approach. Like recurrent neural networks, these models incorporate the problem of mapping input time and modality into the model state. In contrast, tappings aim at a decoupled representation of the input mapping and the model’s state update.
Information theory can be used to quantify the amount of shared information among sensorimotor variables as shown in  or . This provides the empirical complement of tappings and can be used to obtain a tapping from data prior to training a model or to analyze a model’s use of temporal information after training. A number of recent works have suggested predictive information
, the amount of information shared between the past and the future of a random variable, as a measure for the amount of non-trivial information obtained from embodied interaction. This also highlights the importance of the agent’s momentary temporal sensorimotor embedding.
Internal modelling approaches in developmental robotics that use prediction learning are lacking a way to describe the interaction of the embedded sensorimotor models with the information provided by the enclosing developmental model in a general and systematic manner. This also holds for temporal difference learning in RL and correlational learning processes in neuroscience. Thus we see a definite need for an additional tool from which these fields, and maybe robotics and AI at large, might benefit. Our contribution besides the identification of this gap is a proposal for filling it.
The sequence of steps necessary for going from sensorimotor space to the sensorimotor model input / output space are shown in the illustration in Figure 2 with enlarged views of two example tappings. A single sensory measurement at time is represented by a vector. The vector is composed of subparts that reflect the natural structure of the agent’s modalities imposed by the sensors (e.g. vision or joint angles). The set of all possible vectors defines the agent’s sensorimotor space. Measurement vector and sensorimotor space comprise the left part of the figure. The agent’s internal time creates the temporal ordering of incoming measurements , and storing them in this order forms a matrix. The matrix is shown in the center of the figure. It contains a numerical representation of the sequence of external states as they are reflected in sensorimotor space. An agent living in a partially observable world can benefit from extracting additional information from relations across time and modalities. To do this with memoryless models, the sensorimotor matrix has to be tapped using a context dependent pattern attached to the current time step with the data sliding along underneath. The patterns for a forward and an inverse model are shown close up. The locations of the nodes of the tapping indicate which relative time step and modalities are used to assemble a supervised training set. The node’s colors indicate wether the datum is an input or a target.
Consider the example of a Nao robot bootstrapping the ability to move its hand to a given point in visual space shown in Figure 3. The agent creates an episode of data by exploring five random joint angles. For simplicity a kinematic arm is assumed so there is a delay of one time step between motor command and the corresponding measurement. Each momentary measurement consists of the current image, resulting from the previous command, and a new motor command about to be committed. In order to let the agent learn to predict the image in the next time step from the current command, an adaptive model is trained with commands as input and the image as target taken from different relative time steps as shown in Figure 4. The training set is created from the raw data by shifting the row of commands one time step to the right. The measurements in each column of the new matrix are now ordered by model update steps instead of sensorimotor time. A detailed tapping is shown Figure 5.
Iii-B Tapping degrees of freedom
Tappings are specified relative to the current time , becoming positive in the future and negative into the past. This proposal only considers discrete time and equidistant sampling with a constant . It makes sense to group variables in the matrix according to their modality such as as exteroceptive- (vision, hearing), proprioceptive- (motors, joint angles, forces), or interoceptive sensors. Interoceptive variables represent any intermediate stage of other concurrent computations in the agent’s sensorimotor loop. A group whose elements all contribute to the same argument of the target function, for example all pixels in an image, can be reduced to a single element in the graphical representation.
A common arrangement in a developmental model is to use a supervised learning algorithm because it can be trained effectively. A supervised training set consists of the inputand targets that constrain the functional relation . The approximation task is to find parameters for the model such that is minimized under a given loss. Prediction learning allows the agent to construct infinite supervised training data on the fly. Tappings can describe the necessary transformations independent of the learning algorithm. If is the full supervised training set, the tapping defines a map taking an index set to an index set.
It can be immediately seen from the figures that a tapping is a directed graph on top of ’s row and column indices. The graphical structure encodes the relation prescribed by the sensorimotor model’s function inside the developmental model. In addition to the supervised learning case the graph can immediately be taken as dynamic Bayesian network graph connecting the current approach to a rich existing body of formalism and inference techniques.
A single time step prediction problem requires a tapping from one time step to the next. Doing the same along modalities captures intermodal prediction, that is, predicting sensory consequences in one modality from the state of another modality. By adding joint angle sensors to the Nao agent, it could learn to predict the hand position (vision) from joint angles (proprioception) in the same time step.
The two dimensions of the sensorimotor data matrix result in two corresponding tappings resulting in a temporal predictor, shown in (a), and an intermodal predictor shown in (b). Sensorimotor models encode regularity in sensorimotor state transitions along these axes. Learning transitions along the normal forward flow of time results in a forward model. Forward models are central to the simulation theory of cognition, which states that an agent learning to approximate the forward transition rules to a sufficient degree of fidelity can use them to internally “simulate experience” .
Rearranging the direction of prediction to go backwards in time creates an inverse model. This allows the model to predict (infer) causes from observed effects, which allows the agent to control and change its own state by directly predicting the causes of its desired state. This translates to predicting the actions that lead to a goal . Direct prediction imposes constraints on the learning algorithm. Generally the inverse of a function can be a correspondence, requiring the learning algorithm to be able to represent this type relation.
To summarize this section we highlight the main features of tappings. They provide an information centric view on developmental models. This view is independent of particular learning algorithms, and it provides an upper bound111the joint entropy of all sensorimotor variables on the amount of explanation a model needs to accomplish. That bound is a reference for comparing different models in terms of the fraction of maximum explanation. Tappings facilitate the design of developmental models, algorithms and their implementations by highlighting regularities in the design space and being precise and explicit about time. Analysing two important model types and their tappings shows to what extent different functional roles are determined by the input / output relations, and the learning algorithm respectively. These features all contribute to facilitate systematic exploration of developmental models.
Iv Basic tappings
In this section we explore tappings further by looking at some variations of the simple ones that came out of the previous section: multi step prediction, autoencoding, and autopredictive encoding. If the internal model is a feedforward map without internal memory the simple one time step predictor in(a) cannot make use of additional information about the future that was presented more than one time step ago. The missing memory of the model can be replaced by using a moving window of size that augments the momentary model input by including all previous values of the variables222The moving window technique is alternatively known as moving average model, time delay neural network, delay-embedding or method of delays. Since tappings are moving windows, the multi time step tapping shown in Figure 7 is almost trivial, the window size being equal to the number of input taps spread uniformly into the past. Iterative predictions in extended forward simulations demand better model accuracy. A reasonable shortcut towards more accuracy is to improve the prediction by imposing a long-term consistency constraint by extending the target tapping into the future (using buffering in closed-loop learning).
A special case of a predictor is the autoencoder. Its tapping is shown on the left in Figure 8
. Its target output is the same as its input. In terms of theformulation with
, the autoencoder could only consist of wires. The added value of an autoencoder comes exclusively from constraints on the intermediate representation. Like prediction learning, autoencoding is an unsupervised learning technique built with supervised learning. If we look at the tapping we see that the information of each single variable on the input is distributed to all other variables on the output. By a simple change of the tapping we easily obtain an autopredictive encoder (APE) as the result of pulling the autoencoder’s input and output taps one time step apart. The autopredictive encoder is not an established term but multiple proposals for such architectures have in fact been made[9, 10, 11]. Applying the prediction constraint on the model has been shown to increase the task-independence of latent space representations in . In the tapping we see immediately that the prediction constraint encourages the model to represent the rules of change in the hidden space. The APE tapping is shown in Figure 8.
V Application areas
Internal modelling  is an important concept used in developmental robotics [14, 15, 16]. An underlying driving hypothesis is that predictive models enable anticipatory behaviour  which is more powerful than purely reactive behaviour. From the developmental perspective this implies that some functions of a developmental model must be provided by adaptive models of the sensorimotor dynamics. Two basic functional types of internal models, forward and inverse ones, have already been introduced as examples in Figure 2 and are shown again as a pair of tappings in Figure 9. This highlights the rearrangement of the direction of prediction without a change of variables. Exploitation of adaptive models has also been described above indicating different ways of predicting and evaluating future options with forward models, or directly inferring actions with inverse models.
A popular method in reinforcement learning is temporal difference learning. Temporal difference learning is a family of algorithms to approximate a prediction target with a recurrent estimate. The usual target is avalue function
which maps actions to a value. The estimate is bootstrapped by minimizing the moment-to-moment value prediction error, which is ultimately grounded in a primary reward signal. There exists extensive theory in RL that deals with the problem of integrating task-relevant information that is spread out in time, with two fundamental concepts being involved. The first one is that ofmultistep methods which take care of consequences escaping into the future. The second one are eligibility traces which capture causes vanishing into the past. Taken together they solve the general delayed reward problem. Depending on the parameters a corresponding tapping will be similar to the multi step predictor.
The importance of features and modalities and the information contained in their mutual relations is less developed. The concepts used in reinforcement learning can easily be remapped to internal modelling terms and vice versa, making tappings immediately applicable to temporal difference learning problems. Looking at three basic temporal difference learning algorithms, TD(0), Q-Learning and SARSA, it can be seen that they all approximate a target by updating from a one time step difference. TD(0)’s target is a state value function while for Q-learning and SARSA it is a state-action value function . The update rules all follow the same general form of
and the corresponding tappings are shown in Figure 10. Comparing these with the internal model tappings we see that temporal difference learning corresponds with prediction learning and that the value function is a forward model allowing us to reframe RL problems as developmental prediction learning ones and the other way round. The case is shown here to correspond to a single time step tapping but the proportional increase in tapping length with increasing should be obvious.
Neuroscience provides several models that link computational and neurobiological accounts of associative learning and reinforcement learning. The Rescorla-Wagner rule is one example. It is a model of classical conditioning and describes how an association is learned across two modalities, the unconditioned (US) and the conditioned stimulus (CS), which occur at different times. Another example is the reward prediction error hypothesis of dopamine [19, 20, 21] which provides a physiological mechanism in support of computational descriptions of reinforcement. Low-level models of neural adaptation like spike-time dependent plasticity (STDP) [22, 23] are characterized by a local window of interaction on a microscopic time scale. STDP itself is not a model for learning delays but an even lower level mechanism for reinforcing or weakening the association of pre- and post-synaptic events based on the local window prior. It can of course be used indirectly to extract sensorimotor delay information. Tappings apply without modification to all these different levels of modeling as shown exemplarily for the conditioning case in Figure 11.
During this presentation of tappings, a few additional issues came up that still need to be discussed. Models with memory like recurrent neural networks or dynamic Bayesian networks need special consideration with respect to tappings. Such models naturally retain an internal memory of past input values. Because of this, they do not need explicit memory in their inputs and in theory only need to tap across one time step. They are building up an implicit tapping as part of their learning while tappings aim at an representation of specific memory needs for a given learning task. Measuring the information flow across the model inputs and outputs after training with quantitative  or relational techniques  should result in an effective tapping that could be used for comparison with prior tappings or interpreted as a way of learning them.
The memory issue is an example of a more general aspect about tappings. The current proposal disregards details about the learning algorithm used at the level of sensorimotor models. It is argued that this is in fact an advantage and necessary for wider comparison of models. The same is evident in the case of inverse problems where the learning of correspondences instead of functions needs to be considered. It remains to be shown how these properties could be integrated and represented in a tapping.
No experimental validation is given in this paper. This does not mean though that no experimental backing of the idea exists. In fact, it is an outcome of analyzing the design of a large number of different experiments in developmental learning in the course of our research during which the structural invariant that tappings try to represent became evident. It was decided to defer the experimental validation in this paper in favour of being able to focus on the explanation of the basic idea and to demonstrate its applicability to a wide range of contexts.
This paper introduces tappings, a novel concept for designing and analysing models of developmental learning in the field of developmental robotics and the related fields of reinforcement learning and computational neuroscience. Tappings came out of a need for capturing the detailed embedding of learning machines in the temporal and modal context of raw sensorimotor trajectories. Tappings create a particular view on the interaction between the embodiment and the functional requirements of behaviour that can help to better understand developmental learning processes, and make sensorimotor learning more efficient. They can systematically describe the relationship between supervised learning and developmental models. By ignoring computational details the tapping view highlights the information flow across models and using that we can compare a large range of models that cannot easily be compared otherwise. We showed the structural similarity of prediction learning in the developmental context and temporal difference learning in RL.
We would like to thank Guido Schillaci for providing experimental data for the Nao example and the Adaptive Systems Group members for discussions.
-  R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 1st ed. Cambridge, MA, USA: MIT Press, 1998.
-  D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning, 2009.
-  M. Lungarella, T. Pegors, D. Bulwinkle, and O. Sporns, “Methods for quantifying the informational structure of sensory and motor data,” Neuroinformatics, vol. 3, no. 3, pp. 243–262, 2005.
-  F. Kaplan and V. V. Hafner, “Information-theoretic framework for unsupervised activity classification,” Advanced Robotics, vol. 20, no. 10, pp. 1087–1103, 2006.
-  W. Bialek and N. Tishby, “Predictive Information,” eprint arXiv:cond-mat/9902341, Feb. 1999.
-  A. V. Terekhov and J. K. O’Regan, “Space as an invention of active agents,” Frontiers in Robotics and AI, vol. 3, p. 4, 2016.
-  “The current status of the simulation theory of cognition,” Brain Research, vol. 1428, pp. 71 – 79, 2012, the Cognitive Neuroscience of Thought.
-  M. Rolf and M. Asada, “What are goals? and if so, how many?” in 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Aug 2015, pp. 332–339.
-  V. Michalski, R. Memisevic, and K. Konda, “Modeling deep temporal dependencies with recurrent grammar cells"",” in Advances in Neural Information Processing Systems 27.
-  V. Patraucean, A. Handa, and R. Cipolla, “Spatio-temporal video autoencoder with differentiable memory,” CoRR, vol. abs/1511.06309, 2015. [Online]. Available: http://arxiv.org/abs/1511.06309
-  J. L. Copete, Y. Nagai, and M. Asada, “Motor development facilitates the prediction of others’ actions through sensorimotor predictive learning,” in Proceedings of the 6th IEEE International Conference on Development and Learning and on Epigenetic Robotics, 2016.
-  W. Lotter, G. Kreiman, and D. Cox, “Unsupervised learning of visual structure using predictive generative networks,” CoRR, vol. abs/1511.06380, 2015. [Online]. Available: http://arxiv.org/abs/1511.06380
-  K. J. W. Craik, The Nature of Explanation. Cambridge University Press, 1943.
-  D. M. Wolpert and M. Kawato, “Multiple paired forward and inverse models for motor control,” Neural Networks, vol. 11, no. 7, pp. 1317–1329, 1998.
-  “Hierarchical attentive multiple models for execution and recognition of actions,” Robotics and Autonomous Systems, vol. 54, no. 5, pp. 361 – 369, 2006.
-  G. Schillaci, V. V. Hafner, and B. Lara, “Exploration behaviors, body representations, and simulation processes for the development of cognition in artificial agents,” Frontiers in Robotics and AI, vol. 3, p. 39, 2016.
-  R. Rosen, Anticipatory Systems - Philosophical, Mathematical, and Methodological Foundations, 2nd ed., ser. IFSR International Series on Systems Science and Engineering. Springer-Verlag New York, 2012, vol. 1.
-  R. Rescorla and A. R. Wagner, “A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement,” in Classical conditioning II: Current research and theory. Appleton-Century-Crofts, New York, 1972.
-  W. Schultz, P. Dayan, and P. R. Montague, “A neural substrate of prediction and reward,” vol. 275, no. 5306, pp. 1593–1599, 1997.
-  P. Dayan, “Matters temporal,” Trends in Cognitive Sciences, vol. 6, no. 3, pp. 105–106, 2002.
-  “Reinforcement learning in the brain,” Journal of Mathematical Psychology, vol. 53, no. 3, pp. 139 – 154, 2009, special Issue: Dynamic Decision Making.
W. Gerstner, R. Kempter, J. L. van Hemmen, and H. Wagner, “A neuronal learning rule for sub-millisecond temporal coding,”Nature, vol. 383, no. 6595, pp. 76–78, 09 1996.
-  H. Markram, J. Lübke, M. Frotscher, and B. Sakmann, “Regulation of synaptic efficacy by coincidence of postsynaptic aps and epsps,” vol. 275, no. 5297, pp. 213–215, 1997.
-  P. L. Williams and R. D. Beer, “Nonnegative decomposition of multivariate information,” ArXiv.