Adaptive robot body learning and estimation through predictive coding

The predictive functions that permit humans to infer their body state by sensorimotor integration are critical to deploy safe interaction in complex environments. These functions are adaptive and robust to non-linear actuators and noisy sensory information. This paper presents a powerful and scalable computational perceptual model based on predictive processing that enables any multisensory robot to learn, infer and update its body configuration when using arbitrary sensors with Gaussian additive noise. The proposed method integrates different sources of information (tactile, visual and proprioceptive) to drive the robot belief to its current body configuration. The motivation is to enable robots with the embodied perception needed for self-calibration and safe physical human-robot interaction. We formulate body learning obtaining the function that encodes the sensory consequences of the body configuration and its partial derivative with respect to body variables, and we solve it by Gaussian process regression. We model body estimation as minimizing the discrepancy between the robot body configuration belief and the observed posterior. We minimize the variational free energy using the sensory prediction errors (sensed vs expected). In order to evaluate the model we test it on a real multisensory robotic arm. We show how different sensor modalities contributions, included as additive errors, improve the refinement of the body estimation and how the system adapts itself to provide the most plausible solution even when injecting strong sensory visuo-tactile perturbations. We further analyse the reliability of the model when different sensor modalities are disabled. This provides grounded evidence about the correctness of the perceptual model and shows how the robot estimates and adjusts its body configuration just by means of sensory information.


page 2

page 6

page 10


Robot self/other distinction: active inference meets neural networks learning in a mirror

Self/other distinction and self-recognition are important skills for int...

Where is my forearm? Clustering of body parts from simultaneous tactile and linguistic input using sequential mapping

Humans and animals are constantly exposed to a continuous stream of sens...

MOCA-S: A Sensitive Mobile Collaborative Robotic Assistant exploiting Low-Cost Capacitive Tactile Cover and Whole-Body Control

Safety is one of the most fundamental aspects of robotics, especially wh...

Adaptivity to Enable an Efficient and Robust Human Intranet

The Human Intranet is envisioned as an open, scalable platform that seam...

A Framework for Learning Invariant Physical Relations in Multimodal Sensory Processing

Perceptual learning enables humans to recognize and represent stimuli in...

Sensorimotor learning for artificial body perception

Artificial self-perception is the machine ability to perceive its own bo...

Body Models in Humans and Robots

Neurocognitive models of higher-level somatosensory processing have emph...

I Introduction

Providing the robot with the predictive functions of the body, environment and others is a critical aspect for complex interaction. Appropriately, in order to generate safe and robust interaction the artificial agent must take into account uncertainties related to the sensory input as well as the unexpected events that can occur. Unfortunately, a perfect model of the body, environment and others is almost impossible to design. We present an adaptive robot body learning and estimation algorithm able to deal with noisy sensory inputs and to integrate multiple sources of information (touch, visual and proprioceptive sensors). This model is framed on the predictive processing theory proposed by Friston [1] and biologically grounded on predictive coding evidence about the brain as observed by Rao and Ballard in the visual cortex [2]. This approach has been extensively studied in computational biology and psychology but it has not been properly tested in robotics [3].

Fig. 1: Proposed adaptive robot body learning and estimation using prediction errors: expected sensation minus sensory input. Visual, tactile and proprioceptive sensing contribute to obtain the most plausible body configuration, and hence the end-effector location. Body inference is computed by minimizing the free-energy with respect to the latent variables.

The main idea behind this embodied approach of robot body perception is that the only available information is the sensory input [4]. By learning the predictors of the sensor outcome given its current body latent variables and the actions exerted, the robot is able to properly infer its real body configuration. The error between the expected sensory signal and the real input contributes to refine the most plausible hypothesis that the robot has about its body, as depicted in Fig. 1. This simplifies the complexity of online estimation of the body internal variables and increases the ability of the robot to adapt to uncertain situations.

I-a Motivation and method

To produce safe interaction the robot should robustly predict its body and other agents in every instant using all sensory information available. This undoubtedly passes through the systematic design of the body model or enabling the robot with an accurate perception of its body [5]. Here, we define body perception from the probabilistic perspective [6] as inferring the body variables only depending on the sensory information : 111We have intentionally left the action out to be coherent with the active inference perceptual theory from Friston. At the end of the paper we remark the role of the action within the proposed scheme. [4]. When the robot does not have access to its body variables we can infer the body configuration from the sensory input through Bayes:


where is the sensory consequence of being in state and is the prior belief of the internal variables. We could estimate this posterior using Bayesian recursive filters [6]. However, instead of computing this posterior directly, we approximate an auxiliary distribution over the unobserved latent variables to the real posterior . Moreover, according to predictive processing theory [7], the robot belief, and variables that represent it, differs from the real world process222We describe the robot mind as a system whose belief has its own dynamics and internal variables and the inference process tries to fit the robot internal representation to the real world.. Moreover, it has incomplete knowledge about the real generative process of its body. Thus, we have to approximate, at every instant, not only the state variables but a particular density of a family of functions. In other words, we approximate the distribution where is the internal state of the robot. For this paper, as the body configuration has been simplified to the joint angles, we overloaded the notation by using for the body state and as the internal state of the robot.

In order to approximate both distributions (real and believed) we can minimise the Kullback-Leibler divergence

, through the free-energy bound, expressed as [7, 8]:


When converge to , only the sensory surprise differs from the belief and the posterior is properly approximated. Hence, in theory, the inference of the body internal configuration depending on the sensory input can be tractable approximated by minimizing the variational free energy. One way to minimize it is through a gradient descent scheme: .

From the generative process point of view, governs the dynamics of the environment and governs the sensory information. However, the robot has an approximation of them , . Thus, the agent is continuously adapting its belief about its body and the world just with the sensory input. This is performed by dynamically updating its internal variables by means of the error between the expected and the real sensory input: the prediction error.

Conversely to previous works on predictive coding, instead of knowing the sensor generative functions here we provide a method to learn them and transparently integrate them into a predictive coding scheme.

I-B Related works

Predictive coding [2] and predictive processing [1], has mainly been studied for human perception and control. Just a few works have applied it to robots. For instance, in [3]

the viability of predictive processing for robot control on a simulated robotic arm is discussed. However, the generative functions and the parameters were known in advance. Besides, implementing predictive coding with deep neural networks has gained popularity for modelling multisensory perception


and, in the computer vision community, for video prediction

[10]. Finally, this approach shares some conceptual background with sensorimotor contingencies [4, 11] approaches, and predictive learning [12], where the robot develops its perception as infants do.

The variational approach presented in this paper is related to expectation-maximization algorithms formalized as a maximization-maximization problem of the free-energy function

[13]. In fact, using Friston terminology, predictive processing is a dynamic expectation-maximization algorithm [7]

. Furthermore, it is important to highlight the strong similarities with the ensemble Kalman filter

[14]. We have adopted the free-energy mathematical framework for the following reasons: it provides indirect minimization of the Kullback-Leibler divergence [7]; it supports multisensory non-linear integration; it is scalable in its Laplace approximation [8]; it permits unsupervised parameters tuning [15]; and it is biologically plausible [2].

In terms of body model learning there is a vast literature on regressors such as locally weighted projection regression [16], local Gaussian process [17] or infinite experts algorithm [18]. These methods are able to compute the mapping between the sensory input and the configuration of the body and are used to learn forward and inverse kinematics and dynamics. Feed-forward and recurrent nets can also be assimilated for learning body schemas and the needed predictors but relay on supervised information and hundreds of parameters optimization [19, 20]. Unsupervised and self-exploration learning of the body has also been addressed in works like [21] using temporal contingencies. Moreover, biologically plausible sensorimotor learning has been investigated in works like [22] by means of Hebbian-based methods where body calibration can be learnt through sensorimotor mapping. Dynamic Hebbian learning has also been proposed for obtaining intermodal forward models in [23]. Body model free visual detection  [24] has been approached as an intermodal inference problem but it is restricted to the camera view of the robot.

I-C Contribution and organization

This work introduces predictive processing for robot body perception [7], where the robot first learns the sensors or features forward generative models and then it is able to dynamically provide the most plausible body configuration and the location of the end-effector, incorporating in a scalable way several noisy sources of sensory information. We address free model body learning and estimation, where the sensor generative/forward model is learnt using a Gaussian process regression. The body configuration and end-effector location is obtained by means of on-line free-energy minimization using the prediction error.

The computational model is presented in Sec. II, where we propose a way to learn the generative sensor model and its derivative by exploration (sampling), as well as the differential equations to solve body estimation through free-energy minimization. The experimental set-up on a real multisensory robot is presented in Sec. III. In Sec. IV we analyse the proposed approach, evaluating the body estimation with different sensor modalities and inducing visuo-tactile perturbations. Finally, in Sec. V and VI we discuss the advantages and drawbacks of the proposed approach, and enforce the applicability of the method to improve self localization and interaction.

Ii Mathematical model

Distribution and value of the body vars.
Most plausible hypothesis of body vars.
Prior belief of the body variables
Sensor value: visual, proprioceptive, tactile
Error value: visual, proprioceptive, tactile
Derivative of free-energy w.r.t. internal state
Normal with mean and variance

We first describe the proposed mathematical model for visual and proprioceptive information and then we extend it with a more complex visuo-tactile input. The model is based on works from predictive processing [7] and free-energy approaches to perception [15, 8]. For this model and without loss of generality, we restrict that the robot cannot perceive the gradient of the sensor signal and that the state transition model (generative function ) is believed to be static. In Sec. IV we discuss the drawbacks of these simplifications. For the sake of clarity, we adopted the free-energy derivation presented in [15], although the original one uses the KL-divergence as the starting point.

The robot is defined as a set of sensors and body internal unobserved variables . The proprioceptive sensors

outputs a value depending on the body configuration that follows a Normal distribution with linear or non-linear mean

: . The visual sensor provides the location of the end-effector in the visual field also following a Normal distribution with linear or non-linear mean : ). Finally, the robot counts with artificial skin sensors on the end-effector limb and it is able to detect other’s hand in the visual field - See Fig. 1.

Ii-a Perception model for visual and proprioceptive sensors

The body configuration can inferred via visual and proprioceptive sensory information through a Bayes rule. Assuming that the visual and proprioceptive sensing are independent, the distribution of is:


The denominator has integrals that make intractable exact computation for large distributions.

As explained previously we want to approximate the belief distribution to the posterior using Eq. 2. Mimicking predictive processing theory, which states that the brain works with the most plausible model of the world to perform predictions, instead of working with the whole distribution , we use the most plausible value: . This have an important implication as the denominator does not any more depend on [15], and hence we get333In the ensemble Bayesian filtering terminology this is similar to maintain a sample drawn from the latent space distribution.:


Applying logarithms we obtain the negative free-energy formulation:


Substituting the probability distributions by their functions

, and under the Laplace approximation [7, 8] and assuming normally distributed noise, we can compute the negative free energy as:


To approximate the posterior distribution we minimize , following a gradient-descent scheme:


Computing the partial derivative of Eq. II-A we obtain:


Note that the first term is the error between the most plausible value of the body configuration and its prior belief (), the second term is the error between the observed proprioceptive value and the expected one () and the third term is the prediction error between the visual sensed position of the end-effector and the expected location (). In order to use Eq. 8 we need to know or learn the sensor forward/generative functions.

For the sake of simplification, we encode the internal state directly in the proprioceptive sensing space, thus defining body just by means of the proprioception state. For that purpose, we substitute by and its partial derivative is set to 1. In other words, if the body configuration is defined by the joint angles, the state will represent the joint sensors (encoders) output. For notation convenience we maintain as the body configuration but it represents . By defining prediction errors as:


We construct the differential equation that infers the body latent variables as444Note that the computation of is a simplification of the predictive processing approach for passive static perception [15] as we are omitting the generative model of the world . Accordingly, to certainly reduce the difference between the believed distribution and the observed one, should describe the error between the world generative function and the internal belief: . We leave this extension for further works and in Sec. VI we point out the challenges to obtain the full construct without knowing .:


According to Eq. 12 the update of the internal state is driven by the observed and the expected value of the state and the error prediction. The gradient or Jacobian of the sensor with respect to the latent variables maps the contribution of each sensor modality to each body configuration variable in the same way as in the extended or ensemble Kalman filter.

Generalizing the free energy minimization for sensors the body configuration is driven by:


Then, the full dynamics of our body estimation model is given by:


where is the learning ratio parameter that specifies how fast the prior of body configuration is adjusted to the prediction error.

Ii-B Body learning – learning the sensory states caused by the body configuration

We define body-learning as obtaining the unknown forward/observation model and its derivative/Jacobian that relate the sensor values with the body state. This is a consequence of describing body estimation by means of Eq. 14. To learn both functions we use Gaussian process regression with collected data generated by body exploration. We obtain sensor samples from the robot in several body configurations . For instance, for the visual generative process is the proprioceptive state and is the visual information.

The training is performed by computing the covariance matrix on the collected data with noise , where the covariance function is defined as:


The prediction of the sensory outcome given is then computed as [25]:


where for numerical stability.

Finally, in order to compute the gradient of the posterior we differentiate the kernel [26], and obtain its prediction analogously as Eq. 16:


Using the squared exponential kernel with the Mahalanobis distance covariance function, the derivative becomes:


where is a matrix where the diagonal is populated with the length scale for each dimension () and is element-wise multiplication.

Ii-C Adding tactile feedback and other’s interaction

We exploit the artificial skin of the robot to refine the body-configuration estimation. For that purpose, we model the intermodal relation between visual and the tactile sensing[4]. When somebody touches the robot end-effector, it should adjust its body configuration to fit the end-effector location in the visual field where the other agent is touching. In other words, other agent touching the robot end-effector in location in the visual field robot end-effector is there body configuration is adjusted.

First, we assume that the robot is able to discern that its end-effector limb is being touched, and that it knows the relation between the touch signal and the location on the body. We define the likelihood function of being touch by other by means of spatial and temporal coherence. We can learn this function by touching the limb in different end-effector locations. Alternatively, in this paper we reuse learnt model from the visual field to compute the expected end-effector location and define the visuo-tactile sensory likelihood as:


where are parameters that shape the likelihood and have been tuned in concordance with the data acquired in [27] from human participants; is the level of synchrony of the event (e.g., time difference between the visual and the tactile event); and is the other agent end-effector location in the visual field.

(a) Sensory data acquisition
(b) Adaptation test
Fig. 2: Experimental setup. (a) Gathering proprioceptive (joint angles), visual robot (green) and other (red) end-effector pixel coordinates) and tactile sensory data from the robot (proximity values) with different participants and positions. Green circles represent the likelihood of being touched. (b) Adaptation test where we change the visual location of the arm and we induce synchronous visuo-tactile inputs.

We directly introduce this generative function into the free-energy scheme as follows555Under the predictive processing framework we might include another internal variable that defines being touched and a second layer of hierarchy that is able to infer similarity (temporal and spatial) between the patterns generated in the visual field by the other agent and the patterns perceived in the skin.:


When a synchronous touching and visual pattern occurs the body configuration is adjusted depending on the expected end-effector visual location and the other’s visual location .

Ii-D Adaptive body estimation and learning through predictive processing

Algorithm 1 summarizes the learning and estimation stages to dynamically compute the internal body configuration based on the sensory error prediction, using for multiple independent sources of sensory information or features, and body internal variables . The learning stage is using GP regression described in [25] for each sensor modality contribution to the body configuration. Using the solution we reduce the complexity of the prediction calculation. The estimation stage computes the prediction error for every sensor and solves the differential equations by variational free-energy minimization. Note that we are applying 1st order Euler integration method. More accurate approaches are out of the scope of this paper.

Forward sensor model learning using GP
) Covariance
for i=1:N do For every sensor/feature modality
end for
Predictive processing body estimation
GP training
Body configuration prior
Initial body estimation
Initial predition error
Input sensor information
for i=1:N do Compute predictions
      Eq. 16
      Eq. 18
end for
for i=1:N do
      Prediction errors Eq. 20
end for
Free-energy minimization dynamics
Algorithm 1 Multisensory body learning and estimation

Iii Experimental setup on a robotic arm

We test the model on the multisensory UR-5 arm of robot TOMM [28] as depicted in Fig. 2. Although the methodology is thought for robots difficult to calibrate with imprecise sensors, we use this platform as a proof of concept as we can easily compare with the ground truth values. Without loss of generality, the body (latent variables) is defined as the joint angles and its perception from multiple modalities: (1) the proprioceptive input data is three joint angles with Gaussian added noise (shoulder, shoulder and elbow - Fig. 3(a)); (2) the visual input is a rgb camera mounted on the head of the robot with pixels definition; and (3) the tactile input is generated by multimodal skin cells distributed on the arm [29].

Iii-a Learning from visual and proprioceptive data

In order to learn the sensory forward/observation model we programmed random trajectories in the joint space that resemble to horizontal displacements of the arm. Figure 3(a) shows the data extracted: noisy joint angles and visual location of the end-effector, obtained by colour segmentation. To learn the visual forward model , each sample is defined as the input joint angles sensor values and the output pixel coordinates. As an example, Fig. 3(b) shows the learnt visual forward model by GP regression with 46 samples (red dots). The horizontal displacement mean (in pixels) with respect to two joint angles and the variance.

(a) Data recorded example (joint angles + noise, end-effector visual, end-effector cartesian) and schematic picture of the 3-DOF.
(b) Learnt for visual horizontal displacement
(c) Skin proximity data
(d) Tactile (left) and visual (right) event trajectories
Fig. 3: Collected data. (a) Joints, visual and ground truth information of the end-effector. (b) Example of the mean and the variance computed by the GP, which describes the visual horizontal displacement depending on two joints. (c) 30 seconds of raw proximity sensor information of 117 forearm skin cells. (d) Touch patterns extracted from tactile and visual sources.

Iii-B Extracting visuo-tactile data

We use proximity sensing information from the infrarred sensors located in every skin cell to discern when the arm is being touched. The infrarred sensor outputs a filtered signal . The likelihood of a cell being touched is given by the following function (Eq. 19) , where and . The parameters have been obtained by fitting the function to the distance-sensor output measurements. Figure 3(c) shows the raw skin proximity sensing data during the experiment (each colour represents the 117 different skin cells). From the other’s hand visual trajectory and the skin proximity activation we compute the level of synchrony between the two patterns (Fig. 3(d)). Timings for tactile stimuli are obtained by setting a threshold over the proximity value: prox activation. Timings for other’s trajectory events are obtained through the velocity components. Detected initial and ending position of the visual touching is depicted in Fig. 3(d) (right, green circles).

Iv Results

For comparison purposes, all experiments parameters are set fixed values.

learning hyperparameters: signal variance

and kernel length scale . The integration step is () and error variances are , , . Finally, the learning rate of is .

(a) Joint angles estimation
(b) Estimation error
(c) Error dynamics
Fig. 4: Body estimation (joint angles) and error comparison. Proposed algorithm tested with different sensor contributions (only visual; visual + two joint sensors; and visual + proprioception) and compared with a Kalman filter just using the joint measurements - see text for details.

Iv-a Robust multisensory integration

We present three different experiments to study visual and proprioceptive body estimation. The first one, described in Fig. 4 shows the proposed body estimation algorithm while deploying a similar trajectory as presented in Fig. 3(a). We analyse the error between the estimated body configuration and the ground truth joint angles for different sensor contributions. The algorithm is able to correctly estimate the joint angles but presents slow dynamics when big changes occur, due to the static nature of the generative model used. It also shows that with only visual input it is not able to estimate the elbow angle. This happens because learning trajectory was set to not provide information about the elbow. However, we can see how combining visual information and two joint sensors (), reduces the estimation error. This shows the ability of the proposed method to deal with missing information. We have also validated the method against an standard Kalman filter [30] with only the joint angles as input (proprioception), process noise covariance , same measurement noise as the proposed approach and static transition model for fair comparison (yellow dotted-line). As expected, the error and behaviour is practically equivalent to the proprioception version of the proposed approach (red dotted-line).

In second experiment, presented in Fig. 5(a), we test the model with non-linear proprioceptive sensors: . The body configuration values plotted are in the sensor space. We have initialized the robot body belief with a wrong configuration. On the first 5 seconds, the plot shows how the system converges to the “embodied” configuration and then the arm starts moving. The estimation reaction time is slightly slower than previous experiment. Furthermore, we observe an interesting effect. The joint angles vary from to , but with the function the robot cannot distinguish between positive and negative angles. Thus, when inverting the sign of one joint the robot thinks that it is in the right configuration but it is not.

The last experiment, depicted in Fig. 5(b), we study how the model deal with damaged or uncalibrated sensors. After the visual learning stage, we have added a drift error to shoulder proprioceptive sensor. The visual prediction error should correct this anomaly. The plot shows how the system nicely reduces proprioceptive drift in shoulder. However, it induces a wrong bias on shoulder. Thus, although visual information, with the current learning, evidences a coupling between and , visual correction has appeared.

(a) Body estimation with non-linear proprioception
(b) Damaged proprioception compesated with visual sensing
Fig. 5: Non-linear and damaged proprioception test. (a) The generative model of the proprioceptive sensing is quadratic plus Gaussian noise . (a) Body configuration estimation vs real in the proprioceptive values space. The initial prior joint angles differs from the real one . (b) Multimodal body inference with biased sensor.

Iv-B Adaptation with visual, proprioceptive and tactile sensors

We further test the proposed model adaptation with proprioceptive and visuo-tactile stimulation. Fig. 6(a) describes body estimation refinement depending on different sensor modalities. Every sensor or feature contributes independently to improve the robot arm localization. In essence, the method provides scalable data association, e.g., the robot can learn more than one visual feature and incorporate them into the predictive error formulation as an additive term. Besides, Fig. 6(b) experiment shows the potential of the proposed method to adapt its body inference to incoherent new situations as a human will do. We have introduced a strong perturbation on the visuo-tactile input inspired by the rubber-hand illusion experiments in humans [31]. The new visual location induced by synchronous tactile stimulation makes the robot to infer the most plausible situation given the sensory information, which in this case is to drift the location of the arm towards the new location. In the first 5 seconds, there is no tactile stimulation and the estimation is refined to ground truth (black dotted line). Then we inject visuo-tactile stimulation while other agent is pretending to touch another location. When it becomes synchronous a horizontal drift appears and the inferred body configuration is altered.

(a) Body estimation refinement with biased prior
(b) Joint angles estimation with wrong visuo-tactile sensory input
Fig. 6: Adaptation analysis. (a) Joint angles estimation using different sensor modalities. (c) Body inference with visuo-tactile perturbation: body latent variables (blue dotted line) and ground truth (black dotted line).

Iv-C A note on scalability

The learning using Gaussian process regression has a computational complexity of and the prediction of the sensor forward model depends on the covariance kernel complexity . For independent sensor contributions, internal variables and samples, the prediction of forward models is . Finally, the free-energy optimization is using Euler integration method.

V Discussion: I sense, therefore I am?

We have stressed that robot body estimation can be computed just by means of sensory information. Every sensing modality or feature, when available, contributes to the final body estimation through the prediction error and the variance of each error describes the precision of every sensor with respect to body internal variables. For instance, outside of the field of view proprioceptive and tactile sensors define the arm configuration. When the arm appears in the visual field, other features are included into the inference. We have also shown that when the robot has a broken proprioceptive sensor it can rely on visual features to complete the lack of information. Finally, we have underscored embodiment showing how the sensor function influences body estimation. Hence, we have defined adaptive body learning and estimation as providing the most plausible solution according to the current information available from the sensors. As a collateral effect, the model has been showed to be prone to visuo-tactile illusions, something that has been also evidenced in humans.

Nevertheless, we have only focused on passive perception and omitted deliberatively the generative model of the body dynamics. Moreover, where is the action? We have not considered it in the model, something core for interacting with the body. In order to obtain the full construct, which properly reduces the KL-divergence between the robot belief and the posterior probability of the body configuration given the sensors, we need to include the robot dynamics. However, this is a hard task from the learning perspective. The advantage with this approach is that we only need an approximation of the dynamics because free-energy minimization should solve the discrepancy. With the full construct we expect to improve prediction accuracy and to incorporate the action into the body estimation framework.

Vi Conclusion

We have presented an adaptive robot body learning and estimation algorithm based on predictive processing, able to integrate information from visual, proprioceptive and tactile sensors. The robot independently learns the sensor forward generative functions and then it use them to refine its body estimation by a free-energy minimization scheme.

The model has been tested on a robot with a standard industrial arm to facilitate ground truth comparison. Results have shown how the model deals with missing and noisy sensory information, reducing the effect of sensor failures. The algorithm has also displayed adaptability to wrong body prior initialization and unexpected situations. In addition, we have shown how other’s touch can refine body robot estimation, opening interesting questions about improved localization and mapping by means of tactile interaction. Altogether reflects the potential of the proposed approach for complex robots, where estimating body location is a hard task and a requirement for safe interaction.


  • [1] K. Friston, “A theory of cortical responses,” Philosophical Transactions of the Royal Society of London B: Biological Sciences, vol. 360, no. 1456, pp. 815–836, 2005.
  • [2] R. P. Rao and D. H. Ballard, “Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects,” Nature neuroscience, vol. 2, no. 1, pp. 79–87, 1999.
  • [3] L. Pio-Lopez, A. Nizard, K. Friston, and G. Pezzulo, “Active inference and robot control: a case study,” Journal of The Royal Society Interface, vol. 13, no. 122, p. 20160616, 2016.
  • [4] P. Lanillos, E. Dean-Leon, and G. Cheng, “Yielding self-perception in robots through sensorimotor contingencies,” IEEE Trans. on Cognitive and Developmental Systems, no. 99, pp. 1–1, 2016.
  • [5] P. Lanillos, E. Dean-Leon, and G. Cheng, “Enactive self: a study of engineering perspectives to obtain the sensorimotor self through enaction,” in Developmental Learning and Epigenetic Robotics, Joint IEEE Int. Conf. on, 2017.
  • [6] S. Thrun, W. Burgard, and D. Fox, Probabilistic robotics.   MIT press, 2005.
  • [7] K. Friston, “Hierarchical models in the brain,” PLoS computational biology, vol. 4, no. 11, p. e1000211, 2008.
  • [8] C. L. Buckley, C. S. Kim, S. McGregor, and A. K. Seth, “The free energy principle for action and perception: A mathematical review,” arXiv preprint arXiv:1705.09156, 2017.
  • [9]

    A. Ahmadi and J. Tani, “Bridging the gap between probabilistic and deterministic models: a simulation study on a variational bayes predictive coding recurrent neural network model,” in

    Int. Conf. on Neural Information Processing.   Springer, 2017, pp. 760–769.
  • [10] W. Lotter, G. Kreiman, and D. Cox, “Deep predictive coding networks for video prediction and unsupervised learning,” arXiv preprint arXiv:1605.08104, 2016.
  • [11] C. Angulo and J. M. Acevedo-valle, “On dynamical systems for sensorimotor contingencies. a first approach from control engineering,” in Recent Advances in Artificial Intelligence Research and Development, Proc. Int. Conf. of the Catalan Association for Artificial Intelligence, vol. 300.   IOS Press, 2017, p. 46.
  • [12] Y. Nagai and M. Asada, “Predictive learning of sensorimotor information as a key for cognitive development,” in Proc. of the IROS 2015 Workshop on Sensorimotor Contingencies for Robotics, 2015.
  • [13] R. M. Neal and G. E. Hinton, “A view of the em algorithm that justifies incremental, sparse, and other variants,” in Learning in graphical models.   Springer, 1998, pp. 355–368.
  • [14] G. Evensen, “Sequential data assimilation with a nonlinear quasi-geostrophic model using monte carlo methods to forecast error statistics,” Journal of Geophysical Research: Oceans, vol. 99, no. C5, pp. 10 143–10 162, 1994.
  • [15] R. Bogacz, “A tutorial on the free-energy framework for modelling perception and learning,” Journal of mathematical psychology, 2015.
  • [16] S. Vijayakumar, A. D’souza, and S. Schaal, “Incremental online learning in high dimensions,” Neural computation, vol. 17, no. 12, pp. 2602–2634, 2005.
  • [17] D. Nguyen-Tuong, J. R. Peters, and M. Seeger, “Local gaussian process regression for real time online model learning,” in Advances in Neural Information Processing Systems, 2009, pp. 1193–1200.
  • [18] B. Damas and J. Santos-Victor, “An online algorithm for simultaneously learning forward and inverse kinematics,” in Inte. Robots and Systems (IROS), 2012 IEEE/RSJ Int. Conf. on, 2012, pp. 1499–1506.
  • [19] C. Nabeshima, Y. Kuniyoshi, and M. Lungarella, “Adaptive body schema for robotic tool-use,” Advanced Robotics, vol. 20, no. 10, pp. 1105–1126, 2006.
  • [20] E. Wieser and G. Cheng, “Progressive learning of sensory-motor maps through spatiotemporal predictors,” in Developmental Learning and Epigenetic Robotics (ICDL-Epirob), IEEE Int. Conf. on, 2016.
  • [21] A. Stoytchev, “Self-detection in robots: a method based on detecting temporal contingencies,” Robotica, vol. 29, no. 01, pp. 1–21, 2011.
  • [22] H. Mori and Y. Kuniyoshi, “A human fetus development simulation: Self-organization of behaviors through tactile sensation,” in Development and Learning (ICDL), IEEE 9th Int. Conf. on, 2010, pp. 82–87.
  • [23] G. Schillaci, V. V. Hafner, and B. Lara, “Exploration behaviors, body representations, and simulation processes for the development of cognition in artificial agents,” Frontiers in Robotics and AI, vol. 3, p. 39, 2016.
  • [24] P. Lanillos, E. Dean-Leon, and G. Cheng, “Multisensory object discovery via self-detection and artificial attention,” in Developmental Learning and Epigenetic Robotics, Joint IEEE Int. Conf. on, 2016.
  • [25] C. E. Rasmussen and C. K. I. Williams,

    Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

    .   The MIT Press, 2005.
  • [26] A. McHutchon, “Differentiating gaussian processes,” 2013.
  • [27] M. Samad, A. J. Chung, and L. Shams, “Perception of body ownership is driven by bayesian sensory inference,” PloS one, vol. 10, no. 2, p. e0117178, 2015.
  • [28] E. Dean-Leon, B. Pierce, F. Bergner, P. Mittendorfer, K. Ramirez-Amaro, W. Burger, and G. Cheng, “Tomm: Tactile omnidirectional mobile manipulator,” in Robotics and Automation (ICRA), IEEE Int. Conf. on, 2017, pp. 2441–2447.
  • [29] P. Mittendorfer and G. Cheng, “Humanoid multimodal tactile-sensing modules,” IEEE Trans. on robotics, vol. 27, no. 3, pp. 401–410, 2011.
  • [30] E. Besada-Portas, J. A. Lopez-Orozco, P. Lanillos, and J. M. de la Cruz, “Localization of non-linearly modeled autonomous mobile robots using out-of-sequence measurements,” Sensors, vol. 12, no. 3, pp. 2487–2518, 2012.
  • [31] N.-A. Hinz, P. Lanillos, H. Mueller, and G. Cheng, “Drifting perceptual patterns suggest prediction errors fusion rather than hypothesis selection: replicating the rubber-hand illusion on a robot,” arXiv preprint arXiv:1806.06809, 2018.