Gait learning for soft microrobots controlled by light fields

by   Alexander von Rohr, et al.
Max Planck Society

Soft microrobots based on photoresponsive materials and controlled by light fields can generate a variety of different gaits. This inherent flexibility can be exploited to maximize their locomotion performance in a given environment and used to adapt them to changing conditions. Albeit, because of the lack of accurate locomotion models, and given the intrinsic variability among microrobots, analytical control design is not possible. Common data-driven approaches, on the other hand, require running prohibitive numbers of experiments and lead to very sample-specific results. Here we propose a probabilistic learning approach for light-controlled soft microrobots based on Bayesian Optimization (BO) and Gaussian Processes (GPs). The proposed approach results in a learning scheme that is data-efficient, enabling gait optimization with a limited experimental budget, and robust against differences among microrobot samples. These features are obtained by designing the learning scheme through the comparison of different GP priors and BO settings on a semi-synthetic data set. The developed learning scheme is validated in microrobot experiments, resulting in a 115 locomotion performance with an experimental budget of only 20 tests. These encouraging results lead the way toward self-adaptive microrobotic systems based on light-controlled soft microrobots and probabilistic learning control.


page 1

page 4

page 5

page 8


Learning Flexible and Reusable Locomotion Primitives for a Microrobot

The design of gaits for robot locomotion can be a daunting process which...

Trajectory Optimization for Robust Humanoid Locomotion with Sample-Efficient Learning

Trajectory optimization (TO) is one of the most powerful tools for gener...

Modeling and Validation of Soft Robotic Snake Locomotion

Snakes are a remarkable evolutionary success story. Many snake-inspired ...

Learning Contact-aware CPG-based Locomotion in a Soft Snake Robot

In this paper, we present a model-free learning-based control scheme for...

SMORS: A soft multirotor UAV for multimodal locomotion and robust interaction

We present SMORS, the first Soft fully actuated MultirOtoR System for mu...

Mechanism Design of a Bio-inspired Armwing Mechanism for Mimicking Bat Flapping Gait

The objective of this work is to design and develop a bio-inspired soft ...

Towards Hexapod Gait Adaptation using Enumerative Encoding of Gaits: Gradient-Free Heuristics

The quest for the efficient adaptation of multilegged robotic systems to...

I Introduction

Soft microrobots are mobile robotic devices at the sub-millimeter scale that are made of soft, stimuli-responsive materials, which can act as sensors and/or actuators [1, 2, 3, 4]. Indeed, as at the micro scale no robotic components are available, basic robotic functionalities need to be implemented in microrobots by the very materials they are made of. Recently, soft microrobots consisting of photo-responsive liquid-crystal elastomers (LCEs) have been presented [5]. The material-based actuation of these soft continuum robots is powered and controlled by structured light fields (see Fig. 1

), resulting in microrobots with many internal degrees of freedom (DOFs) 

[6]. We define a gait, in the context of the microrobotic system, as any pattern of periodic deformation that results in locomotion. The deformation and, thus, the gait of these soft microrobots can be flexibly controlled and tuned by the driving light field, enabling the adaptation of the gait to different environments [7]. Nonetheless, finding good locomotion performance by hand-tuning the light field parameters is extremely inefficient and time-consuming, especially considering that: (i) often there is no accurate model of the microrobot locomotion; (ii) each microrobot is to some extent different because of their manual fabrication process; and (iii) the response of the microrobot changes over time as the material slowly degrades. Therefore, learning effective gaits from data is an attractive approach.

Fig. 1: Soft microrobot powered and controlled by structured light field. The picture shows the robot (black with dotted outline) and linear light waves with wavelength  and duty cycle ; in this example,  pixels (about ) and . The pattern will move to the right in the next time step.

In this paper, we propose an automatic learning procedure to find the optimal gait for light-driven soft microrobots in a given environment. The optimal controller’s parameters, i.e. the parameters of the driving light field, are progressively evaluated by Bayesian Optimization (BO). Learning in this scenario presents a few peculiar challenges. First, the developed learning scheme should be highly data efficient, making use of prior information and leveraging empirical data to approach optimal locomotion performance in a limited number of experiments. At the same time, it should be robust against differences among microrobots, performing well independently of the specific microrobot used. This work represents the first attempt to learn a controller from data for sub-millimeter robots. First, we define a method to efficiently evaluate the locomotion speed from the microrobot’s tracking images. We then report a comparison of a number of BO settings (i.e. acquisition functions, kernels, hyperparameters) on a semi-synthetic dataset, which is based on actual data from microrobot experiments and augmented to model inter-microrobot variability. Finally, we demonstrate successful learning in real-robot experiments, with a locomotion performance improvement of

with respect to an informed initial guess. The experimental validation was performed using a previously untested microrobot. The paper is organized as follows. The microrobotic system is described in Sec. II, and the proposed gait learning approach is introduced in Sec. III. Section IV presents the microrobotic controller and the extraction of prior information from previously published experiments. The benchmarking of a number of BO settings on a semi-synthetic dataset is described in Sec. V, whereas Sec. VI reports successful learning in microrobot experiments.

Ii Microrobotic System

In this work, we learn optimal gaits for the soft microrobots developed in [5]. These consist of monolithic LCEs that actively and continuously deform in response to light. As the deformation occurs locally, desired motion patterns can be obtained by exciting the microrobots with proper light fields. This approach results in microrobots having many internal DOFs [6]. In other words, the light-based control allows these sub-millimeter monolithic structures to behave as if they contained many independent, wirelessly controllable actuators. The driving light fields are generated by modulating the intensity of a laser beam and projected onto the microrobots workspace by a microscope objective [5]. The microrobot is observed through the same objective by a camera that images the workspace at 10 frames per second and at a resolution of 12801024 pixel (). The laser beam (532 nm – Verdi G10, Coherent) is modulated in space and time by means of a computer-controlled digital micromirror device (DMD) module (V-7000, ViaLUX). The DMD consists of an array of 1024768 micromirrors, each representing a pixel in the projected light field. The micromirrors modulate the light intensity in a binary fashion, yet gray-scale levels control can be achieved by high-frequency pulse-width modulation (PWM). Therefore, the microrobotic system has, in principle, almost 10 control parameters that can each assume 2 values. This represents a huge parameter space for gait optimization. Learning controller parameters directly in such a high-dimensional parameter space would require a prohibitive amount of experimental data. Therefore, at this stage, we propose learning gaits by employing an effective controller structure as in [5] and tuning its critical free parameters with BO.

Iii Gait Learning Approach

Fig. 2: Learning controller parameter from data. In each iteration , the controller generates a different control input for the microrobot. The control input in our case is the dynamic light field (symbolized by the green stripes) parameterized by . The microrobot’s state trajectory during the experiment is tracked and its performance evaluated by the cost function. The resulting cost is given back to the optimizer and added to the dataset . Based on the GP posterior , the acquisition function is maximized and a new set of parameters is obtained for the next experiment.

In this section, we formulate the problem of gait learning for light-driven soft microrobots as a cost-minimization problem over a suitably parametrized open-loop controller. The gait learning algorithm is schematically represented in Fig. 2. Following the approach in [5], we use periodic light patterns with only few open parameters. The optimization then takes place in the parameter space of the light pattern. We also provide here a brief overview of BO with Gaussian Processes (GPs) [8], which represents the core of he gait learning method presented herein. BO with GPs has successfully been used, for example, for gait learning with bipedal walkers [9], quadrupedal robots [10], and in cm-scale hexapodal robots [11]. It has also been proposed for automatic feedback controller tuning [12, 13, 14]

. A major strength of GPs is that they allow one to include existing information about the system in the form of a probabilistic prior. This reduces the size of the data set needed to learn a good approximation of the cost function, while at the same time retaining the flexibility of a non-parametric model.

Iii-a Controller Learning

We formulate the problem of gait learning in soft microrobots as a parametric controller tuning problem. This work builds on the automatic controller tuning methods proposed in [12], where an unknown controller cost function was learned in a data-efficient way using BO with GPs. The microrobot is considered an uncertain, non-linear dynamic system. The dynamics of a specific microrobot in a given environment are unknown and are evaluated by carrying out experiments. The experimental observations are affected by uncertainty and noise. The general state space model of the microrobot is



is the system’s state (which, we assume, can be fully observed or estimated),

the input, and is the process noise. The input signal is generated by a controller , which is a function of a set of bounded parameters and of the desired movement reference , namely,


For gait learning, we need to evaluate how each input sequence influences the unknown system dynamics (1). For this, a cost function is used to map the controller parameters to a scalar cost value given the experimental data. Learning an optimal gait for the given objective corresponds to finding an optimal that minimizes :


Due to the process noise  in (1), however, we never directly observe but a noisy version of it, , where

is assumed zero-mean Gaussian noise with variance 


Iii-B Gaussian Process

To account for uncertainty in measurements and state estimation, as well as to formulate prior knowledge about the cost function, if such knowledge is available, the cost function is modeled as a GP. A GP can be viewed as a distribution over a space of functions. It is a non-parametric model, defined by its mean and covariance functions, and , where is the kernel function


Given a set of samples, , the posterior at is


where with , , and is the covariance matrix of the training points plus the noise variance , where is the Kronecker delta. While the prior mean is often assumed to be constant, the choice of the kernel function encodes our assumptions and prior knowledge about the cost function. The kernel function usually depends on a set of hyperparameters. Typical hyperparameters are the length scales  for each dimension of the parameter space  and the signal variance . The kernel’s length scales are related to the rate of variation of the underlying function, where short length scales correspond to quickly-varying functions and long length scales to slow-varying ones. The signal variance describes the width of the distribution and relates to the magnitude of variation around the mean. In principle, kernel hyperparameters may be obtained from problem knowledge, or estimated from data, e.g., by maximizing the marginal likelihood of the observation with respect to the hyperparameters. Here, we follow a hybrid approach by setting a prior belief over the hyperparameters (hyperprior

), and using maximum a posteriori (MAP) estimation for the likelihood, i.e., maximizing marginal likelihood times hyperprior. We can thus optimize hyperparameters of the GP after each experimental evaluation. Encoding knowledge in the form of hyperpriors facilitates hyperparameter learning when the data set is small, as is the case for the problem herein, while still retaining flexibility by adjusting hyperparameters based on observed data. In this work the hyperparameters are estimated from previous experimental results, as reported in Sec. 


Iii-C Bayesian Optimization

As the system dynamics are unknown, the gait performance, captured by the cost function (3), can only be optimized by sampling, i.e. by conducting experiments on the physical system. BO is a derivative-free global optimization algorithm, which is particularly suited for problems with costly and uncertain function evaluations [15]. Given the current belief about the cost function (i.e. the posterior GP (5)), BO suggests the parameters for the next evaluation by maximizing an acquisition function ,


. The acquisition function is formulated to capture the utility of knowing the outcome of the experiment at the suggested point, for example, the probability of improvement or some expected information gain. The concrete acquisition functions used for the gait learning problem are described in Sec. 

V-C. After evaluating , the new data point is added to , the GP posterior (5) updated, and BO enters the next iteration.

Iv Learning Gait for 1d Locomotion

In the previous section, we presented gait learning for microrobots as an optimization problem, where parameters of an open-loop controller are optimized from experimental trials via BO. In this section, we consider a concrete instance of a gait learning problem, namely learning the one-dimensional (1D) movement of cylindrical microrobots submerged in a liquid (see Fig. (a)a). In particular, the microrobots are placed on the bottom of a Petri dish (which is coated with Polydimethylsiloxane (PDMS) to avoid adhesion) and submerged in silicone oil, as in [6]. The locomotion of the microrobots occurs as a result of their periodic deformation, which is powered and controlled by the projected light fields. This section continues by describing the light patterns used to control the microrobot and their defining parameters. We then define the movement objective and the corresponding cost function used to learn the 1D gait. Lastly, we estimate the GP hyperparameters (cf. Sec. III-B).

Iv-a Light Pattern

The gait objective for this learning task is locomotion along one dimension, for which linear traveling waves of deformation are generated in the microrobot’s body. The deformation is controlled by the light field, which consists of a linear stripe pattern (see Fig. 1). The controller generates the light field with intensity , defined as a rectangular wave in the 2D workspace  of the microrobot. This light field represents the controller output from (2). As the microrobot is aligned along the -axis, the light pattern can be expressed as


The pattern is parametrized by the duty cycle , the spatial wavelength , and the time domain frequency . In the reported experiment, we set the frequency  to , whereas the parameters are to be optimized.

Iv-B Linear Movement

Swimming at the microscale is governed by the Stokes equations, where inertial forces play a negligible role with respect to viscous forces. In other words, a microrobot generating a constant propulsive force will swim at a constant speed , and its acceleration to this speed will be instantaneous. Therefore, the acceleration of the microrobot is neglected and its state space consists of two dimensions, namely the microrobot position and its speed . However, because of the control scheme described in Sec. IV-A, the position of the microrobot’s center of mass will have some oscillation at the frequency  of the traveling wave deformation (see Fig. (b)b). We are interested in optimizing the linear component of the propulsion, regardless of the oscillations caused by the exciting light field. Therefore we consider only the microrobots constant speed . As the cost in (3), we use the deviation of this constant speed component from the desired, . The estimation of from the microrobot’s position tracking is explained in the next section.

Iv-C Velocity Estimation

We estimate the velocity based on the experimental setup described in Sec. II. The position 

of the microrobot is calculated by binarizing the acquired images and calculating the center of mass of the area corresponding to the microrobot (see Fig. 

(a)a). One possible approach to obtain the underlying constant speed from the observed state trajectory consists in estimating the instantaneous velocity , by differentiating the position , and then averaging over an integer number of periods . As derivatives are highly susceptible to noise and the sampling frequency is close to the oscillation frequency, we chose to use a different approach. We estimate the average speed directly from the position by fitting the phenomenological model


This model captures the two types of movement of the soft microrobots, namely the linear movement at constant speed  and the oscillations at frequency . The oscillation term is defined by the known frequency , and the unknown amplitude  and phase shift . The linear term includes the offset , which accounts for changes in the position that occur during the transient (cf. Fig. (b)b). In previous experiment we observed the presence of a transient, which is not due to acceleration but to the initial response time of the material with respect to the start of the light projection. As we have no phenomenological model of the transient, but previous experiments have shown that it decayed after about two seconds, we only fit the model starting at . Fig. (b)b shows the measurement for one experiment as well as the fitted movement model. The constant speed is the slope of the linear part represented by the solid line. The speed estimation procedure also allows for quantifying the uncertainty of the velocity estimate, which we did not use, however, for the experiments herein.

(a) Capture of a microrobot
(b) Average speed estimation
Fig. 5: Velocity estimation. (a) The microrobot during operation from above as seen from the camera. The position is determined by the center of mass (yellow star) of the area enclosed by the estimated edges (yellow line). (b) The position of the microrobot during the experiment is shown by the black dots. The dashed blue line shows the fitted model. The solid blue line represents the linear part of the movement. Its slope is the average speed .

Iv-D Hyperparameter Estimation

Estimates of the hyperparameters are obtained from the experimental data reported in [6]. For this, the raw data from these experiments are re-processed to estimate the speed according to the model (8). These data are then used to estimate the boundaries of the parameter space and the prior for the cost function hyperparameters, namely, the constant mean , the signal variance  for the kernel, and the noise variance . The kernel length scales are more difficult to estimate with the given data and are instead set to one fourth of the total range for each parameter. To account for variability in the microrobot size, the estimated speed is normalized on the bodylength, as common in microrobotics, and is expressed in bodylength percent per second (). The fastest speed observed in these experiments, according to the velocity estimation described in Sec. IV-C is , obtained with a duty cycle and a wavelength

. From the measurements, we estimate the signal standard deviation 

to be and the noise standard deviation  to be . Physically reasonable ranges for the parameter space are a duty cycle comprised between and and a wavelength ranging from to  pixels in the camera frame (corresponding to to

). As prior distributions for each hyperparameter, we use Gaussian distributions with the corresponding estimated value as mean and one fourth of the estimate as standard deviation.

V Semi-Synthetic Benchmark

For BO to work well in practice, we need to appropriately choose some settings, in particular, the kernel and the acquisition function. Appropriate choices for these should allow for generalization to different microrobots, while retaining data-efficiency. To determine a good setup in a systematic way, we consider a semi-synthetic cost function and benchmark different settings. In this section, we first describe how the semi-synthetic cost function is generated, then we motivate the considered design choices for the GP and BO, and finally we present the benchmarking results.

V-a Creating a Semi-Synthetic Cost Function

As the systems dynamics (1

) are unknown, we cannot properly benchmark the aforementioned BO settings on a pure simulation. Furthermore, a good performance on a fully synthetic optimization test function will not necessarily translate to real-world performance gains. We thus propose to sample from the real system on a coarse grid, and then interpolate the data at unobserved locations to create a semi-synthetic cost function, which has similar properties to the real one, while being cheaper to evaluate. To collect the necessary data, we defined a grid on the parameter space and used BO with a standard kernel to sample from this grid. Specifically, we collected 56 cost function evaluations by running BO with a squared exponential (SE) kernel and parameters defined in 

IV-D. The observations were conducted with the microrobotic system described in Sec. II and with the experimental conditions reported in Sec. IV. These are the same conditions as used in [6], from which we estimated the prior information. We use a different microrobot which results in a slightly different locomotion behavior. For each BO run, we generate different semi-synthetic cost functions by considering the intrinsic noise of the measurements and re-sampling each data point. This assures that the cost function, including the location of the optimum, is different for each run, while retaining the overall ‘shape’ to some extent. As not all points on the grid were actually evaluated in experiments, the missing data points were generated by linear interpolation. The resulting data was smoothed with a 33 spatial mean filter. To obtain the final continuous cost function, the discrete data set was further interpolated using cubic splines. Figure 6 shows one sample from the possible benchmark functions.

Fig. 6: Contour plot of a random sample of the semi-synthetic cost function. Dots are locations of measurements from a real microrobot.

V-B Gaussian Process Priors

In this section, we describe the choice of the prior mean, signal variance, and kernel function, to model the semi-synthetic cost function. We distinguish between two kinds of priors: optimistic and pessimistic. An optimistic prior assumes good performance (i.e. low cost values) at unobserved locations, while a pessimistic prior assumes the opposite. Since we terminate BO after a small number of experiments, the choice of the prior has an impact on the search: An optimistic prior encourages an extensive exploration, while a pessimistic prior explores slower.

V-B1 Mean

As prior mean function we use a constant non-zero mean. For a cost function that is close to zero at its minimum, a mean of zero would be optimistic. If the real cost function has in fact a higher mean, a zero mean prior would make the optimizer believe that it has initially evaluated bad regions of the parameter space and therefore keep exploring new regions. This leads to an undesirably high amount of exploration at the corners and edges of the feasible parameter space. With a more pessimistic prior mean (e.g. at approximately the standard deviation of the signal, ) the belief is that most controllers perform badly. We found that this yields a much better exploration-exploitation trade-off. All settings presented herein use a pessimistic mean of as prior mean.

V-B2 Signal Variance

With a pessimistic mean , we can readjust the estimated value for the signal variance from the initial estimate used to find such that the lowest possible cost, which is by definition, is either inside or outside the confidence interval of the prior. We call the first case an optimistic setting and the second a pessimistic one. During the benchmark we compared an optimistic signal variance, such that , to a pessimistic one, with , to evaluate the effect of this setting on the BO performance.

V-B3 Kernel

The kernel is the main object in the GP that encodes prior knowledge. We compare four standard kernels and one composite kernel. The SE kernel is one of the most frequently used kernels for GP regression [8]. We use SE with automatic relevance determination (ARD), i.e. we have different length scales for each dimension. Another common option is the Rational Quadratic (RQ) kernel with ARD [8]. This kernel can be seen as a weighted sum over SE kernels of all lengthscales. The SE and RQ kernels both assume very smooth functions [8]. While the smoothness assumption of the SE and RQ kernels is too restrictive for most practical optimization problems (e.g. physical systems), the Matern kernel is known to be a much better choice [8, 16]. We compare two versions of the -Matern (M) kernel with ARD, (where ) and (where ). The property of the Matern kernel determines the smoothness of modeled functions, where a higher yields smoother functions. We also adopt a composite kernel, which we term , having one component that captures changes of the cost function at a short length scale, and one smoother component for variations at a longer scale:


The sum of two kernels is again a valid kernel, [8]. Combining simple kernels, e.g., with different length scales, is a powerful way to build more expressive models.

V-C Acquisition Functions

As mentioned in Sec. III-C, the adopted acquisition function determines the behavior and performance of BO. An acquisition function evaluates the utility of a candidate for sampling. Here a candidate is a parameter for the controller in the feasible parameter space. BO then chooses as sampling location the candidate with the maximum utility. Here we test the following acquisition functions. Probability of Improvement (PI) [17] is the probability that the candidates cost value is below some threshold . Here is the smallest cost as predicted by the posterior mean and a constant factor used to trade off exploration vs. exploitation. We set to an optimistic value of , expecting that the posterior is pessimistic, as its prior is a pessimistic mean. Expected Improvement (EI) [18] is the expected value of improvement over with an optimistic of . In contrast to PI, EI takes into account not only the probability of an improvement in the cost function at a candidate position, but also the expected magnitude of such improvement. Entropy Search (ES) [19] evaluates the expected information gain (change in entropy) about the location of the minimum at the candidate. As such, ES is not directly concerned about sampling at the location of the minimum, but about samples that maximize the information gain about the minimum location.

V-D Results

fixed Hyp. learned Hyp. fixed Hyp. learned Hyp. fixed Hyp. learned Hyp. fixed Hyp.
SE 11.8 (54.3) 10.7 (43.8) 11.4 (53.2) 14.9 (59.7) 9.9 (47.5) 13.8 (49.4) 11.7 (51.1)
RQ 5.5 (35.3) 4.7 (27.6) 7.5 (37.8) 7.3 (46.2) 7.3 (42.8) 10.1 (66.1) 6.0 (46.1)
M32 5.8 (37.7) 3.3 (33.2) 4.1 (33.6) 4.4 (36.2) 5.8 (51.2) 7.4 (66.2) 7.5 (43.9)
M52 3.5 (29.2) 4.2 (33.9) 20.0 (62.4) 4.5 (37.6) 13.5 (71.8) 8.0 (53.3) 7.9 (47.7)
2Mat 3.4 (23.0) 3.5 (26.7) 5.4 (43.8) 7.0 (38.6) 6.8 (41.1) 5.6 (33.4) -
TABLE I: Benchmarking on a semi-synthetic function: Overall best in bold, highlighted benchmarks are detailed in Fig. (a)a;
median (95-percentile) over 200 runs of the normalized regret  (%) after 20 iterations.

The results for BO on the semi-synthetic cost functions serve to evaluate the relative performances of different design choices. While we do not necessarily expect to achieve the exact same absolute performance on a real microrobot, the comparison gives a good indicator on which setting to prefer, when learning microrobot gaits. All experiments start at the same location, i.e. the initial controller found in Sec. IV-D (, ). For EI we tried both an optimistic and a pessimistic signal variance, respectively  and  (see above), whereas PI and ES were tested only with the optimistic signal variance . Each design choice of the BO was run on 200 different semi-synthetic cost function. Each BO run consisted of 20 iterations on one cost function, representing 20 experimental observations on the same system (the experimental budget). For each iteration, we calculated the normalized regret after each iteration defined as


where is the best parameter set as predicted by the GPs posterior mean, and is the parameter set at the true optimum. The results are summarized in Table I, which shows the median over the 200 runs of the normalized regret after 20 iterations. The number in brackets represent the 95-percentile over the 200 runs of the normalized regret after 20 iterations; that is, the value that contains the best of cases (in other words, of the cases performed better than this value and only worse). This metric, which quantifies the robustness of the specific learning scheme, is arguably the most important in practice, as in real experiments, the gait shall usually only be learned once for each microrobot. Numbers in bold represent the best performances achieved for the given design choices with and without hyperparameter optimization. The settings plotted in Fig. 9 are filled grey.

(a) Median regret over BO iteration
(b) Histogram of regret after 20 BO iterations
Fig. 9: (a) Comparison of the median over 200 run of the normalized regret for selected design choices at each iteration. (b) Histogram (normalized frequency) for SE EI fixed in yellow and for 2Mat EI learned in blue.

One can see from the results that the SE kernel is (with one exception) outperformed by all other kernels. The Matern- kernel and the sum of two Matern kernels of different length scales always perform reasonably well. It can also be observed that MAP hyperparameter learning during BO does not lead to consistently better results. Hyperparameter optimization can lead to inferior performance when directly compared to the same settings with fixed hyperparameters. Optimizing the hyperparameter does however lead to better performance in cases where the set of fixed hyperparameters are performing relatively poorly. Comparing design choices where the hyperparameter were seemingly off (M52, EI, ), learning the parameter helps significantly. This is especially relevant for the performance in terms of the 95-percentile. The best performing acquisition function was EI. We did, however, not test ES with hyperparameter optimization. Figure (a)a shows the comparison of a selection of design choices. Examining the regret of the settings with the worst performance after 20 iterations, one can see it decreases cost faster and then flattens out after six to eight iterations. We have observed that, after finding a local optimum, exploration stopped until the experimental budget was exhausted. Hyperparameter optimization keeps the steep decrease in cost during the first iterations, but does not get stuck in a local optimum. The histogram in Fig. (b)b shows the distribution of cost for the blue and the yellow curves from Fig. (a)a. Not only is the median improved, but the spread is also reduced significantly. The above comparison among different BO design choices allows us to evaluate the best setting to optimize the gait of a new, potentially different microrobot. As discussed in Sec. I, the learning scheme of choice has to be robust with respect to differences in microrobots and be highly data efficient, such that a good controller is found after 20 iterations. As it can be observed in Table I, the 2Mat kernel with EI and the optimistic signal variance  is expected to lead to substantial improvement in the locomotion performance, and to do so consistently. As learning hyperparameters generally makes the learning scheme more robust, we selected the 2Mat EI  learned Hyp. configuration for gait learning.

Vi Experimental Results

A new microrobot is used for the experimental validation of the developed learning control framework. As each microrobot differs slightly, the optimal gait parameters are also expected to be different. Without learning the specific cost function of the new microrobot the best known controller parameters are the ones identified in Sec. IV-D, namely and . Starting from this initial controller, we performed BO with an experimental budget of 20 samples. The learned cost function over the parameter space is shown in Fig. 10. The initial performance was then compared to the performance at the parameter sets with lowest posterior mean () and with the lowest observed cost (). The lowest-posterior-mean parameters were and , whereas the lowest-observed-cost ones were and . Fig. 11 shows the obtained values of the cost function . Notably, we were able to reduce the cost by  and  in the lowest-posterior-mean and lowest-observed-cost cases, respectively. Correspondigly, the locomotion performance, i.e. the average speed , was increased by  () and by  () with respect to the initial controller parameters obtained by previous experiments ().

Fig. 10: The posterior mean of the learned cost function in the parameter space. The blue dots show the 20 sample locations.
Fig. 11: The cost for the initial, posterior and the best observed controller parameters and their standard deviation based on three evaluations at each parameters set.

Vii Conclusion

This paper is the first to investigate and demonstrate the potential of probabilistic learning of optimal gaits for sub-millimeter robots. In particular, the proposed learning scheme based on BO with GPs proved to be highly data-efficient and robust against differences among microrobots. These features are essential for light-driven soft microrobots, which can have a relatively high variability and a limited lifetime under continuous excitation. The proposed BO learning approach is general and could be applied to find optimal controller parameters for other microrobotic systems. The method we used to develop the learning scheme for gait optimization is based on the definition of a semi-synthetic benchmark function. This approach allowed us to assess the robustness of different learning schemes against the inter-microrobot variability, and to select the scheme with the highest chance to provide substantial improvement in the locomotion performance of a slightly different microrobot. This was verified in an actual microrobot experiment. By using a different microrobot for the semi-synthetic benchmark and the validation experiment, we show that our method can handle differences between microrobots. Learning control is an important step toward building a complete robotic system based on light-driven soft microrobots. Still, there are many more interesting challenges ahead. Future research directions are the design of problem-specific kernels, as well as the learning of more complex gaits in higher dimensional parameter spaces to take advantage of the microrobot’s many DOFs [6]. This is relevant, for example, for planar locomotion using disk-shaped microrobots [5]. In addition, extending BO to learn time-varying cost functions is also particularly interesting, as it will enable microrobots to adapt to degrading material properties and, more importantly, to changes in their environment.


The authors thank Dr. Hao Zeng, Dr. Camilla Parmeggiani, Dr. Daniele Martella, and Prof. Diederik S. Wiersma for providing the LCE samples for the microrobots.


  • [1] A. Mourran, H. Zhang, R. Vinokur, and M. Möller, “Soft microrobots employing nonequilibrium actuation via plasmonic heating,” Advanced Materials, vol. 29, no. 2, 2017.
  • [2] H.-W. Huang, M. S. Sakar, A. J. Petruska, S. Pane, and B. J. Nelson, “Soft micromachines with programmable motility and morphology,” Nature Communications, vol. 7, 2016.
  • [3] J. C. Breger, C. Yoon, R. Xiao, H. R. Kwag, M. O. Wang, J. P. Fisher, T. D. Nguyen, and D. H. Gracias, “Self-folding thermo-magnetically responsive soft microgrippers,” ACS Applied Materials & Interfaces, vol. 7, no. 5, pp. 3398–3405, 2015.
  • [4] H. Zeng, P. Wasylczyk, C. Parmeggiani, D. Martella, M. Burresi, and D. S. Wiersma, “Light-fueled microscopic walkers,” Advanced Materials, vol. 27, no. 26, pp. 3883–3887, 2015.
  • [5] S. Palagi, A. G. Mark, S. Y. Reigh, K. Melde, T. Qiu, H. Zeng, C. Parmeggiani, D. Martella, A. Sanchez-Castillo, N. Kapernaum, F. Giesselmann, D. S. Wiersma, E. Lauga, and P. Fischer, “Structured light enables biomimetic swimming and versatile locomotion of photoresponsive soft microrobots,” Nature Materials, vol. 15, no. 6, pp. 647–653, 2016.
  • [6] S. Palagi, A. G. Mark, K. Melde, H. Zeng, C. Parmeggiani, D. Martella, D. S. Wiersma, and P. Fischer, “Soft continuous microrobots with multiple intrinsic degrees of freedom,” in Int. Conf. on Manipulation, Automation and Robotics at Small Scales, July 2016, pp. 1–5.
  • [7] S. Palagi, A. G. Mark, K. Melde, T. Qiu, H. Zeng, C. Parmeggiani, D. Martella, D. S. Wiersma, and P. Fischer, “Locomotion of light-driven soft microrobots through a hydrogel via local melting,” in Int. Conf. on Manipulation, Automation and Robotics at Small Scales, July 2017, pp. 1–5.
  • [8] C. Rasmussen and C. Williams,

    Gaussian Processes for Machine Learning

    .   MIT Press, 2006.
  • [9] R. Calandra, A. Seyfarth, J. Peters, and M. P. Deisenroth, “Bayesian optimization for learning gaits under uncertainty,”

    Annals of Mathematics and Artificial Intelligence

    , vol. 76, no. 1, pp. 5–23, Feb 2016.
  • [10] D. J. Lizotte, T. Wang, M. H. Bowling, and D. Schuurmans, “Automatic gait optimization with Gaussian process regression,” in Int. Joint Conf. on Artifical intelligence, Jan. 2007, pp. 944–949.
  • [11] B. Yang, G. Wang, R. Calandra, D. Contreras, S. Levine, and K. S. J. Pister, “Learning flexible and reusable locomotion primitives for a microrobot,” IEEE Robotics and Automation Letters, 2018.
  • [12] A. Marco, P. Hennig, J. Bohg, S. Schaal, and S. Trimpe, “Automatic LQR tuning based on Gaussian process global optimization,” in IEEE Int. Conf. on Robotics and Automation, May 2016, pp. 270–277.
  • [13] F. Berkenkamp, A. P. Schoellig, and A. Krause, “Safe controller optimization for quadrotors with gaussian processes,” in IEEE Int. Conf. on Robotics and Automation, May 2016, pp. 491–496.
  • [14]

    A. Marco, F. Berkenkamp, P. Hennig, A. P. Schoellig, A. Krause, S. Schaal, and S. Trimpe, “Virtual vs. Real: Trading off simulations and physical experiments in reinforcement learning with Bayesian optimization,” in

    IEEE Int. Conf. on Robotics and Automation, May 2017, pp. 1557–1563.
  • [15] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas, “Taking the human out of the loop: A review of Bayesian optimization,” Proc. of the IEEE, vol. 104, no. 1, pp. 148–175, Jan. 2016.
  • [16] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian optimization of machine learning algorithms,” in Advances in Neural Information Processing Systems, 2012, pp. 2951–2959.
  • [17] D. J. Lizotte, “Practical Bayesian optimization,” Ph.D. dissertation, University of Alberta, Edmonton, Canada, 2008.
  • [18] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimization of expensive black-box functions,” Journal of Global Optimization, vol. 13, no. 4, pp. 455–492, 1998.
  • [19] P. Hennig and C. J. Schuler, “Entropy search for information-efficient global optimization,” Journal of Machine Learning Research, vol. 13, pp. 1809–1837, June 2012.