Paper code for `Data-efficient Learning of Morphology and Controller for a Microrobot` (ICRA 2019)
Robot design is often a slow and difficult process requiring the iterative construction and testing of prototypes, with the goal of sequentially optimizing the design. For most robots, this process is further complicated by the need, when validating the capabilities of the hardware to solve the desired task, to already have an appropriate controller, which is in turn designed and tuned for the specific hardware. In this paper, we propose a novel approach, HPC-BBO, to efficiently and automatically design hardware configurations, and evaluate them by also automatically tuning the corresponding controller. HPC-BBO is based on a hierarchical Bayesian optimization process which iteratively optimizes morphology configurations (based on the performance of the previous designs during the controller learning process) and subsequently learns the corresponding controllers (exploiting the knowledge collected from optimizing for previous morphologies). Moreover, HPC-BBO can select a "batch" of multiple morphology designs at once, thus parallelizing hardware validation and reducing the number of time-consuming production cycles. We validate HPC-BBO on the design of the morphology and controller for a simulated 6-legged microrobot. Experimental results show that HPC-BBO outperforms multiple competitive baselines, and yields a 360% reduction in production cycles over standard Bayesian optimization, thus reducing the hypothetical manufacturing time of our microrobot from 21 to 4 months.READ FULL TEXT VIEW PDF
Paper code for `Data-efficient Learning of Morphology and Controller for a Microrobot` (ICRA 2019)
Designing intelligent robots to solve complex real-world tasks can be a daunting challenge. The dominant paradigm is based on the flawed assumption that the morphology of a robot (i.e., the hardware) can to a large extent be designed independently of the underlying controllers. In practice, this results in either designing general-purpose morphologies which can in theory solve a wide range of tasks, at the expense of being sub-optimal for any specific one, or in using an iterative process where each morphology design is followed by the design of an appropriate controller, with modifications (based on expert knowledge) to the previous morphology design to improve the chances of achieving a better controller at the next iteration. Both of these approaches usually require a significant amount of expert knowledge which heavily influences the ultimate performance of the system. Moreover, this process often requires a significant amount of design and manufacturing time for each morphology and controller.
One alternative to this paradigm is to autonomously optimize both morphology and controller, based directly on the performance achieved on the specific application of interest. This paradigm is conceptually similar to the evolutionary theory of Lamarckian inheritance [1, 2], where the physical features of a species are directly guided through the generation by the “usefulness” of the features to the specific environment inhabited. Although this evocative idea has already been proposed in past robotic literature [3, 4], the proposed approaches are difficult to apply to real-world applications due to the need for accurate analytical models, or a high number of evaluations to find good morphology/controller configurations.
The motivating application of this paper is the design of a hexapod microrobot and its corresponding controller. Due to the lack of sufficiently accurate models at the micro-scale, previous efforts  have proven dependent on expert knowledge. This robot is printed on silicon wafer, involving an expensive process with long production times (wafer delivery approximately 4-6 weeks after ordering). With the current manufacturing process, each wafer has sufficient space to contain up to 5 printed robots. After printing and assembly, it is still necessary to design a controller for the micro-robot, another design process largely guided by human experience. Moreover, it is important to design this controller within a small number of experiments before the robot wears out.
As a result of these real-world manufacturing and design constraints, we focus in this paper on the need for an efficient optimization process, able to find suitable designs within an extremely limited number of iterations. Hence, we investigate the following question: How can we design an algorithm for morphology/controller adaptation that is sufficiently data-efficient to be applicable to real-world robots?
Our contribution is two-fold: First, we propose a new optimization algorithm for co-optimization of robot morphology and controller in a data-efficient manner; Second, we validate this approach in simulation and show that it outperforms other state-of-the-art learning methods. Our method exploits the hierarchical relationship between morphology and controller to produce optimal robots despite limited wall time. A sketch of this process is shown in Figure 1. To demonstrate our method, we optimize the design and controller of a recently developed hexapod microrobot . We use a simulation of this microrobot to validate our approach, but our method is not dependent on such a simulation existing. Indeed, the data-efficient nature of our approach allows application to the real-world microrobot. Our approach is – without loss of generality – applicable to other robotic applications that benefit from joint morphology/controller optimization. This in an exciting frontier towards enabling real-world robots that can quickly adapt both morphology and controller to perform specific tasks with high performance.
Bayesian optimization has been used to optimize both the design of mechanical systems , as well as control policies [7, 8, 9], including those for microrobots . Our work differs from these previous works because we jointly optimize hardware design and control policies. While control policies are relatively cheap to optimize, new hardware designs incur substantial costs in both money and time. We propose a hierarchical batch contextual Bayesian optimization approach which identifies multiple promising hardware candidates at once (i.e., to be printed on the same silicon wafer) and leverages past evaluations when optimizing the control policy. Our approach yields significant performance improvements compared to standard Bayesian optimization, as demonstrated in Section VI.
Our work is related to efforts to optimize controls for robots with non-fixed or reconfigurable morphologies [11, 12, 13, 14, 15], where the controller must handle changes in the robot’s physical configuration, either to cope with unanticipated damage [13, 14] or prepare for different tasks . Notably,  jointly evolve both morphological and gait parameters on a physical robot able to change its leg lengths dynamically. Application of this method is restricted to robots able to rapidly, repeatedly, and precisely alter their own morphology. This kind of online body modification is impossible for our microrobot because each change requires fabrication and assembly of a new robot.
Other works allow for changes in morphology between iterations, but not dynamic adjustments (i.e., the robot cannot adjust its morphology during simulation or the real-world).  include muscle routing in their optimization of gaits of bipedal creatures, allowing variation of muscle attachment points within bounded regions. Although they produce highly natural-looking gaits, it is unclear how muscle routing translates to a robot without an analogous concept of muscles. Furthermore, such aesthetic virtues are not relevant to our robot. The detailed modeling of muscles and ligaments also entails a massive number of parameters to optimize – up to thirty parameters for muscle physiology alone – and a resultant increase in optimization time. In contrast, we only have 3 hardware parameters to optimize.
Substantial work in evolutionary robotics has focused on jointly evolving morphology and controls of virtual lifeforms [18, 3, 19, 20]. However, evolving body plans from scratch rather than tuning existing designs increases optimization time substantially. Such works also often blur the lines between morphology and control parameters , which is disadvantageous when morphological changes are orders of magnitude more costly to evaluate. Our method produces highly performing robots with far fewer morphological changes when compared to evolutionary techniques.
Both  and  optimize morphology and controls simultaneously but do not exploit the hierarchical relationship between morphology and control. Such optimization often leads to convergence of morphology before convergence of control [23, 24] and fails to adequately explore the space of morphology parameters. Anecdotally, we observed this in experiments using regular Bayesian optimization, which frequently converged to a poorly performing morphology where only the front four legs touched the ground.
One proposed explanation by  suggests that morphology mediates the role of the controller by functioning like an interface to the real world. Simultaneously changing morphology and control parameters is therefore counterproductive because each controller is specialized for some particular morphology. Early convergence of morphology is a consequence of heavy penalization of controller changes, since updating body plans negatively affects the performance of a controller optimized for a different body plan. In this vein, 
conduct policy search for several hardware schemes to learn optimal control-hardware combinations, running reinforcement learning for each hardware design they explore. This work is most similar to ours because it jointly optimizes morphology and controls, but does not do so simultaneously and thereby avoids the early convergence problem. However, all the morphologies were bio-inspired and designed in advance by hand, whereas we include a large design space of morphologies in our optimization. Engineering morphologies by hand for our robot is notably harder because of a novel leg design that precludes straightforward transfer of other work[26, 27] in legged locomotion.
We formulate learning the morphology and controller of our microrobot as the optimization
of the parameters , where denotes the parameters of the morphology and the parameters of the controllers, w.r.t. the desired objective function .
Although this problem can be solved as a single joint optimization task, changing the morphology at each optimization step is extremely costly for our microrobot as each fabrication takes up to a month. Hence, it is crucial to minimize the amount of morphology evaluations. On the other hand, once a morphology is available we can perform hundreds of controller evaluations at little cost. This difference in evaluation cost already suggests that the formulation as a single joint optimization might not be desirable.
An alternative formulation is as a hierarchical optimization task with two independent levels of optimization, morphology optimization on top and controller below, where we alternate between selecting a morphology and optimizing the corresponding controller. This formulation offers a natural way of decoupling the number of evaluations performed on the controller from the evaluations of the morphology. A further improvement on this formulation is to consider the batch nature of the morphology evaluations for our application, selecting multiple morphologies to be manufactured (and later evaluated) at once. One drawback of this hierarchical formulation is that decoupling the two levels of optimization prevents information sharing between them. In practice, this means each controller optimization process cannot make use of information provided by previous controller optimizations, and thus needs to start from scratch.
Our approach, presented in Section V, extends the hierarchical batch formulation and allows to make full use of data collected from previous controller optimizations, thus further improving the data-efficiency.
Central pattern generators (CPGs) are neural circuits commonly found in vertebrates that do not need sensory input to produce periodic outputs . They have been used widely in the design of gaits for robotic locomotion [29, 30, 31]. We chose to use CPGs for our controller. For reasons why CPGs are a good choice of controller for microrobots, we refer readers to . The dynamics of CPGs are modeled as a network of coupled non-linear oscillators. For an in-depth explanation of how the oscillators in CPG networks work, we refer readers to .
A major benefit of using CPGs for our controller is the low number of parameters to optimize. Usually, the parameters optimized are where is the desired frequency of the oscillators, is the phase difference between each vertical-horizontal oscillator pairs, and and are the amplitudes of the left and right side oscillators and allow for directional control of the microrobot.
One method often used to automate the parameter tuning process is Bayesian optimization (BO). BO is a zero-order black-box optimizer often used for global optimization of expensive functions [32, 33]. At every iteration of the optimization, BO learns a model from the dataset of previously evaluated parameters and their returned objective values . The learned model is then used to execute a virtual optimization by using an acquisition function which controls the trade-off between exploitation and exploration. The returned parameters from this optimized model are then evaluated on the real system to obtain an objective value . Finally, the parameters evaluated and the corresponding objective value obtained from the real system are added to the dataset, and a new iteration of the optimization begins. The choice of model is important for BO to learn the underlying objective. One commonly used model, and the one which we use in this paper, is the Gaussian process (GP) model . For a more in-depth background on BO, we direct readers to [33, 32, 35].
An extension of standard BO we use as a subcomponent is contextual Bayesian optimization (cBO) . cBO extends the standard BO framework by augmenting the optimization problem with an additional context parameter , and learns a joint policy , where is fixed during optimization (i.e., it is observable, but not controllable). In our approach, we use cBO to optimize the controller, as described in Section V. By encoding the morphologies as contexts, cBO takes advantage of the similarities between different morphologies and generalizes to good polices for unseen designs faster.
Within our approach, we also use another variant of BO called batch Bayesian optimization (BBO)  to optimize the morphology. In contrast to a fully sequential algorithm, which alternates between choosing individual points and evaluating them on the true reward function, BBO queries the acquisition function for multiple points, then evaluates them in parallel before selecting another. The first set of parameters of each batch is selected as in a sequential policy and the next set chosen with an acquisition function. However, rather than immediately evaluate the returned parameters on the real system, BBO defers evaluation of the reward function on this point until the entire batch is selected, temporarily substituting for its reward a prediction made by the GP (also called hallucinated observations  or fantasies ). The GP model is updated with the data point of returned parameters and respective hallucinated observation and is then used to select the next point. Once the entire batch is selected, all points are evaluated and the hallucinated observations are replaced by real ones. Although hallucinated rewards are less informative, batching saves time since points in the batch can be evaluated in parallel. The particular implementation of BBO we use as a subcomponent in our approach is PC-BBO .
Our algorithm is called hierarchical process constrained batch Bayesian optimization (HPC-BBO). In this context, “process-constrained” refers not to classical constrained optimization, but rather to a physical limitation that restricts how frequently a particular parameter can be changed. In our case, it would be relatively straightforward to change control parameters (unconstrained) on a physical microrobot, compared to fabricating a new hexapod to test a different morphology (constrained).
We now detail Algorithm 1. Given a set of morphology parameters and controller parameters , we want to jointly optimize them. We use BBO to select the morphology parameters and evaluate them by having a nested optimization procedure for the controller parameters . At the end of the controller optimization, we return to the morphology optimizer the best reward obtained for that specific morphology. We initialize a batch of size by picking morphology parameters with random search. In the contextual case, for each of step of the current batch, we set the morphology parameters as a context, and perform contextual Bayesian optimization over the controller parameters using a GP model that learns a policy , where . The noncontextual variant of our algorithm uses standard BO instead of cBO to optimize the controller, which necessitates training a new GP for each morphology in the batch for a total of controller GPs per batch. Both the contextual GP and noncontextual GPs must first be initialized with randomly chosen controller parameters. After the first batch is evaluated, another GP model is used to learn a policy and updated with the batch and corresponding best rewards for each batch element evaluated from the controller optimization. We query this updated model using an acquisition function (GP-UCB in our case) for another batch of morphology parameters, this time with updated knowledge of what parameters generated the best rewards from the software optimization, and evaluate this batch as before. Regardless of whether noncontextual HPC-BBO or contextual HPC-BBO is employed, the use of BBO to optimize morphology allows all controller optimizations for a batch to be done in parallel. By evaluating multiple morphologies in one batch, we drastically reduce wall time for optimization since we can evaluate morphologies in one production cycle, whereas regular BO evaluates one. In essence, contextual HPC-BBO leverages the data efficiency of BBO to optimize the expensive constrained parameters while also taking advantage of the information learned across different contexts with cBO to optimize the unconstrained parameters. Using our approach, we can co-optimize robot control and morphology in a much more data-efficient manner, allowing us to evaluate more microrobots per fabrication cycle.
In our experiments, we used the robotic simulator V-REP to model a hexapod microrobot similar to the one described in . Each of the six legs is driven by 2 motors: one motor actuates the leg vertically, and the other motor moves the leg back and forth – resulting in the legs having a circular-like sweep. The simulated microrobot is scaled 100 times larger than its physical analogue since V-REP is unable to handle dynamics at the micrometer scale. It is important to notice that our algorithm does not require a simulator to work, and that the simulator is here used only to evaluate the performance of the robots, similar to what would happen in the real world.
For the controller, we optimize six CPG parameters, which correspond to the frequency, amplitude, and offsets of the vertical and horizontal motors. In addition, we separately consider parameters that control the amplitude of leg swings on the left and right sets of legs respectively. Although most of the parameters are related to our controller of choice, the CPG, our method is agnostic to the type of controller. The three morphology parameters control the lengths of pairs of legs (front, middle, and rear) and are encoded as ratios relative to a normalized leg length. We use a batch size and run the controller optimizer for 50 iterations for each morphology evaluation. Videos and code for reproducing the experiments are available at https://sites.google.com/view/learning-robot-morphology
We now compare the two variants of our approach (contextual HPC-BBO and non-contextual HPC-BBO) against three baselines: random search, covariance matrix adaptation evolutionary strategy (CMA-ES), and standard BO. Random search 
samples the parameter space as a uniform distribution, establishing a baseline for our optimization task. Covariance matrix adaptation evolutionary strategy is a gradient-free algorithm to optimize non-convex functions. Standard BO optimizes all parameters at once and, unlike our hierarchical approach, does not batch hardware evaluations.
Figure 3 shows the learning curves of the various methods w.r.t. different optimization desiderata: number of fabrication cycles, morphology iterations, and controller evaluations. The number of fabrication cycles is number of times a new silicon wafer has to be fabricated and is the most important statistic for comparison because it directly correlates to wall time. By generating a batch of multiple morphologies, HPC-BBO examines many different sets of hardware parameters at each fabrication cycle, whereas CMA-ES and standard BO only evaluate one new morphology every cycle. The fabrication process of each batch of morphologies takes 4-6 weeks in the real world. As a result, standard BO would need 21 months to reach the same performance that our approach would reach in 4 months (assuming that the convergence rate in simulation would translate to real world). HPC-BBO also outperforms standard BO and CMA-ES w.r.t. the number of morphology iterations. This metric is relevant because it shows that even if the batch size , HPC-BBO would still outperform standard BO (since the larger batch sizes decrease performance, as explained in Section IV).
HPC-BBO is significantly outperformed by standard BO when considering the number of controller evaluations, because standard BO is able to see far more morphologies as it is able to change both controller and morphology parameters at the same time. However, since even just 200 controller evaluations would take a decade and a half, comparing HPC-BBO to standard BO on this basis is misleading.
The importance of our hierarchical technique is highlighted by the result presented in Table I, which shows that the gaits we learn are specific to different morphologies. We take the controllers and morphologies of the best four performing robots across all experiments and show that recombining them produces suboptimal pairings. Highly performing morphologies can perform up to 90% worse when paired with a controller optimized for another morphology, even when the other morphology-controller pair also performs well. Importantly, this relationship is not symmetric: even though the controller optimized for morphology 1 only decreases the performance of morphology 3 by 8%, the controller for morphology 3 decreases the performance of morphology 1 by 75%.
Even the controller that least decreases the performance of morphologies it was not optimized for, still decreases the performance of the morphology which has the least change in performance with different controllers. This means that even if standard BO or CMA-ES were to find a controller that performs well across most morphologies, they are not guaranteed to find the optimal controller/morphology pair.
We evaluate both the contextual and noncontextual variants of HPC-BBO. Contextual HPC-BBO consistently outperforms noncontextual HPC-BBO. As seen in Figure 3, the performance gap in distance traveled is largest at the beginning and towards the end w.r.t. the number of fabrication cycles and morphology iterations. Contextual HPC-BBO outperforms non-contextual HPC-BBO at the beginning of optimization because it is able to efficiently use data accumulated in previous morphology contexts to produce competitive gaits in unseen morphology contexts. This gap becomes less evident when the number of controller evaluations increases, as seen in the rightmost graph of Figure 3. As non-contextual HPC-BBO trains on more morphologies, it picks better ones, which lessens the importance of generalizing across different contexts. This is because a higher number of controller evaluation iterations devoted to optimizing the software parameters allows the non-contextual optimizer to catch up to the contextual optimizer. With more iterations to optimize software parameters, the information of the contextual information from previous iterations is washed out, reducing the advantage the contextual optimizer has with fewer iterations. The practical implication is that a shorter period of time to optimize software favors the contextual approach, whereas a longer one reduces its advantage.
In this paper, we studied how to automatically optimize the design of robot morphologies and controllers in a data-efficient manner. To achieve this goal, we introduced a novel algorithm, hierarchical process constrained batch Bayesian optimization (HPC-BBO), and validated our approach in simulation. Results on a simulated hexapod microrobot show that HPC-BBO significantly outperforms all other baselines and other state-of-the-art learning methods, with a performance improvement of over standard Bayesian optimization. By exploiting the hierarchical relationship between morphology and controller, we demonstrate that HPC-BBO can produce high-performing morphologies/controllers in a data-efficient manner. Moreover, HPC-BBO can exploit the simultaneous fabrication of multiple robot morphologies. As a result, HPC-BBO achieve the same performance of standard BO in a fifth of the time ( months compared to months).
The proposed approach is a first step towards the grand goal of allowing robots that can not only quickly learn suitable controllers from experience, but also to adapt their hardware based on the needs dictated by their environment and goals.
An exciting future direction is to “open the black-box” by replacing Bayesian optimization with model-based reinforcement learning , to allow for more complex controllers. Additionally, we aim to apply the proposed approach to the design of real-world micro-robots.
International Joint Conference on Artificial Intelligence (IJCAI), 2007, pp. 944–949.
IEEE transactions on neural networks and learning systems, vol. 25, no. 3, pp. 441–456, 2014.
C. E. Rasmussen, “Gaussian processes in machine learning,” inAdvanced lectures on machine learning. Springer, 2004, pp. 63–71.