Swarms are impressive examples of collective behavior leading to the emergence of new properties [4, 26, 8]. One of the most visible characteristics of swarms is their display of complex, dynamic structures [14, 1, 7]. Here, we are specifically interested in the creation of structure through cooperative assembly. Such behavior has been investigated both in the natural world [1, 7] as well as in robotic swarms [15, 27, 20, 24].
In this paper, we are interested in the automated programming of a swarm of micro-robots (). The micro-robots are microscopic agarose beads functionalized with bio-molecules (single strand DNA of 12-24 base pairs) . The beads forming the body of the robots are small enough to move through Brownian motion. The DNA functionalization allows them to produce (a) chemical signals that may impact the behavior of nearby beads and (b) an anchoring signal that will attach them to their neighbors. Clusters created from aggregated beads move slower or even stop, based on their size. That property allows us to control where the beads should go, which we use to create self-assembled structures. Thanks to that scale and the low price and availability of the molecular components, we previously designed and validated in vitro a system of a million micro-robots that self-assemble [3, 29].
We previously introduced the BioNeat algorithm , inspired from state-of-the-art Neat algorithm . BioNeat aims at producing chemical reaction networks (CRN) which represent bio-molecules to be attached to beads or left floating in the 2-dimensional substrate. Given a target for self-assembly (e.g., in Fig. 2), the objective is that micro-robots assemble to one another in the target area.
Though initial results are promising, this previous work raised several important issues:
target complexity: optimization was limited to simple targets, with one single rectangular region for self-assembly.
computational cost: simulating chemical reactions in a realistic fashion is extremely expensive. Several days of computation were necessary for running the optimization algorithm.
In this paper, we address both issues. Firstly, we introduce the use of MAP-Elites , an illumination algorithm that favors exploration over pure optimization, thus reducing the risk of premature convergence during the search. We also implement a new mechanism, termed topological initialization that helps MAP-Elites to bootstrap exploration when confronted with search spaces with large neutral regions.
Secondly, we introduce a surrogate model  to approximate the (costly) pseudo-realistic simulation of real chemical reactions. We also introduce transferability  in combination with the proposed surrogate model, i.e., bootstrapping optimization using the surrogate model and then reverting to the costly pseudo-realistic simulator.
In the following, we first describe the Methods, both in terms of experimental setup, objective function and optimization algorithms. Then, we present the results obtained using a target function that combines several regions of interest. Finally, we discuss the remaining challenges of the approach and conclude.
Ii-a Molecular robotic swarm
We consider a swarm comprised of molecular robots made from spherical beads grafted with specific strands of DNA (Fig. 1). Those strands are of two types: (a) templates used to capture signal DNA from the environment, process it and produce signal back; (b) anchoring points, which are used for aggregation when the anchoring signal is present.
Processing and production of signals are based on the PEN DNA toolbox  (Fig. 1, c.): a signal strand can attach to a compatible template and either triggers the production of a different DNA strand (activation) or prevents activation from other strands (inhibition). The activation mechanism works as follows: once a compatible signal strand is double-stranded to a template, it gets extended into a fully complementary strand by a first enzyme called polymerase. The extend strand is designed to contain the recognition site of a second enzyme, called nickase, which splits it in half. That activity recovers the original signal strand (green in Fig. 1, c.) as well as produces a new signal strand (orange in Fig. 1, c.). That structure is unstable by design and both strands eventually fall off, allowing them to interact with other templates in the system. The inhibition mechanism relies on a signal strand attaching to a target template in a way that does not trigger the activity of either enzymes: a 3’-end mismatch prevents polymerase extension, while missing bases on the 5’-end of the strand prevents recognition by the nickase. At the same time, the inhibiting strand physically prevents activation signals to interact with the template, effectively inactivating it. As with activation signal, this structure is designed to be unstable, allowing the template to be eventually freed.
Signal strands are spread in the environment through chemical diffusion and are degraded over time by a third enzyme called exonuclease. Templates are chemically protected against that activity, thus remaining stable.
Additionally to templates, anchor strands are grafted on the beads. A specific signal, called anchoring signal, can link together anchor strands from two separate beads to link them together (Fig. 1, b.). Past experimental results showed that this mechanism is strong enough to produce stable aggregation .
Robots move through Brownian motion: individual robots move much faster than aggregated clusters, which in turn may become completely motionless when they get large enough.
Ii-B Simulation models
We use two distinct models to simulate our molecular systems: a “full” model that explicitly computes beads positions and aggregation, and a “surrogate” model that considers a simplified scenario where beads are uniformly spread in the environment.
In both cases, the production, diffusion, and degradation of signal species are modeled through reaction-diffusion. For a given signal , we have:
where is the concentration of , is the contribution from reactions (e.g., production or degradation, see Figure 1,C. for a graphical representation and  for an explicit formula), is the diffusion coefficient of and
is the Laplacian operator. That approach generates a set of Partial Differential Equations (PDEs) that we solve numerically.
In the full model, beads are modelled as disks moving in the environment through Brownian motion. We implement it by adding Gaussian noise to their position over time, with mean
with the Boltzmann constant, the temperature, the bead diameter and the viscosity of water. To integrate with the reaction-diffusion part of the system, that Gaussian noise is scaled by the length of a time step from the PDE solver.
While the simulation is 2-dimensional, we consider that the beads are moving in a 3-dimensional environment, which allows us to ignore collisions. In presence of the anchoring signal, beads have a probability to aggregate. That probability is computed by implementing a Gillespie-like step: considering the current concentration of signal, we predict when the next aggregation event will occur
. If that event happens before the next time step of the PDE solver, aggregation is considered successful. The reverse reaction, a bead separating from an aggregate, is handled the same way. Since reactions probability distributions are memoryless, that approach does not change the overall behavior of the system. Once aggregated, we consider a cluster ofmoves in a similar fashion to a single bead with a radius times larger.
Note that keeping track of the behavior of each bead is computationally intensive. To minimize costs, we use far fewer beads in simulations than there would be in wetlab experiments. To still cover enough of the environment, a larger radius is used both to compute signal production and aggregation range. However, that radius would make the beads extremely slow; to keep the system dynamic, we consider that the beads move as if they had a smaller radius. Those two radii are indicated as bead size (aggregation) and bead size (Brownian motion) respectively in Table I.
The local concentration of each signal-producing species is directly proportional to the number of beads at a given point of space. That is, we sum the concentrations of DNA molecules grafted on all the beads that are present. Note that, due to the non-linearity of the enzymatic reactions involved, a linear increase in signal-producing species does not necessarily mean a linear increase in the production of signal species.
In the surrogate model, we use the same reaction-diffusion approach, however the concentrations of signal-producing strands are considered spatially uniform and are equal to the respective concentrations of species present on the surface of a single bead. In practice, this model is equivalent to having motionless beads uniformly spread out in the environment or having the signal-producing species grafted on the surface of the environment rather than on beads. Note that such reaction-diffusion system can be experimentally implemented as well .
One of the main difference between these two models is that the full model is stochastic, due to the random motion of beads and probabilistic aggregations, while the surrogate model is deterministic. This characteristic makes the surrogate model much cheaper to compute as multiple repeats are not required. At the same time, not having to simulate the Brownian motion and aggregation of beads gives us an additional cut in computational cost. On the negative side, the surrogate model may fail to capture the lack of robustness of a given system. It is also incapable of taking into account possible designs relying on the increase in local concentrations when beads aggregate. However, we have found that, for the task presented in this paper, performances achieved with the simulated model or with the full model and with the surrogate model are correlated (Fig. 3 and Results section).
Ii-C Target Objective
We extend our work from [3, 6] with a more challenging problem, described in this paragraph. We previously devised a methodology to optimize CRN controllers driving a swarm of micro-beads to self-assemble into simple shapes composed of only one straight line. Here, we extends this approach further by focusing on a more complex type of pattern, composed of two lines. This requires that micro-robots to consider several chemical signals in order to perform a collection of attraction and repulsion behaviors to navigate to the target area.
We define a target for self-assembly corresponding to the shape of the letter T (Fig. 2 right). Our objective is for the micro-beads to self-assemble into the target area, shown in black, starting from their initial random positions. Two fixed gradient sources are arranged in the top-left and top-right positions in the arena (Fig. 2 left). They each emit a respective type of signal strand that diffuse throughout the environment. They provide the micro-beads information about their localization, potentially inducing self-assembly. As the gradients diffuse isotropically, self-assembling into straight patterns can be difficult.
We quantify the performance of a simulation by using the method described in [3, 6]. The experimental arena is discretized into a matrix of cells, with . This allows to compute the following match-nomatch score:
with the set of cells in the target area, a reward parameter, a Bool function of the presence of beads at position , the distance of a cell towards the closest position in and a scaling parameter. Individual are rewarded according to the number of cells within and penalized for cells outside of , with penalization increasing with distance to the target area. Table I lists the parameters used for simulation and fitness computation.
We define two fitness functions catering respectively to the full and surrogate models. The surrogate model is deterministic, so the fitness is equal to the match-nomatch score over one simulation. The full model is stochastic. To alleviate simulation stochasticity, we assess the performance of an individual CRN by reevaluating simulations, computing their respective match-nomatch scores, and defining the resulting fitness value as the median of these scores. The selection of the number of reevaluations corresponds to a trade-off between precision and computational costs. Our choice of reevaluations was motivated by the results presented in Fig. 4 assessing fitness variability depending on the number of reevaluations.
Individuals are deemed valid only if their respective topology matches the requirements of Table II in term of number of signal species and number of activation and inhibition templates.
|Bead size (aggregation)|
|Bead size (Brownian motion)|
|Time discretization||min per step|
|Simulation duration||steps (i.e., min)|
|Target width||(20% of arena width)|
We use a two-step experimental protocol for optimization. As a first step, we perform both structural and parametric optimization to obtain chemical reaction networks (CRNs) through two possible methods: BioNeat and Map-Elites. Structural optimization modifies which templates are added to the beads, thus changing the set of reactions. Parametric optimization focuses on optimizing continuous values, namely DNA sequences stabilities and template concentrations. After convergence or exhaustion of the evaluation budget, we refine the designs obtained by performing a parametric-only optimization using CMA-ES, a state-of-the-art evolutionary optimization method for continuous values.
is an evolutionary algorithm we first introduced in. It takes inspiration from the famous state-of-the-art Neat algorithm 
, which was originally designed to optimize artificial neural networks.BioNeat uses specific variation operators to navigate the search space of chemical reaction networks. It is also capable to protect innovation, that is, to explore several regions of the search space simultaneously, balancing between novelty of a particular design and the quality of solutions. We previously showed that BioNeat provides efficient solutions for targets comprised of a single regions (horizontal or vertical lines, see  for details). The main limit of BioNeat is that while it can protect innovation for some time, there is not guarantee that it is capable of escaping completely the curse of premature convergence due to the pressure for optimizing towards (possibly only temporary) better solutions.
|Number of retrials per individuals|
|Maximal number of activation signals|
|Maximal number of activation templates|
|Maximal number of inhibition templates|
|Target number of BioNeat species|
|Number of templates|
|Number of bins|
|Number of elites per grid bin|
|Number of templates|
|Add template strand|
|Remove template strand|
|Add signal species|
|Add inhibition species|
BioNeat searches through CRN topologies iteratively. This behavior is inherited from Neat that postulated that iterated small changes in topologies would often only result in a mild effect on the fitness values. It may not be the case with CRN , where small changes in topology can have severe effects in fitness values, a particular trait of deceptive and hard-explore problems. This may explain why BioNeat is prone to premature convergence during optimization, as it does not possess a way to explore a totally unexplored niche in the space of topologies. While this aspect is partially mitigated with BioNeat mechanism of speciation, which allows the parallel optimization of several topological niches, it does not enforce the discovery of totally novel niches. Worst, as new species are only created through atomic mutations (i.e., change only a small part of the topology), the surviving species may contains individuals with very similar topologies, possibly in the same topological niche.
In order to favor exploration over pure optimization, we rely on MAP-Elites , a Quality-Diversity algorithm , that decomposes the search space into regions based on feature descriptions. It considers how does a candidate solution looks like in the phenotypic space instead of considering how it is coded in the genotypic space. This method is particularly suited to cope with multi-modal, deceptive, hard exploration problems where traditional optimization algorithms would be prone to premature convergence, as in evolutionary robotics [19, 9, 11].
MAP-Elites iteratively regroups the explored solutions in a grid of elites. This results in a collection of high-performing individuals across a number of features selected by the user, corresponding to the axes of the grid. Here, we only consider a single feature corresponding to the total number of templates in the topology of an individual. CRN with smaller number of templates are easier to test experimentally but lose expressivity. Conversely, large-sized CRN can describe more complex behaviors, which may be necessary for the beads to successfully self-aggregate into the target shape. As such, a trade-off in term of topology complexity has to be considered, possibly after the optimization process. This substantiates methodologies that concurrently search for topologies of differing sizes.
MAP-Elites is equipped with the same set of mutation operators as BioNeat, and also retains its capability to optimize iteratively the topologies. We introduce a novel methodology to bootstrap MAP-Elites exploration by initializing a collection of individuals with random topologies. This approach, described in Algorithm 1, allows MAP-Elites to consider a large number of differing topologies. It makes use of the BioNeat mutation operators to generate individuals of varying topologies across a range of number of templates. In one optimization run, of the individuals are initialized with a random topology (i.e., individuals for an evaluation budget of ). This contrasts with the BioNeat approach, where only small iterative changes in topologies are possible from mutations, and where individuals with totally new topologies are not initialized.
Table II lists the chosen parameters for the MAP-Elites algorithm. We use our own implementation of BioNeat and MAP-Elites. BioNeat is open source and available from https://bitbucket.org/AubertKato/bioneat/. MAP-Elites is coded in Python using the QDpy library  and freely available as open source software at https://gitlab.com/leo.cazenille/qdpy. All additional scripts used in this paper are available at https://bitbucket.org/leo-cazenille/daccad-qd.
The BioNeat and MAP-Elites algorithms (shorten as BN and ME for brevity) are used to optimize CRNs on the T target in three settings. The full and srg settings involve, respectively, the full and the surrogate simulations on a budget of evaluations. The srg+tsf setting tests the transferability of the surrogate model: individuals are first optimized using the surrogate model during evaluations. Then, all individuals of the last iteration (last population for BioNeat, final grid of MAP-Elites) are reevaluated and optimized using the full model over evaluations.
All results are compared to a CRN designed by an expert, corresponding to the Expert setting.
The best-performing individuals of each method are reevaluated using the full model, with scores presented in Table LABEL:tab:scores. For each method, CRN parameters of the best reevaluated individuals are optimized by CMA-ES  over evaluations, as in : the topology of these individuals are fixed, but templates concentration and activation signals stability parameters are optimized.
Full runs are computed in hours per run, srg runs take minutes per run and srg+tsf runs take hours per run.
For each setting, the evolution of fitness values during 16 optimization runs are presented in Fig. LABEL:fig:fitnessPerEvalsT.
Figure LABEL:fig:bestSolutions shows the final state of a full simulation of the best-performing solutions for each setting, before and after optimization by CMA-ES. Only the anchoring signal is represented, corresponding to self-assembled clusters of beads.
To compare the performance of the different settings, we collected the best individuals from each run (16 individuals per setting). For any two given settings, we used a Mann-Whitney U test on the distribution of fitness of their respective best individuals (Table LABEL:tab:mannwhitney). The algorithms can be ordered in terms of mean performance: ME ME+srg+tsf BN+srg+tsf BN ME+srg BN+srg, where indicates a statistically significant difference (). The remaining cases, shown by a have , and respectively.
In order to evaluate the impact of further parametric optimization refinement, we picked the best performing individual for each method, and compare performance before and after an extra parametric optimization step (i.e., the topology is fixed) using the CMA-ES optimization algorithm. All individuals were reevaluated 100 times with the full model and the distribution of their evaluations were compared with a Mann-Whitney U test).
Ordering the best individuals from each method before parametric optimization gives the following: ME ME+srg+tsf BN Expert BN+srg+tsf ME+srg BN+srg.
After parametric optimization, the order of the best individuals becomes: Expert+CMAES ME+CMAES BN+srg+CMAES ME+srg+tsf+CMAES BN+CMAES ME+srg+CMAES BN+srg+tsf+CMAES.
Finally, we also measured the impact of the optimization by CMA-ES by comparing the distribution of evaluations of a given individual and that of its optimized version. We used once again 100 retrials with a Mann-Whitney U test. The -values were added to Table LABEL:tab:scores. All optimized individuals were significantly better than their original () except for the best individual of the ME with Full model setting ().
In accordance with the specifications of Table II, optimized individuals are noticeably smaller than the expert-designed CRN (19 templates, Fig. 5): 13 for BN, 12 for BN+srg and BN+srg+tsf, 11 for ME, 10 for ME+srg and ME+srg+tsf. Fewer templates is actually beneficial with respect to potential in vitro experimentation, as the models used loose accuracy when the number of chemical interactions grows.
Iv Discussion and conclusion
We presented an approach to optimize CRNs driving the behavior of a swarm of bio-micro-robots to self-assemble into a T-shaped target. Our approach improves upon the methodology in , based on BioNeat. We then described a method to optimize CRNs using the MAP-Elites . It is a Quality-Diversity algorithm , allowing us to explore solutions that are both high-performing and diverse. It is particularly suitable to cope with the deceptive nature of our target problem. We also introduced the notion of topological initializations to bootstrap MAP-Elites. We showed that MAP-Elites outperforms BioNeat both in term of performance and convergence speed. While the best-performing individuals optimized by BioNeat still require an additional optimization stage using CMA-ES to provide competitive results, MAP-Elites does not necessitate this procedure.
We presented a way to reduce the computational costs of simulations through the use of a surrogate model of the robot swarm dynamics. Our optimization approach required far less computational power, while still retaining sufficient accuracy. We described how individuals optimized through the surrogate model can be transferred into optimization processes implementing the full model. We showed that such results are competitive with results obtained using only the full model, nearly dividing by two the optimization effort (60%).
We limited the optimization runs to small CRN ( 13 templates) to ease potential experimental validations. Indeed, smaller CRNs are easier to test experimentally, but lose expressivity. We showed that all methods are capable of optimizing small CRN while still being competitive with results obtained by the expert-designed CRN, that involves a far larger number of templates (19). Additionally, MAP-Elites allows for the concurrent optimization of CRN of differing size (number of templates), resulting in several CRNs that could be selected after the optimization process for experimentation, depending on the choices and requirements of the experimenter.
Our approach with MAP-Elites could be improved by testing additional features rather than only the number of templates, including topological features (e.g., number of vertices, properties of the graph) or behavioral features (e.g., occupation of various zones of the arena, speed of reaching a stable state). The interplay between surrogate and full model could be improved further by finding automatically when to perform a transfer. The optimization process could also switch several times between surrogate and full model evaluations, either periodically or under some predetermined conditions. Finally, our approach could be applied to more complex target shapes, possibly requiring the implementation of mechanisms to optimized CRN iteratively, or by automatically decomposing the target shapes into sub-shapes.
This work was supported by JSPS KAKENHI Grant Number JP17K00399 and by Grant-in-Aid for JSPS Fellows JP19F19722. This work was also supported by the Agence Nationale pour la Recherche under Grant ANR-18-CE33-0006 (MSR project).
-  C Anderson, G Theraulaz, and JL Deneubourg. Self-assemblages in insect societies. Insectes sociaux, 49(2):99–110, 2002.
-  N Aubert, C Mosca, T Fujii, M Hagiya, and Y Rondelez. Computer-assisted design for scaling up systems based on dna reaction networks. J. R. Soc. Interface, 11(93):20131167, 2014.
-  N Aubert-Kato, C Fosseprez, G Gines, I Kawamata, H Dinh, L Cazenille, A Estevez-Tores, M Hagiya, Y Rondelez, and N Bredeche. Evolutionary optimization of self-assembly in a swarm of bio-micro-robots. In GECCO, pages 59–66. ACM, 2017.
-  S Camazine. Self-organization in biological systems. PUP, 2003.
-  L Cazenille. Qdpy: A python framework for quality-diversity. https://gitlab.com/leo.cazenille/qdpy, 2018.
-  L Cazenille, N Aubert-Kato, and N Bredeche. Using map-elites to optimize self-assembling behaviors in a swarm of bio-micro-robots. In SWARM, 2019.
-  MF Copeland and DB Weibel. Bacterial swarming: a model system for studying dynamic self-assembly. Soft matter, 5(6):1174–1187, 2009.
-  ID Couzin. Collective cognition in animal groups. Trends in cognitive sciences, 13(1):36–43, 2009.
-  A Cully, J Clune, D Tarapore, and JB Mouret. Robots that can adapt like animals. Nature, 521(7553):503, 2015.
-  HQ Dinh, N Aubert, N Noman, T Fujii, Y Rondelez, and H Iba. An effective method for evolving reaction networks in synthetic biochemical systems. Trans. Evol. Comp., 19(3):374–386, 2014.
-  M Duarte, J Gomes, SM Oliveira, and AL Christensen. Evolution of repertoire-based control for robots with complex locomotor systems. Trans. Evol. Comp., 22(2):314–328, 2018.
-  DT Gillespie. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. Journal of computational physics, 22(4):403–434, 1976.
-  G Gines, AS Zadorin, JC Galas, T Fujii, A Estevez-Torres, and Y Rondelez. Microscopic agents programmed by dna circuits. Nat. Nano., 12(4):351, 2017.
-  AE Goodenough, N Little, WS Carpenter, and AG Hart. Birds of a feather flock together: Insights into starling murmuration behaviour revealed using citizen science. PloS one, 12(6):e0179277, 2017.
-  R Groß, M Bonani, F Mondada, and M Dorigo. Autonomous self-assembly in swarm-bots. Trans. robot., 22(6):1115–1130, 2006.
The CMA evolution strategy: a comparing review.
Towards a new evolutionary computation, pages 75–102. Springer, 2006.
-  S Koos, JB Mouret, and S Doncieux. The transferability approach: Crossing the reality gap in evolutionary robotics. Trans. Evol. Comp, 17(1):122–145, February 2013.
-  K Montagne, R Plasson, Y Sakai, T Fujii, and Y Rondelez. Programming an in vitro dna oscillator using a molecular networking strategy. Molecular systems biology, 7(1):466, 2011.
-  JB Mouret and J Clune. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909, 2015.
-  R O’Grady, AL Christensen, and M Dorigo. Swarmorph: multirobot morphogenesis using directional self-assembly. Trans. Robot., 25(3):738–743, 2009.
-  A Padirac, T Fujii, A Estevez-Torres, and Y Rondelez. Spatial waves in synthetic biochemical networks. JACS, 135(39):14586–14592, 2013.
-  JK Pugh, LB Soros, and KO Stanley. Quality diversity: A new frontier for evolutionary computation. Frontiers in Robotics and AI, 3:40, 2016.
-  NV Queipo, RT Haftka, W Shyy, T Goel, R Vaidyanathan, and PK Tucker. Surrogate-based analysis and optimization. Progress in aerospace sciences, 41(1):1–28, 2005.
-  M Rubenstein, A Cornejo, and R Nagpal. Programmable self-assembly in a thousand-robot swarm. Science, 345(6198):795–799, 2014.
-  KO Stanley and R Miikkulainen. Evolving neural networks through augmenting topologies. Evolutionary computation, 10(2):99–127, 2002.
-  David JT Sumpter. The principles of collective animal behaviour. Proc. Royal Soc. B, 361(1465):5–22, 2006.
-  E Tuci, R Groß, V Trianni, F Mondada, M Bonani, and M Dorigo. Cooperation through self-assembly in multi-robot systems. TAAS, 1(2):115–150, 2006.
-  W Yahiro, N Aubert-Kato, and M Hagiya. A reservoir computing approach for molecular computing. In ALIFE. MIT Press, 2018.
-  AS Zadorin, Y Rondelez, G Gines, V Dilhas, G Urtel, A Zambrano, JC Galas, and A Estévez-Torres. Synthesis and materialization of a reaction–diffusion french flag pattern. Nat. Chem., 9(10):990, 2017.