Path-planning microswimmers can swim efficiently in turbulent flows

10/03/2019 ∙ by Jaya Kumar Alageshan, et al. ∙ indian institute of science MINES ParisTech 0

We develop an adversarial-reinforcement learning scheme for microswimmers in statistically homogeneous and isotropic turbulent fluid flows, in both two (2D) and three dimensions (3D). We show that this scheme allows microswimmers to find non-trivial paths, which enable them to reach a target on average in less time than a naïve microswimmer, which tries, at any instant of time and at a given position in space, to swim in the direction of the target. We use pseudospectral direct numerical simulations (DNSs) of the 2D and 3D (incompressible) Navier-Stokes equations to obtain the turbulent flows. We then introduce passive microswimmers that try to swim along a given direction in these flows; the microswimmwers do not affect the flow, but they are advected by it. Two, non-dimensional, control parameters play important roles in our learning scheme: (a) the ratio Ṽ_s of the microswimmer's bare velocity V_s and the root-mean-square (rms) velocity u_rms of the turbulent fluid; and (b) the product B̃ of the microswimmer-response time B and the rms vorticity ω_rms of the fluid. We show that, in a substantial part of the Ṽ_s-B̃ plane, the average time required for the microswimmers to reach the target, by using our adversarial-learning scheme, eventually reduces below the average time taken by microswimmers that follow the naïve strategy.



There are no comments yet.


page 2

page 3

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Machine-learning techniques and advances in computational facilities have led to significant improvements in obtaining solutions to optimization problems, e.g., to problems in path planning and optimal transport, referred to in control systems as Zermelo’s navigation problem Zermelo . With vast amounts of data available from experiments and simulations in fluid dynamics, machine-learning techniques are being used to extract information that is useful to control and optimize flows ML_FM . Recent studies include the use of reinforcement learning, in fluid-flow settings, e.g., (a) to optimise the soaring of a glider in thermal currents Soaring and (b) the development of an optimal scheme in two- (2D) and three-dimensional (3D) fluid flows that are time independent PRL ; 3D ; Biferale2 . Optimal locomotion, in response to stimuli, is also important in biological systems ranging from cells and micro-organisms Dusenbery ; Durham ; Michalec to birds, animals, and fish Fish ; such locomotion is often termed taxis Barrows .

It behooves us, therefore, to explore machine-learning strategies for optimal path planning by microswimmers in turbulent fluid flows. We initiate such a study for microswimmers in 2D and 3D turbulent flows. In particular, we consider a dynamic-path-planning problem that seeks to minimize the average time taken by microswimmers to reach a given target, while moving in a turbulent fluid flow that is statistically homogeneous and isotropic. We develop a novel, multi-swimmer, adversarial--learning algorithm to optimise the motion of such microswimmers that try to swim towards a specified target (or targets). Our adversarial--learning approach ensures that the microswimmers perform at least as well as those that adopt the following naïve strategy: at any instant of time and at a given position in space, a naïve microswimmer tries to point in the direction of the target. We examine the efficacy of this approach as a function of the following two dimensionless control parameters: (a) , where the microswimmer’s bare velocity is and the the turbulent fluid has the root-mean-square velocity ; and (b) , where is the microswimmer-response time and the rms vorticity of the fluid. We show, by extensive direct numerical simulations (DNSs), that, in a substantial part of the plane, the average time , required by a microswimmer to reach a target at a fixed distance, is lower, if it uses our adversarial--learning scheme, than if it uses the naïve strategy.

Ii Background flow and microswimmer dynamics

For the low-Mach-number flows we consider, the fluid-flow velocity satisfies the incompressible Navier-Stokes (NS) equation. In two dimensions (2D), we write the NS equations in the conventional vorticity-stream-function form, which accounts for incompressibility in 2D RP_Review :


here, is the fluid velocity, is the kinematic viscosity, is the coefficient of friction (present in 2D, e.g., because of air drag or bottom friction) and the vorticity , which is normal to in 2D. The 3D incompressible NS equations are


is the pressure and the density of the incompressible fluid is taken to be ; the large-scale forcing (large-scale random forcing in 2D) or (constant energy injection in 3D) maintains the statistically steady, homogeneous, and isotropic turbulence, for which it is natural to use periodic boundary conditions.

We consider a collection of passive, non-interacting microswimmers in the turbulent flow; and are the position and swimming direction of the microswimmer. Each microswimmer is assigned a target located at . We are interested in minimizing the time required by a microswimmer, which is released at a distance from its target, to approach within a small distance of this target. The microswimmer’s position and swimming direction evolve as follows Pedley :


here, we use bi-linear (tri-linear) interpolation in 2D (3D) to determine the fluid velocity

at the microswimmer’s position from eq. 2; is the swimming velocity, is the time-scale associated with the microswimmer to align with the flow, and is the control direction. Equation 4 implies that tries to align along . We define the following non-dimensional control parameters: , where is the root-mean-square () fluid flow velocity, and , where is the inverse of the vorticity.

Iii Adversarial -learning for smart microswimmers

Designing a strategy consists in choosing appropriately the control direction , as a function of the instantaneous state of the microswimmer, in order to minimize the mean arrival time . To develop a tractable framework for -learning, we use a finite number of states by discretizing the fluid vorticity at the microswimmer’s location into 3 ranges of values labelled by and the angle , between and , into 4 ranges , as shown in fig 1. The choice of is then reduced to a map from to an action set, , which we also discretize into the following four possible actions: , where

is the unit vector pointing from the swimmer to its target and

. Therefore, for the naïve strategy , . This strategy is optimal if : Microswimmers have an almost ballistic dynamics and move swiftly to the target. For , vortices affect the microswimmers substantially, so we have to develop a nontrivial -learning strategy, in which is a function of and .

Figure 1: Left panel: a pseudocolor plot of the vorticity field, with a microswimmer represented by a small white circle; the black arrow on the microswimmer indicates its swimming direction, , the red arrow represents the direction towards the target, , and is the angle between and . Top-center panel shows the discretized vorticity states (red: , green: , blue: ). The bottom-center panel indicates the color code for the discretized (red: ; orange: ; blue: ; gray: ). The right panel lists all possible discrete states of the microswimmers, via colored squares where the lower half stands for the vorticity state, , and the upper half represents the direction state, .

In our -learning scheme, we assign a quality value to each state-action binary relation of microswimmer as follows: , where and ; and the control direction is defined by . At each iteration, is calculated as above and the microswimmer evolution is performed by using eqs. 3 and 4. In the canonical -learning approach, during the learning process, each of the ’s are evolved by using the Bellman equation Sutton below, whenever there is a state change, i.e., :


where and are learning parameters that are set to optimal values after some numerical exploration (see tab. 2), and is the reward function. For the path-planning problem we define , where . According to eq. 5, any for which is positive can be a solution, and there exist many such solutions that are sub-optimal compared to the naïve strategy.

To reduce the solution space, we propose an adversarial scheme: Each microswimmer, the master, is accompanied by a slave microswimmer, with position , that shares the same target at , and follows the naïve strategy, i.e., . Now, whenever the master undergoes a state change, the corresponding slave’s position and direction are re-initialized to that of the master, i.e., if , then and (see fig. 2). Then the reward function for the master microswimmer is given by ; i.e., only those changes that improve on the naïve startegy are favored.

In the conventional -learning approach Watkins ; Survey , the matrices of each microswimmer evolve independently; this matrix is updated only after a state change, so a large number of iterations are required for the convergence of . To speed-up this learning process, we use the following multi-swimmer, parallel-learning scheme: all the microswimmers share a common matrix, i.e., . At each iteration, we choose one microswimmer at random, from the set of microswimmers that have undergone a state change, to update the corresponding element of the matrix (flow chart in Appendix A); this ensures that the matrix is updated at almost every iteration and so it converges rapidly.

Figure 2: Top-left panel: a schematic diagram illustrating the trajectories of master (black line) and slave (dashed black line) microswimmers superimposed on a pseudocolor plot of the two-dimensional (2D) discrete vorticity field ; the master undergoes a state change at the points shown by white filled circles; white arrows indicate the re-setting of the slave’s trajectory. Top-right panel: color code for the control direction ; for the states see Fig. 1. Bottom panel: control maps for the master and slave; for the purpose of illustration, we use , for the master; for and , this leads to the circular path shown in our schematic diagram.

Iv Numerical simulation

We use a pseudospectral DNS canuto ; pramanareview , with the dealiasing rule to solve eqs. 1 and 2. For time marching we use a third-order Runge-Kutta scheme in 2D and the exponential Adams-Bashforth time-integration scheme in 3D; the time step is chosen such that the Courant-Friedrichs-Lewy (CFL) condition is satisfied. Table 1 gives the parameters for our DNSs in 2D and 3D, such as the number of collocation points and the Taylor-microscale Reynolds numbers , where the Taylor microscale .

2D 3D
Table 1: Parameters: , the number of collocation points; the kinematic viscosity; the coefficient of friction; the time step; and the Taylor-microscale Reynolds number.

iv.1 Naïve microswimmers

The average time taken by the microswimmers to reach their targets is (see fig. 3). If is the unit vector pointing from the microswimmer to the target, then for we expect the naïve strategy, i.e., , to be the optimal one. For , we observe that the naïve strategy leads to the trapping of microswimmers (fig. 3(b)) and gives rise to exponential tails in the arrival-time (

) probability distribution function (PDF); in fig. 


we plot the associated complementary cumulative distribution function (CCDF)

. As a consequence of trapping, is dominated by the exponential tail of the distribution, as can be seen from fig. 4.

Figure 3: (a) Illustrative (blue) paths for two microswimmers, with their corresponding (yellow) circular target regions (mapping in red dashed lines) where the microswimmer is eventually absorbed and re-initialized. We consider random positions of targets and initialize a microswimmer at a fixed distance from its corresponding target with randomized ; (b) a snapshot of the microswimmer distribution, in a vorticity field (), for the naïve strategy, at time , with . Here, the initial distance of the microswimmers from their respective targets is and the target radius is ; we use a system size with periodic boundary conditions in all directions.
Figure 4: Plots showing exponential tails in for the naïve strategy, with different values of and . The inset shows how these data collapse when, is normalized, for each curve, by the corresponding , which implies .

iv.2 Smart microswimmers

In our approach, the random initial positions of the microswimmers ensures that they explore different states without reinitialization for each epoch. Hence, we present results with 10000 microswimmers, for a single epoch. In our single-epoch approach, the control map

reaches a steady state once the learning process is complete (fig. 5(b)).

We use the adversarial -learning approach outlined above (parameter values in tab. 2) to arrive at the optimal scheme for path-planning in a 2D turbulent flow. To quantify the performance of the smart microswimmers, we introduce equal numbers of smart (master-slave pairs) and naïve microswimmers into the flow. The scheme presented here pits -learning against the naïve strategy and enables the adversarial algorithm to find a strategy that can out-perform the naïve one. (Without the adversarial approach, the final strategy that is obtained may end up being sub-optimal.)

Table 2: List of learning parameter values: is the earning discount, is the learning rate, is the probability of noise in learning, i.e., with probability the actions are scrambled, is the cut-off used for defining , and is the rms value of .
Figure 5: Learning statistics: (a) Plot of , with , in 2D. Adversarial -learning initially shows a transient behavior, before settling to a lower value of than that in the naïve strategy. (b) The evolution of the control map, , where the color codes represent the actions that are performed for each of the 12 states. Initially, -learning explores different strategies and settles down to a that shows, consistently, improved performance relative to the naïve strategy.
Figure 6: The dependence of on , for different values of , shown for the naïve strategy (dotted line) and for adversarial -learning (solid line), for our 2D turbulent flow. The plot shows that, in the parameter space that we have explored, our adversarial--learning method yields a lower value than in the naïve strategy. The plot in the inset shows that the CPDF of has an exponential tail.

V Results

To show the progress in our learning scheme, we calculate the moving average of arrival times, , which is given by the average that we calculate for microswimmers absorbed by the targets, between the times and , with the bin size. Figures 5(a), and 5(b) show the evolution of and , respectively, for the naïve strategy and our adversarial--learning scheme. After the initial learning phase, the -learning algorithm explores different , before it settles down to a steady state. It is not obvious, a priori, if there exists a stable, non-trivial, optimal strategy, for microswimmers in turbulent flows, that could out-perform the the naïve strategy. The plot in fig. 6 shows the improved performance of our adversarial--learning scheme over the naïve strategy, for different values of and ; in these plots we use , so that the initial transient behavior in learning is excluded. The inset in fig. 6 shows that has an exponential tail, just like the naïve scheme in fig. 4, which implies the smart microswimmers also get trapped; but a lower value of implies they are able to escape from the traps faster than microswimmers that employ the naïve strategy.

In a 3D turbulent flow, we also obtain such an improvement, with our adversarial -learning approach, over the naïve strategy. The details about the 3D flows, parameters, and the definitions of states and actions are given in Appendix B. In fig. 7 we show a representative plot, for the performance measure, which demonstrates this improvement in the 3D case (cf. fig. 5 for a 2D turbulent flow).

Figure 7: The performance trend, , with for adversarial -learning ( blue) and naïve strategy ( red) for microswimmers in a 3D homogeneous isotropic turbulent flow, for and . The trend shows a slow rise in performance, similar to that observed in 2D. In 3D the -learning is performed by using 13 states and 6 actions defined in Appendix B.

Vi Conclusions

We have shown that the generic -learning approach can be adopted to solve control problems arising in complex dynamical systems. Global information of the flows has been used for path-planning problems in autonomous-underwater-vehicles navigation to improve their efficiency, based on the Hamilton-Jacobi-Bellmann approach HJB . In contrast, we present a scheme that uses only the local flow parameters for the path planning.

The flow parameters (tab. 1) and the learning parameters (tab. 2) have a significant impact on the performance of our adversarial--learning method. Even the choice of observables that we use to define the states

can be changed and experimented with. Furthermore, the discretization process can be eliminated by using deep-learning approaches, which can handle continuous inputs and outputs 

Deep_Q . Our formulation of the optimal-path-planning problem for microswimmers in a turbulent flow is a natural starting point for detailed studies of control problems in turbulent flows.

We thank DST and CSIR (India) and the Indo-French Centre for Applied Mathematics (IFCAM) for support.


  • (1) Zermelo E (1931) Über das navigationsproblem bei ruhender oder veränderlicher windverteilung. ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik 11(2):114–124.
  • (2) Brunton S, Noack B, Koumoutsakos P (2019) Machine Learning for Fluid Mechanics. arXiv e-prints p. arXiv:1905.11075.
  • (3) Reddy G, Celani A, Sejnowski TJ, Vergassola M (2016) Learning to soar in turbulent environments. Proceedings of the National Academy of Sciences 113(33):E4877–E4884.
  • (4) Colabrese S, Gustavsson K, Celani A, Biferale L (2017) Flow navigation by smart microswimmers via reinforcement learning. Physical review letters 118(15):158004.
  • (5) Gustavsson K, Biferale L, Celani A, Colabrese S (2017) Finding efficient swimming strategies in a three-dimensional chaotic flow by reinforcement learning. The European Physical Journal E 40(12):110.
  • (6) Biferale L, Bonaccorso F, Buzzicotti M, Clark Di Leoni P, Gustavsson K (2019) Zermelo’s problem: Optimal point-to-point navigation in 2D turbulent flows using Reinforcement Learning. arXiv e-prints p. arXiv:1907.08591.
  • (7) Dusenbery D (2009) Living at Micro Scale: The Unexpected Physics of Being Small. (Harvard University Press).
  • (8) Durham WM, et al. (2013) Turbulence drives microscale patches of motile phytoplankton. Nature communications 4:2148.
  • (9) Michalec FG, Souissi S, Holzner M (2015) Turbulence triggers vigorous swimming but hinders motion strategy in planktonic copepods. Journal of the Royal Society Interface 12(106):20150158.
  • (10) Verma S, Novati G, Koumoutsakos P (2018) Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proceedings of the National Academy of Sciences of the United States of America 115(23):5849—5854.
  • (11) Barrows E (2011) Animal Behavior Desk Reference: A Dictionary of Animal Behavior, Ecology, and Evolution, Third Edition. (Taylor & Francis).
  • (12) Pandit R, et al. (2017) An overview of the statistical properties of two-dimensional turbulence in fluids with particles, conducting fluids, fluids with polymer additives, binary-fluid mixtures, and superfluids. Physics of Fluids 29(11):111112.
  • (13) Pedley TJ, Kessler JO (1992) Hydrodynamic phenomena in suspensions of swimming microorganisms. Annual Review of Fluid Mechanics 24(1):313–358.
  • (14) Sutton RS, Barto AG (2011) Reinforcement learning: An introduction. (Cambridge, MA: MIT Press).
  • (15) Watkins CJ, Dayan P (1992) Technical note: Q-learning. Machine Learning 8(3):279–292.
  • (16) Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: A survey.

    Journal of artificial intelligence research

  • (17) (2019). See Supplementary material.
  • (18) Canuto C, Hussaini MY, Quarteroni A, Zang TA (2006) Spectral methods. (Springer).
  • (19) Pandit R, Perlekar P, Ray SS (2009) Statistical properties of turbulence: an overview. Pramana 73(1):157.
  • (20) Kularatne D, Bhattacharya S, Hsieh MA (2018) Optimal path planning in time-varying flows using adaptive discretization. IEEE Robotics and Automation Letters 3(1):458–465.
  • (21) Lillicrap TP, et al. (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

Appendix A Flowchart

Figure 8 shows the sequence of processes involved in our adversarial--learning scheme. Here stands for the iteration number and is the number of sessions. We use a greedy action in which the action corresponding to the maximum value in the matrix, for the state of the microswimmer, is performed; -noise ensures, with probability , that the actions are scrambled. Furthermore, we find that episodic updating of the values on the matrix lead to a deterioration of performance; therefore, we use continuous updating of .

Figure 8: This flow chart shows the sequence of processes involved in our adversarial -learning algorithm.

Appendix B State and action definitions for 3D turbulent flow

From our DNS of the 3D Navier-Stokes equation we obtain a statistically steady, homogeneous-isotropic turbulent flow in a periodic domain. We introduce passive microswimmers into this flow. To define the states, we fix a coordinate triad, defined by as shown in fig. 9; here, is the unit vector pointing from the microswimmer to the target, is the vorticity pseudo-vector, and is defined by the conditions and . This coordinate system is ill-defined if is parallel to . To implement our -learning in 3D, we define 13 states: (see fig. 10); and 6 actions, . Consequently, the matrix is an array of size .

Figure 9:

We define a Cartesian coordinate system by using the ortho-normal triad

; thus, all the vectorial quantities are represented in terms of this observer-independent coordinate system.
Figure 10: Discretization of states in 3D: We define a spherical-polar coordinate system for each particle with the axis pointing along the direction and the axis along . We define the canonical angles and , and discretize the states into 13, based on the magnitude of , where and are state-definition parameters (we use and ), and the direction of , with respect to the triad, is defined in fig. 9.