I Introduction
Machinelearning techniques and advances in computational facilities have led to significant improvements in obtaining solutions to optimization problems, e.g., to problems in path planning and optimal transport, referred to in control systems as Zermelo’s navigation problem Zermelo . With vast amounts of data available from experiments and simulations in fluid dynamics, machinelearning techniques are being used to extract information that is useful to control and optimize flows ML_FM . Recent studies include the use of reinforcement learning, in fluidflow settings, e.g., (a) to optimise the soaring of a glider in thermal currents Soaring and (b) the development of an optimal scheme in two (2D) and threedimensional (3D) fluid flows that are time independent PRL ; 3D ; Biferale2 . Optimal locomotion, in response to stimuli, is also important in biological systems ranging from cells and microorganisms Dusenbery ; Durham ; Michalec to birds, animals, and fish Fish ; such locomotion is often termed taxis Barrows .
It behooves us, therefore, to explore machinelearning strategies for optimal path planning by microswimmers in turbulent fluid flows. We initiate such a study for microswimmers in 2D and 3D turbulent flows. In particular, we consider a dynamicpathplanning problem that seeks to minimize the average time taken by microswimmers to reach a given target, while moving in a turbulent fluid flow that is statistically homogeneous and isotropic. We develop a novel, multiswimmer, adversariallearning algorithm to optimise the motion of such microswimmers that try to swim towards a specified target (or targets). Our adversariallearning approach ensures that the microswimmers perform at least as well as those that adopt the following naïve strategy: at any instant of time and at a given position in space, a naïve microswimmer tries to point in the direction of the target. We examine the efficacy of this approach as a function of the following two dimensionless control parameters: (a) , where the microswimmer’s bare velocity is and the the turbulent fluid has the rootmeansquare velocity ; and (b) , where is the microswimmerresponse time and the rms vorticity of the fluid. We show, by extensive direct numerical simulations (DNSs), that, in a substantial part of the plane, the average time , required by a microswimmer to reach a target at a fixed distance, is lower, if it uses our adversariallearning scheme, than if it uses the naïve strategy.
Ii Background flow and microswimmer dynamics
For the lowMachnumber flows we consider, the fluidflow velocity satisfies the incompressible NavierStokes (NS) equation. In two dimensions (2D), we write the NS equations in the conventional vorticitystreamfunction form, which accounts for incompressibility in 2D RP_Review :
(1) 
here, is the fluid velocity, is the kinematic viscosity, is the coefficient of friction (present in 2D, e.g., because of air drag or bottom friction) and the vorticity , which is normal to in 2D. The 3D incompressible NS equations are
(2) 
is the pressure and the density of the incompressible fluid is taken to be ; the largescale forcing (largescale random forcing in 2D) or (constant energy injection in 3D) maintains the statistically steady, homogeneous, and isotropic turbulence, for which it is natural to use periodic boundary conditions.
We consider a collection of passive, noninteracting microswimmers in the turbulent flow; and are the position and swimming direction of the microswimmer. Each microswimmer is assigned a target located at . We are interested in minimizing the time required by a microswimmer, which is released at a distance from its target, to approach within a small distance of this target. The microswimmer’s position and swimming direction evolve as follows Pedley :
(3)  
(4) 
here, we use bilinear (trilinear) interpolation in 2D (3D) to determine the fluid velocity
at the microswimmer’s position from eq. 2; is the swimming velocity, is the timescale associated with the microswimmer to align with the flow, and is the control direction. Equation 4 implies that tries to align along . We define the following nondimensional control parameters: , where is the rootmeansquare () fluid flow velocity, and , where is the inverse of the vorticity.Iii Adversarial learning for smart microswimmers
Designing a strategy consists in choosing appropriately the control direction , as a function of the instantaneous state of the microswimmer, in order to minimize the mean arrival time . To develop a tractable framework for learning, we use a finite number of states by discretizing the fluid vorticity at the microswimmer’s location into 3 ranges of values labelled by and the angle , between and , into 4 ranges , as shown in fig 1. The choice of is then reduced to a map from to an action set, , which we also discretize into the following four possible actions: , where
is the unit vector pointing from the swimmer to its target and
. Therefore, for the naïve strategy , . This strategy is optimal if : Microswimmers have an almost ballistic dynamics and move swiftly to the target. For , vortices affect the microswimmers substantially, so we have to develop a nontrivial learning strategy, in which is a function of and .In our learning scheme, we assign a quality value to each stateaction binary relation of microswimmer as follows: , where and ; and the control direction is defined by . At each iteration, is calculated as above and the microswimmer evolution is performed by using eqs. 3 and 4. In the canonical learning approach, during the learning process, each of the ’s are evolved by using the Bellman equation Sutton below, whenever there is a state change, i.e., :
(5) 
where and are learning parameters that are set to optimal values after some numerical exploration (see tab. 2), and is the reward function. For the pathplanning problem we define , where . According to eq. 5, any for which is positive can be a solution, and there exist many such solutions that are suboptimal compared to the naïve strategy.
To reduce the solution space, we propose an adversarial scheme: Each microswimmer, the master, is accompanied by a slave microswimmer, with position , that shares the same target at , and follows the naïve strategy, i.e., . Now, whenever the master undergoes a state change, the corresponding slave’s position and direction are reinitialized to that of the master, i.e., if , then and (see fig. 2). Then the reward function for the master microswimmer is given by ; i.e., only those changes that improve on the naïve startegy are favored.
In the conventional learning approach Watkins ; Survey , the matrices of each microswimmer evolve independently; this matrix is updated only after a state change, so a large number of iterations are required for the convergence of . To speedup this learning process, we use the following multiswimmer, parallellearning scheme: all the microswimmers share a common matrix, i.e., . At each iteration, we choose one microswimmer at random, from the set of microswimmers that have undergone a state change, to update the corresponding element of the matrix (flow chart in Appendix A); this ensures that the matrix is updated at almost every iteration and so it converges rapidly.
Iv Numerical simulation
We use a pseudospectral DNS canuto ; pramanareview , with the dealiasing rule to solve eqs. 1 and 2. For time marching we use a thirdorder RungeKutta scheme in 2D and the exponential AdamsBashforth timeintegration scheme in 3D; the time step is chosen such that the CourantFriedrichsLewy (CFL) condition is satisfied. Table 1 gives the parameters for our DNSs in 2D and 3D, such as the number of collocation points and the Taylormicroscale Reynolds numbers , where the Taylor microscale .
2D  3D  

iv.1 Naïve microswimmers
The average time taken by the microswimmers to reach their targets is (see fig. 3). If is the unit vector pointing from the microswimmer to the target, then for we expect the naïve strategy, i.e., , to be the optimal one. For , we observe that the naïve strategy leads to the trapping of microswimmers (fig. 3(b)) and gives rise to exponential tails in the arrivaltime (
) probability distribution function (PDF); in fig.
4we plot the associated complementary cumulative distribution function (CCDF)
. As a consequence of trapping, is dominated by the exponential tail of the distribution, as can be seen from fig. 4.iv.2 Smart microswimmers
In our approach, the random initial positions of the microswimmers ensures that they explore different states without reinitialization for each epoch. Hence, we present results with 10000 microswimmers, for a single epoch. In our singleepoch approach, the control map
reaches a steady state once the learning process is complete (fig. 5(b)).We use the adversarial learning approach outlined above (parameter values in tab. 2) to arrive at the optimal scheme for pathplanning in a 2D turbulent flow. To quantify the performance of the smart microswimmers, we introduce equal numbers of smart (masterslave pairs) and naïve microswimmers into the flow. The scheme presented here pits learning against the naïve strategy and enables the adversarial algorithm to find a strategy that can outperform the naïve one. (Without the adversarial approach, the final strategy that is obtained may end up being suboptimal.)
V Results
To show the progress in our learning scheme, we calculate the moving average of arrival times, , which is given by the average that we calculate for microswimmers absorbed by the targets, between the times and , with the bin size. Figures 5(a), and 5(b) show the evolution of and , respectively, for the naïve strategy and our adversariallearning scheme. After the initial learning phase, the learning algorithm explores different , before it settles down to a steady state. It is not obvious, a priori, if there exists a stable, nontrivial, optimal strategy, for microswimmers in turbulent flows, that could outperform the the naïve strategy. The plot in fig. 6 shows the improved performance of our adversariallearning scheme over the naïve strategy, for different values of and ; in these plots we use , so that the initial transient behavior in learning is excluded. The inset in fig. 6 shows that has an exponential tail, just like the naïve scheme in fig. 4, which implies the smart microswimmers also get trapped; but a lower value of implies they are able to escape from the traps faster than microswimmers that employ the naïve strategy.
In a 3D turbulent flow, we also obtain such an improvement, with our adversarial learning approach, over the naïve strategy. The details about the 3D flows, parameters, and the definitions of states and actions are given in Appendix B. In fig. 7 we show a representative plot, for the performance measure, which demonstrates this improvement in the 3D case (cf. fig. 5 for a 2D turbulent flow).
Vi Conclusions
We have shown that the generic learning approach can be adopted to solve control problems arising in complex dynamical systems. Global information of the flows has been used for pathplanning problems in autonomousunderwatervehicles navigation to improve their efficiency, based on the HamiltonJacobiBellmann approach HJB . In contrast, we present a scheme that uses only the local flow parameters for the path planning.
The flow parameters (tab. 1) and the learning parameters (tab. 2) have a significant impact on the performance of our adversariallearning method. Even the choice of observables that we use to define the states
can be changed and experimented with. Furthermore, the discretization process can be eliminated by using deeplearning approaches, which can handle continuous inputs and outputs
Deep_Q . Our formulation of the optimalpathplanning problem for microswimmers in a turbulent flow is a natural starting point for detailed studies of control problems in turbulent flows.We thank DST and CSIR (India) and the IndoFrench Centre for Applied Mathematics (IFCAM) for support.
References
 (1) Zermelo E (1931) Über das navigationsproblem bei ruhender oder veränderlicher windverteilung. ZAMMJournal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik 11(2):114–124.
 (2) Brunton S, Noack B, Koumoutsakos P (2019) Machine Learning for Fluid Mechanics. arXiv eprints p. arXiv:1905.11075.
 (3) Reddy G, Celani A, Sejnowski TJ, Vergassola M (2016) Learning to soar in turbulent environments. Proceedings of the National Academy of Sciences 113(33):E4877–E4884.
 (4) Colabrese S, Gustavsson K, Celani A, Biferale L (2017) Flow navigation by smart microswimmers via reinforcement learning. Physical review letters 118(15):158004.
 (5) Gustavsson K, Biferale L, Celani A, Colabrese S (2017) Finding efficient swimming strategies in a threedimensional chaotic flow by reinforcement learning. The European Physical Journal E 40(12):110.
 (6) Biferale L, Bonaccorso F, Buzzicotti M, Clark Di Leoni P, Gustavsson K (2019) Zermelo’s problem: Optimal pointtopoint navigation in 2D turbulent flows using Reinforcement Learning. arXiv eprints p. arXiv:1907.08591.
 (7) Dusenbery D (2009) Living at Micro Scale: The Unexpected Physics of Being Small. (Harvard University Press).
 (8) Durham WM, et al. (2013) Turbulence drives microscale patches of motile phytoplankton. Nature communications 4:2148.
 (9) Michalec FG, Souissi S, Holzner M (2015) Turbulence triggers vigorous swimming but hinders motion strategy in planktonic copepods. Journal of the Royal Society Interface 12(106):20150158.
 (10) Verma S, Novati G, Koumoutsakos P (2018) Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proceedings of the National Academy of Sciences of the United States of America 115(23):5849—5854.
 (11) Barrows E (2011) Animal Behavior Desk Reference: A Dictionary of Animal Behavior, Ecology, and Evolution, Third Edition. (Taylor & Francis).
 (12) Pandit R, et al. (2017) An overview of the statistical properties of twodimensional turbulence in fluids with particles, conducting fluids, fluids with polymer additives, binaryfluid mixtures, and superfluids. Physics of Fluids 29(11):111112.
 (13) Pedley TJ, Kessler JO (1992) Hydrodynamic phenomena in suspensions of swimming microorganisms. Annual Review of Fluid Mechanics 24(1):313–358.
 (14) Sutton RS, Barto AG (2011) Reinforcement learning: An introduction. (Cambridge, MA: MIT Press).
 (15) Watkins CJ, Dayan P (1992) Technical note: Qlearning. Machine Learning 8(3):279–292.

(16)
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: A survey.
Journal of artificial intelligence research
4:237–285.  (17) (2019). See Supplementary material.
 (18) Canuto C, Hussaini MY, Quarteroni A, Zang TA (2006) Spectral methods. (Springer).
 (19) Pandit R, Perlekar P, Ray SS (2009) Statistical properties of turbulence: an overview. Pramana 73(1):157.
 (20) Kularatne D, Bhattacharya S, Hsieh MA (2018) Optimal path planning in timevarying flows using adaptive discretization. IEEE Robotics and Automation Letters 3(1):458–465.
 (21) Lillicrap TP, et al. (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
Appendix A Flowchart
Figure 8 shows the sequence of processes involved in our adversariallearning scheme. Here stands for the iteration number and is the number of sessions. We use a greedy action in which the action corresponding to the maximum value in the matrix, for the state of the microswimmer, is performed; noise ensures, with probability , that the actions are scrambled. Furthermore, we find that episodic updating of the values on the matrix lead to a deterioration of performance; therefore, we use continuous updating of .
Appendix B State and action definitions for 3D turbulent flow
From our DNS of the 3D NavierStokes equation we obtain a statistically steady, homogeneousisotropic turbulent flow in a periodic domain. We introduce passive microswimmers into this flow. To define the states, we fix a coordinate triad, defined by as shown in fig. 9; here, is the unit vector pointing from the microswimmer to the target, is the vorticity pseudovector, and is defined by the conditions and . This coordinate system is illdefined if is parallel to . To implement our learning in 3D, we define 13 states: (see fig. 10); and 6 actions, . Consequently, the matrix is an array of size .
Comments
There are no comments yet.