Machine-learning techniques and advances in computational facilities have led to significant improvements in obtaining solutions to optimization problems, e.g., to problems in path planning and optimal transport, referred to in control systems as Zermelo’s navigation problem Zermelo . With vast amounts of data available from experiments and simulations in fluid dynamics, machine-learning techniques are being used to extract information that is useful to control and optimize flows ML_FM . Recent studies include the use of reinforcement learning, in fluid-flow settings, e.g., (a) to optimise the soaring of a glider in thermal currents Soaring and (b) the development of an optimal scheme in two- (2D) and three-dimensional (3D) fluid flows that are time independent PRL ; 3D ; Biferale2 . Optimal locomotion, in response to stimuli, is also important in biological systems ranging from cells and micro-organisms Dusenbery ; Durham ; Michalec to birds, animals, and fish Fish ; such locomotion is often termed taxis Barrows .
It behooves us, therefore, to explore machine-learning strategies for optimal path planning by microswimmers in turbulent fluid flows. We initiate such a study for microswimmers in 2D and 3D turbulent flows. In particular, we consider a dynamic-path-planning problem that seeks to minimize the average time taken by microswimmers to reach a given target, while moving in a turbulent fluid flow that is statistically homogeneous and isotropic. We develop a novel, multi-swimmer, adversarial--learning algorithm to optimise the motion of such microswimmers that try to swim towards a specified target (or targets). Our adversarial--learning approach ensures that the microswimmers perform at least as well as those that adopt the following naïve strategy: at any instant of time and at a given position in space, a naïve microswimmer tries to point in the direction of the target. We examine the efficacy of this approach as a function of the following two dimensionless control parameters: (a) , where the microswimmer’s bare velocity is and the the turbulent fluid has the root-mean-square velocity ; and (b) , where is the microswimmer-response time and the rms vorticity of the fluid. We show, by extensive direct numerical simulations (DNSs), that, in a substantial part of the plane, the average time , required by a microswimmer to reach a target at a fixed distance, is lower, if it uses our adversarial--learning scheme, than if it uses the naïve strategy.
Ii Background flow and microswimmer dynamics
For the low-Mach-number flows we consider, the fluid-flow velocity satisfies the incompressible Navier-Stokes (NS) equation. In two dimensions (2D), we write the NS equations in the conventional vorticity-stream-function form, which accounts for incompressibility in 2D RP_Review :
here, is the fluid velocity, is the kinematic viscosity, is the coefficient of friction (present in 2D, e.g., because of air drag or bottom friction) and the vorticity , which is normal to in 2D. The 3D incompressible NS equations are
is the pressure and the density of the incompressible fluid is taken to be ; the large-scale forcing (large-scale random forcing in 2D) or (constant energy injection in 3D) maintains the statistically steady, homogeneous, and isotropic turbulence, for which it is natural to use periodic boundary conditions.
We consider a collection of passive, non-interacting microswimmers in the turbulent flow; and are the position and swimming direction of the microswimmer. Each microswimmer is assigned a target located at . We are interested in minimizing the time required by a microswimmer, which is released at a distance from its target, to approach within a small distance of this target. The microswimmer’s position and swimming direction evolve as follows Pedley :
here, we use bi-linear (tri-linear) interpolation in 2D (3D) to determine the fluid velocityat the microswimmer’s position from eq. 2; is the swimming velocity, is the time-scale associated with the microswimmer to align with the flow, and is the control direction. Equation 4 implies that tries to align along . We define the following non-dimensional control parameters: , where is the root-mean-square () fluid flow velocity, and , where is the inverse of the vorticity.
Iii Adversarial -learning for smart microswimmers
Designing a strategy consists in choosing appropriately the control direction , as a function of the instantaneous state of the microswimmer, in order to minimize the mean arrival time . To develop a tractable framework for -learning, we use a finite number of states by discretizing the fluid vorticity at the microswimmer’s location into 3 ranges of values labelled by and the angle , between and , into 4 ranges , as shown in fig 1. The choice of is then reduced to a map from to an action set, , which we also discretize into the following four possible actions: , where
is the unit vector pointing from the swimmer to its target and. Therefore, for the naïve strategy , . This strategy is optimal if : Microswimmers have an almost ballistic dynamics and move swiftly to the target. For , vortices affect the microswimmers substantially, so we have to develop a nontrivial -learning strategy, in which is a function of and .
In our -learning scheme, we assign a quality value to each state-action binary relation of microswimmer as follows: , where and ; and the control direction is defined by . At each iteration, is calculated as above and the microswimmer evolution is performed by using eqs. 3 and 4. In the canonical -learning approach, during the learning process, each of the ’s are evolved by using the Bellman equation Sutton below, whenever there is a state change, i.e., :
where and are learning parameters that are set to optimal values after some numerical exploration (see tab. 2), and is the reward function. For the path-planning problem we define , where . According to eq. 5, any for which is positive can be a solution, and there exist many such solutions that are sub-optimal compared to the naïve strategy.
To reduce the solution space, we propose an adversarial scheme: Each microswimmer, the master, is accompanied by a slave microswimmer, with position , that shares the same target at , and follows the naïve strategy, i.e., . Now, whenever the master undergoes a state change, the corresponding slave’s position and direction are re-initialized to that of the master, i.e., if , then and (see fig. 2). Then the reward function for the master microswimmer is given by ; i.e., only those changes that improve on the naïve startegy are favored.
In the conventional -learning approach Watkins ; Survey , the matrices of each microswimmer evolve independently; this matrix is updated only after a state change, so a large number of iterations are required for the convergence of . To speed-up this learning process, we use the following multi-swimmer, parallel-learning scheme: all the microswimmers share a common matrix, i.e., . At each iteration, we choose one microswimmer at random, from the set of microswimmers that have undergone a state change, to update the corresponding element of the matrix (flow chart in Appendix A); this ensures that the matrix is updated at almost every iteration and so it converges rapidly.
Iv Numerical simulation
We use a pseudospectral DNS canuto ; pramanareview , with the dealiasing rule to solve eqs. 1 and 2. For time marching we use a third-order Runge-Kutta scheme in 2D and the exponential Adams-Bashforth time-integration scheme in 3D; the time step is chosen such that the Courant-Friedrichs-Lewy (CFL) condition is satisfied. Table 1 gives the parameters for our DNSs in 2D and 3D, such as the number of collocation points and the Taylor-microscale Reynolds numbers , where the Taylor microscale .
iv.1 Naïve microswimmers
The average time taken by the microswimmers to reach their targets is (see fig. 3). If is the unit vector pointing from the microswimmer to the target, then for we expect the naïve strategy, i.e., , to be the optimal one. For , we observe that the naïve strategy leads to the trapping of microswimmers (fig. 3(b)) and gives rise to exponential tails in the arrival-time (
) probability distribution function (PDF); in fig.4
we plot the associated complementary cumulative distribution function (CCDF). As a consequence of trapping, is dominated by the exponential tail of the distribution, as can be seen from fig. 4.
iv.2 Smart microswimmers
In our approach, the random initial positions of the microswimmers ensures that they explore different states without reinitialization for each epoch. Hence, we present results with 10000 microswimmers, for a single epoch. In our single-epoch approach, the control mapreaches a steady state once the learning process is complete (fig. 5(b)).
We use the adversarial -learning approach outlined above (parameter values in tab. 2) to arrive at the optimal scheme for path-planning in a 2D turbulent flow. To quantify the performance of the smart microswimmers, we introduce equal numbers of smart (master-slave pairs) and naïve microswimmers into the flow. The scheme presented here pits -learning against the naïve strategy and enables the adversarial algorithm to find a strategy that can out-perform the naïve one. (Without the adversarial approach, the final strategy that is obtained may end up being sub-optimal.)
To show the progress in our learning scheme, we calculate the moving average of arrival times, , which is given by the average that we calculate for microswimmers absorbed by the targets, between the times and , with the bin size. Figures 5(a), and 5(b) show the evolution of and , respectively, for the naïve strategy and our adversarial--learning scheme. After the initial learning phase, the -learning algorithm explores different , before it settles down to a steady state. It is not obvious, a priori, if there exists a stable, non-trivial, optimal strategy, for microswimmers in turbulent flows, that could out-perform the the naïve strategy. The plot in fig. 6 shows the improved performance of our adversarial--learning scheme over the naïve strategy, for different values of and ; in these plots we use , so that the initial transient behavior in learning is excluded. The inset in fig. 6 shows that has an exponential tail, just like the naïve scheme in fig. 4, which implies the smart microswimmers also get trapped; but a lower value of implies they are able to escape from the traps faster than microswimmers that employ the naïve strategy.
In a 3D turbulent flow, we also obtain such an improvement, with our adversarial -learning approach, over the naïve strategy. The details about the 3D flows, parameters, and the definitions of states and actions are given in Appendix B. In fig. 7 we show a representative plot, for the performance measure, which demonstrates this improvement in the 3D case (cf. fig. 5 for a 2D turbulent flow).
We have shown that the generic -learning approach can be adopted to solve control problems arising in complex dynamical systems. Global information of the flows has been used for path-planning problems in autonomous-underwater-vehicles navigation to improve their efficiency, based on the Hamilton-Jacobi-Bellmann approach HJB . In contrast, we present a scheme that uses only the local flow parameters for the path planning.
The flow parameters (tab. 1) and the learning parameters (tab. 2) have a significant impact on the performance of our adversarial--learning method. Even the choice of observables that we use to define the states
can be changed and experimented with. Furthermore, the discretization process can be eliminated by using deep-learning approaches, which can handle continuous inputs and outputsDeep_Q . Our formulation of the optimal-path-planning problem for microswimmers in a turbulent flow is a natural starting point for detailed studies of control problems in turbulent flows.
We thank DST and CSIR (India) and the Indo-French Centre for Applied Mathematics (IFCAM) for support.
- (1) Zermelo E (1931) Über das navigationsproblem bei ruhender oder veränderlicher windverteilung. ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik 11(2):114–124.
- (2) Brunton S, Noack B, Koumoutsakos P (2019) Machine Learning for Fluid Mechanics. arXiv e-prints p. arXiv:1905.11075.
- (3) Reddy G, Celani A, Sejnowski TJ, Vergassola M (2016) Learning to soar in turbulent environments. Proceedings of the National Academy of Sciences 113(33):E4877–E4884.
- (4) Colabrese S, Gustavsson K, Celani A, Biferale L (2017) Flow navigation by smart microswimmers via reinforcement learning. Physical review letters 118(15):158004.
- (5) Gustavsson K, Biferale L, Celani A, Colabrese S (2017) Finding efficient swimming strategies in a three-dimensional chaotic flow by reinforcement learning. The European Physical Journal E 40(12):110.
- (6) Biferale L, Bonaccorso F, Buzzicotti M, Clark Di Leoni P, Gustavsson K (2019) Zermelo’s problem: Optimal point-to-point navigation in 2D turbulent flows using Reinforcement Learning. arXiv e-prints p. arXiv:1907.08591.
- (7) Dusenbery D (2009) Living at Micro Scale: The Unexpected Physics of Being Small. (Harvard University Press).
- (8) Durham WM, et al. (2013) Turbulence drives microscale patches of motile phytoplankton. Nature communications 4:2148.
- (9) Michalec FG, Souissi S, Holzner M (2015) Turbulence triggers vigorous swimming but hinders motion strategy in planktonic copepods. Journal of the Royal Society Interface 12(106):20150158.
- (10) Verma S, Novati G, Koumoutsakos P (2018) Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proceedings of the National Academy of Sciences of the United States of America 115(23):5849—5854.
- (11) Barrows E (2011) Animal Behavior Desk Reference: A Dictionary of Animal Behavior, Ecology, and Evolution, Third Edition. (Taylor & Francis).
- (12) Pandit R, et al. (2017) An overview of the statistical properties of two-dimensional turbulence in fluids with particles, conducting fluids, fluids with polymer additives, binary-fluid mixtures, and superfluids. Physics of Fluids 29(11):111112.
- (13) Pedley TJ, Kessler JO (1992) Hydrodynamic phenomena in suspensions of swimming microorganisms. Annual Review of Fluid Mechanics 24(1):313–358.
- (14) Sutton RS, Barto AG (2011) Reinforcement learning: An introduction. (Cambridge, MA: MIT Press).
- (15) Watkins CJ, Dayan P (1992) Technical note: Q-learning. Machine Learning 8(3):279–292.
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: A survey.
Journal of artificial intelligence research4:237–285.
- (17) (2019). See Supplementary material.
- (18) Canuto C, Hussaini MY, Quarteroni A, Zang TA (2006) Spectral methods. (Springer).
- (19) Pandit R, Perlekar P, Ray SS (2009) Statistical properties of turbulence: an overview. Pramana 73(1):157.
- (20) Kularatne D, Bhattacharya S, Hsieh MA (2018) Optimal path planning in time-varying flows using adaptive discretization. IEEE Robotics and Automation Letters 3(1):458–465.
- (21) Lillicrap TP, et al. (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
Appendix A Flowchart
Figure 8 shows the sequence of processes involved in our adversarial--learning scheme. Here stands for the iteration number and is the number of sessions. We use a greedy action in which the action corresponding to the maximum value in the matrix, for the state of the microswimmer, is performed; -noise ensures, with probability , that the actions are scrambled. Furthermore, we find that episodic updating of the values on the matrix lead to a deterioration of performance; therefore, we use continuous updating of .
Appendix B State and action definitions for 3D turbulent flow
From our DNS of the 3D Navier-Stokes equation we obtain a statistically steady, homogeneous-isotropic turbulent flow in a periodic domain. We introduce passive microswimmers into this flow. To define the states, we fix a coordinate triad, defined by as shown in fig. 9; here, is the unit vector pointing from the microswimmer to the target, is the vorticity pseudo-vector, and is defined by the conditions and . This coordinate system is ill-defined if is parallel to . To implement our -learning in 3D, we define 13 states: (see fig. 10); and 6 actions, . Consequently, the matrix is an array of size .