I Introduction
Extracting information from the observation of a natural system is a fundamental problem in science. Except in the few cases where laboratory or numerical experiments are performed, inferring physical parameters is never easy brunton2016discovering; bocquet2019data. Measurements are always limited and our theoretical knowledge sometimes is inadequate to extract from them a correct description of the system. A significant example are turbulent flows frisch1995turbulence; Pope00; davidson2011voyage; duraisamy2019turbulence. Here because of chaos, any error in the observations increases exponentially in time, and quickly compromises our possibility to understand, predict, hence model turbulent behaviour. Consequences of such limitations are clear in fields like meteorology, oceanography and climate modeling vallis2017atmospheric; pedlosky1987geophysical; akyildiz2002wireless; kalnay2003atmospheric; buzzicotti_ocean_2021; carrassi2008data. In this work we aim to analyse turbulent flows in a different way. In particular, we use tools of Machine Learning (ML) developed in the field of computer vision such as Deep Convolutional Neural Network (DCNN) krizhevsky2012imagenet; he2016deep; ronneberger2015u; redmon2016you, to bring new perspectives in the data assimilation and analysis of complex physical systems goodfellow2016deep; brunton2019data; brunton2020machine; biferale2019zermelo; buzzicotti2021reconstruction; brajard2019combining; borra2021using; corbetta2021deep. The setup we consider is 3d turbulence under rotation alexakis2018cascades
. There are multiple reasons for this choice, from the fundamental interest given by the high challenge to classify multiscale nonGaussian fields to the specific relevance for geophysics and many applied flows realizations. Rotating turbulence is described by the NavierStokes equations (NSE) equipped with the Coriolis force,
(1) 
here is the kinematic viscosity, is the Coriolis force produced by rotation, and is the frequency of the angular velocity around the rotation axis . The fluid density is constant and absorbed into the definition of pressure . , represents an external forcing mechanism. The relative importance of nonlinear (advection) with respect to linear terms (viscous dissipation) is given by the Reynolds number, ,
(2) 
where is the rate of energy input at the forcing scale . The larger is the more turbulence is developed in the dynamics.
Rotation, at the same time, introduces a second nondimensional control parameter in the equations, the Rossby number, , which represents the ratio between rotation frequency and the flow characteristic time scale at the forcing wavenumber;
(3) 
While it is well known that increasing leads the transition to turbulence, varying leads to a second and more subtle transition. Indeed, in the limit of large or small , standard 3d homogeneous and isotropic turbulent dynamics is observed, on the other hand, increasing , , the flow becomes almost bidimensional deusebio2014dimensional; marino2013inverse; marino2014large; buzzicotti2018energy; di2020phase, moving with constant velocity along the direction parallel to the rotation axis, i.e. . Simultaneously, at high rotation rates, the flow develops coherent/largescales vortex like structures that live on the horizontal plane, , as can be seen in the two renderings of Fig.1. These structures are fed by an inverse energy cascade that moves energy from where it is injected towards the larger scales available in the domain smith1999transfer. In other words, the transition induced by rotation consists in a change of the energy flux direction, from towards the small scales (direct cascade) to a split cascade regime with energy going simultaneously towards small and large scales seshasayanan2018condensates; van2020critical.
In Fig. 2 we present a visualization of the velocity module on planes extracted from 3d simulations varying the reference rotation rate . As expected, for small values of the flow is statistically isotropic and homogeneous, while increasing
large vortical structures appear in the dynamics. The nature of this transition, from direct to inverse cascade, is still not understood and also the transition threshold it is not known. Indeed, the are several reasons that complicate this problem, the high number of degrees of freedom and the length of the range of scales coupled in the dynamics are the main ones. An extra complication comes from the subtle way in which rotation affects the dynamics. To appreciate this point, one can notice that rotation produces the energy flux transition without doing any work on the system,
. The result is that it is difficult to infer the value from the analysis of the resulting flow field. For all this reasons this problem is a relevant example where measurements are not easily translated in information and new paradigm of analyses are needed. Some attempts in this direction have been followed in the recent years, using different tools like Fourier spatiotemporal decomposition di2015spatio or nudging di2018inferring; di2020synchronization; buzzicotti2020synchronizing, however all these techniques require access to the temporal evolution to deduce the value from the velocity field. In this work we aim to solve the same inverse problem of inferring from the analysis of the velocity, using a DCNN designed and trained on this purpose. As discussed we expect to have only a partial knowledge of the system, and in particular we make the four following assumptions:
Time evolution is not accessible

Observations capture only a coarsegrained version of the velocity field

We have access only to the velocity amplitude,

We can observe the system on a 2d plane and not the full 3d domain
Each of these limitation strongly compromises our knowledge of the systems, however, as discussed above the first one is the most stringent. Indeed, up to our knowledge, there are not previous attempts based on Machine Learning techniques to attack the problem without any knowledge on the equations of motion. For comparison, in this work, we have repeated the same investigation using standard Bayesian Inference (BI) analysis. Results highlight the superiority of DCNN in solving regression problems and show how DCNN can be used to improve current data analysis capability. At the same time, in this work we tested DCNN on a much more challenging task than the standard image classification performed to simulate human vision
janai2020computer; maron2020learning; pidhorskyi2018generative; tan2020efficientdet; chouhan2020applications; cheng2021fashion; feng2019computer; pathak2016context. We have also attempted to perform the same classification, using the unsupervised Gaussian Mixture Models (GMM) classification
rasmussen1999infinite; reynolds2009gaussian; HeGMM2011. However we did not find any relation between the unsupervised classification and the rotation value, so it has not been included in the comparison with DCNN and BI. The paper is organized as follow. In Sect. II there is a detailed description of the datasets used in this work. In Sect. III we describe all methodologies, namely the DCNN and the BI. Results are presented in Sect. IV, and in Sect. V we draw our concluding remarks.Ii Dataset
ii.1 Numerical Simulations
To generate the database of turbulent flows on a rotating reference frame, we have performed a set of Direct Numerical Simulations (DNS) of the NSE for incompressible fluid in a triply periodic domain of size with a resolution of girdpoints per spatial direction. We used a fully dealiased parallel 3d pseudospectral code, with time integration implemented as a secondorder AdamsBashforth scheme and the viscous term implicitly integrated. A linear friction term, , is added at wavenumbers with module to reach a stationary state. At the same time, viscous dissipation, , is replaced by a hyperviscous term, , with , to increase the inertial range of scales dominated by the nonlinear energy transfer. The forcing mechanism, , is a Gaussian process deltacorrelated in time, with support in wavenumber space around . The total energy time evolution for six simulations varying is presented in Fig. 3(a), while in the panel (b) of the same figure we show the energy spectra, , averaged over time in the stationary regime for the same set of six simulations.
, from the direct to the inverse cascade regime. (Panel b) Energy spectra as a function of the wavenumbers for the same set of simulations. The spectra are averaged on time in the stationary regime, errorbars indicate the standard deviation and are generally inside the symbols size.
From the energy spectra we can observe that while for (in the direct cascade regime) there is a depletion of energy at wavenumbers smaller than , for values above the transition the split cascade regime, also the small wavenumbers are very energetic thanks to the inverse cascade suggesting the presence of largescales structures in the flow.
ii.2 Dataset extraction
The ML training and validation datasets, accessible at the webpage http://smartturb.roma2.infn.it, are extracted from the DNS described above, following TurbRot, such as to match all assumption discussed in the introduction (Sect. I), namely;

In order to vary the Rossby number, we performed ten numerical simulations each with a different rotation frequency. Two simulations well below the transition in the direct cascade regime, , three of them in the transition region, , and the remaining five well inside the split cascade regime with .

For each of the ten simulations we have dumped a number of roughly snapshots of the full 3d velocity field with a temporal separation large enough to decrease correlations between two successive datapoints (see Fig. 3(a) for the total energy evolution).

Each snapshot at the original resolution of grid points is downsized on a coarser grid of gridpoints, after applying a Galerkin truncation in Fourier space where the maximum frequency is set to .

For each downsized configuration, horizontal planes , (perpendicular to the rotation axis), are selected at different and for each of them we computed the module of the velocity field as; .

To increase the dataset, we shift each of the planes in different ways, by choosing randomly a new center of the plane and using periodic boundary conditions, such as to obtain a total number of planes at each instant of time and a total of planes for each value.
The full dataset, in this way, is composed of more than 1 million different velocity module planes, out of which the of them are used as training set while are used as validation set. To be sure that validation data were never seen during training, none of the planes used in the validation set is obtained from a random shift of a plane contained in the training set. Instead, the two sets are built using the same protocol but from different temporal realization of the flow splitting its time evolution in two separate parts.
Iii Methods
iii.1 Deep Convolutional Neural Network (DCNN)
Fig. 4 schematizes the ML problem setup. A plane of velocity module randomly extracted from the dataset of ten simulations is given in input to a DCNN with the goal to infer the value of the simulation from which the plane has been extracted. The training is performed in a supervised setting, having prelabelled all planes with their corresponding value, . As solution of the regression problem, the DCNN gives in output a real number corresponding to the predicted rotation value,
. In this way the inferring task is not limited to a predefined set of classes and can be generalized to estimate rotation values never observed during training. The network is trained to
minimizethe following loss function;
(4) 
which is the meansquarederror between the target value and the predicted value .
is the mean over a minibatch of 256 different planes. To implement the DCNN we used TensorFlow
abadi2016tensorflow libraries with a Python interface, optimized to run on GPUs. As described in Fig. 5 the overall architecture is composed of three convolutional layers that encode the input plane of velocity module on different planes of size. This first part of the network is the one asked to identify all the relevant features of the velocity planes relevant to the regression problem. The data/features are then reshaped into a single vector of
elements. The second half of the network, is composed by two fully connected layers that transform the dimensional vector into the real number representing the network prediction, . The training of these two final fully connected layers is performed using a dropout of to reduce overfitting issues srivastava2014dropout. All layers are followed by a nonlinear ReLu activation functions.
In the main panel of Fig. 6 we show the evolution of the loss, , measured on the training and validation data as a function of the number of epochs. From their evolution we see that there is no evidence of overfitting with the two losses remaining always of the same order. We considered training converged after roughly epochs where the cost of the validation set starts to saturate. Using a minibatch of planes, and a training dataset of planes, each epoch consists of
backpropagation steps. The backpropagation has been performed with an Adam optimizer equipped with a Nesterov momentum
dozat2016incorporating using a learning rate of . To achieve a good training with the learning protocol just introduced, we have normalized the input data to be in the range, namely we rescaled the module intensity of each girdpoint by normalizing by it by the maximum value calculated over the whole training set. The same normalization constant has been used to rescale the validation set. In the same way the training labels have been rescaled by their maximum value in the training. In the inset of Fig. 6, we show the Probability Density Function (PDF) of predicting the correct rotation value using the DCNN on the validation set at different epochs during training. Here
is used, but similar behaviours are observed for all the other . At epoch the PDF becomes close to a deltalike distribution, suggesting very good accuracy of the network, with errors only every few cases over onethousand, and always within the correct value.iii.2 Bayesian Inference
The second statistical inference approach considered in this work is based on the Bayes’ Theorem
aster2018parameter; efron2013250; mackay1992bayesian. The idea is to infer properties of the flow assuming the knowledge of some underlying probability distributions. More specifically, the theorem tells us that the posterior probability,
, of observing given a measure of another generic observable, , can be estimated as follow;(5) 
is the prior probability, in our case for all
equal to because the dataset is composed by the same number of planes for each of the ten . , instead, is the likelihood to get the measure of a generic observable for a fixed . Once known the posterior probability in Eq. (5), a first Bayes estimation can be obtained as follows;(6) 
which correspond to the mean value on the posterior probability. A second possible estimation can be done as the most likely given , namely;
(7) 
where in the last step the denominator of (5) is neglected because it does not affect the maximization on .
For our specific setup we have chosen two different observables that could be relevant for the inference of the correct and at the same time accessible from the analysis of the dataset. The first is the mean energy, , and the second is the mean squared gradient, , where , and the angular brackets, , are the average over the plane surface. For each of these observables we have calculated the likelihood from the thousand planes for each of the training dataset. Namely we have measured; , and the jointlikelihood, on the training set. At this point, measuring the same observables on the validation dataset, and knowing the likelihoods for each , we could estimate the posterior probability (Eq. (5)) and than infer (Eq. (6)), on the validation planes. Before moving to the results, to highlight the difficulties of such problem in the left panel of Fig. 7 we show a scatterplot of the fourth vs the eight order moments of the velocity module for each of the thousand planes of the validation set. Different colours indicate different rotation values as indicated by the arrows and labels, while the intensity of the colours is proportional to the density of points accumulated in that region. From the scatterplot we can differentiate three main clusters, the first is composed by the planes in the direct cascade regime, while the other two are made of planes from simulations around or above the transition. Inside these three main clusters it is not possible to distinguish any specific . Furthermore, moments amplitude is not a monotonic function of the . The right panel of Fig. 7 shows the jointlikelihood of the two observables introduce above, mean energy and mean gradients. From this plot we can see that measuring gradients does not allow to have a better statistical separation of the different and only the points belowabove the transition are always possible to be inferred. In the next section we present and discuss results of inference via DCNN and standard Bayesian approach.
Iv Results
As already pointed out, guessing the correct value from the observation of the velocity planes is a challenge much beyond human vision capability. To appreciate it, in Fig. 8 we present a random selection of planes extracted from our validation dataset. Above each panel we have reported the exact value of the simulation, and the corresponding prediction obtained by the DCNN analysis and the Bayes inference, using the jointlikelihood . These first results are already suggesting that the DCNN prediction is closer to the correct than the Bayes one. In particular, the network’s uncertainties, (the neural network outputs a real number and not a class), are generally small enough to allow the correct classification on the corresponding , on the contrary the Bayes inference leads to a missclassification of the rotation parameter.
A quantitative comparison of the two techniques is presented in Fig. 9, where in the four panels we report a scatterplot between the correct rotation value, and the corresponding prediction obtained via the DCNN (top left panel), and the Bayes inference with the three different likelihoods, namely the one based on the mean energy (top right panel), on the mean square gradients (bottom left panel) and on the joint likelihood combining energy and gradients (bottom right panel). The four panels are all obtained with the same validation dataset of roughly thousands planes for each of the ten values. The symbols size and color intensity are proportional to the density of points in the specific region of the scatterplot, while the open hexagon in black represents the mean value measure over all planes for each
. The DCNN estimates are the more accurate both in terms of its mean and variance, being most of the time close to the identity (dotted) line indicating the perfect correspondence between
and . From Fig. 9 we can conclude that Bayesian inference based on the knowledge of likelihood on the mean energy allows only to classify among below and above transition values (topright panel), while the same analysis repeated on the mean squared gradients likelihood, is not even enough to classify such transition. This can be explained by the fact that mean energy is a larger scale observable than the mean squared gradients, hence it is able to better highlight the formation of the largescales vortexes. On the other hand, combining the two information at large and small scales the jointlikelihood Bayes estimate improve its prediction quality and even if it does not allow to distinguish close values such as in the range at least it allows to always classify values below the transition , close to the transition and large rotation far from the transition .The more detailed comparison is presented in Fig. 10, where in ten different panels we report the PDF of the DCNN and Bayes predictions, for all over all planes of validation set. Each PDF is normalized to corresponding rotation value such as indicates the correct prediction. The Bayes inference reported in this figure is based on the knowledge of the jointlikelihood , which as already found gives a higher accuracy. The PDFs for the predictions in the direct cascade regime () show difficulties for both approaches to distinguish the two cases. This is not surprising, indeed, before the transition no largescale coherent structures are expected to appear in the flow and the resulting field should be almost equivalent for all rotation rates and similar to the nonrotating homogeneous and isotropic case. At the same time comparing the two PDFs before the transition we notice that while Bayes peaks always at the center between the two values, the DCNN is correctly peaked around one in both cases, suggesting that rotation is affecting the flow since before the inverse energy transfer is established in the dynamics. For all the other after the transition, and in particular at intermediate rotation rate, the DCNN accuracy is always better than classical Bayesian approach. The final comparison is presented in Fig.11 which contains a summary of what already shown by previous the PDFs. Here, indeed, it is shown the mean difference between the predicted and the true value for all the different classes considered. The average also in this case is measured over all the estimations for all validation planes of each of the ten classes, and the error are estimated as the root mean square of the same quantity. In this figure we can see, as observed before, that the mean value of the DCNN prediction is for all closer to the true value, indeed the black squared dots are always closer to zero than the other cases reporting the Bayes inference. At the same time also the standard deviation around the mean is always smaller for the DCNN case than for the other Bayesian results. This final observation allows us to conclude that the classification produced by the neural network is in all terms more efficient the one achieved with the Bayesian approach.
V Conclusions
We have implemented a Deep Convolutional Neural Network (DCNN) inspired by the computer vision community to analyze data of turbulent flows under rotation with the goal to infer the rotation frequency of the reference frame from the resulting flow velocity field. To put our self in a realistic scenario we assumed to have just a limited knowledge of the flow, and in particular we imagined to have access only to a bidimensional cut of the full 3d domain and to have the velocity only at a fixed time during the evolution. As argued, these two assumptions make the problem difficult enough that no other tools of analysis are known to be able to achieve such goal under the same conditions, and allowed us to show how Machine Learning can push forward our data analysis capability. Indeed, results showed that even a simple DCNN properly trained has been able to solve such problem with unprecedented accuracy. For comparison, the only other possible way to repeat the same inference without using temporal information, was to use a standard Bayesian inference based on different observables. For us, the natural choice from the type of problem and the available data, was to measure moments of the velocity field and of the velocity gradients. As discussed all results have highlighted the potential of DCNN, that were able to outperform any other inference attempt. As briefly discussed in the introduction, an unsupervised classification of the same dataset has been attempted using a GMM classification algorithm, however no relation between the classes identified and the rotation value has been found. Indeed GMM, as more in general kmeans algorithms
lee1999learning, are mainly sensitive to the number of vortexes inside each plane and to their spatial location, hence, to observables that are not closely connected to the rotation value but can only be used to distinguish the dynamics above and below the transition threshold from direct to inverse cascade. For this reason a detailed analysis using this tool has been omitted in the paper. From this study we want to bring two main messages. The first is, from a physics point of view, that Machine Learning provides tools that can be highly beneficial in problems of data analysis of complex systems, the work presented in corbetta2021deep is another example. Second, on the other hand, is to show how physical systems can be ideal testing grounds for new ML algorithms, where they can be evaluated objectively and compared quantitatively on much more challenging tasks than the ones of general objects recognition or classification. It is far from us to think that ML can always offer best solutions to data analysis, but instead we aim to bring an example of how ML and physics can benefits from each other, hoping that this work will lead to further and deeper investigations in such direction.Acknowledgement
The authors thank Prof. Roberto Benzi for many useful discussions. This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 882340), and by the project Beyond Borders (CUP): E84I19002270005 funded by the University of Rome Tor Vergata.