1 Introduction
Particle Image Velocimetry (PIV) is one of the most popular measurement techniques in experimental fluid dynamics, and is also used to diagnose flow information from the remote sensing of largescale environmental flows. The method provides quantitative measurements of velocity fields in fluids that can be used to explore complex flow phenomena. When conducting the PIV technique, the fluid under investigation is seeded with sufficiently small tracer particles (or the presence of naturally occurring features is exploited). These particles are assumed to follow the flow dynamics. With illumination (in the laboratory often through the use of lasers to capture image information over a twodimensional plane), the particles in the fluid are visible. By comparing resulting flow images between time levels, velocity field information can be inferred [1]. There are two main techniques used for performing classical PIV: crosscorrelation and variational optical flow methods.
The development of deep learning techniques has inspired a new direction for tackling PIVlike problems. Several authors have in the literature proposed and demonstrated the use of supervised learning based methods for PIV. However, due to the unavailability of a broad range of reliable ground truth training data, supervised learning methods have limitations, especially when seeking to generalise to realworld problems. On the other hand, unsupervised learning is a type of machine learning approach that looks for previously undetected patterns in a dataset with no preexisting labels and with minimum human supervision
[2].In this paper we propose a new fluid velocity estimation method using an unsupervised learning strategy based upon particle images.
1.1 Crosscorrelation and variational optical flow methods
There are two main standard approaches for performing particle image velocimetry: crosscorrelation and optical flow methods. The crosscorrelation method calculates a displacement by searching for the maximum crosscorrelation between two interrogation windows from an image pair [3], e.g. such as in the WIDIM (window deformation iterative multigrid) method. The crosscorrelation method is efficient and relatively easy to implement. However, it only outputs a spatially sparse (compared to the resolution of the seed particles in the fluid) displacement field and requires postprocessing. The variational optical flow method was proposed by Horn and Schunck (HS) [4]. It is a motion estimation approach that has been applied to PIV problems [5]. It treats the PIV problem through the solution of an optimisation problem, seeking the minimisation of an objective function. The method can output a dense displacement field, but the optimisation process is timeconsuming.
1.2 Deep learning methods
Machine learning methods, especially deep learning, have made great progress in applications to many realworld problems in recent years. In the PIV community, deep learning has been introduced recently. In [6]
, the authors provided a proofofconcept on this topic, where artificial neural networks are designed to perform endtoend PIV for the first time in this work.
PIV techniques are closely related to computational photography, a subdomain of computer vision. In this community, there are several important works related to the motion estimation problem using deep learning. The
FlowNetS and FlowNetC networks [7] were the first proposed for dense optical flow estimation. FlowNet2 [8], an extension of FlowNet, improves the optical flow estimation to a stateoftheart level. In addition, a lighterweight network LiteFlowNet [9] has also been proposed. It achieves a similar level accuracy compared to FlowNet2, using less trainable parameters. Although the networks mentioned above have achieved excellent performance for estimating motion fields from consecutive image pairs, their applications is generally limited to rigid or quasirigid motion.Therefore, it is of interest to explore the performance of these existing networks on particle image velocimetry problems.
2 Related work
Supervised and unsupervised learning are two different learning approaches. The key difference is that supervised learning requires ground truth data while unsupervised learning does not.
2.1 Supervised learning methods
Endtoend supervised learning using neural networks for PIV was first introduced by Rabault et al. in [6]
. A convolutional neural network and a fullyconnected neural network were trained to perform PIV on several test cases. That work provided a proofofconcept for the research community. However, the trained model did not achieve the ultimate quality of result compared with traditional PIV methods, and the application scenarios considered were limited to relatively simple cases. Lee et al.
[10] proposed a cascaded network architecture. The network was verified to produce results comparable to standard PIV methods (onepass crosscorrelation method, threepass window deformation). However, it had larger computational costs and lower efficiency. Another deep architecture approach based on supervised learning was proposed by Cai et al. in [11]. In that work the author developed a motion estimator PIVFlowNetSen based upon FlowNet. The estimator is able to extract features from particle images and output dense displacement fields. The model was evaluated both on synthetic and experimental data, and was shown to achieve good accuracy with high efficiency compared to correlationbased PIV methods such as the WIDIM method. Followup work introduced a more complex but lighterweight network PIVLiteFlowNeten [12], based on LiteFlowNet [9]. The model was shown to have the same level of performance as variational optical flow methods in terms of estimation accuracy, while showing advantages in terms of efficiency.The supervised learning approach relies heavily on large volumes of training data. However, in most realworld scenarios, especially in fluid dynamics, there is no easily available ground truth data and/or it is extremely difficult to annotate the data accurately through human means. Although the use of synthetic data (e.g. based upon computational fluid dynamics studies) can help construct large annotated datasets, the gap between synthetic and realworld scenarios limits the generalisation abilities of the constructed networks. This can mean that supervised learning based approaches may struggle when confronted with data from realworld problems.
2.2 Unsupervised learning methods
Unsupervised learning is a type of machine learning that, in contrast, looks for previously undetected patterns in a dataset with no preexisting labels and with minimum human supervision.
To the best of our knowledge, there are no previous examples of approaches that tackle the PIV problem based on unsupervised learning. In the computer vision community, there is some previous work related to the use of unsupervised learning for optical flow estimation. Yu et al. [13] suggested an unsupervised method based on FlowNet in order to sidestep the limitations of synthetic datasets. They use a loss function that combines a data term (photometric loss) measuring photometric constancy over time with a spatial term (smoothness loss) modelling the expected variation of flow across the image. In [14], Meister et al. extended the work using a symmetric, occlusionaware loss based on both forward and backward optical flow estimates. They also made use of iterative refinement by stacking multiple FlowNet networks. The model showed advantages and outperformed supervised learning on challenging realistic benchmarks.
Our work is inspired in part by Meister et al. [14]; we extend the unsupervised learning strategy to PIV problems, building our model based on LiteFlowNet instead of FlowNet. We trained our model on a synthetic PIV dataset generated by Cai et al. in [11]. Unlike the supervised strategy, we only use the particle images pairs in the dataset, and leave the ground truth motion data (which is used to generate the image pairs) for benchmarking purposes.
3 Method
Given a grayscale image pair as input, our goal is to estimate the forward flow field from to , , where and are scalar velocity fields in two orthogonal directions. As we take the bidirectional estimate into consideration, the backward flow field is defined as . In section 3.1 we will introduce the unsupervised loss and how the loss is integrated for training. The network architecture will be described in section 3.2.
3.1 Unsupervised Loss
In the training process, the input only contains image pairs , without the velocity field ground truth. Therefore, we use traditional optical flow measurements to evaluate our results. The total unsupervised loss is a combination of photometric loss, estimate flow smoothness loss and consistency loss between forward and backward fields.
3.1.1 Photometric loss.
The photometric loss is defined in terms of the difference between the first image and the warped second image using the forward flow field estimate, and the difference between the second image and the warped first image using the backward estimate. The bidirectional photometric loss is thus defined as the sum of two parts:
(1)  
where is the generalized Charbonnier penalty function, , which is a differentiable, robust convex function [15]. We use the values in this work.
Image ‘backwarping’ is the key step when computing the photometric loss. In order to make the loss backpropagation possible during the training process, we use the differentiable bilinear sampling scheme proposed in
[16]. The basic idea is first to generate a sampling coordinate in target image , using and the flow field estimate . The coordinate can be described as: , here is the coordinate field for image . A bilinear sampler is then used to construct the warped image in terms of coordinate :(2) 
3.1.2 Smoothness loss.
There are regions in the images that lack necessary information. For example, there may be insufficient particles near image boundaries, as the particles move out of the image area in the second frame or the particles have not entered the image in the first frame. Therefore, to tackle resulting ambiguities, a smoothness loss is included into our total unsupervised loss. To enhance the regularisation effects, we use a secondorder smooth constraint [17]:
(3)  
where represents a four channel filter (, and two diagonals, see Fig. 1). Therefore, the process here is first to compute the convolution of the two flow components ( in the and in the directions) with the four channel filter respectively, then compute their Charbonnier loss.
3.1.3 Consistency loss.
The forward and backward flow estimates should be consistent, i.e. the forward flow is expected to be the inverse of the backward flow at the corresponding pixel in the second image. The sum of this pair of flow fields should therefore be zero, and similarly for the backward flow estimate. The consistency loss function can thus be defined as:
(4)  
3.1.4 Final integrated loss.
The final integrated loss, , combines the above loss terms using weighted (with scalar weights ) summation:
(5) 
3.2 Network architecture
3.2.1 UnLiteFlowNetPIV.
Our network, named UnLiteFlowNetPIV, is based on LiteFlowNet [9]
. It extracts two images’ features using a twostream convolution neural network (NetC) with shared weights. NetC has a pyramidal structure and encodes the image from full resolution to a sixth of that of the original. Then a decoder (NetE) performs cascaded flow inference (convolutionally upsampling) with flow regularisation. The final flow estimate is upsampled to the original resolution using bilinear interpolation. In our work, we compute both the forward and backward flow in one estimation. The input for the forward flow estimation is
, and it is for the backward flow.3.2.2 Training Loss.
The training loss function’s design is similar to that in FlowNet [7] and LiteFlowNet [9], and uses a multiscale resolution loss. It is the weighted sum of the estimation losses from each of the intermediate layers:
(6) 
where is the loss function (5). At each layer, the image pair is downsampled to compute the current layer’s loss. As the distance between pixels effectively changes after downsampling, the flow estimate is multiplied by the appropriate scaling factor, which is the fraction between the current and the full image resolution. Here, , and indicates the loss at full resolution.
4 Evaluation
4.1 PIV dataset
The dataset considered in this work was generated by Cai et al. [11]. The dataset contains 15,050 particle image pairs with the originating flow field ground truth data obtained from computational fluid dynamics simulations. There are eight different types of flow contained in the dataset, including ‘uniform’ flow, flow past a backward facing step (‘backstep’) and past a ‘cylinder’, both at a variety of Reynolds numbers, ‘DNSturbulence’, sea surface flow driven by a quasigeostrophic model (‘SQG’), etc. Detailed information on the dataset is provided in Table 3. In our work we use half of the dataset for training and the other half for testing.
4.2 Training details
We train the model for 40,000 iterations with a batch size of four image pairs using the Adam optimiser. The learning rate is kept at . The smoothness loss weight is , the consistency loss weight , photometric loss weight . The weights for different layers are set to [12.7, 5.5, 4.35, 3.9, 3.4, 1.1] as in [14], from the full resolution to the lowest level. The image pair are normalized from value ranges of 0–255 to 0–1 before feeding into the network.
4.3 Results
Table 1 compares the accuracy of our model with previous work and different approaches, including classical PIV and stateoftheart deep learning based methods. The results are evaluated on the PIV dataset, with the Averaged Endpoint Error (AEE) calculated for different flow types. In order to compare the results easier, we set the units of the AEE to pixel per 100 pixels. The AEE can be described as the norm of the difference in flow estimation and the flow ground truth :
(7) 
4.3.1 Comparison to classical PIV.
It can be observed that our unsupervised model outperforms classical correlationbased PIV WIDIM methods in almost all flow cases, especially for the challenging cases of DNSturbulence and SQG. Although the unsupervised model does not outperform the Horn–Schunck (HS) optical flow method [4], the differences are relatively small. In addition, as mentioned above, the HS optical flow method requires a large amount of computational time in order to conduct the optimisation process, which results in low efficiency especially when multiple image pairs need to be processed. Without considering the time to load images from disk, the computational time for 500 image (256 256) pairs using our UnLiteFlowNetPIV is 10.17 seconds on an Nvidia Tesla P100 GPU, while the HS optical method requires roughly 556.5 seconds and WIDIM (with a window size of 29 29) requires 211.5 seconds on an Intel Core I77700 CPU [12]. Although the classical PIV methods are tested on a CPU, as shown in the [6, 11] the speed improvements for them running on GPUs are limited. Therefore, efficiency is a great advantage for learning based methods compared to the classical approaches.
4.3.2 Comparison to deep learning PIV.
The unsupervised learning approach shows potentially significant advantages compared to stateoftheart supervised learning methods. Figs. 3, 4 and 5 demonstrate comparisons between our fully unsupervised model UnLiteFlowNetPIV and PIVLiteFlowNet. PIVLiteFlowNet [12] uses a similar network architecture to our UnLiteFlowNetPIV, but is trained using a supervised learning strategy with ground truth data. Although the unsupervised UnLiteFlowNetPIV never has access to the ground truth data during the training process, it still outperforms most supervised learning methods (PIVNetSnoRef, PIVNetS, PIVLiteFlowNet), especially on difficult cases. Therefore, the unsupervised learning method with an accurate loss function shows competitive capabilities and often better performance compared to supervised methods.
PIVLiteFlowNeten [12] is an enhanced version of PIVLiteFlowNet, it adds one additional layer at the end of the NetE, which improves its inference ability but makes the network more complicate and heavier. We did not try to construct deeper networks in our work for brevity. There are ideas for improving the performance by stacking networks [8], which would also be an interesting avenue to explore in further work.
Methods  BackStep  Cylinder  JHTDB  DNS  SQG  
channel  turbulence  
train  test  train  test  train  test  train  test  train  test  
WIDIM [12]    3.4    8.3    8.4    30.4    45.7 
HS optical flow [12]    4.5    7.0    6.9    13.5    15.6 
PIVNetSnoRef [11]  13.6  13.9  19.8  19.4  24.6  24.7  50.6  52.5  51.9  52.5 
PIVNetS [11]  5.8  5.9  6.9  7.2  16.3  15.5  27.1  28.2  28.9  29.4 
PIVLiteFlowNet [12]  5.5  5.6  8.7  8.3  10.9  10.4  18.8  19.6  19.8  20.2 
PIVLiteFlowNeten [12]  3.2  3.3  5.2  4.9  7.9  7.5  11.6  12.2  12.4  12.6 
UnLiteFlowNetPIV    10.1    7.8    9.6    13.5    19.7 
Loss function  BackStep  Cylinder  JHTDB  DNS  SQG 
channel  turbulence  
10.1  7.8  9.6  13.5  19.7  
11.6  10.5  15.3  21.4  22.5  
14.1  38.4  18.1  23.6  25.5 
4.3.3 Ablation study.
There are three components to the loss function as mentioned above. The contributions to model performance of each component are investigated here. Results are summarize in table 2. The model is trained for 40,000 iterations with three different loss functions: (i.e. the full loss function), (no consistency loss), and (no smoothness loss). The model trained using the full loss performs the best among the three on the test dataset. Removing either smoothness or consistency loss leads to a worse performance on the test dataset considered here.
5 Conclusion
We present here the first work using an unsupervised learning approach for solving Particle Image Velocimetry (PIV) problems. The proposed unsupervised learning approach shows significant promise and potential advantages for fluid flow estimation. It yields competitive results compared with classical PIV methods as well as existing supervised learning based methods, and even outperforms them on some difficult flow cases. Furthermore, the unsupervised learning method does not rely on any ground truth data in order to train, which makes it extremely promising to generalize to complex realworld flow scenarios where ground truth is effectively unknowable, and thus represents a key advantage over supervised methods.
Metric Name  Description  Condition  Quantity 
Uniform  Uniform flow  1000  
Backstep  Flow past a backward facing step  Re = 800  600 
Re = 1000  600  
Re = 1200  1000  
Re = 1500  1000  
Cylinder  Flow past a circular cylinder  Re = 40  50 
Re = 150  500  
Re = 200  500  
Re = 300  500  
Re = 400  500  
DNSturbulence  Homogeneous and    2000 
isotropic turbulent flow  
SQG  Sea surface flow    1500 
driven by SQG model  
Channel flow  Channel flow    1600 
provided by JHTDB  
JHTDBmhd1024  Forced MHD turbulence    800 
provided by JHTDB  
JHTDBisotropic1024  Forced isotropic turbulence    2000 
provided by JHTDB 
Acknowledgements
The authors would like to acknowledge funding from the Chinese Scholarship Council and Imperial College London (a pump priming research award from the Energy Futures Lab, Data Science Institute and Gratham Institute – Climate Change and the Environment) that supported this work.
References
References
 [1] R. Adrian and J. Westerweed.: Particle Image Velocimetry. Cambridge University Press, (2011)
 [2] G. Hinton and T. Sejnowski.: Unsupervised Learning: Foundations of Neural Computation. MIT Press, (1999)
 [3] J. Westerweel.: Fundamentals of digital particle image velocimetry. Experiments in Fluids, vol. 23, no. 12, pp. 1379–1392 (1997)

[4]
B. Horn and B. Schunck.: Determining optical flow. Artificial Intelligence, vol. 17, no. 1–3, pp. 185–203 (1981)
 [5] P. Ruhnau, T. Kohlberger, C. Schnorr, and H. Nobach.: Variational optical flow estimation for particle image velocimetry. Experiments in Fluids, vol. 38, no. 1, pp. 21–32 (2005)
 [6] J. Rabault, J. Kolaas, and A. Jensen.: Performing particle image velocimetry using artificial neural networks: a proofofconcept. Measurement Science and Technology, vol. 28, no. 12, p. 125301 (2017)
 [7] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van der Smagt, D. Cremers, and T. Brox.: Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)

[8]
E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox.: Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, (2017)
 [9] T. Hui, X. Tang, and C. Loy.: LiteFlowNet: A lightweight convolutional neural network for optical flow estimation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 8981–8989 (2018)
 [10] Y. Lee, H. Yang, and Z. Yin.: PIVDCNN: cascaded deep convolutional neural networks for particle image velocimetry. Experiments in Fluids, vol. 58, no. 12, article number 171 (2017)
 [11] S. Cai, S. Zhou, C. Xu, and Q. Gao.: Dense motion estimation of particle images via a convolutional neural network. Experiments in Fluids, vol. 60, no. 4, article number 73 (2019)
 [12] S. Cai, J. Liang, Q. Gao, C. Xu, R. Wei.: Particle image velocimetry based on a deep learning motion estimator. IEEE Transactions on Instrumentation and Measurement (2019b)
 [13] J. Yu, A. Harley, and K. Derpanis.: Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In European Conference on Computer Vision ECCV 2016, pp. 3–10 (2016)
 [14] S. Meister, J. Hur, and S. Roth.: UnFlow: Unsupervised learning of optical flow with a bidirectional census loss. The ThirtySecond AAAI Conference on Artificial Intelligence (2018)
 [15] D. Sun, S. Roth, and M. Black.: A quantitative analysis of current practices in optical flow estimation and the principles behind them. International Journal of Computer Vision, vol. 106, no. 2, pp. 115–137 (2014)

[16]
M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu.: Spatial transformer networks. In NIPS’15: Proceedings of the 28th International Conference on Neural Information Processing Systems – Volume 2, pp. 2017–2025 (2015)
 [17] C. Zhang, Z. Li, R. Cai, H. Chao, and Y. Rui.: Asrigidaspossible stereo under second order smoothness priors. In European Conference on Computer Vision ECCV 2014, pp. 112–126 (2014)
 [18] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. Black, and R. Szeliski.: A database and evaluation methodology for optical flow. International Journal of Computer Vision, vol. 92, no. 1, pp. 1–31 (2011)
 [19] Y. Li, E. Perlman, M. Wan, Y. Yang, C. Meneveau, R. Burns, S. Chen, A. Szalay, and G. Eyink.: A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence. Journal of Turbulence, vol. 9, no. 9, (2008)