1 Introduction
Experimental fluid dynamics requires a measurement system to observe the flow in a fluid. The main method to reconstruct the dense flow field in a (transparent) volume of fluid is to inject neutralbuoyancy tracer particles, tune the illumination to obtain high contrast at particle silhouettes, and record their motion with multiple highspeed cameras. The flow is then reconstructed either by Particle Tracking Velocimetry (PTV) [1]
, meaning that one follows the trajectories of individual particles and interpolates the motion in the rest of the volume; or by
Particle Imaging Velocimetry (PIV) [2, 3], where one directly estimates the local motion everywhere in the volume from the displacements of nearby particle constellations.Early approaches operated essentially in 2D, focusing on a thin slice of the volume that is selectively illuminated with a laser lightsheet. More recently, it has become common practice to process entire volumes and recover actual 3D fluid motion. A popular variant is to directly lift the problem to 3D by discretizing the volumetric domain into voxels. The standard approach is to process the multiview video in two steps. First, a 3D particle distribution is reconstructed independently for each frame, often by means of tomographic PIV (TomoPIV) [4, 5, 6, 7]
, and stored as an intensity/probability volume, representing the particleinvoxel likelihood. A second step then recovers motion vectors, by densely matching the reconstructed 3D particles (respectively, particle likelihoods) between consecutive time frames. The most widely used approach for the latter step is local matching of relatively large interrogation volumes, followed by various postprocessing steps, e.g.
[4, 8, 9]. In computer vision terminology, this corresponds to the classic LucasKanade (LK) scheme for optical flow computation
[10]. While LK is no longer stateoftheart for 2D flow, it is still attractive in the 3D setting, due to its simplicity, and to the efficiency inherent in a purely local model that lends itself to parallel computation. Optical flow, i.e., dense reconstruction of apparent motion in the 2D image plane, is one of the most deeply studied problems in computer vision. Countless variants have been developed and have found their way into practical applications, such as medical image registration, human motion analysis and driver assistance. The seminal work of Horn and Schunk [11], and its contemporary descendants, e.g. [12, 13, 14], address the problem from a variational perspective. A setting we also follow here. Flow estimation from images is locally underconstrained (aperture problem). Imperfect particle reconstructions and the ambiguity between different, visually indistinguishable, particles further complicate the task. The normal solution for such an illposed, inverse problem is to introduce prior knowledge that further constrains the result and acts as a regularizer.Indeed, modeling the problem in the volumetric domain opens up the possibility to incorporate physical constraints into the model, e.g., one can impose incompressibility of the fluid. These advantages were first pointed out in [15]. [16] were among the first to exploit physical properties of fluids, however, only as a postprocess to refine an initial solution obtained with taking into account the physics. Following our own previous work [17], we base regularization on the specific properties of liquids and propose an energy, whose optimality conditions correspond to the stationary Stokes equations of fluid dynamics. This strictly enforces a divergencefree flow field and additionally penalizes the squared gradient of the flow, making explicit the viscosity of the fluid.
Figure 1 depicts our basic setup, a 2pulse variational flow estimation approach from initial 3d reconstructions.
By operating densely in the 3D domain, our approach stands in stark contrast with methods that build sparse particle trajectories over multiple timesteps from image domain data [18]. Here, 3D information is used only for matching, in particular to constrain the local search for matching particles in the temporal domain, while regularization is introduced by demanding for smooth 2D trajectories. Physical properties like incompressibility are much harder to model without the consideration of 3D spatial proximity. Hence, these methods have to fit a dense, volumetric representation to the discovered, sparse particle tracks in a separate step. There, relevant physical constraints [19, 20] can be included, yet cannot influence the particles’ 3D motion any more. While multiframe analysis is certainly wellsuited to improve the accuracy and robustness of PIV over long observation times, we argue that the basic case of two (or generally, few) frames is better captured by dense flow estimation in 3D.
The proposed variational energy naturally fits well with modern proximal optimization algorithms [17], and it produces only little memory overhead, compared to conventional windowbased tracking in 3D [4, 8] without apriori constraints. In particular, we have shown in [17] that the sparsity of the particle data and the relative smoothness of the 3D motion field requires highest resolution only for matching. The 3D motion field can therefore be parameterized at a lower resolution, such that the whole optimization part requires about one order of magnitude less memory than storing the intensity volume.
According to the classical variational formulation [11], matching should directly consider the smallest entities (pixel in 2D, voxel in 3D) in the scene. However, particles are sparsely distributed and of high ambiguity, such that one rather compares a constellation of particles in a larger interrogation volume, trading off robustness for matching accuracy. Note that, compared to purely local methods [4, 8], our global model also allows us to consider a smaller neighborhood size for matching.
A popular alternative to matching raw spatial windows is to compress the local intensity patterns into descriptors, e.g. [21, 22]. A descriptor can be designed to support robust matching of different entities, like keypoints in different views [23], pedestrians [22] or to provide robustness against illumination changes [24]. Descriptors also improve optical flow computation in several realworld settings [25]. Here, we construct our own descriptor specifically for 3DPIV, that seeks to balance precision and robustness.
Our descriptor can be constructed onthefly and does not require storage of a highresolution intensity volume, rather it works directly on the locations and intensities of individual particles. Recall that the resolution of the intensity voxel grid [4, 8, 17], is directly coupled to the maximal precision of the estimated motion – this dependency is overcome by the new descriptor. At typical particle densities, constructing descriptors onthefly saves about three orders of magnitude in memory (compared to a conventional voxel volume), but obtains comparable precision of the motion field. Moreover, it allows one to fully exploit our memoryefficient optimization procedure.
Although a particlebased representation can also be extracted from tomographic methods [26], it appears more natural to directly reconstruct the particles from the multiview image data. Iterative particle reconstruction methods (IPR), e.g. [27], deliver exactly that, and often outperform TomoPIV methods especially in particle localization accuracy [27]. Building on these ideas, we pose IPR as an optimization problem. Since such a direct energy formulation is highly nonconvex, we add a proposal generation step that instantiates new candidate particles. That step is alternated with energy optimization. The energy is designed to explain intensity peaks that are consistent across all views; while at the same time discarding spurious particles that are not supported by sufficient evidence, with the help of a sparsity prior. Especially at large particle densities, our novel IPRbased technique outperforms a tomographic baseline in terms of recall as well as precision of particles (respectively number of missing and ghost particles), which in turn leads to significantly improved fluid motion.
We also present a deep analysis of our IPR based particle reconstruction model, where we analyze the effect of particle quality and density on the motion estimation, and quantitatively study the influence of the main parameters, including different layouts of the sparse descriptor.
2 Method
We follow Horn and Schunk [11] and cast fluid motion estimation directly as a global energy minimization problem over a (cuboid) 3D domain . Emerging from variational principles, the energy of that inverse problem can be divided into a data fidelity term and a regularizer. As input, the basic model requires a set of 3D particles, reconstructed individually for each time step, see Figure 2. It is described in more detail in Sec. 2.1.
As a baseline for particle reconstruction we utilize a wellknown tomographic reconstruction technique (MART) [4], which we briefly describe next at the start of the technical section. Afterwards, we present the novel, energybased IPR solution, which combines a simple imagebased data term with an sparsity. At this point we also detail the optimization procedure, which alternates between particle generation and energy minimization. Then follows an explanation of the proposed sparse descriptor, which exploits the low density of the particles within the fluid in Sec. 2.2.
The second part, Sec. 2.3, describes the volumetric model for fluid motion estimation. We start with the regularizer which, following [17], is based on the physical properties of the fluid, such as incompressibility and viscosity. Next we review the standard sumofsquareddistances (SSD) data cost, that we use as a baseline in our framework in Sec. 2.3, and show how our sparse descriptor can instead be integrated into the energy function as data cost. Importantly, the energy is constructed to be convex, in spite of the rather complex data and regularization terms. Hence, it is amenable to a highly efficient optimization algorithm, which we describe at the end of Sec. 2.3.
2.1 3D Reconstruction
The first objective of our pipeline is to reconstruct a set of particles , consisting of a set of 3D positions , and an associated set of intensities , , that together define the data term of the energy function. This is in contrast to voxelbased intensity volumes, e.g. [4]. These dense volumetric representations can be obtained by tomographic reconstruction methods, as traditionally used for 3DPIV. We compare two techniques to obtain the set of particles: a tomographic method built on MART, and an iterative particle reconstruction (IPR) [27] method that directly obtains sparse 3D particles and intensities from the 2D input images. To extract 3D particle locations and intensities in continuous space from the tomographic MART reconstruction, we perform peak detection and subsequent subvoxel refinement. Note that the mapping from dense to sparse is invertible, in our experiments we also utilize our IPR variant to generate a high resolution intensity volume on which we define the voxelbased SSD data term [17], as a baseline for our sparse descriptor.
In the following, we assume that the scene is observed by calibrated cameras in the images and define for each camera a generic projection operator . To keep the notation simple, we refrain from committing to a specific camera model at this point. However, we point out that fluid dynamics requires sophisticated models to deal with refractions, or alternatively exhaustive calibration to obtain an optical transfer function [28, 29]. Note that we reconstruct the particles individually for each time step, and denote the two consecutive time steps as and .
Mart.
Tomographic reconstruction assigns each voxel of a regular grid an intensity value . A popular algebraic reconstruction technique, originally proposed for threedimensional electron microscopy, is MART [30]. MART solves the convex optimization problem:
(1)  
Here, denotes the observed intensity at pixel , the unknown intensities of voxel , and is the set of all voxels traversed by the viewing ray through pixel . The weight depends on the distance between the voxel center and the line of sight and is essentially equivalent to an uncertainty cone around the viewing ray, to account for aliasing of discrete voxels. For each pixel in every camera image , and for each voxel , the following update step is performed, starting from :
(2) 
After each MART iteration, we apply anisotropic Gaussian smoothing ( voxels) of the reconstructed volume, to account for elongated particle reconstructions along the axis due to the camera setup [31]. Furthermore, we found it advantageous to apply nonlinear contrast stretching before the subsequent particle extraction or flow computation: , with .
Given the intensity volume , subvoxel accurate sparse particle locations are extracted: first, intensity peaks are found through nonmaximum suppression with a kernel, then we fit a quadratic polynomial to the intensity values in a neighborhood.
Ipr.
As an alternative to memoryintensive tomographic approaches, IPR methods iteratively grow a set of explicit 3D particles. They are reported to achieve high precision and robustness, cf. [27]
. In contrast to the effective, but somewhat heuristic
[27], our variant of IPR minimizes a joint energy function over a set of candidate particles to select and refine the optimal particle set. The energy minimization alternates with a candidate generation step, where putative particles are instantiated by analyzing the current residual images (the discrepancies between the observed images and the backprojected 3D particle set). Thus, in every iteration new candidate particles are proposed in locations where the image evidence suggests the presence of yet undiscovered particles. Energy minimization then continually refines particle locations and intensities to match the images, minimizing a data term . At the same time, the minimization also discards spurious ghost particles, through a sparsity prior . The overall energy is the sum of both terms, balanced by a parameter :(3) 
We model particles as Gaussian blobs of variance
and integrate the quadratic difference between prediction and observed image over the image plane of camera :(4) 
The model in 4 assumes that particles do not possess distinctive material or shape variations. In contrast, we explicitly model the particles’ intensity, which depends on the distance to the light source and changes only slowly. Apart from providing some degree of discrimination between particles, modeling the intensity allows one to impose sparsity constraints on the solution, which reduces the ambiguity in the matching process cf. [32]. In other words, we prefer to explain the image data with as few particles as possible and set
(5) 
The 0norm in 5 counts the number of particles with nonzero intensity, and has the effect that particles falling below a threshold (modulated by ) get discarded immediately. In practical settings, the depth range of the observed volume is small compared to the distance to the camera. Consequently, the projection can be assumed to be (almost) orthographic, such that particles remain Gaussian blobs also in the projection. Omitting constant terms, we can simplify the projection in 4 to
(6) 
such that we are only required to project the particle center. In practice, we can restrict the area of influence of 6 to a radius of . While still covering % of a particle’s total intensity, this avoids having to integrate particle blobs over the whole image in 3.
Starting from an empty set of particles, our particle generator operates on the current residual images
(7) 
and proceeds as follows: First, peaks are detected in all 2D camera images. The detection applies nonmaximum suppression with a kernel and subsequent subpixel refinement of peaks that exceed an intensity threshold . For one of the cameras, , we shoot rays through each peak and compute their entry and exit points to the domain . In the other views the reprojected entry and exit yield epipolar line segments, which are scanned for nearby peaks. For each constellation of peaks that can be triangulated with a reprojection error in all views, we generate a new (candidate) particle. The initial intensity is set depending on the observed intensity in the reference view, and on the number of candidate particles to which that same peak contributes. In particular, for each of the detected particles corresponding to the same peak in the reference, we set .
We run several iterations during which we alternate proposal generation and energy optimization. Over time, more and more of the evidence is explained, such that the particle generator becomes less ambiguous and fewer candidates per peak are found. We start from a stricter triangulation error and increase the value in later iterations.
To minimize the nonconvex and nonsmooth energy in 3 we apply (inertial) PALM [33, 34]. Being a semialgebraic function [33, 35], our energy 3 fulfills the KurdykaLojasiewicz property [36], so the sequence generated by PALM is guaranteed to converge to a critical point of 3. To apply PALM we arrange the variables in two blocks, one for the locations of the particles and one for their intensities. This gives two vectors and . The energy is already split into a smooth part 4 and nonsmooth part 5, where the latter is only a function of one block (). In this way, we can directly apply the blockcoordinate descent scheme of PALM. The algorithm iterates between the two blocks and and executes the steps of a proximal forwardbackward scheme. At first, we take an explicit step w.r.t. one block of variables on the smooth part , then take a backward (proximal) step on the nonsmooth part w.r.t. the same block of variables – if possible.
That is, at iteration and , we perform updates of the form
(8)  
with a suitable step size for each block of variables.
Throughout the iterations, we further have to ensure that the step size in the update of a variable block is chosen in accordance with the Lipschitz constant of the partial gradient of function at the current solution:
(9) 
In other words, we need to verify whether the step size in 8, used for computing the update , fulfills the descent lemma [37]:
(10)  
Otherwise the update step has to be recalculated with a smaller step size. In practice, the Lipschitz continuity of the gradient 10 of only has to be checked locally at the current solution. Hence, we employ the backtracking approach of [38], which is provided in pseudocode in Alg. 1.
For simple proximal operators, like ours, it is recommended to also increase the step sizes as long as they still fulfill 10, instead of only reducing them when necessary (lines 9,15 in Alg. 1). Taking larger steps per iteration speeds up convergence. We also take inertial steps [34] in Alg. 1 (lines 5,10). Inertial methods can provably accelerate the convergence of firstorder algorithms for strongly convex energies, improving the worstcase convergence rate from to in iterations, cf. [38]. Although there are no such guarantees in our nonconvex case, these steps significantly reduce the number of iterations in our algorithm by a factor of five, while leaving the computational cost per step practically untouched.
The proximal step on the intensities can be solved pointwise. It can be written as the 1Dproblem
(11) 
which admits for a closedform solution:
(12) 
2.2 Sparse Descriptor
In PIV, particle constellations are matched between two time steps. To that end, an interrogation window at the location for which one wants to determine velocity is considered as the local “descriptor”. Velocity information is obtained from the spatial displacement between corresponding patches, typically found via (normalized) cross correlation ((N)CC). For 3DPIV this requires the storage of full intensity voxel volumes, like those obtained with tomographic reconstruction. Discretizing the particle information at this early stage has the disadvantage that the resolution of the voxel grid directly limits the maximal precision of the motion estimate. Especially when considering multiple time steps, storing intensity voxel volumes at high resolution is demanding on memory, although the actual particle data is sparse. Moreover, large window sizes are needed to obtain robust correspondence, which (over)smoothes the output, trading off precision for robustness. Notably, in practical 3DPIV these required interrogation volumes are very large, e.g. Elsinga et al. [4] recommend . For typical particle densities, such an interrogation window contains only 25 particles on average. Finally, IPRtype methods reportedly outperform MART in terms of reconstruction quality, and directly deliver sparse particle output.
These findings motivate us to investigate a new feature descriptor that directly works on the sparse particle data. In order to replace the conventional correlation window, our novel descriptor must be defined at arbitrary positions in and lead to subvoxel accurate measurements. To use the descriptor in our variational setup we further demand that it be differentiable. Finally, the descriptor should be independent of the distance metric, such that we can use standard measures like SSD or (negative) NCC.
Similar to an interrogation window, the descriptor considers all particles within a specified distance around its center location. Its structure is defined by a radius , the arrangement of a set of vertices , and an integer count . All particles within a distance smaller than contribute to the descriptor vector. For such a particle , we further define the set of nearest neighbors in , , to which the intensity of the particle is distributed. We weight the contribution of to the vertex with a radial distance function and normalize the weights induced by the particle to sum to one. In computer graphics, such a 3Dto2D projection by weighted softassignment is called splatting. Apart from higher precision of the descriptor, cf. [21, 22], the splatting strategy leads to smoother descriptor changes when the nearest neighbor set changes after a spatial displacement of the particle. More importantly, it guarantees that the descriptor is at least piecewise differentiable.
Note, when arranging in a regular orthogonal grid and setting , the descriptor closely resembles a conventional interrogation volume. Instead, we propose to arrange the vertices on spherical shells with different radii. A 2dimensional illustration with is shown in Figure 3. Vertices are depicted in red and particles in blue.
Any descriptor is constructed to fulfill some set of desirable properties [39, 40, 41]. Designing one specifically for our application has the advantage that it can be tuned to the specific scenario. In our case we do not seek rotational invariance, and all directions should be considered equally (i.e. same vertex density on poles and equator). Further, our input is unstructured (in contrast to, for instance, laser scan points that adhere to 2D surfaces). Similar to [41] we prefer to distribute the vertices uniformly on the surface of multiple spheres. Accordingly, we place them at the corners of an icosahedron (or subdivisions of it) for each sphere. We further seek equidistant spacing between neighboring spheres and chose the distance of the radii in accordance with the distances of vertices on the inner shell. Because particles close to the center are more reliable for its precise localization, we place more points towards the descriptor center, for a more finegrained binning of close particles. In contrast, particles that are further away are more likely have different velocity and are splatted over a wider area. Hence, these particles mainly contribute to disambiguate the descriptor. The precision argument also suggests to weight particle closer to the center more strongly than those further away. Similarly, we compensate the fact that the vertices near the center represent a smaller volume than those further away, by weighting their contributions accordingly. To evaluate our descriptor centered at a given location in , we must detect the particles within a radius of that location, and also find the nearest neighbours for each particle within . Recall that in our scheme we precompute a set of particles consisting of location and intensity at both time steps. To find the particles we construct a KDtree [42] once for each time step, to accelerate the search. In our lowdimensional search domain (), with limited number of particles ( orders of magnitude smaller than voxels in an equivalently accurate intensity volume), the complexity of the search is quite low, . To find the nearest vertices for each contributing particle we can use a mixed strategy: We partition the descriptor volume into a voxel grid and precompute the nearest neighbors for the corners of the voxels, storing the nearest neighbor if that set is unique. In this way, the NN search can be done in constant time for a large majority of all cases. To handle the rare case of particles falling in voxels without a unique neighbor set, we precompute a KDtree for the rather small set of vertices and query the neighbors of the concerned particles. We use the KDtree implementation proposed in [43]. Overall, we find that, in spite of the need to frequently query nearest neighbours, our descriptor can be evaluated efficiently for any realistic particle density, while saving a lot of memory.
We have tested different layouts and have decided for a structure of six layers. The descriptor vertices are depicted in Figure 3 on the right. On each layer, points are positioned on an icosahedron or subdivisions of it. The point and layer arrangement is described in Table 1. In total the descriptor is a vector with 331 dimensions (vertices).
We empirically chose a radius of . Note that this leads to a larger influence area than the interrogation volume of suggested in [17]. However, in [17] a squared area is used for matching whereas our descriptor has a spherical form. Furthermore, as stated above, particles closer to the descriptor center are weighted higher than further away particles.
Finally we splat the intensity value of each particle over neighbors in the descriptor. Vertices are weighted according to their distance to the descriptor center by and subsequent normalization such that . We point out that an interrogation volume of comparable metric size, i.e. voxels, would consist of 4913 voxels to represent the same particle distribution captured in our 331dimensional descriptor.
Distance to  Distance to  

Layer  #Points  Radius  neighbor points  lower layer 
0  1  0     
1  12  1  1.052  1 
2  42  2.205  1.205  1.205 
3  92  3.484  1.278  1.278 
4  92  5.503  2.019  2.019 
5  92  8.693  3.190  3.190 
2.3 Volumetric Flow
Motion estimation starts from two particle sets for consecutive time steps and , with particles located in the cuboid domain . Our goal is to reconstruct the 3D motion field , represented as a functional , which maps points inside the domain at to points in (but not necessarily inside ) at . Just like 2D optical flow, the problem is severely illposed, if the estimation were to be based solely on the data term . As usual in optical flow estimation, we thus incorporate prior information into the model, in the form of a regularizer . The optimal flow field can then be found by solving the energy minimization problem
(13) 
With the help of a convex approximation of the (nonconvex) data term, e.g., a firstorder or convexified secondorder Taylor approximation, we can cast 13 as a saddlepoint problem and minimize it with efficient primaldual algorithms such as [44]. In spite of the high framerate of PIV cameras, particles travel large distances between successive frames. We use the standard technique to handle these large motions and embed the computation in a coarsetofine scheme [12], starting from an approximation of at lower spatial resolution and gradually refining the motion field. Downsampling the spatial resolution smoothes out highresolution detail and increases the radius over which the linear approximation is valid.
We employ an Eulerian representation for our functional , although our sparse descriptor would equally allow for a Lagrangian representation, for instance employing Smoothed Particle Hydrodynamics [45, 46]. Naturally, we decide to evaluate the data term on the same regular grid structure, and the proposed descriptor (Section 2.2) is used as a dropin replacement for the interrogation volume in the classical TomoPIV setup.
Data Term.
Recall that we can apply any distance measures on top of our descriptor . Here, we have tested sum of squared differences and negative normalized crosscorrelation . In previous evaluations for 3DPIV [17], more robust cost functions that are popular in the optical flow literature [47] performed worse than both of the measures above, presumably due to the controlled contrast and lighting. Our (generic) similarity function takes two descriptor vectors as input and outputs a similarity score, which is integrated over the domain to define the data cost:
(14) 
As a baseline we further define the data cost , evaluated within an interrogation window in a highresolution intensity voxel space (in our experiments, generated by MART). In [17] we showed that for highaccuracy motion estimates it is crucial to use a finer discretization for the data term than for the discretized Eulerian flow field . In that case the corresponding data cost takes the form
(15)  
where is a box filter of width and are intensity volumes, defined over the same cuboid domain but sampled at a different discretization level.
Regularization.
In the variational flow literature [11, 12, 14] regularization usually amounts to simply smoothing the flow field. The original smoothness term from [11] is quadratic regularization (QR), later the more robust Total Variation (TV) [48, 13] was often preferred. An investigation in [17]) revealed, however, that TV does not perform as well for 3D fluid motion. Hence, we only provide a formal definition of the QR regularizer:
(16) 
Here we choose .
In this work, we are mainly concerned with incompressible flow, which is divergencefree. Prohibiting or penalizing divergence is thus a natural way to regularize the motion field.
Let us recall that the motion field approximates the displacement of particles over a time interval , reciprocal to the camera frame rate. I.e., , where denotes the velocity field. We now investigate the stationary Stokes equations:
(17) 
where the generalized Laplace operator operates on each component of the velocity field individually. The equation 17 describes an external force field that leads to a deformation of the fluid. The properties of the fluid, incompressibility and viscosity (viscosity constant ), prevent it from simply following . To arrive at an equilibrium of forces, the pressure field has to absorb differences in force and velocity field . Compared to the full NavierStokes model of fluid dynamics, 17 lacks the transport equations and, hence, does not consider inertia. Nevertheless, for our basic flow model with only 2 frames 17 provides a reasonable approximation, because of the short time interval.
We can identify the stationary Stokes equations 17 as the EulerLagrange equations of the following energy:
(18)  
To come back to our timediscretized problem, and to justify the construction of a suitable regularizer for our motion field , we chose the time unit as , without loss of generality. In the saddlepoint problem above, the role of the Lagrange multiplier for the incompressibility condition is taken by the pressure . In our reconstruction task, we do not know the force field, but we do observe its effects, such that the role of the term is filled by the data term. Hence, we can interpret the remaining terms as a regularizer. In particular, Eq. 18 suggests to utilize a quadratic regularizer on the flow field, and to employ the incompressibility condition as a hard constraint:
(19) 
Here, we let denote the indicator function of the convex set . This physically motivated regularization scheme is compatible with our optimization framework, and leads to the best results for our data. Still, without the hard constraint the regularizer 18 would be strongly convex, making it possible to accelerate the optimization [44], similar to Alg. 1. Therefore, we also consider a version where we keep the quadratic regularization of the gradients, but replace the incompressibility constraint with a soft penalty.
(20) 
Larger lead to similar regularization effects as 18, but also require more iterations to reach convergence.
Estimation Algorithm.
To discretize the energy functional 13 we partition the domain into a regular voxel grid . Our objective is to assign a displacement vector to each voxel , so . We briefly review the primaldual approach [44] for problems of the form
(21) 
Here is a linear operator whose definition depends on the form of the regularizer. denotes the discretized version of the data term from 14 or 15, and one of the investigated regularizers from (16, 19, 20). In case of QR, the linear mapping approximates the spatial gradient of the flow in each coordinate direction via finite differences, which leads to . If we base our regularizer also on the divergence of the flow field, either as hard 19 or as soft 20 constraint, we expand with a linear operator for the 3D divergence that is based on finite (backward) differences. We arrive at . In that case contains the dual variables for the incompressibility constraint, respectively penalty; the pressure field 17. For convex we can convert 21 to the saddlepoint form
(22) 
where we use to denote the convex conjugate of . In this form the problem can be solved by iterating the steps of a proximal forwardbackward scheme, alternating between primal and dual variables :
(23)  
The step sizes and are set such that . In the saddlepoint form, data and smoothness terms are decoupled and the updates of the primal and dual variables can be solved pointwise, cf. [44].
The proximal operator of for and is a pixelwise operation that can be applied in parallel :
(24) 
with , except in the case of where we additionally have . If we employ hard incompressibility constraints, elements from remain unchanged and are solely affected by the explicit gradient steps. For further details refer to [44, 47].
To discretize our data terms for the sparse descriptor, we use a simple first order approximation, fitting a linear function in the vicinity of the current solution. This leads to the discretized data cost, expanded at the current solution for the motion vector at voxel i for :
(25)  
This generic treatment allows for a simple implementation and evaluation of different distance metrics. For instance, the negative NCC cost function is not convex and does not possess a closed form solution for the proximal map. Although also a (more accurate) secondorder Taylor expansion could be used to build a convex approximation [14, 47], we opt for a numerical solution. Due to the rather low evaluation time of our descriptor, the additional computation time for a numerical solution is quite small. In our implementation, we approximate the linear function via leastsquares fit to the six values , with being the unit vector in direction and a small . To find a solution for the proximal step one must solve the following quadratic problem per voxel :
(26)  
We also briefly review the proximal step for the SSD data term 15, operating on the intensity voxel volumes . Discretizing 15 leads to the following cost for a voxel :
(27) 
with . A direct convex approximation would be to use firstorder Taylorexpansion around on all the voxels in the interrogation window. This treatment would introduce a significant computational burden, since at each location we would have to compute the gradient for all voxels in the neighborhood. A computationally more efficient procedure is to expand 27 around the current flow estimate for each voxel . After multiplying with we arrive at our convexified data term:
(28)  
With this formulation we only have to evaluate the gradients and volumes once per voxel, namely at the current flow estimate. The proximal map for the SSD at voxel again amounts to solving a small quadratic problem to find the update for :
(29)  
As already mentioned, we embed the optimization of 13 into a coarsetofine scheme and adjust the data term by moving the particles. Depending on the representation this step is often called warping [12]. After warping, the convex approximation of the data term is computed and several iterations of 23 are applied to find an update of the flow estimate. The coarsetofine scheme can easily be implemented by increasing the radii in the descriptor. To perform warping we need to rebuild the KDtree for the first timestep, moving the particles based on the current flow estimate. Their motion is found by bilinear interpolation of the flow at grid locations, which is in accordance with our discretization of the operators and . Note that the chosen optimization scheme [44] is rather generic and allows us to test different regularization and data terms. Apart from allowing for acceleration, the smooth (and strongly convex) softconstraint regularizer 20 removes the need for a second forwardbackward step, and with that the need to store any dual variables at all. E.g., one can employ the fast iterative shrinkagethresholding algorithm (FISTA) [38] to perform the optimization. We point out that the descriptor scheme implies an overall memory footprint about an order of magnitude smaller than any algorithm, which requires to explicitly store the dense voxel volume, e.g., [17].
3 Evaluation
To quantitatively evaluate our approach and to compare different parameter settings we have generated synthetic data using the Johns Hopkins turbulence database (JHTDB) [49, 50]. The database provides a direct numerical simulation of isotropic turbulent flow in incompressible fluids. We can sample particles at arbitrary positions in the volume and track them over time. Additionally, we sample ground truth velocity flow vectors on a regular voxel grid.
Our setup consists of four cameras and particles rendered into a volume of voxels. The cameras are arranged symmetrically with viewing angles of w.r.t. the plane of the volume and w.r.t. the plane. Camera images have a resolution of pixels, where a pixel is approximately the size of a voxel. In this evaluation the maximum magnitude between two time steps is 4.8 voxels.
3D particles are rendered to the image planes as Gaussian blobs with and intensity varying between 0.3 and 1, where the same intensity is used in all four cameras. We test different particle densities from to particles per pixel (ppp). A density of leads to a density of particles per voxel. Exemplary renderings for particle densities of , and are shown in Figure 4.
We compare our method to [17] and test different particle densities and parameter settings. Our main evaluation criterion is the average endpoint error (AEE) of our estimated flow vectors, measured in voxels w.r.t. to the resolution of our test volume (). Standard parameter settings for our coarsetofine scheme are 9 pyramid levels and a pyramid scale factor of between adjacent pyramid levels. Note that larger velocity magnitudes would require more pyramid levels. We chose 10 warps and 20 inner iterations per pyramid level. Unless otherwise specified we use SSD as data cost. We also select our physically based regularizer for our quantitative evaluation. According to [17], quadratic penalizing the gradient vectors of the flow, while enforcing a divergencefree vector field, performed better than other tested regularizers. For IPR we chose an intensity threshold of , inner iterations and 24 triangulation iterations, starting from a triangulation error and increasing it up to . Note that for lower particle densities less iterations would already be sufficient. However, most particles will be triangulated already in earlier iterations, such that the additional runtime demand for more iterations becomes negligible. Flow vectors are estimated on every fourth voxel location per dimension, i.e. a voxel grid, and subsequently upscaled to in order to be compared with the ground truth flow field. As shown in [17] flow estimation on the downsampled grid can be performed with virtually no loss in accuracy, but with the benefit of faster computation and lower memory demands. In Figure 5 we visualize a single xyslice (z=72) of the flow field for both, ground truth and our estimated flow, with standard settings at a particle density of .
In addition to the quantitative evaluation we show qualitative results on test case D of the International PIV Challenge [51].
3.1 3D Reconstruction & Particle Density
We start by comparing different initialization methods for various particle densities. Additional to our implementations of MART and IPR we also provide results for starting the flow reconstruction from the ground truth particle distributions. Following the naming convention of [51], we refer to this version as HACKER. In Table 2 we compare the estimated flow w.r.t. the AEE and furthermore show statistics on the particle reconstruction of time step 0, for both reconstruction techniques, MART and IPR. Reconstructed particles with no ground truth particle within a distance of 2 voxels are considered as ghost particles. As expected, assuming perfect particle reconstruction (HACKER) the flow estimation is still improving with higher particle densities. However, for increasing particle densities the reconstruction gets more ambiguous, as can be seen from the increased number of ghost particles, as well as missing particles for both MART and IPR. The numbers suggest that the current bottleneck of our pipeline can be found in the particle reconstruction. Furthermore, IPR consistently outperforms MART at all particle densities. Here, MART already starts to fail at lower particle densities while our IPR implementation yields results similar to HACKER up to a particle density of .
Particle Density (ppp)  0.075  0.1  0.125  0.15  0.175  0.2  

MART  Avg. endpoint error  0.2768  0.2817  0.2927  0.3119  0.3363  0.3500 
Undetected particles  1412  6283  13918  24141  37215  51190  
3.21%  10.7%  18.96%  27.41%  36.22%  43.59%  
Ghost particles  22852  44752  79356  107276  118476  129791  
51.89%  76.21%  108.11%  121.79%  115.29%  110.52%  
Avg. position error  1.681  1.672  1.663  1.654  1.647  1.638  
IPR  Avg. endpoint error  0.2505  0.2388  0.2250  0.2260  0.2522  0.2950 
Undetected particles  18  24  35  582  14187  33936  
0.04%  0.04%  0.05%  0.66%  13.81%  28.90%  
Ghost particles  4  3  14  4344  107286  176176  
0.01%  0.01%  0.02%  4.93%  104.40%  150.01%  
Avg. position error  0.001  0.001  0.002  0.029  0.188  0.301  
HACKER  Avg. endpoint error  0.2503  0.2382  0.2244  0.2229  0.2140  0.2068 
3.2 Descriptor & Data Cost
We evaluate two different distance metrics for our sparse descriptor: sum of squared distances (SSD) and (negative) normalized crosscorrelation (NCC). In addition to our sphericalshaped descriptor introduced in Section 2.2, we evaluate our approach with a descriptor based on a regular grid structure. A grid point is set at every second voxel position per dimension (i.e. 2,0,2) within a radius of 8, resulting in 257 grid points. Particles within a radius of are considered for the descriptor estimation. Here, we splat particles to regular grid points, and utilize for our standard layer based descriptor (with denser sampling closer to the descriptor center). The impact of these choices on the estimated flow vectors are summarized in Table 3 for various particle densities. Here both distance metrics, SSD and NCC, perform similarly well. Our spherical, layer based descriptor appears to work slightly, but also consistently, better than the descriptor based on the regular grid.
Particle Density (ppp)  0.075  0.1  0.125  0.15  
NCC  Layered spherical descriptor (331, d=21)  0.2492  0.2359  0.2270  0.2286 
Regular spherical grid (257, d=21)  0.2630  0.2498  0.2405  0.2431  
SSD 
Layered spherical descriptor (331, d=21)  0.2505  0.2388  0.2250  0.2260 
Regular spherical grid (257, d=21)  0.2643  0.2436  0.2362  0.2374  

3.3 Comparison to Dense VolumeFlow
We compare our method to [17] where the data term is based on a highresolution voxel grid, as obtained from MART. Additionally to MART reconstruction we also show results for IPR and HACKER, where particles were rendered into the volume by trilinear interpolation to their nearest voxels. For the flow estimation we chose the recommended parameter set of [17]: SSD with IV size , and stepsize . Average endpoint errors are shown in Table 4 for different particle densities. Performance is slightly better than for our sparse descriptor variant, with the drawback that a memory demanding high resolution intensity volume is needed for the inherently sparse data. By adding more grid points to our descriptor (i.e. a grid point at every voxel location) we could effectively rebuild a dense matching approach with our sparse data. However, we opt for a more compact representation that allows faster computation and less memory requirements. Furthermore, our sparse descriptor can be adapted to any kind of similarity measure, where the use of numerical differentiation allows for a simple and quick evaluation. Here, it would be interesting to test also learned similarity functions, which could benefit from our compact representation [52, 53, 54].
Particle Density (ppp)  0.075  0.1  0.125  0.15 

MART  0.2470  0.2717  0.3144  0.3735 
IPR  0.2266  0.2107  0.2001  0.2057 
HACKER  0.2270  0.2112  0.2000  0.1923 

3.4 PIV Challenge Data
Test case D of the International PIV Challenge [51] consists of 50 time steps of a volume of size and a particle density of . Unfortunately no ground truth is provided, however, we can visually compare our results to results stated in the challenge. In Figure 6 we show result of our method for snapshot 23 of the challenge, as well as the ground truth and the result of the topperforming method Shakethebox (StB) [18]. The motion fields appear comparable, flow details are recovered. Note that our method uses only two frames to estimate the motion field, while StB relies on multiple time steps.
4 Conclusion
Our method is based on and extends work of a conference publication [17]. In particular, we adopt their physically motivated regularization scheme and follow the same optimization methodology. We extend [17] with a novel, energy based IPR method that allows for more accurate and robust particle reconstructions. Relying on the quality of the 3D particle reconstruction, these advantages also affect the accuracy of our motion estimation technique, which before was based on the tomographic reconstruction model of MART. To further exploit sparsity of the directly reconstructed particles in the volume, we propose a novel sparse descriptor that can be constructed onthefly and possess a significantly lower memory footprint than a dense intensity volume, which is required in high resolution to deliver motion reconstructions of similar quality.
A current drawback of our model is that the initially reconstructed particles are kept fixed and the mutual dependency of fluid motion and particle reconstruction is not considered. Hence, an interesting path to investigate in the future would be to combine particle reconstruction and flow estimation in a joint energy model.
Acknowledgments
This work was supported by ETH grant 29 141. Christoph Vogel and Thomas Pock acknowledge support from the ERC starting grant 640156, ’HOMOVIS’. We thank Markus Holzner for discussions and help with the evaluation.
References
 [1] K. Ohmi and H.Y. Li, “Particletracking velocimetry with new algorithms,” Measurement Science and Technology, vol. 11, no. 6, p. 603, 2000.
 [2] R. Adrian and J. Westerweel, Particle Image Velocimetry. Cambridge University Press, 2011.
 [3] M. Raffel, C. E. Willert, S. Wereley, and J. Kompenhans, Particle image velocimetry: a practical guide. Springer, 2013.
 [4] G. E. Elsinga, F. Scarano, B. Wieneke, and B. W. Oudheusden, “Tomographic particle image velocimetry,” Experiments in Fluids, vol. 41, no. 6, 2006.
 [5] C. Atkinson and J. Soria, “An efficient simultaneous reconstruction technique for tomographic particle image velocimetry,” Experiments in Fluids, vol. 47, no. 4, 2009.
 [6] S. Discetti and T. Astarita, “Fast 3D PIV with direct sparse crosscorrelations,” Experiments in Fluids, vol. 53, no. 5, 2012.
 [7] F. Champagnat, P. Cornic, A. Cheminet, B. Leclaire, G. L. Besnerais, and A. Plyer, “Tomographic piv: particles versus blobs,” Measurement Science and Technology, vol. 25, no. 8, p. 084002, 2014.
 [8] F. Champagnat, A. Plyer, G. Le Besnerais, B. Leclaire, S. Davoust, and Y. Le Sant, “Fast and accurate PIV computation using highly parallel iterative correlation maximization,” Experiments in Fluids, vol. 50, no. 4, p. 1169, 2011.
 [9] R. Yegavian, B. Leclaire, F. Champagnat, C. Illoul, and G. Losfeld, “Lucaskanade fluid trajectories for timeresolved piv,” Measurement Science and Technology, vol. 27, no. 8, p. 084004, 2016.

[10]
B. D. Lucas and T. Kanade, “An iterative image registration technique with an
application to stereo vision,” in
Proceedings of the 7th International Joint Conference on Artificial Intelligence  Volume 2
, IJCAI’81, (San Francisco, CA, USA), pp. 674–679, Morgan Kaufmann Publishers Inc., 1981.  [11] B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artificial Intelligence, vol. 17, 1981.
 [12] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow estimation based on a theory for warping,” in ECCV 2004, pp. 25–36, Springer.
 [13] C. Zach, T. Pock, and H. Bischof, “A duality based approach for realtime TVL1 optical flow,” DAGM 2007.
 [14] M. Werlberger, T. Pock, and H. Bischof, “Motion estimation with nonlocal total variation regularization,” in CVPR 2010, pp. 2464–2471, .
 [15] D. Heitz, E. Mémin, and C. Schnörr, “Variational fluid flow measurements from image sequences: synopsis and perspectives,” Experiments in Fluids, vol. 48, no. 3, 2010.
 [16] L. Alvarez, C. Castaño, M. García, K. Krissian, L. Mazorra, A. Salgado, and J. Sánchez, “A new energybased method for 3d motion estimation of incompressible PIV flows,” Computer Vision and Image Understanding, vol. 113, no. 7, 2009.
 [17] K. Lasinger, C. Vogel, and K. Schindler, “Volumetric flow estimation for incompressible fluids using the stationary stokes equations,” ICCV 2017.
 [18] D. Schanz, S. Gesemann, and A. Schröder, “ShakeTheBox: Lagrangian particle tracking at high particle image densities,” Experiments in Fluids, vol. 57, no. 5, 2016.
 [19] S. Gesemann, F. Huhn, D. Schanz, and A. Schröder, “From noisy particle tracks to velocity, acceleration and pressure fields using Bsplines and penalties,” Int’l Symp. on Applications of Laser Techniques to Fluid Mechanics, 2016.
 [20] J. F. Schneiders and F. Scarano, “Dense velocity reconstruction from tomographic PTV with material derivatives,” Experiments in Fluids, vol. 57, no. 9, 2016.
 [21] D. G. Lowe, “Distinctive image features from scaleinvariant keypoints,” IJCV, vol. 60, no. 2, pp. 91–110, 2004.
 [22] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in CVPR 2005, vol. 1, pp. 886–893, IEEE.
 [23] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” PAMI, vol. 27, pp. 1615–1630, Oct. 2005.
 [24] R. Zabih and J. Woodfill, “Nonparametric local transforms for computing visual correspondence,” in Proceedings of the Third European Conference on Computer Vision (Vol. II), ECCV 1994, (Secaucus, NJ, USA), pp. 151–158, SpringerVerlag New York, Inc.
 [25] C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 5, pp. 978–994, 2011.
 [26] A. Schröder, R. Geisler, K. Staack, G. E. Elsinga, F. Scarano, B. Wieneke, A. Henning, C. Poelma, and J. Westerweel, “Eulerian and lagrangian views of a turbulent boundary layer flow using timeresolved tomographic piv,” Experiments in Fluids, vol. 50, pp. 1071–1091, Apr 2011.
 [27] B. Wieneke, “Iterative reconstruction of volumetric particle distribution,” Measurement Science & Technology, vol. 24, no. 2, 2013.
 [28] B. Wieneke, “Volume selfcalibration for 3d particle image velocimetry,” Experiments in fluids, vol. 45, no. 4, pp. 549–556, 2008.
 [29] D. Schanz, S. Gesemann, A. Schröder, B. Wieneke, and M. Novara, “Nonuniform optical transfer functions in particle imaging: calibration and application to tomographic reconstruction,” Measurement Science & Technology, vol. 24, no. 2, 2012.
 [30] R. Gordon, R. Bender, and G. T. Herman, “Algebraic reconstruction techniques (ART) for threedimensional electron microscopy and xray photography,” Journal of Theoretical Biology, vol. 29, pp. 471–481, 1970.
 [31] S. Discetti, A. Natale, and T. Astarita, “Spatial filtering improved tomographic PIV,” Experiments in Fluids, vol. 54, no. 4, 2013.
 [32] S. Petra, A. Schröder, and C. Schnörr, “3D tomography from few projections in experimental fluid dynamics,” in Imaging Measurement Methods for Flow Analysis: Results of the DFG Priority Programme 1147, 2009.
 [33] J. Bolte, S. Sabach, and M. Teboulle, “Proximal alternating linearized minimization for nonconvex and nonsmooth problems,” Mathematical Programming, vol. 146, no. 1, pp. 459–494, 2014.
 [34] T. Pock and S. Sabach, “Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems,” SIAM J Imaging Sci, vol. 9, no. 4, pp. 1756–1787, 2016.
 [35] H. Attouch, J. Bolte, and B. F. Svaiter, “Convergence of descent methods for semialgebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods,” Mathematical Programming, vol. 137, no. 1, pp. 91–129, 2013.
 [36] J. Bolte, A. Daniilidis, and A. Lewis, “The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems,” SIAM J Optimiz, vol. 17, no. 4, pp. 1205–1223, 2007.
 [37] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. PrenticeHall, 1989.
 [38] A. Beck and M. Teboulle, “A fast iterative shrinkagethresholding algorithm for linear inverse problems,” SIAM J Imaging Sci, vol. 2, no. 1, pp. 183–202, 2009.
 [39] A. Frome, D. Huber, R. Kolluri, T. Bülow, and J. Malik, “Recognizing objects in range data using regional point descriptors,” Computer visionECCV 2004, pp. 224–237, 2004.
 [40] P. Scovanner, S. Ali, and M. Shah, “A 3dimensional sift descriptor and its application to action recognition,” in Proceedings of the 15th ACM International Conference on Multimedia, MM ’07, (New York, NY, USA), pp. 357–360, ACM, 2007.
 [41] Y. H. Le, U. Kurkure, and I. A. Kakadiaris, “3d dense local point descriptors for mouse brain gene expression images,” Computerized Medical Imaging and Graphics, vol. 38, no. 5, pp. 326 – 336, 2014.
 [42] J. H. Friedman, J. L. Bentley, and R. A. Finkel, “An algorithm for finding best matches in logarithmic expected time,” ACM Trans. Math. Softw., vol. 3, pp. 209–226, Sept. 1977.

[43]
M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,”
PAMI, vol. 36, 2014.  [44] A. Chambolle and T. Pock, “A firstorder primaldual algorithm for convex problems with applications to imaging,” JMIV, vol. 40, no. 1, 2011.
 [45] J. J. Monaghan, “Smoothed particle hydrodynamics,” Reports on Progress in Physics, vol. 68, no. 8, p. 1703, 2005.
 [46] B. Adams, M. Pauly, R. Keiser, and L. J. Guibas, “Adaptively sampled particle fluids,” ACM SIGGRAPH 2007.
 [47] C. Vogel, S. Roth, and K. Schindler, “An evaluation of data costs for optical flow,” GCPR 2013.
 [48] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” PhysḊ, vol. 60, no. 14, 1992.
 [49] Y. Li, E. Perlman, M. Wan, Y. Yang, C. Meneveau, R. Burns, S. Chen, A. Szalay, and G. Eyink, “A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence,” Journal of Turbulence, vol. 9, 2008.
 [50] E. Perlman, R. Burns, Y. Li, and C. Meneveau, “Data exploration of turbulence simulations using a database cluster,” in ACM/IEEE Conf. on Supercomputing, 2007.
 [51] C. J. Kähler, T. Astarita, P. P. Vlachos, J. Sakakibara, R. Hain, S. Discetti, R. La Foy, and C. Cierpka, “Main results of the 4th International PIV Challenge,” Experiments in Fluids, vol. 57, no. 6, 2016.
 [52] K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, “LIFT: Learned Invariant Feature Transform,” in Proceedings of the European Conference on Computer Vision, 2016.

[53]
J. Zbontar and Y. LeCun, “Stereo matching by training a convolutional neural network to compare image patches,”
Journal of Machine Learning Research
, vol. 17, no. 132, p. 2, 2016. 
[54]
M. Khoury, Q.Y. Zhou, and V. Koltun, “Learning compact geometric features,”
in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 153–161, 2017.
Comments
There are no comments yet.