1 Introduction
Numerical flow visualization plays a critical role in enabling scientists to understand fluid phenomena and improve computational fluid dynamics models. Although simulations typically produce timevarying vector fields, analysis and visualization are often limited to single time slices due to I/O constraints and memory requirements. Performing accurate timevarying flow visualization using traditional methods requires a high temporal resolution of the vector field data. A potential solution to perform accurate timevarying flow visualization is to consider a Lagrangian representation of the vector field. Lagrangian representations have been demonstrated to offer strong accuracystorage propositions compared to traditional techniques ([agranovsky2014improved, sane2021investigating]). The approach involves two phases: in situ and post hoc. Lagrangian representations are extracted from computational simulations using in situ processing and explored during post hoc analysis. In this paper, we study the use of deep learning methods to perform post hoc exploration of timevarying vector fields using reduced Lagrangian representations computed in situ as training data.
In recent years, the scientific visualization community has seen an increased adoption of deep learning ([leventhal2019pave, weiss2019volumetric, berger2018generative, hong2019dnn, he2019insitunet, han2019tsr, han2020v2v, engel2020deep]), including multiple research projects that consider vector field data ([han2018flownet, han2019flow, Jakob2020, sahoo2021integration, guo2020ssr, kim2019deep, liu2019cnn]
). With respect to exploratory Lagrangianbased particle advection schemes, the use of deep learning has not previously been studied to the best of our knowledge. Prior strategies have relied on constructing search structures over the data to identify sets of precomputed particles trajectories that can be interpolated across intervals of time. Search structures such as kd trees and Delaunay triangulations can be computationally expensive to compute for each interval and memory intensive for large data sets (
[hlawatsch2011hierarchical, chandler2015interpolation, sane2019interpolation]). Our study shows that, by leveraging deep learning, we can limit the memory footprint of the extracted data. Importantly, once the model is trained, it provides quick inference of new particle trajectories during post hoc analysis and exploration.Overall, we contribute the first deep neural networkbased method to encode Lagrangian flow maps and enable exploratory particle tracing in timevarying flow fields. Our study demonstrates the performance of the method across varying hyperparameter settings as well as multiple Lagrangian representation configurations. Our trained model requires a fixedmemory footprint of 10.5 MB, potentially offering a potentially significant data reduction for highresolution flow maps and alleviating I/O costs during exploration. Further, the trained model can infer new trajectories accurately and at rates supporting interactive exploration. Lastly, we consider a widely studied analytical data set, the Double Gyre, as well as, a second vector field targeted to machine learning applications to demonstrate our approach.
2 Related Work
This section provides background on Lagrangian analysis, the use of reduced Lagrangian representations, and the use of machine learning for flow visualization tasks.
2.1 Lagragian Analysis
Lagrangian analysis is a powerful tool, widely adopted by the ocean modeling community ([VANSEBILLE201849]), to explore timevarying vector fields generated by simulations. In response to growing data set sizes, reduced Lagrangian representations have been increasingly researched as a solution to enable timevarying vector field exploration across various application domains. Reduced Lagrangian representations are computed using in situ processing and explored during post hoc analysis. By utilizing in situ processing, Lagrangian representations are computed using the complete spatial and temporal resolution of the simulation data. Studies have demonstrated reduced Lagrangian representations offer strong accuracystorage propositions for exploration in temporally sparse settings ([agranovsky2014improved, rapp2019void, sane2021investigating]
) as well as directly support feature extraction (
[froyland2015rough, schlueter2017coherent, hadjighasem2017critical, froyland2018robust, Jakob2020]). Additionally, previous research has demonstrated the traditional Eulerian paradigm performs poorly in underresolved temporal settings ([costa2004lagrangian, Qin2014, agranovsky2014improved, sane2018revisiting, rockwood2019practical, sane2021investigating]).In the Lagrangian specification of a timevarying vector field, information is encoded using particle trajectories. Thus, the Lagrangian representation consists of a collection of particle trajectories spanning the spatial domain and can be defined as a flow map. The flow map describes to where a massless particle starting at position and time moves in the time interval ([garth2007efficient]).
Research related to reduced Lagrangian representations that enable timevarying vector fields has advanced along multiple axes. These include in situ sampling techniques ([agranovsky2014improved, rapp2019void, sane2019interpolation, sane2021scalable]), post hoc reconstruction strategies ([hlawatsch2011hierarchical, agranovsky2015multi, bujack2015lagrangian, chandler2015interpolation]), theoretical and empirical error analysis ([chandler2016analysis, hummel2016error, sane2018revisiting]), feature extraction ([froyland2015rough, schlueter2017coherent, hadjighasem2017critical, froyland2018robust, Jakob2020]), and application to various domains ([envirvis.20171099, siegfried2019tropical, sane2021investigating]
). In this paper, we study the use of deep learning to perform post hoc reconstruction. Specifically, we propose and evaluate the use of MultiLayer Perceptrons (MLPs) to learn the timevarying vector field behavior from previously computed particle trajectories. With deep learning, a model can be trained once and then be interactively queried at the time of exploration without the significant memory requirements of prior approaches. Our study focuses on the impact of various hyperparameters and extraction configurations on the efficacy of post hoc reconstruction as well as the overall computational cost.
2.2 Flow Visualization Using Machine Learning
In recent years, machine learning techniques have been increasingly researched by the fluid dynamics community ([brunton2020machine]). Similarly, with respect to scientific visualization, specifically, flow visualization, the use of machine learning to perform several tasks has increased. For example, it has been widely used to detect flow field features such as eddies and vortices ([lguensat2018eddynet, yi2018cnn, strofer2018data, bai2019streampath, duo2019oceanic, liu2019cnn, deng2019cnn, wang2021rapid]). [kim2019robust]
utilized the convolutional neural networks (CNNs) to extract a robust frame of reference for unsteady twodimensional (2D) vector fields.
[hong2018access]used the long shortterm memory (LSTM) to improve data access patterns for improved computational performance during distributed memory particle advection.
[li2015extracting]employed the support vector machine (SVM) to segment streamlines based on useridentified features. For the widely studied task of selecting a representative set of particle trajectories (
[sane2020survey]), recent stateoftheart techniques by [han2018flownet] and [lee2021deep] have used deeplearningbased clustering approaches. Further, modern techniques to reconstruct steady state vector fields using a set of streamlines employ machine learning ([han2019flow, sahoo2021integration]).[Jakob2020] upsampled 2D finitetime lyapunov exponent (FTLE) scalar fields derived from Lagrangian flow maps using an efficient subpixel convolutional neural network (ESPCN) by [shi2016real] and SRCNN by [dong2015image]. In our study, we use the Lagrangian representations of 2D timevarying vector fields as data to train neural networks built with MLPs. We then infer new particle trajectories from the model to support the exploration use case. Our study shows that the application of deep learning to particle tracing can offer the significant benefits of reduced memory requirement and accurate trajectory inference.
3 Lagrangian Analysis using Deep Learning
We designed our network to learn the flow behavior encoded by the Lagrangian representation of the timevarying vector field. Figure 1(a) shows the workflow of in situ training data generation process, network training process, and the post hoc inference process. In the in situ extraction phase, Lagrangian flow maps are computed by advecting particles using the full spatial and temporal resolution of the timevarying vector field. We considered two approaches to extract flow maps,

: extract a single flow map consisting of long particle trajectories with a uniform temporal sampling of each integral curve.

: extract multiple short flow maps with each flow map consisting of a set of seed locations and a set of end locations for each seed, where each end location in a set corresponds to the displacement from the seed location over nonoverlapping intervals of time.
In our paper, we follow the notation used by [agranovsky2014improved]. We refer to the cycles where the end location is saved out as file cycles.
To begin the post hoc analysis phase, the network fetches flow maps from the database, preprocesses them, and loads data as training samples (Section 3.1). The network architecture is built with MLP that are a series of fully connected layers (Section 3.2
). The loss function is set to the L1 loss, which is calculated as the error between the target end location and the predicted end location. During the training process, the model takes two parameters, particle start locations and queried file cycles as inputs, and outputs the corresponding end locations. Weights of the model are updated by backpropagation of the loss to find the optimized weights (Section
3.3). Finally, new trajectories can be infered from the trained model (Section 3.4).3.1 Training Data Generation
We stored extracted Lagrangian flow maps in the form of training data for the model. We considered two strategies to sample the timevarying vector field. The first strategy, , involves computing long trajectories with uniform sampling along the curve. Reconstruction of new trajectories using long precomputed trajectories is more accurate when the propagation of error is eliminated after every interpolation step ([hummel2016error, sane2019interpolation]). However, the quality of domain coverage may be reduced as the integration time increases and due to divergence in the flow field ([chandler2016analysis]). The second strategy, , involves computing sets of short trajectories with only the start and end location after nonoverlapping intervals of time stored. Although such an approach offers improved domain coverage ([agranovsky2014improved]), the particle trajectory reconstruction may be less accurate due to error propagation ([bujack2015lagrangian]).
For both approaches, the first step is placing sample seeds in the domain. In this paper, we denote the number of seeds by . To understand the impact of the seed placement strategy on the model inference performance, we studied three strategies: (1) seeding along a uniform grid (), (2) seeding using a pseudorandom number sequence (), and (3) seeding using a Sobol quasirandom sequence (). Specifically, we considered reconstruction accuracy near features of interest and boundaries. Although placing uniform seeds can provide good domain coverage and fast interpolation during post hoc analysis, it does not optimize information per byte stored. Thus, in many practical cases, the Lagrangian representation can be unstructured and would typically incur a higher interpolation cost during post hoc analysis. By considering and seeding, we were able to demonstrate the fast inference of new trajectories from unstructured Lagrangian flow maps. We compare these three seeding choices in Section 4.2.1.
After seeds are placed, particle trajectories are computed by displacing particles from time to , where indicates an advancement by one simulation time step. Following the notation in [agranovsky2014improved], we refer to one simulation advancement as a cycle, the cycle on which the simulation saves data as a file cycle, and the number of cycles between file cycles as the interval in the following sections. Given a total temporal duration , the total number of file cycles can be calculated by
(1) 
where represents the file cycle interval. Thus, the list of file cycles is . To generate flow maps, seeds are placed once at the beginning at time and traced until , i.e., the entire temporal duration. Intermediate locations are recorded along each trajectory at every file cycle. To generate flow maps, particle tracing starts at time , and terminates at time . Then, the location at is saved and seeds are reset for the tracing until the next file cycle. This process is repeated until the last file cycle.
The training data sets are saved in the NPY file format for efficient loading in Python. We created a threedimentional (3D) array, with dimensions of , for saving start seed locations and corresponding end locations at various file cycles. When loading the data sets, the data are organized into training samples, as shown in Equation 2. One training sample contains start location (where ), the queried file cycle (where ), and the end location at the queried file cycle (where and ). The start location and the queried file cycle are inputs to the network. The end locations are used for calculating the loss (Equation 3). In addition to training data, we generated validation data by using seeds (10% of training samples) and following the same process.
(2)  
3.2 Network Architecture
The network architecture, shown in Figure 1(b), consists of a latent encoder and a latent decoder . The latent encoder and decoder are built with MLP, a series of fully connected layers. The latent encoder takes a particle’s start location , and a queried file cycle as inputs. These two parameters are fed into two sequences of fully connected layers of size (64, 128, 256, 512) and (16, 32, 64, 128, 256, 512) separately. The two outputs are then concatenated together as a latent vector. Next, the latent decoder that is also a series of fully connected layers of size (512, 256, 128, 64) is followed by the latent vector being mapped to end location at the queried file cycle. We added layer normalization ([ba2016layer]
) after each fully connected layer except output layers to stabilize the training process. Moreover, we used the rectified linear unit (ReLU) (
[nair2010rectified]) as the activation function for each output from the fully connected layer.
3.3 Training Process
We implemented our neural network using Pytorch (
[NEURIPS2019_9015]). The training process, shown in Algorithm 1, aims to find the optimized weights of the network. The weights are initialized by Pytorch. We created a custom Pytorch Dataset class to load and store all training samples. We then loaded the Pytorch Dataset object into a Pytorch DataLoader for iterating through the training samples. At the beginning of each epoch, the training samples are shuffled and split into batches. Given a batch of training samples, the forward process computes the output following the network architecture and computes the loss as defined by the loss function. The backpropagation process is done automatically using Pytorch by calling loss.backward(), and the weights are updated by the optimizer. For our experiments, we trained the network for 100 epochs using the Adam optimizer ([kingma2014adam]) with the hyperparameters of , , and . Further, in our training process, we set the initial learning rate to and used a learning rate scheduler ([ReduceLROnPlateau]), provided by Pytorch to reduce the current learning rate by a factor of 2 if the validation loss had not decreased for five epochs. We applied L1 loss as loss functions in our method. L1 loss calculates the mean absolute error between target and predicted end locations by the network (Equation 3).(3) 
3.4 Inference Process
Besides varying generation processes for and , the inference process when using the model trained by data from these two approaches also varies. When using , interpolations are performed by always considering the new seed start location at . The end location inferred by the model results from the provided start location and the queried file cycle. In contrast, when using , new particle trajectories are “stitched” together by advancing the new seed across intervals. Here, the inference is performed by considering the location of the seed particle at the previous file cycle and the target file cycle. Since every inference except the first uses previously inferred results, errors might propagate along new trajectories when using ([hummel2016error, sane2019interpolation]). We refer to the absolute error introduced by the model for any single inference as local error and to the error accumulated along particle trajectories that are “stitched” together as global error. Similar to other Lagrangianbased advection schemes, our inference process currently is limited to interpolating the locations along a particle trajectory at file cycles, and in the case of , it is limited to particles starting at .
To measure the accuracy of new particle trajectories inferred by the model, we used a robust and accurate metric called the adaptive edit distance on real sequences (AEDR) proposed by [ren2020uncertainty] to measure pathline uncertainty. The metric uses the L1 norm divided by a threshold distance to quantify the local error of each interpolated location, accumulates error along the trajectory, and produces an average across all the interpolated locations. The use of a threshold distance and maximum error at any particular sample results in an AEDR error value between 0 and 1. A value close to 0 indicates the particle trajectories are similar, whereas a value close to 1 indicates the particle trajectories are dissimilar.
4 Results
In this section, we first describe the data set used for our experiments (Section 4.1). Next, we present an evaluation of sampling strategies and hyperparameters (learning rate, batch size) used during training data generation (Section 4.2), followed by a report of the performance of our proposed network for training and inferences (Section 4.3). Finally, to evaluate the accuracy of the model across Lagrangian flow map extraction parameter settings, we quantitatively and qualitatively evaluate the impact of varying the number of seeds (Section 4.4) and file cycle intervals (Section 4.5).
4.1 Data Set
We conducted our study by considering a standard benchmark data set frequently used to study fluid dynamics, and in particular, flow visualization tools and techniques: the 2D unsteady Double Gyre [Shadden05]. The model of the unsteady Double Gyre flow field is widely studied for the computation of hyperbolic Lagrangian coherent structures (LCS) in flow data. For all the training data generated, we considered a total temporal duration of with . The Double Gyre flow field is defined by equation 4 within the spatial domain .
(4)  
Our training data generation process used the analytical solution (Equation 4) for particle advection during Lagrangian flow map computation. We show the velocity field at time 0 (Figure 2(a)) and the FTLE (Figure 2(b)) of the Double Gyre data set. The ridges of the FTLE scalar field are used to approximate Lagrangian Coherent Structures in the flow. We extended the 2D Double Gyre data sets to 3D by adding the same zaxis to every seed. The size of training data sets increases linearly with a larger number of seeds and shorter intervals. In our experiments, the minimum and maximum sizes of the reduced Lagrangian representation training data were MB and MB, respectively. We did not observe significant improvements of accuracy from using more training data for this data set. We generated all the training data sets using a desktop equipped with an Intel(R) Xeon(R) W3275M CPU ( cores; GB memory) and one NVIDIA Titan RTX GPU. We computed the particle trajectories of the Lagrangian flow maps in parallel using the TBB library ([Advanced_HPC_Threading]).
4.2 Evaluation of Seeding Strategy and Hyperparameters Settings
Our model was implemented using the Pytorch library [NEURIPS2019_9015] and trained on dual RTX 3090s GPUs. We considered two methods of extracting training data sets (Section 3.1): and . We studied the impact of seeding strategy as well as the learning rate and batch size for each flow map extraction approach.
4.2.1 Seeding Strategy
To generate training data, we evaluated three seed placement strategies: (1) seeding along a uniform grid (), (2) seeding using a pseudorandom number sequence (), and (3) seeding using a Sobol quasirandom sequence (). For this experiment, we sampled the timevarying Double Gyre vector field domain using 2,000 seeds and a fixed file cycle interval of 30. All models were trained with a batch size of 200 and a learning rate of . For the uniform sampling experiment, we used a grid. Further, besides applying these three seed placement strategies to generate training data sets, we also considered the strategies for testing seeds. Figure 3 presents error maps produced by various combinations of seed placement strategies for training and testing data, as well as outcomes considering two flow map extractions strategies. Comparing error maps evaluated by using for sampling timevarying vector field (Figure 3(a)), we found that the Sobol quasirandom sequence () was slightly better than the pseudorandom number sequence (). They both produced more accurate results for the testing seeds that were not on the boundary. The uniform seeding () was more accurate only when the testing seeds were also uniform. Moreover, the Sobol quasirandom sequence () performed better than the pseudorandom number sequence () when sampling the timevarying vector field using , and they were both better than the uniform seeding () (Figure 3(b)) except for seeds on the boundary. We chose the Sobol quasirandom sequence () as the seeding strategy in all our following experiments. Further work is required to identify sampling strategies that optimize quality of the training data.
4.2.2 Learning Rate and Batch Size
The learning rate is a critical hyperparameter for a deep neural network. We examined four settings of the learning rate: , , , and for and . For all experiments, the training data sets were generated with 5,000 seeds and a file cycle interval of 30 using the seed placement method with the Double Gyre data set. The batch size was set to 200. The learning rate of resulted in the model failing to converge; therefore, we did not use it for comparison. We found the learning rates of and were better for our model when the training data sets were generated using the flow map extraction strategy (Figure 4(a)). The learning rates of , , and resulted in a similar loss when the model was trained using data sets generated using the approach (Figure 4(b)).
is the learning rate. Top 1% of errors in each experiment are treated as outliers and have been removed for analysis. A batch size of 200 with the learning rates of
and are optimal for training data sets with 5,000 seeds and 10,000 seeds, respectively, using the approach. A batch size of 300 with the learning rate is optimal for the approach.To identify the optimal combination of batch size with the learning rates of and , we conducted a set of experiments. Our experiments considered three options for batch size, two options for total number of training samples, and both flow map extraction strategies ( and ). Figure 5) presents violin plots of the AEDR error for reconstructed trajectories. Although we found the choice of learning rate and flow map extraction strategy could significantly impact accuracy, varying the batch size did not result in a significant change of accuracy for a fixed learning rate and flow extraction strategy.
4.3 Network Training and Inference
Table 1 reports time spent training the model, memory consumption for saving the trained model, and the inference time to generate new trajectories with the trained model. As expected, the training time increased linearly with the number of training samples for both approaches. The storage cost for saving the trained model, irrespective of the data set or number of training samples, was fixed. Based on the network’s parameters, the trained models required the same memory size of 10.5 MB. We expect the model can be trained using data from more complex, turbulent, and 3D flow fields. However, verification as well as understanding impacts flow field complexity on network training and performance requires a future indepth investigation. That said, considering the network’s parameters are independent of the complexity of the flow field, we expect our method to scale and be used to reduce the memory footprint of largescale highresolution Lagrangian representations of timevarying vector fields. An important consequence of a small memory footprint is the reduced cost of two seconds to load the entire model, thus alleviating the system from expensive I/O for loading data during exploratory visualization. Further, our results show parallel inference of 2,000 trajectory with 20 locations interpolated to approximate each curve costs 0.38s using the same machine as for generating training data sets.
4.4 Impact of Number of Seeds
We evaluated the impact of the number of seeds on the performance of our model qualitatively and quantitatively. We used a fixed file cycle interval of 30 for all training data discussed in this section. We created training data sets with four options for number of seeds, 5,000, 10,000, 15,000, and 20,000, for the and approaches. To evaluate the accuracy of the reconstruction, 2000 random particles were seeded in the domain. To avoid extrapolation errors due to our use of the seeding strategy for training data generation (Section 3.1), we used a boundary offset of to prevent test seeds from being placed exactly on the boundary.
In Figure 6, we report the error map as well as the FTLE derived from using various configurations for training data generation. The result highlighted the relation of the trained model’s performance and flow features in the domain. The error for each trajectory was measured using the AEDR metric proposed by [ren2020uncertainty]. We observed reconstruction errors were higher in regions with greater separation in the flow field, i.e., regions with higher FTLE values. Moreover, for both and , the error maps confirmed that increasing the number of seeds could increase the inference accuracy. In addition, we visualized the distribution of AEDR errors for the modelgenerated results in comparison to the ground truth (Figure 7). We observed a decreasing median error as the number of seeds used to sample the domain increased. However, the reduction in error was less after 10,000 seeds. Further, the models trained with data sets showed greater global error due to local error propagation during reconstruction of new trajectories. In the derived FTLE fields in Figure 2(b), although the FTLE ridges are visible in all reconstructions, the can support accurate reconstruction of the entire field, whereas the reconstructions produce minor artifacts in regions of low separation.
Finally, to assess the inference results qualitatively, Figure 8 shows the modelgenerated trajectories and the ground truth Double Gyre trajectories by varying number of training seeds. The reconstructed results were almost identical to the ground truth for all new trajectories when 10,000 or more seeds were used for training. When 5,000 seeds were used for training, the demonstrated lower reconstruction accuracy as interpolation error propagates and accumulates. In contrast, the followed the ground truth closely. Here, each location along the trajectory was interpolated directly from the starting seed location. For the , even training data generated using 5,000 seeds was sufficient to maintain accuracy.
4.5 Impact of File Cycle Interval
To understand the performance of our model with varying file cycle intervals, we evaluated four intervals, 10, 20, 50, and 100, in our experiments. We considered a total of 1000 cycles of the Double Gyre data set. Further, we used a fixed number of 10,000 seeds to generate the training data sets.
In Figure 9, we report the error maps as well as the FTLE derived from using various configurations for training data generation. The was not impacted by file cycle interval since each interpolation was independent of prior locations stored along the trajectory. Reconstruction of new trajectories using the model trained by the data involved an interpolation process where each location along the trajectory was dependent on the previous location. Thus, we observed a higher reconstruction error when the interval was short, and more intervals need to be spanned to construct a trajectory over the entire temporal duration. For example, for training data generated by the using an interval of 10, we saw the reconstruction error was higher for particles originating near FTLE ridges. These findings are consistent with the error analysis of Lagrangianbased particle tracing systems ([chandler2016analysis]). Similar to prior experiments, in Figure 2(b), we observed the derived FTLE scalar fields are accurate for the , but contained some artifacts for the . Here, as expected, the shows fewer artifacts when using a longer file cycle interval.
Considering the violin plots in Figure 10, we obsersed varying reconstruction accuracy patterns. The accuracy did not change significantly with the file cycle interval. The local error of the was low for short intervals, but increased as the interval length increased due to greater divergence between neighboring trajectories over longer integration times. The global error of the represented the accuracy of particle trajectories that are “stitched”. We found the global error was the highest when the file cycle interval was short given a greater number of “stitching” events were involved. As the file cycle interval increased, although the accuracy of every individual interpolation (local error) was higher, the global error decreased due to fewer total advections steps. Again, these findings are consistent with prior work by [chandler2016analysis] and [sane2019interpolation]. Additionally, we present the average error across all particles over time for the and approaches in Figure 11. The line curves provide strong evidence of local error propagation and accumulation for tests using training data.
For a qualitative assessment of the impact of the file cycle interval, we present reconstructed pathlines alongside the ground truth in Figure 12. We used piecewise linear interpolation to connect every interpolated location along the new trajectories. Although the demonstrated a small deviation from the ground truth when short file cycle intervals were used, the overall accuracy of reconstructed trajectories was high with interpolated results closely overlapping the ground truth.
4.6 Application to Fluid Dynamics Machine Learning Data Set
We applied our method to an ensemble member (#200) of the twodimensional fluid dynamics machine learning data set generated using the Gerris flow solver ([Jakob2020]). The resolution of the original data set is . To generate the training data set, we placed seeds in the domain, set the file cycle interval to 10, and traced flow maps over the first 100 cycles. For particle advection, we used the VTKm ([moreland2016vtk]) library and a fourthorder RungeKutta (RK4) advection kernal. The median error of using our method after 100 cycles and 10 interpolation steps is approximately two times the grid cell size. Our method cost 0.6 seconds for reconstructing 2000 particle trajectories using parallel inferences with OpenMP ([dagum1998openmp]). When considering the storage requirements, the subset of the original data size we consider is approximately 209MB. Since our model has a fixed memory requirement, once trained, the storage costs are still fixed at 10.5 MB. To qualitatively evaluate the reconstructed data, we visualize pathlines inferred by the trained model in comparison with the ground truth in Figure 13. In future works, we aim to study how to improve inteprolation accuracy as well as determine an appropriate number of samples to be computed using in situ processing.
5 Future Work and Conclusion
Exploratory flow visualization for largescale timevarying vector field data is challenging. In this paper, we introduced a deep neural networkbased approach using Lagrangian represesntations to enable exploratory analysis. Our study demonstrated our model can be trained using Lagrangian representations extracted from a 2D timevarying vector field. Specifically, we used the widely studied unsteady Double Gyre analytical flow data set and one fluid dynamics machine learning data set to demonstrate our method. We contributed the first assessment of applying deep learning to various forms of Lagrangian representations and evaluated the efficacy of exploratory analysis. A benefit of using our method is the fixed memory required by a model and fast inference of unstructured spatiotemporal data. Our trained model requires only 10.5 MB, and consequently, time spent on I/O to load the model during post hoc analysis is negligible. Further, we are able to infer the pathlines of thousands of particles at interactive rates. With respect to reconstruction interpolation error, we found inference errors are small and follow predictable patterns consistent with results from prior works. Predictable and consistent error patterns enable effective future navigation of strategies to reduce reconstruction interpolation error when using machine learning. Overall, our study demonstrates the benefits of leveraging deep learning for exploratory flow visualization of timevarying vector field data.
An important direction for future work is investigating model performance for more complex or turbulent flows as well as largescale threedimensional flow fields. With the objectives of improving spatial and temporal interpolation accuracy and reducing model training time, various forms of training data to train a model or different network architectures could be considered. For example, concatenating sets of
trajectories to limit instances of error propagation while simultaneously accounting for reduced interpolation error due to stretching or divergence in the flow. Lastly, an opensource interactive tool for interactive flow visualization exploration, with a trained model serving as a backend, would be valuable to the community. We plan to pursue these projects in the future.
Comments
There are no comments yet.