1 Introduction
Despite the ongoing efforts of modernization, a large fraction of the software used in High Energy Physics (HEP) is legacy. It mostly consists of libraries assembling single threaded, Fortran and C++03 monoplatform routines. Concomitantly, HEP experiments keep collecting samples with unprecedented large statistics, while data analyses become increasingly complex. Even deploying powerful computers, it is common to spend days performing calculations to reach a result, and they often need retuning anyway.
On the other hand, computer processors will not increase clock frequency anymore in order to reach higher performance. Indeed, the current roadmap to improve overall performance is to deploy different levels of concurrency. One effect of which is the proliferation of multithread friendly and multiplatform environments among HPC data centers. Unfortunately, HEP software is not completely prepared yet to fully exploit concurrency and to deploy more opportunistic computing strategies.
Hydra proposes a computing model to approach these issues. The Hydra framework provides collection of parallelized highlevel algorithms, addressing some of of typical computing bottlenecks commonly found in HEP, and a set of optimized containers and types, through a modern and functional interface, allowing to enhance HEP software productivity and performance, keeping the portability between NVidia GPUs, multicore CPUs and other devices compatible with CUDA, TBB and OpenMP computing models.
2 Design highlights
Hydra is a C++11 template framework organized using a variety of static polymorphism idioms and patterns. This ensures the predictability of the stack size at compile time, which is critical for stability and performance when running on GPUs and minimizes the overhead introduced by the user interface when engaging the actual calculations. Furthermore, the combination of static polymorphism and templates allows exposure of the maximum amount of code to the compiler, in the context in which the code will be used, contributing to activate many compile time optimizations that could not be accessible otherwise. Hydra’s interface and implementation details extensively deploy patterns and idioms that enforce threadsafety and efficient memory access and management. The following list summarizes some of the main design choices adopted in Hydra:

Hydra provides a set of optimized STLlike containers that can store multidimensional datasets using SoA^{1}^{1}1Structure of arrays or SoA is a layout separating elements of a structure into one parallel array per field. This ease the data manipulation with SIMD instructions and, if only a specific field of the structure is needed, only this field can to be iterated over, allowing more data to fit onto a single cache line.

Data handled using iterators and all classes manage allocated resources using RAII idiom.

The framework is type and threadsafe.

No limitation on the maximum number of dimensions that containers and algorithms can handle.
The types of devices in which Hydra can be deployed are classified by backend type, according to the device compatibility with certain computing models. Currently, Hydra supports four backends, which are CPP, OpenMP, CUDA and TBB. Code can be dispached and executed in all supported backends concurrently and asynchronously in the same program, using the suitable policies represented by the symbols
hydra::omp::sys, hydra::cuda::sys, hydra::tbb::sys, hydra::cpp::sys, hydra::host::sys and hydra::device::sys. These policies define the memory space where resources should be allocated to run algorithms and store data.For monobackend applications, source files written using Hydra and standard C++ compile for GPU and CPU just exchanging the extension from .cu to .cpp and one or two compiler flags. There is no need for refactory code.
3 Basic features
Currently, Hydra provides a collection of parallelized highlevel algorithms, addressing some computingintensive tasks commonly found in data analyses in HEP. The available highlevel algorithms are listed below:

Multidimensional p.d.f. sampling.

Parallel function evaluation on multidimensional datasets.

Five fully parallelized numerical integration algorithms: GenzMalik, selfadaptive and static GaussKronrod quadratures, plain, selfadaptive importance sampling and phasespace Monte Carlo integration.

Phasespace Monte Carlo generation, integration and modeling.

Interface to ROOT::Minuit2[1] minimization package, allowing to accelerate maximum likelihood fits over multidimensional large datasets.

Parallel implementation of the SPlots[2] technique prescription, for statistical unfolding data distributions.
Hydra also provides two multidimensional containers optimized to store large POD datasets with dimensions represented by numbers with the same or different types. A significant number of facilities are also available to filter, copy and perform other operations. In the following subsections, some of the basic features are described.
3.1 Functors and arithmetic operators
In all situations where custom calculations need to be performed, Hydra framework instantiates algorithms to call the user’s code that is passed using functors or C++11 lambdas. Hydra adds features and type information to generic functors using the CRTP idiom. To be compliant with Hydra’s interface, the functors need to derive from the hydra::BaseFunctor<Functor,ReturnType,NParameters> class and to implement the __host__ __device__ Evaluate(...) method. Functor’s values can be cached. The 1 shows how to implement a simple functor to calculate a Gaussian function.
3.2 C++11 Lambdas
The user can define a C++11 lambda function and convert it into a Hydra functor using the function hydra::wrap_lambda(). The wrapped lambda will have access to all functionality of Hydra. The 2 shows a basic example of usage for this functionality. Hydra also supports the usage of C++11 lambdas as fit models. In this case, the lambda function needs to have a different signature and the list of fit parameters needs to be passed to hydra::wrap_lambda().
3.3 Arithmetic operators
Hydra overloads the basic arithmetic operators for all objects deriving from hydra::BaseFunctor. Composition of functors is supported as well. For example, the lines shown in 3 are completely legal C++11 code.
3.4 Containers
Hydra framework provides one dimensional STLlike vector container for each supported backend, aliasing the underlying Thrust types. Beyond this, Hydra implements two native multidimensional containers:
hydra::multivector and hydra::multiarray. In these containers, the data corresponding to each dimension is stored in contiguous memory regions and when the container is traversed, each entry is accessed as a hydra::tuple, where each field holds a value corresponding to a dimension. Both classes implement an interface completely compliant with a STL vector and also provides constant and nonconstant accessors for the single dimensional data. The container hydra::multivector is suitable to store datasets where the dimensions are represented by entries with different POD types. hydra::multiarray is designed to store datasets where all dimensions are represented by the same type. Data is copiable across different backends.4 Code examples and performance measurements
4.1 Multidimensional numerical integration
The VEGAS algorithm[3] samples the integrand and adapts itself, so that the sampling points are concentrated in the regions that make the largest contribution to the integral. Hydra’s implementation follows the structure of the corresponding GSL algorithm, but parallelizes the integrand evaluations and accumulations in each iteration. There is no limit in the number of dimensions the integrator can handle. Figure 1 shows the performance of Hydra’s VEGAS implementation.
4.2 PhaseSpace Monte Carlo
PhaseSpace Monte Carlo simulates the kinematics of a particle with a given fourmomentum decaying to a nparticle final state, without intermediate resonances. Samples of phasespace Monte Carlo events are widely used in HEP studies where the calculation of phasespace volume is required as well as a starting point to implement and to describe the properties of models with one or more resonances or to simulate the response of the detector to decay’s products[4]. Hydra provides an implementation of the RauboldLynch method[4] and can generate the full kinematics of decays with any number of particles in the final state. Sequential decays, evaluation of model, production of weighted and unweighted samples and many other features are also supported. 4 shows how to use the phasespace Monte Carlo generator to produce a sample with ndecays threebody decays. Figure 3 shows the phasespace generator performance as a function of the sample size.
4.3 Interface to Minuit2
Hydra implements an interface to ROOT::Minuit2 that parallelizes the FCN calculation [1]. This dramatically accelerates the calculation over large datasets. Hydra normalizes the pdfs onthefly using analytical or numerical integration algorithms provided by the framework and handles data using iterators.
shows how to build a simple model composed of two pdfs, respectively representing a Gaussian and an exponential distributions. The model is used to perform an extended likelihood fit. The FCN object provided by Hydra parallelizes the function calls in the backend the data is stored in. The processing of a sample with 20 million events takes 4.8 seconds in a Tesla K40c and 299.8 seconds in a Intel Xeon(R) CPU E52680 v3 @ 2.50 GHz (one thread), resulting in a speedup of a factor
62x.5 Summary
The basic design, performance and functionality of the headeronly, C++11compliant framework Hydra have been introduced. Some of the basic interfaces and algorithms are discussed in Section 3. The performance measurements for running some of Hydra’s algorithms using CUDA and one CPU thread are discussed in Section 4, and show that Hydra can be up to 250 times faster than conventional software, depending on the graphics card. For CPU multithreads backends, TBB and OpenMP, if the problem size is large and calculations pay the cost of thread creation and destruction, the algorithms scale linearly with the number of threads deployed. Since Hydra is header only, no additional building process needs to be done beyond the inclusion of the required headers. Hydra has been presented at several conferences, including NVidia’s GTC 2017[5]. It is currently being used in a data analysis aiming to measure the charged Kaon mass at the LHCb Experiment at CERN[6]. The initial development of Python bindings for Hydra was one of the projects of Google Summer of Code 2017[7].
Hydra is open source and released under GPL version 3 license. The project is hosted on GitHub at https://github.com/MultithreadCorner/Hydra.
6 Acknowledgments
This work was performed with support from NSF Award PHY1414736. NVidia provided K40 GPUs for our use through its University Partnership program.
References
 [1] Minuit2 Package https://root.cern.ch/root/html/MATH_MINUIT2_Index.html
 [2] Pivk M and Le Diberder F R 2005 Nucl. Instrum. Meth. A555 356–369 (Preprint physics/0402083)
 [3] Lepage G P 1978 Journal of Computational Physics 27 192 – 203 ISSN 00219991 URL http://www.sciencedirect.com/science/article/pii/0021999178900049
 [4] James F E 1968 (CERN) URL https://cds.cern.ch/record/275743
 [5] Alves Junior A A 2017 GTC2017 (San José, US) Presentation ID S7340
 [6] Alves Junior A A and Contu A 2017 Rare n Strange 2017: strange physics at LHCb (Santiago de Compostela, Spain)
 [7] Hydra.Python: python bindings for Hydra https://summerofcode.withgoogle.com/dashboard/project/6669304945704960/overview/
Comments
There are no comments yet.