Implementation of high-precision computation capabilities into the open-source dynamic simulation framework YADE

This paper deals with the implementation of arbitrary precision calculations into the open-source discrete element framework YADE published under the GPL-2+ free software license. This new capability paves the way for the simulation framework to be used in many new fields such as quantum mechanics. The implementation details and associated gains in the accuracy of the results are discussed. Besides the "standard" double (64 bits) type, support for the following high-precision types is added: long double (80 bits), float128 (128 bits), mpfr_float_backend (arbitrary precision) and cpp_bin_float (arbitrary precision). Benchmarks are performed to quantify the additional computational cost involved with the new supported precisions. Finally, a simple calculation of a chaotic triple pendulum is performed to demonstrate the new capabilities and the effect of different precisions on the simulation result.



page 4

page 9

page 10

page 13

page 16


Performance evaluation of multiple precision matrix multiplications using parallelized Strassen and Winograd algorithms

It is well known that Strassen and Winograd algorithms can reduce the co...

Improving the Performance of the GMRES Method using Mixed-Precision Techniques

The GMRES method is used to solve sparse, non-symmetric systems of linea...

gSLICr: SLIC superpixels at over 250Hz

We introduce a parallel GPU implementation of the Simple Linear Iterativ...

Reconciling interoperability with efficient Verification and Validation within open source simulation environments

A Cyber-Physical System (CPS) comprises physical as well as software sub...

Vectorized Calculation of Short Circuit Currents Considering Distributed Generation - An Open Source Implementation of IEC 60909

An important task in grid planning is to ensure that faults in the grid ...

An open-source simulation package for power electronics education

Extension of the open-source simulation package GSEIM for power electron...

Eilmer: an Open-Source Multi-Physics Hypersonic Flow Solver

This paper introduces Eilmer, a general-purpose open-source compressible...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The advent of a new era of scientific computing has been predicted in the literature, one in which the numerical precision required for computation is as important as the algorithms and data structures in the program Bailey et al. (2012); Kahan (2004, 2006); McCurdy et al. (2002). An appealing example of a simple computation gone wrong was presented in the talk “Are we just getting wrong answers faster?” of Stadtherr in 1998 Stadtherr (1998). An exhaustive list of such computations along with a very detailed analysis can be found in Kahan (2006).

Many examples exist where low-precision calculations resulted in disasters. The military identified an accumulated error in multiplication by a constant factor of , which has no exact binary representation, as the cause for a Patriot missile failure on 25 February 1991, which resulted in several fatalities Blair et al. (1992). If more bits were used to represent a number, the explosion of an Ariane 5 rocket launched by the European Space Agency on 4 June 1996 could have been prevented Lions et al. (1996); Lann (2006); Jézéquel and Meyer (1997); Ben-Ari (2001) as it was a result of an inappropriate conversion from a 64 bit floating point number into a 16 bit signed integer. Indeed, the 64 bit floating point number was too big to be represented as a 16 bit signed integer. On 14 May 1992 the rendezvous between the shuttle Endeavour and the Intelsat 603 spacecraft nearly failed. The problem was traced back to a mismatch in precision Peterson (1996); Huckle and Neckel (2019). More catastrophic failures related to the lack of precision are discussed in Peterson (1996); Huckle and Neckel (2019). In 2012 it was predicted that most future technical computing will be performed by people with only basic training in numerical analysis or none at all Loh and Walster (2002); Goldberg (1991); Bailey et al. (2012); Kahan (2006). High-precision computation is an attractive option for such users, because even if a numerically better algorithm with smaller error or faster convergence is known for a given problem (e.g. Kahan summation Goldberg (1991) for avoiding accumulating errors111for summands and Unit in Last Place (ULP) error, the error in regular summation is , error in Kahan summation is , while error with regular summation in twice higher precision is . See proof of Theorem 8 in Goldberg (1991).), it is often easier and more efficient to increase the precision of an existing algorithm rather than deriving and implementing a new one Bailey et al. (2012); Kahan (2004) — a feat which is made possible by the work presented in this paper. It shall however be noted that increasing precision is not the answer to all types of problems, as recently a new kind of a pathological systematic error of up to has been discovered in a certain type of Bernoulli map calculations which cannot be mitigated by increasing the precision of the calculations Boghosian et al. (2019). In addition, switching to high-precision generally means longer run times Isupov (2020); Fousse et al. (2007).

Nowadays, high-precision calculations find application in various different domains, such as long-term stability analysis of the solar system Laskar and Gastineau (2009); Sussman and Wisdom (1992); Bailey et al. (2012), supernova simulations Hauschildt and Baron (1999), climate modeling He and Ding (2001), Coulomb n-body atomic simulations Bailey and Frolov (2002); Frolov and Bailey (2004), studies of the fine structure constant Yan and Drake (2003); Zhang et al. (1996), identification of constants in quantum field theory Broadhurst (1999); Bailey (2005), numerical integration in experimental mathematics Bailey et al. (2005); Lu et al. (2010), three-dimensional incompressible Euler flows Caflisch (1993), fluid undergoing vortex sheet roll-up Bailey (2005), integer relation detection Bailey (2000), finding sinks in the Henon Map Joldes et al. (2014) and iterating the Lorenz attractor Abad et al. (2011). There are many more yet unsolved high-precision problems Stefański (2013), especially in quantum mechanics and quantum field theory where calculations are done with 32, 230 or even 10000 decimal digits of precision Pachucki and Puchalski (2005); Siłkowski and Pachucki (2020); Broadhurst (1999). Additionally Debian, a Linux distribution with one of the largest archive of packaged free software is now moving numerous numerical computation packages such as Open MPI, PETSc, MUMPS, SuiteSparse, ScaLAPACK, METIS, HYPRE, SuperLU, ARPACK and others into 64-bit builds Debian . In order to stay ahead of these efforts, simulations frameworks need to pave the way into 128-bit builds and higher.

The open-source dynamic simulation framework YADE Kozicki and Donzé (2008); Šmilauer and others (2015) is extensively used by many researchers all over the world with a large, active and growing community of more than 25 contributors. YADE, which stands for “Yet Another Dynamic Engine”, was initially developed as a generic dynamic simulation framework. The computation parts are written in C++ using flexible object models, allowing independent implementation of new algorithms and interfaces. Python (interpreted programming language, which wraps most of C++ YADE code) is used for rapid and concise scene construction, simulation control, postprocessing and debugging. Over the last decade YADE has evolved into a powerful discrete element modelling package. The framework benefits from a great amount of features added by the YADE community, for example particle fluid coupling Gladkyy and Schwarze (2014); Lominé et al. (2011); Maurin et al. (2015), thermo–hydro-mechanical coupling Caulk et al. (2020); Krzaczek et al. (2019, 2020), interaction with deformable membrane-like structures, cylinders and grids Effeindzourou et al. (2016); Bourrier et al. (2013); Thoeni et al. (2013), FEM-coupling Jerier et al. (2011); Frenning (2008); Guo and Zhao (2014), polyhedral particles Boon et al. (2013); Eliáš (2014); Gladkyy and Kuna (2017), deformable particles Haustein et al. (2017), brittle materials Scholtès and Donzé (2013); Donzé et al. (2021), quantum dynamics of diatomic molecules Jasik et al. (2018, 2021) and many others. A more extensive list of publications involving the use of YADE can be found on the framework’s web page The Yade Project (2020). A list of selected available YADE modules and features is presented in Tab. 1. Although its current focus is on discrete element simulations of granular material, its modular design allows it to be easily extended to new applications that rely on high-precision calculations.

The present work deals with the implementation of high-precision support for YADE which will open the way for YADE to be used in many new research areas such as quantum mechanics Jasik et al. (2018, 2021), special relativity, general relativity, cosmology, quantum field theory and conformal quantum geometrodynamics Santamato and De Martini (2015); De Martini and Santamato (2013). The programming techniques necessary for such extension are presented and discussed in Section 2. Relevant tests and speed benchmarks are performed in Sections 3 and 4. A simple chaotic triple pendulum simulation with high precision is presented in Section 5. Finally, conclusions are drawn and it is discussed how this new addition to the framework will enable research in many new directions.

cmake flag description
High-precision support in present YADE version37.

(always on)
Discrete Element Method Šmilauer and others (2015); Kozicki and Donzé (2008).
(always on) Deformable structures Effeindzourou et al. (2016); Bourrier et al. (2013); Thoeni et al. (2013).
ENABLE_CGAL Polyhedral particles, polyhedral particle breakage Eliáš (2014); Gladkyy and Kuna (2017).
ENABLE_LBMFLOW Fluid-solid interaction in granular media with coupled
Lattice Boltzmann/Discrete Element Method Lominé et al. (2011).
ENABLE_POTENTIAL_PARTICLES Arbitrarily shaped convex particle described as a 2nd degree
polynomial potential function Boon et al. (2013).
Selected YADE features with high-precision support.

Exporting data and simulation geometry to ParaView Šmilauer and others (2015)
(always on) Importing geometry from CAD/CAM software (yade.ymportŠmilauer and others (2015).
ENABLE_ASAN AddressSanitizer allows detection of memory errors, memory leaks,
heap corruption errors and out-of-bounds accesses Serebryany et al. (2012).
ENABLE_OPENMP OpenMP threads parallelization, full support for double,
long double, float128 types8.
Modules under development for high-precision support.
ENABLE_MPI MPI environment for massively parallel computation Šmilauer and others (2015).
ENABLE_VPN Thermo-hydro-mechanical coupling using virtual pore network Krzaczek et al. (2019, 2020).
ENABLE_NRQM Quantum dynamics simulations of diatomic molecules including
photoinduced transitions between the coupled states Jasik et al. (2018, 2021).

Table 1: Selected modules and features in YADE.

2 Implementation of arbitrary precision

2.1 General overview

Since the beginning of YADE Kozicki and Donzé (2008), the declaration ‘using Real=double;222originally YADE was written in C++03, hence, before the switch to C++17 it was ‘typedef double Real; was used as the main floating point type with the intention to use it instead of a plain double everywhere in the code. The goal of using Real was to allow replacing its definition with other possible precisions333see for example:

. Hence, the same strategy was followed for other types used in the calculations, such as vectors and matrices. Per definition the last letter in the type name indicates its underlying type, e.g. ’

Vector3r v;’ is a 3D vector , and Vector2i is a 2D vector of integers (where is a subset of rational numbers , which are representable by the currently used precision: ; the name Real is used instead of Rational or FloatingPoint for the sake of brevity).

Figure 1: Simplified dependency tree of the open-source framework YADE. External dependencies are marked in orange. The green boxes indicate parts of the framework that needed to be adapted for high-precision. Selected YADE modules which support high-precision are in the top row, dashed lines indicate modules under development (also see Tab. 1).

In the presented work, the goal to use high precision is achieved by using the C++ operator overloading functionality and the boost::multiprecision library. A simplified dependency diagram of YADE is shown in Fig. 1. The layered structure of YADE remains nearly the same as in the original paper by Kozicki and Donzé Kozicki and Donzé (2008). It is built on top of several well established libraries (marked with orange in Fig. 1) as discussed in Section 2.3. Some changes were necessary in the structure of the framework (marked with green in Fig. 1) as highlighted in Section 2.5. The top row in Fig. 1 indicates selected YADE modules with respective citations listed in Tab. 1. It should be noted that YADE relies on many external libraries to expand its functionality which can result in a demanding server setup.

The Boost library Dawes et al. (2020) provides convenient wrappers for other high-precision types with the perspective of adding more such types in the future444at the time of writing, the quad–double library with 62 decimal places (package libqd-dev) is in preparation, see: The new supported Real types are listed in Tab. 2. A particular Real type can be selected during compilation of the code by providing a cmake argument either REAL_PRECISION_BITS or REAL_DECIMAL_PLACES555see for detailed documentation.

The process of adding high-precision support to YADE was divided into several stages which are described in the subsections below666also see the consolidated merge request: !383.

Total Decimal Exponent Significant
Type bits places bits bits Notes
float 32 6 8 24 only for testing
double 64 15 11 53 hardware accelerated
long double 80 18 15 64 hardware accelerated
boost float128 128 33 15 113 may be hardware accelerated
boost mpfr MPFR library as wrapped by Boost
boost cpp_bin_float uses Boost only, but is slower

 The specifics of long double depend on the particular compiler and hardware; the values in this table correspond to the most common x86 platform and the g++ compiler.

 All types use 1 bit to store the sign and all types except long double have an implicit first bit=1, hence here the sum .

 The complete C++ type names for the Boost high-precision types are as follows: boost::multiprecision::float128, boost::multiprecision::mpfr_float_backend and boost::multiprecision::cpp_bin_float

Table 2: List of high-precision types supported by YADE.

2.2 Preparations

To fully take advantage of the C++ Argument Dependent Lookup (ADL), the entire YADE codebase was moved into namespace yade777see: !284, thus using the C++ standard capabilities to modularize the namespaces for each software package. Similarly, the libraries used by YADE such as Boost Dawes et al. (2020), CGAL The CGAL Project (2020) and EIGEN Guennebaud et al. (2010) reside in their respective boost, CGAL and Eigen namespaces. After this change, all potential naming conflicts between math functions or types in YADE and these libraries were eliminated.

Before introducing high precision into YADE it was assumed that Real is actually a Plain Old Data (POD) double type. It was possible to use the old C-style memset, memcpy, memcmp and memmove functions which used raw-memory access. However, by doing so the modern C++ structure used by other high-precision types was completely ignored. For example, the MPFR type may reserve memory and inside its structure store a pointer to it. Trying to set its value to zero by invoking memset (which sets that pointer to nullptr) leads to a memory leak and a subsequent program failure. In order to make Real work with other types, this assumption had to be removed. Hence, memset calls were replaced with std::fill calls, which when invoked with a POD type reduce to a (possibly faster) version of memset optimized for a particular type in terms of chunk size used for writing to the memory. In addition, C++ template specialization mechanisms allow for invoking with a non-POD type which then utilizes the functionality provided by this specific type, such as calling the specific constructors. All places in the code which used these four raw-memory access functions were improved to work with the non-POD Real type888see: !381, with one exception which is yet to be evaluated and hence mpfr and cpp_bin_float types work with OpenMP, but do not take full advantage of CPU cache size in class OpenMPArrayAccumulator. For similar reasons one should not rely on storing an address of the component of a Vector3r or Vector2r999see: !406.

Next, all remaining occurrences of double were replaced with Real101010these changes were divided into several smaller merge requests: !326, !376, !394; there were also a couple of changes such as MatrixXd MatrixXr and Vector3d Vector3r. and the high-precision compilation and testing was added to the gitlab Continuous Integration (CI) testing pipeline, which guarantees that any future attempts to use double type in the code will fail before merging such changes into the main branch. Next the Real type was moved from global namespace into yade namespace111111see: !364 to eliminate any potential problems with namespace pollution121212usually such errors manifest themselves as very unrelated problems, which are notoriously difficult to debug, e.g. due to the fact that an incorrect type (with the same name) is used; see: #57 and

2.3 Library compatibility

In order to be able to properly interface YADE with all other libraries it was important to make sure that mathematical functions (see Tab. 4) are called for the appropriate type. For example, the EIGEN library would have to call the high-precision sqrt function when invoking a normalize function on a Vector3r in order to properly calculate vector length. Several steps were necessary to achieve this. First, an inline redirection131313The recommended practice in such cases is to use the Argument Dependent Lookup (ADL) which lets the compiler pick the best match from all the available candidate functions Alexandrescu (2001); Meyers (2014); Stroustrup (2014); Vandevoorde et al. (2017). No ambiguity is possible, because such situations would always result in a compiler error. This was done by employing the C++ directives using std::function; and using boost::multiprecision::function; for the respective function and then calling the function unqualified (without namespace qualifier) in the MathFunctions.hpp file. to these functions was implemented in namespace yade::math in the file MathFunctions.hpp. Next, all invocations in YADE to math functions in the std namespace were replaced with calls to these functions in the yade::math namespace141414see: !380, !390, !391, !392, !393, !397. Functions which take only Real arguments may omit math namespace specifier and use ADL instead. Also some fixes were done in EIGEN and CGAL151515see: and, although they did not affect YADE directly since it was possible to workaround them.

The C++ type traits is a template metaprogramming technique which allows one to customize program behavior (also called polymorphism) depending on the type used Alexandrescu (2001); Meyers (2014); Stroustrup (2014); Vandevoorde et al. (2017). This decision is done by the compiler (conditional compilation) due to inspecting the types in the compilation stage (this is called static polymorphism). Advanced C++ libraries provide hooks (numerical traits) to allow library users to inform the library about the used precision type. The numerical traits were implemented in YADE for the libraries EIGEN and CGAL161616see files EigenNumTraits.hpp, CgalNumTraits.hpp and !412 as these were the only libraries supporting such a solution at the time of writing this paper. EIGEN and CGAL are fully compatible and aware of the entire high-precision code infrastructure in YADE. Similar treatment would be possible for the Coinor Saltzman (2002); Lougee-Heimer (2003) library (used by the class PotentialBlock) if it would provide numerical traits. Additionally, an OpenGL compatibility layer has been added by using inline conversion of arguments from Real to double for the OpenGL functions as OpenGL drawing functions use double171717see: !412 and file OpenGLWrapper.hpp. If the need for drawing on screen with precision higher than double arises (e.g. at high zoom levels) it will be rectified in the future.. The VTK compatibility layer was added using a similar approach. Virtual functions were added to convert the Real arguments to double181818see: !400 and VTKCompatibility.hpp. If the VTK display software will start supporting high precision, this solution can be readily improved. in classes derived from the VTK parent class (such as vtkDoubleArray).

The LAPACK compatibility layer was provided as well, this time highlighting the problems of interfacing with languages which do not support static polymorphism. The routines in LAPACK are written in a mix of Fortran and C, and have no capability to use high-precision numerical traits like EIGEN and CGAL. The only way to do this (apart from switching to another library) was to down-convert the arguments to double upon calling LAPACK routines (e.g. a routine to solve a linear system) then up-converting the results to Real. This was the first step to phase out YADE’s dependency on LAPACK. With this approach the legacy code works even when high precision is enabled although the obtained results are low-precision191919see: !379 and LapackCompatibility.cpp.. Additionally, this allows one to test the new high-precision code against the low-precision version when replacing these function calls with appropriate function calls from another library such as EIGEN in the future. Fortunately, only two YADE modules depend on LAPACK: potential particles and the flow engine Caulk and Chareyre (2019). The latter also depends on CHOLMOD, which also supports the double type only, hence it is not shown in Tab. 1. Nevertheless, a similar solution as currently implemented for LAPACK can be used in the future to remove the current dependency on CHOLMOD.

2.4 Double, quadruple and higher precisions

Sometimes a critical section of the computations in C++ would work better if performed in a higher precision1. This would also guarantee that the overall results in the default precision are correct. The RealHP<N> types serve this purpose. In analogy to float and double types used on older systems, the types RealHP<2>, RealHP<4> and RealHP<N> correspond to double, quadruple and higher multipliers of the Real precision selected during compilation, e.g. with REAL_DECIMAL_PLACES5, respectively. A simple example where this can be useful is solving a system of linear equations where some coefficients are almost zero. The old rule of thumb to ,,perform all computation in arithmetic with somewhat more than twice as many significant digits as are deemed significant in the data and are desired in the final results” works well in many cases Kahan (2004). Nevertheless, maintaining a high quality scientific software package without being able to use, when necessary, arithmetic precision twice as wide can badly inflate costs of development and maintenance Kahan (2004). On the one hand, there might be additional costs for the theoretical formulation of such tricky single-precision problems. On the other hand, the cost of extra demand for processor cycles and memory when using RealHP<N> types is picayune when compared with the cost of a numerically adept mathematician’s time Kahan (2006). Hence, the new RealHP<N> makes high and multiple-precision simulations more accessible to the researcher community.

The support for higher precision multipliers was added in YADE202020see: !496 in such a way that RealHP<1> is the Real type from Tab. 2 and every higher number N is a multiplier of the Real precision. All other types follow the same naming pattern: Vector3rHP<1> is the regular Vector3r and Vector3rHP<N> uses the precision multiplier N. A similar concept is used for CGAL types (e.g. CGALtriangleHP<N>). One could then use an EIGEN algorithm for solving a system of linear equations with a higher N using MatrixXrHP<N> to obtain the result with higher precision. Then, after the critical code section, one could potentially continue the calculations in the default Real precision. On the Python side the mathematical functions for the higher precision types are accessible via yade.math.HP2.*. By default only the RealHP<2> is exported to Python. One can export to Python all the higher types for debugging purposes by adjusting #define YADE_MINIEIGEN_HP in the file RealHPConfig.hpp.

On some occasions it is useful to have an intuitive up-conversion between C++ types of different precisions, say for example to add RealHP<1> to RealHP<2>. The file UpconversionOfBasicOperatorsHP.hpp serves this purpose. After including this header, operations using two different precision types are possible and the resultant type of such operation will always be the higher precision of the two types. This header should be used with caution (and only in .cpp files) in order to still be able to take advantage of the C++ static type checking mechanisms. As mentioned in the introduction, this type checking whether a number is being converted to a fewer digits representation can prevent mistakes such as the explosion of the rocket Ariane 5 Lions et al. (1996); Lann (2006); Jézéquel and Meyer (1997); Ben-Ari (2001).

2.5 Backward compatibility with older YADE scripts

In the present work, preserving the backward compatibility with existing older YADE Python scripts was of prime importance. To obtain this the MiniEigen Python library had to be incorporated into YADE’s codebase. The reason for this was the following: python3-minieigen was a binary package, precompiled using double. Thus any attempt of importing MiniEigen into a YADE Python script (i.e. using from minieigen import *) when YADE was using a non-double type resulted in failure. This, combined with the new capability in YADE to use any of the current and future supported types (see Tab. 2) would place a requirement on python3-minieigen that it either becomes a header-only library or is precompiled with all possible high-precision types. It was concluded that integrating its source directly into YADE is the most reasonable solution. Hence, old YADE scripts that use supported modules37 can be immediately converted to high precision by switching to yade.minieigenHP. In order to do so, the following line:

from minieigen import *

has to be replaced with:

from yade.minieigenHP import *

Respectively import minieigen has to be replaced with import yade.minieigenHP as minieigen, the old name as minieigen being used for the sake of backward compatibility with the rest of the script.

Python has native support212121see: ToFromPythonConverter.hpp file. for high-precision types using the mpmath Python package. However, it shall be noted that although the coverage of YADE’s basic testing and checking (i.e. yade --test and yade --check) is fairly large, there may still be some parts of Python code that were not yet migrated to high precision and may not work well with the mpmath module. If such problems occur in the future, the solution is to put the non compliant Python function into the py/high-precision/ file222222see also: !414.

A typical way of ensuring correct treatment of Real in Python scripts is to initialize Python variables using yade.math.Real(arg). If the initial argument is not an integer and not an mpmath type then it has to be passed as a string (e.g. yade.math.Real(’9.81’)) to prevent Python from converting it to double. Without this special initialization step a mistake can appear in the Python script where the default Python floating-point type double is for example multiplied or added to the Real type resulting in a loss of precision232323see: !604 and commit 494548b82d, where a small change in the Python script enabled it to work for high precision..

3 Testing

It should be noted that it is near to impossible to be absolutely certain about the lack of an error in a code Kahan (2006). Therefore, to briefly test the implementation of all mathematical functions available in C++ in all precisions, the following test was implemented in _RealHPDiagnostics.cpp and run using the Python script Each available function was evaluated times with on average evenly spaced pseudo-random argument in the range . These evaluations were divided into two sets. The first

evaluations were performed with uniformly distributed pseudo-random numbers in the range

. The second evaluations were done by randomly displacing a set of equidistant points in the range , where each point was randomly shifted by less than of the distance between the points. This random shift was to lower the chances of duplicate calculations on the same argument after adjusting to the function domain. The arguments were subsequently modified to match the domain argument range of each function using simple operations, such as abs() or fmod(abs(),2)-1. The obtained result for each evaluation was then compared against its respective RealHP<4> type with four times higher precision. Care was taken to exactly use the same argument for a higher precision function call. The arguments were randomized and adjusted to the function domain range in the lower precision RealHP<1>, then the argument was converted to the higher precision by using static_cast, thereby ensuring that all the extra bits in the higher precision are set to zero.

The difference expressed in terms of Units in the Last Place (ULP) Kahan (2006) was calculated. The obtained errors are listed in Tab. 4. During the tests a bug in the implementation of the tgamma function for boost::multiprecision::float128 was discovered but it was immediately fixed by the Boost developers242424see: . Some other bug reports252525see: and instigated a discussion about possible ways to fix the few problems found with the cpp_bin_float type which can be seen in the last column of Tab. 4. A smaller version of this test, with only pseudo-random evaluations262626because this test is time consuming it is not possible to run the test involving evaluations in the GitLab CI pipeline after each git push. was then added to the standard yade --test invocation.

Finally, an AddressSanitizer Serebryany et al. (2012); Bellotti (2021) was employed to additionally check the correctness of the implementation in the code and to quickly locate memory access bugs. Several critical errors were fixed due to the reports of this sanity checker. This tool is now integrated into the Continuous Integration (CI) pipeline for the whole YADE project to prevent introduction of such errors in the future Bellotti (2021) (make_asan_HP job in the GitLab CI pipeline).

4 Benchmark

A benchmark272727see: !388, !491 and the file: examples/test/performance/ yade --stdperformance -j16 (16 OpenMP threads Dagum and Menon (1998); OpenMP Architecture Review Board (2021)) on a PC with two Intel E5-2687W v2 @ 3.40GHz processors (each of the two having 8 cores resulting in a total of 16 cores or 32 threads if hyperthreading is enabled) was performed to assess performance of higher precision types. The benchmark consists of a simple gravity deposition of spherical particles into a box, a typical simulation performed in YADE. A spherical packing with 10,000 spheres is released under gravity within a rectangular box. The spheres are allowed to settle in the box (Fig. 2). The simulation runs for 7,000 iterations and the performance is reported in terms of iterations per wallclock seconds. This standardized test (hence --stdperformance in the name) is constructed in such a way that almost all the computation happens on the C++ side, only the calculation of the wallclock time is done in Python. Obviously doing more calculations in Python will make any script slower. Hence, any calculation in Python should be kept to a minimum.

(a) (b) (c) (d)[J]

Figure 2: Snapshots of the benchmark yade --stdperformance -j16 with mpfr 150 at different time steps: (a)  s, (b)  s (3,000 iterations), (c)  s (4,000 iterations), (d)  s (7,000 iterations). The particles are colored by kinetic energy.

Since the benchmark results strongly depend on other processes running on the system, the test was performed at highest process priority after first making sure that all unrelated processes are stopped (via pkill -SIGSTOP command). The benchmark was repeated at least 50 times for each precision type and compiler settings. The average calculation speed (in iterations per seconds) was determined for each precision type and data points not meeting the criterion (

is the standard deviation) were considered outliers. On average 4% of data points per bar was removed, the largest amount removed was 10% (5 data points) on two occasions

. Hence, each bar in Fig. 3 represents the average of at least 45 runs using 7,000 iterations each.

A summary of all the benchmark results is shown in Fig. 3 along with the relevant standard deviations. The performance is indicated in terms of iterations per seconds. The tests were performed282828also see,30 for the seven different precision types (Fig. 3A–G) listed in Tab. 3 for three different optimization settings: default cmake settings (Fig. 3a), with SSE vectorization enabled (Fig. 3b), and with maximum optimizations offered by the compiler but without vectorization292929the test with SSE and maximum optimizations was also performed but the results were simply additive, thus they were not included here. Also sometimes they produced the following error due to memory alignment problems:, because the operands of an SSE assembly SIMD instruction set must have their addresses to be a multiple of 32 or 64 bytes, and the compiler could not always guarantee this. (Fig. 3c). The lack of significant improvement in the third case (Fig. 3c) shows that the code is already well optimized and the compiler cannot optimize it any further, except for long double and float128 types and the gcc compiler where a 2% speed gain can be observed (Fig. 3Cc and Dc versus Ca and Da). It is interesting to note that the clang compiler systematically produced a code that runs about 4 to 9% faster than the gcc or icpc compilers. Intel compiler users should be careful, because the -fast switch might result in a performance loss of around 2 to 10%, depending on particular settings (Fig. 3c). Code vectorization (using the SSE assembly instruction set, an experimental feature, Fig. 3b) provides about 1 to 3% speed gain, however this effect is often smaller than the error bars. Enabling intel hyperthreading (HT) did not affect the results more than the standard deviation error of the benchmark. The float128 results for the intel compiler stand out with a 5% speed gain (Fig. 3D). However, not all mathematical functions are currently available for this precision in icpc and to get this test to work a crippled branch303030see was prepared for the tests with some of the mathematical functions disabled. The missing mathematical functions313131see changes in MathFunctions.hpp in commit 3b07475e38 in benchmarkIntel branch were not required for these particular calculations to work. The clang compiler does not support323232see float128 type yet. The average speed difference between each precision is listed in Tab. 3. The run time increase with precision in the MPFR library is roughly (where is the number of digits used) but it is application specific and strongly depends on the type of simulation performed Isupov (2020); Fousse et al. (2007).

Type Decimal places Speed relative to double
float 6 faster
double 15
long double 18 slower
boost float128 33 slower
boost mpfr 62 slower
boost mpfr 150 slower
boost cpp_bin_float 62 slower

 except for clang which does not yet32 support float128.

 for future comparison with libqd-dev, see footnote4.

Table 3: The high-precision types used in the benchmark and corresponding speed performance relative to double.

It shall be noted that currently YADE does not fully take advantage of the SSE assembly instructions (cmake -DVECTORIZE=1) because Vector3r is a three component type, while a four component class Eigen::AlignedVector3333333four double components in Eigen::AlignedVector3 use 256 bits which matches SSE operations, the fourth component is unused and set to zero is suggested in the EIGEN library but it is not completely functional yet. In the future, this class can be improved in EIGEN and then used in YADE.

Figure 3: Benchmark results yade --stdperformance -j16 for seven different precision types, with hyperthreading disabled and enabled (HT); for gcc version 9.3.0, clang version 10.0.1-+rc4-1, and intel icpc compiler version 20190815: (a) default cmake settings; (b) with SSE vectorization enabled via cmake -DVECTORIZE=1; (c) with maximum optimizations offered by the compiler (gcc, clang: -Ofast -fno-associative-math -fno-finite-math-only -fsigned-zeros and additionally for native: -march=native -mtune=native; intel icpc -fast) but without vectorization.
                                                                                                                                                  on a PC with two Intel E5-2687W v2 @ 3.40GHz processors with 16 cores and 32 threads.
                                                                                                                                                  via command echo off > /sys/devices/system/cpu/smt/control or by a BIOS setting (HT is also known as intel SMT).
                                                                                                                                                  the extra three flags are used by CGAL; in intel compilers -fast enforces all processor model native optimizations.
type and number of decimal places
float double long double float128 mpfr cpp_bin_float
decimal places
significand bits

 Also see Kahan (2006); please note that to obtain the number of incorrect bits one needs to take a of the value in the table.

 This test was performed with gcc version 9.3.0 and Boost library version 1.71.

 See file for implementation details. This test (with fewer evaluations) can be executed using and is a part of the yade --test suite (file py/tests/, function testRealHPErrors).

Table 4: Maximum error after function evaluations expressed in terms of Units in the Last Place (ULP), calculated by the absolute value of boost::math::float_distance between functions from C++ standard library or boost::multiprecision when compared with its respective RealHP<4> (having four times higher precision).

5 Simulation

5.1 Problem description

A simple simulation of a triple elastic pendulum system was performed to check the effect of high precision in practice. Triple pendulums are considered highly chaotic as they provide an irregular and complex system response Awrejcewicz et al. (2008). Numerical modeling of such systems can show the benefits of using high precision. A single thread was used (i.e. yade -j1) during the simulations to avoid numerical artifacts arising from different ordering of arithmetic operations performed by multiple threads. This allowed focusing on high precision and eliminating the non-deterministic effect of parallel calculations (e.g. arithmetic operations performed in a different order result in a different ULP error in the last bits Goldberg (1991); Revol and Theveny (2014); He and Ding (2001)) and to have a completely reproducible simulation.

Figure 4: Numerical simulation of the triple pendulum with (double), (float128) and (mpfr) decimal places. The snapshots are captured at the following times: (a)  s, (b)  s, (c)  s, (d)  s, (e)  s, (f)  s, (g)  s, (h)  s, and (i)  seconds. The lines are showing the positions of the connected rods. Only three precisions from Tab. 5 are shown for clarity.

The numerical setup of the model represents the chain consisting of three identical elastic pendulums (see attached Listing 1 and on gitlab). The pendulums are represented by a massless elastic rod (a long-range normal interaction) and mass points. The latter are modeled using spheres with radius  m and density  kg/m, noting that the masses of the spheres are lumped into a point. The rods are modeled by a normal interaction using cohesive interaction physics. The length of the chain is  m. Each rod is  m long and the normal stiffness of the interaction is  N/m. The strength (i.e. cohesion) is set to an artificial high value ( N/m) so that the chain cannot break. Hence, the behavior of the rods can be assumed purely elastic. The initial position of the chain is relative to the horizontal plane (see Fig. 4a,  s). Gravity  m/s is acting on the chain elements as they are moving. The process was simulated with time steps equal to  s,  s,  s, and  s. The results obtained using different precision are discussed in the following subsections. First, the evolution of the angles in the pendulum movement are discussed with  s. Second, the effect of damping is shown with  s. Then the effect of using various time steps is discussed. Finally, the total energy conservation is examined for various time steps.

5.2 Pendulum movement

Numerical damping was not used in this simulation series to avoid any energy loss and for the purity of the numerical results. The simulations were carried out with the different precisions listed in Tab. 5 and a time step of  s. Angles between the two rods were constantly monitored and saved with a period of  s for further analysis and comparison. After the simulation was performed and the data was gathered, the Pearson correlation coefficient Stigler (1989) was calculated for all data sets. The simulation with 150 decimal places (type mpfr 150) was used as the reference solution as it has the highest number of decimal places. The product of the two angles between the three rods was used as an input parameter for the calculation of the correlation coefficient. The data for the correlation was placed in chunks with each having 500 elements ( s  s of the simulation). Then the scipy.stats.pearsonr function from SciPy Virtanen et al. (2020) was employed for the calculation of the Pearson correlation coefficient . For further reference, the point in time when the correlation between two simulations falls below is marked as . The latter corresponds to the time from the start of the simulation until the correlation is lost and in the following it is denoted correlation duration.

type Decimal places Correlation duration
float 6 seconds
double 15 seconds
long double 18 seconds
boost float128 33 seconds
boost mpfr 62 seconds
boost cpp_bin_float 62 seconds
Table 5: The high-precision types used in the simulation and corresponding correlation duration for .

Fig. 4 shows snapshots of the evolution of the movement for three different precisions. At the beginning of the simulation the angles between the rods are the same. However, from a certain point in time onward the angles are starting to differ. Indeed, at  s (Fig. 4d), the green line (15 decimal places, type double) has clearly another state compared to the other two with 33 and 150 decimal places (types float128 and mpfr 150 respectively). The snapshot at  s (Fig. 4g) demonstrates the beginning of the deviation of the simulation with 33 decimal places (type float128). Thereafter all the pendulums are moving very differently and no correlation is observed. It can be concluded that using higher precision increases the time when accurate calculation results are obtained which is also reflected by the correlation duration listed in Tab. 5.

Figure 5: Pearson correlation coefficient as a function of time between the results obtained with mpfr 150 and the following different precisions: (a) float; (b) double; (c) long double; (d) float128; (e) mpfr 62; (f) cpp_bin_float 62. Note that the timescales on both figures are different. The black dashed lines mark the threshold which is used to calculate the correlation duration .

Fig. 5 presents the correlation coefficient as a function of time. The graphs provide a more accurate representation of the point in time when the correlation disappears. One can see that there is a positive linear correlation with at the beginning of the simulations. This means initially the rods are moving identically. The lowest precision curve float is starting to jump between correlation values of and at around  s (Fig. 5a), which means that no visible correlation is observed any more. For all the higher precision simulations the drop off happens later and progressively with increasing precision as summarized in Tab. 5.

Type double and long double have a correlation of after  s (Fig. 5b, comparing 15 with 150 decimal places) and  s (Fig. 5c, 18 vs. 150 decimal places) respectively. The same tendency is seen from Boost float128 type (Fig. 5d, 33 vs. 150 decimal places) which deviates at approximately  s. Both simulations with 62 decimal places start to deviate at around  s. This clearly demonstrates that the level of precision, i.e. the number of decimal places, has an influence on the accuracy of the simulation results. Sometimes the decrease of the correlation happens suddenly and sometimes it starts to decrease slowly before it decreases rapidly. It can clearly be seen that the simulations with higher precision are showing better results and are closer to the reference solution calculated with 150 decimal places. Nevertheless, higher precision requires much more time for the simulation and more computing resources.

5.3 Effect of damping

To study the importance of precision in combination with other parameters, the same simulations as in Section 5.2 were carried out with a numerical damping coefficient equal to . Numerical damping is generally applied to dissipate energy. In this particular test case, numerical damping can be interpreted as the slowing down of the pendulum oscillation. The global damping mechanism was used, as described in the original DEM publication of Cundall and Strack Cundall and Strack (1979). Global damping acts on the absolute velocities of the simulation bodies and is implemented in the NewtonIntegrator class in the source code of YADE. Global damping slows down all affected bodies based on their current velocities.

The damping coefficient was chosen so that an effect of different precisions can be seen on the whole system. If the damping coefficient is too high (), the system loses its whole energy very quickly and no visible differences are seen. Too small damping coefficients () lead to very slow energy dissipation.

Figure 6: Development of angle between first and second rods as a function of time for different precisions with a global damping coefficient equal to . Curves are showing smoothed data for better visibility (main plot) and raw data for the inset; (a) float; (b) double; (c) long double; (d) float128; (e) mpfr 62; (f) cpp_bin_float 62. (g) mpfr 150.

Fig. 6 shows the development of the angle between the first and the second rod of the pendulum during the simulation with a global damping coefficient equal to . One can clearly see the differences in simulation results based on different precisions. The float precision simulation (blue curve, Fig. 6a) indicates the largest deviation from all other simulations. Higher precisions gradually provide results that are closer to the simulation with the highest precision (boost mpfr 150 decimal places).

Since the angle between rods is oscillating rapidly, as can be seen on the inset in Fig. 6, the raw data was smoothed using the Savitzky–Golay filter Savitzky and Golay (1964) for better visibility. The filter removes most of the noise, and there are no visible differences between cpp_bin_float, mpfr 62 and mpfr 150. The three curves are basically overlapping each other.

5.4 Effect of time step

Figure 7: Correlation duration as a function of different precisions and a time step . Each curve is compared to the mpfr 150 simulation using the same ; (a)  s; (b)  s; (c)  s; (d)  s.

The time step is one of the most important parameters influencing the simulation results. Hence, simulations with time step values of  s,  s,  s, and  s were performed with different precisions. The values for the correlation duration were recorded and plotted in Fig. 7 for all precisions and time steps considered. It can be clearly seen that the correlation duration increases with increasing precision. This highlights once more that higher precision is required for higher confidence. It can also be seen that generally the correlation duration increases with decreasing time step. This is due the fact that a smaller time step results in a smaller integration error (e.g. the leapfrog integration scheme used in YADE has the error proportional to per iteration). This tendency is also marked on the figure with a black arrow pointing in the direction of the smaller time step . The opposite result is observed for float. This is because 6 decimal places are not enough to work with  s. It shall be noted that the implementation of a more precise time integration scheme is out of the scope of the current paper. In the current symplectic leapfrog integration scheme, as implemented in the present version of NewtonIntegrator, the positions and velocities are leapfrogging over each other. There is no jump-start option which would allow to start the simulation with both position and velocity defined at . This means that the initial velocities are declared at and initial positions at or initial velocities are declared at and initial positions at . Which of these two it is, is only a formal choice343434see also: and checks/, since both interpretations are valid. Therefore a more detailed comparison between the time steps is not carried out as the initial conditions for each simulation with a different differ slightly, i.e. the starting velocity declared in the script is interpreted as being defined at or at , thus resulting in slightly different simulations36.

5.5 Energy conservation

In the following, the total energy in the system is analyzed for different precisions and time step values of  s,  s,  s, and  s. No numerical damping is considered. The total energy in the system is calculated as the sum of the elastic energy in the interactions and the kinetic and potential energy of the mass points (i.e. spheres). As already pointed out in the previous section, the symplectic leapfrog integration scheme is used. This means that velocities and positions are not known at the same time. Hence, the velocities needed for an accurate calculation of the kinetic energy are taken as an average of the velocities from the current and the next iteration. All numerical results are compared to the reference solution which was calculated using 150 decimal places, similar as in the previous sections.

Fig. 8 shows two typical results obtained from the study using a time step of  s. The results obtained with the other time steps have a similar trend and are not included for brevity. The top graphs show the evolution of the energy balance where each energy component is divided by the total reference energy to give an energy ratio. The total reference energy is calculated using 150 decimal places. The total energy ratio should be equal to 1 throughout the simulation. The bottom graphs show the absolute error in total energy calculated as the absolute difference between the total energy given by a specific precision and the constant total energy calculated using 150 decimal places.

The results obtained using float are depicted in Fig. 8a. It can be seen that the absolute error in total energy starts to increase drastically after about 4 s. This is much later than the correlation duration. From the energy plot it can also be seen that energy is continuously added to the system from this time onward. This makes the simulation not only incorrect but also unstable. A different observation can be drawn from Fig. 8b where the results of a typical simulation with float128 are shown. It can be seen that the energy balance is stable and the absolute error is several order of magnitudes smaller. In addition, the absolute error does not have an increasing trend. Instead, it bounces around, i.e. it increases initially and decreases thereafter, and never goes above a certain threshold value. This is also a reflection of the symplectic leapfrog integration scheme. It should be noted that the results for double, long double, mpfr 62 and cpp_bin_float 62 are very similar to Fig. 8b and, hence, not shown for brevity.

Fig. 9 summarizes the results for all precisions and all time steps. As noted previously, a detailed comparison between different time steps does not make sense because of the different initial velocities. Nevertheless, a qualitative comparison is valid. It can be seen that the maximum absolute error in energy balance for float is many orders of magnitudes larger than for the other precisions (note that the vertical axis uses logarithmic scale). The data also indicates that the error is almost constant for all other precisions. This clearly highlights the effectiveness and reliability of the symplectic leapfrog integration scheme implemented in YADE. However the error is many orders of magnitude larger than the ULP error of the higher precision types. For example to achieve maximum absolute error of [J] for float128, further decreasing the time step is not practical. Different approaches, such as higher order symplectic methods, have to be employed Omelyan et al. (2002); Omelyan (2006)36. Like in previous section, a smaller time step results in a smaller absolute error (this tendency is indicated by the black arrow), except for float where 6 decimal places are not enough to work with the smaller time steps.

Figure 8: Evolution of energy balance plotted as energy ratios and corresponding absolute error compared to the mpfr 150 simulation for precisions (a) float and (b) float128.
Figure 9: Maximum absolute error in energy balance during the first 16 s as a function of different precisions and for different time steps . Each curve is compared to the mpfr 150 simulation using the same ; (a)  s; (b)  s; (c)  s; (d)  s. Note that the axis for the maximum absolute error is in log scale.

6 Conclusions and future perspectives

The obtained results show that using high precision has a pronounced influence on the simulation results and the calculation speed. Higher precisions provide more accurate results and reduce numerical errors which in some fields can be beneficial. It can also be concluded that high precision is essential for research of highly chaotic systems (Fig. 5). Nevertheless, increasing the number of decimal places in the code leads to a higher CPU load and raises the calculation times (Fig. 3, Tab. 3).

Updating an existing software with a large codebase to bring the flexibility of arbitrary precision can be a challenging and error-prone process which might require drastic refactoring. A good test coverage of the code (unit and integration tests: the Continuous Integration pipeline in YADE) is highly recommended before beginning of such refactoring to ensure code integrity Bellotti (2021). Also the employment of AddressSanitizer Serebryany et al. (2012) is highly desirable to prevent heavy memory errors such as heap corruptions, memory leaks, and out-of-bounds accesses.

Simulations with the triple pendulum show that the results are starting to be different after a few seconds of the simulation time because of different precisions. The higher the precision the longer the results remain true to the highest precision tested. The same effect, within each precision (Fig. 7 and Fig. 9), occurs when using smaller time steps , because it has a smaller time integration error per iteration. Applying damping can significantly smooth this effect (Fig. 6).

The new high-precision functionality added to YADE does not negatively affect the existing computational performance (i.e. simulations with double precision), because the choice of precision is done at compilation time and is dispatched during compilation via the C++ static polymorphism template mechanisms Stroustrup (2014); Vandevoorde et al. (2017).

Concluding this work, the main modules of YADE now fully support two aspects of arbitrary precision (see Tab. 1 and Fig. 1):

  1. Selecting the base precision of Real from Tab. 2 (which is an alias for RealHP<1>).

  2. Using RealHP<N> in the critical C++ sections of the numerical algorithms (Section 2.4)1.

These new arbitrary precision capabilities can be used in several different ways in the management of numerical error Bailey (2005):

  1. To periodically test YADE computation algorithms to check if some of them are becoming numerically sensitive.

  2. To determine how many digits in the obtained intermediate and final results are reliable.

  3. To debug the code in order to find the lines of code which produce numerical errors, using the method described in details in chapter 14 of Kahan (2006).

  4. To fix numerical errors that were found by changing the critical part of the computation to use a higher precision type like RealHP<2> or RealHP<4> as suggested in Kahan (2004).

The current research focus is to:

  1. Add quantum dynamics calculations to YADE using the time integration algorithm which can have the error smaller than the numerical ULP error of any of the high-precision types: the Kosloff method Tal-Ezer and Kosloff (1984); Kosloff (1997); Schaefer et al. (2017); Tal-Ezer et al. (2012) based on the rapidly converging Chebychev polynomial expansion of an exponential propagator353535Since that algorithm is Taylor–free, there is no meaning to the term “order of the method” Tal-Ezer et al. (2012); Schaefer et al. (2017)..

  2. Add unit systems support, because there are also software errors related to unit systems, for example in 10 November 1999 the NASA’s Mars Climate Orbiter was lost in space because of mixing SI and imperial units Stephenson et al. (1999); Peterson (1996); Huckle and Neckel (2019).

Possible future research avenues, opened by the present work, include:

  1. Add more precise time integration algorithms. The problems mentioned in Section 5.4 are well known. Albeit symplectic integrators are particularly good Quispel and Turner (1996); Sussman and Wisdom (1992), a better time integration method with smaller will be able to fully take advantage of the new high-precision capabilities (Section 5.5 and Fig. 9). There are three possible research directions:

    1. Use the Boost Odeint library Dawes et al. (2020) with higher order methods1,363636see also: Boost Odeint, #171, !555 and !557 where RungeKuttaCashKarp54Integrator allowed to set error tolerance of 147 correct decimal places if mpfr 150 is used..

    2. Use the work on time integrators by Omelyan et al. Omelyan et al. (2002); Omelyan (2006). These approaches suggests that it is potentially possible to decrease the run time more than 50-fold with the same computation effort upon switching to long double, float128 or higher types. The algorithms focus on reducing the truncation errors and eliminating errors introduced by the computation of forces. Such a smaller error allows to use a larger time step which will more than compensate the speed loss due to high-precision calculations. Of course, the standard considerations for the time step Burns and Hanley (2016); Omelyan and Kovalenko (2011) would have to be re-derived.

    3. Investigate whether the exponential propagator approach presented in Schaefer et al. (2017) or in Orimo et al. (2018); Scrinzi (2021) could be used in YADE as a general solution for ODEs, similarly to Omelyan (2006); Hochbruck and Lubich (1999); Caliari and Ostermann (2009); Quispel and Turner (1996); McLachlan et al. (1998); Fahs (2012), regardless if that is a classical or a quantum dynamics system.

  2. Enhance all auxiliary modules of YADE for the use of high precision373737for the full up-to-date list of supported modules see:

  3. Use the interval computation approach to reduce problems with numerical reproducibility in parallel computations by using boost::multiprecision::mpfi_float as the backend for RealHP<N> type Revol and Theveny (2014); Merlet (2007).

  4. Use different rounding modes to run the same computation for more detailed testing of numerical algorithms Kahan (2006).

Overall, based on the presented work, the architecture of YADE now offers an opportunity to adjust its precision according to the needs of its user. A wide operating system support and simple installation procedure enable forming multidisciplinary teams for computational physics simulations in the Unified Science Environment (USE) McCurdy et al. (2002). This will expand the spectrum of tasks that can be solved, improve the results and reduce numerical errors. Of course, this option not only complicates the architecture and the source code, but also imposes a restriction on the choice of a programming language.


The authors would like to acknowledge the partial support of the COST action CA18222 “Attosecond Chemistry” and the Australian Research Council (DP190102407). The support of the Institute for Mineral Processing Machines and Recycling Systems Technology at TU Bergakademie Freiberg for providing access to computing facilities is also acknowledged. In addition, the authors thank the Department of Building Structures and Material Engineering, Faculty of Civil and Environmental Engineering, at Gdańsk University of Technology for hosting the gitlab Continuous Integration (CI) pipeline for YADE. The authors would also like to thank the two anonymous reviewers whose suggestions helped improve this manuscript.


  • A. Abad, R. Barrio, and A. Dena (2011) Computing periodic orbits with arbitrary precision. Physical Review E 84 (1). External Links: Link, Document Cited by: §1.
  • A. Alexandrescu (2001) Modern c++ design: generic programming and design patterns applied. Addison-Wesley Longman Publishing Co., Inc., USA. External Links: ISBN 0201704315 Cited by: §2.3, footnote 13.
  • J. Awrejcewicz, B. Supel, C.H. Lamarque, G. Kudra, G. Wasilewski, and P. Olejnik (2008) Numerical and experimental study of regular and chaotic motion of triple physical pendulum. International Journal of Bifurcation and Chaos 18, pp. 2883–2915. External Links: Document Cited by: §5.1.
  • D. H. Bailey (2005)

    High-precision floating-point arithmetic in scientific computation

    Computing in Science and Engineering 7 (3), pp. 54–61. Cited by: §1.
  • D.H. Bailey, R. Barrio, and J.M. Borwein (2012) High-precision computation: mathematical physics and dynamics. Applied Mathematics and Computation 218 (20), pp. 10106 – 10121. External Links: ISSN 0096-3003, Document, Link Cited by: §1, §1, §1.
  • D.H. Bailey (2000) Integer relation detection. Computing in Science and Engineering 2 (1), pp. 24–28. External Links: Link, Document Cited by: §1.
  • D.H. Bailey (2005) High-Precision Floating-Point Arithmetic in Scientific Computation. Computing in Science and Engineering 7 (3), pp. 54–61. External Links: Link, Document Cited by: §6.
  • D. H. Bailey and A. M. Frolov (2002) Universal variational expansion for high-precision bound-state calculations in three-body systems. applications to weakly bound, adiabatic and two-shell cluster systems. Journal of Physics B: Atomic, Molecular and Optical Physics 35 (20), pp. 4287–4298. External Links: Link, Document Cited by: §1.
  • D. H. Bailey, K. Jeyabalan, and X. S. Li (2005) A comparison of three high-precision quadrature schemes. Experimental Mathematics 14 (3), pp. 317–329. External Links: Link, Document Cited by: §1.
  • M. Bellotti (2021) Kill it with fire. Manage aging computer systems (and future-proof modern ones). William Pollock. External Links: ISBN 978-1-7185-0118-8 Cited by: §3, §6.
  • M. Ben-Ari (2001) The bug that destroyed a rocket. SIGCSE Bull. 33 (2), pp. 58–59. External Links: ISSN 0097-8418, Link, Document Cited by: §1, §2.4.
  • M. Blair, S. Obenski, and P. Bridickas (1992) Patriot missile defense software problem led to system failure at dhahran, saudi arabia. Technical report Vol. GAO/IMTEC-92-26, United States General Accounting Office. Note: Cited by: §1.
  • B. M. Boghosian, P. V. Coveney, and H. Wang (2019) A new pathology in the simulation of chaotic dynamical systems on digital computers. Advanced Theory and Simulations 2 (12), pp. 1900125. External Links: Document, Link Cited by: §1.
  • C.W. Boon, G.T. Houlsby, and S. Utili (2013) A new contact detection algorithm for three-dimensional non-spherical particles. Powder Technology 248, pp. 94–102. External Links: Link, Document Cited by: Table 1, §1.
  • F. Bourrier, F. Kneib, B. Chareyre, and T. Fourcaud (2013) Discrete modeling of granular soils reinforcement by plant roots. Ecological Engineering. External Links: Document, Link Cited by: Table 1, §1.
  • D.J. Broadhurst (1999) Massive 3-loop Feynman diagrams reducible to SC primitives of algebras of the sixth root of unity. The European Physical Journal C 8 (2), pp. 311–333. External Links: Link, Document Cited by: §1.
  • S. J. Burns and K. J. Hanley (2016) Establishing stable time-steps for dem simulations of non-collinear planar collisions with linear contact laws. International Journal for Numerical Methods in Engineering 110 (2), pp. 186–200. External Links: Link, Document Cited by: item 9b.
  • R. E. Caflisch (1993) Singularity formation for complex solutions of the 3d incompressible euler equations. Physica D: Nonlinear Phenomena 67 (1–3), pp. 1–18. External Links: Link, Document Cited by: §1.
  • M. Caliari and A. Ostermann (2009) Implementation of exponential rosenbrock–type integrators. Applied Numerical Mathematics 59 (3–4), pp. 568–581. External Links: Link, Document Cited by: item 9c.
  • R. A. Caulk, E. Catalano, and B. Chareyre (2020) Accelerating yade’s poromechanical coupling with matrix factorization reuse, parallel task management, and gpu computing. Computer Physics Communications 248, pp. 106991. External Links: ISSN 0010-4655, Document, Link Cited by: §1.
  • R. A. Caulk and B. Chareyre (2019) An open framework for the simulation of thermal-hydraulic-mechanical processes in discrete element systems. In Thermal Process Engineering: Proceedings of DEM8, External Links: Link Cited by: §2.3.
  • P.A. Cundall and O.D.L. Strack (1979) A discrete numerical model for granular assemblies. Geotechnique (29), pp. 47–65. External Links: Document Cited by: §5.3.
  • L. Dagum and R. Menon (1998) OpenMP: an industry standard api for shared-memory programming. Computational Science & Engineering, IEEE 5 (1), pp. 46–55. Cited by: §4.
  • B. Dawes, D. Abrahams, and J. M. et al. (2020) Boost C++ Libraries. Note: Cited by: §2.1, §2.2, item 9a.
  • F. De Martini and E. Santamato (2013) Interpretation of quantum nonlocality by conformal quantum geometrodynamics. International Journal of Theoretical Physics 53 (10), pp. 3308–3322. External Links: Link, Document Cited by: §1.
  • [26] Debian Bug tracker. Note: bug numbers: openmpi #961108, PETSc #953116,#961977, mumps #961976, scotch #961184, scalapack #961186, metis #961183, hypre #961645, slepc #972069, superlu #989497, suitesparse #989550, arpack #972068; visited on 29.07.2021 External Links: Link Cited by: §1.
  • F. Donzé, Y. Klinger, V. Bonilla-Sierra, J. Duriez, L. Jiao, and L. Scholtès (2021) Assessing the brittle crust thickness from strike-slip fault segments on earth, mars and icy moons. Tectonophysics 805, pp. 228779. External Links: Link, Document Cited by: §1.
  • A. Effeindzourou, B. Chareyre, K. Thoeni, A. Giacomini, and F. Kneib (2016) Modelling of deformable structures in the general framework of the discrete element method. Geotextiles and Geomembranes 44 (2), pp. 143–156. External Links: Link, Document Cited by: Table 1, §1.
  • J. Eliáš (2014) Simulation of railway ballast using crushable polyhedral particles. Powder Technology 264, pp. 458–465. External Links: Link, Document Cited by: Table 1, §1.
  • H. Fahs (2012) Investigation on polynomial integrators for time-domain electromagnetics using a high-order discontinuous galerkin method. Applied Mathematical Modelling 36 (11), pp. 5466–5481. External Links: Link, Document Cited by: item 9c.
  • L. Fousse, G. Hanrot, V. Lefèvre, P. Pélissier, and P. Zimmermann (2007) MPFR. ACM Transactions on Mathematical Software 33 (2), pp. 13. External Links: Link, Document Cited by: §1, §4.
  • G. Frenning (2008) An efficient finite/discrete element procedure for simulating compression of 3d particle assemblies. Computer Methods in Applied Mechanics and Engineering 197 (49–50), pp. 4266–4272. External Links: Link, Document Cited by: §1.
  • A. M. Frolov and D. H. Bailey (2004) Highly accurate evaluation of the few-body auxiliary functions and four-body integrals. Journal of Physics B: Atomic, Molecular and Optical Physics 37 (4), pp. 955–955. External Links: Link, Document Cited by: §1.
  • A. Gladkyy and M. Kuna (2017) DEM simulation of polyhedral particle cracking using a combined mohr–coulomb–weibull failure criterion. Granular Matter 19 (3), pp. 41. External Links: ISSN 1434-7636, Document, Link Cited by: Table 1, §1.
  • A. Gladkyy and R. Schwarze (2014) Comparison of different capillary bridge models for application in the discrete element method. Granular Matter, pp. 1–10. External Links: ISSN 1434-5021, Document, Link Cited by: §1.
  • D. Goldberg (1991) What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys 23 (1), pp. 5–48. External Links: Link, Document Cited by: §1, §5.1, footnote 1.
  • G. Guennebaud, B. Jacob, et al. (2010) Eigen v3. Note: Cited by: §2.2.
  • N. Guo and J. Zhao (2014) A coupled fem/dem approach for hierarchical multiscale modelling of granular media. International Journal for Numerical Methods in Engineering 99 (11), pp. 789–818. External Links: Link, Document Cited by: §1.
  • P. H. Hauschildt and E. Baron (1999) Numerical solution of the expanding stellar atmosphere problem. Journal of Computational and Applied Mathematics 109 (1–2), pp. 41–63. External Links: Link, Document Cited by: §1.
  • M. Haustein, A. Gladkyy, and R. Schwarze (2017) Discrete element modeling of deformable particles in yade. SoftwareX 6, pp. 118 – 123. External Links: ISSN 2352-7110, Document, Link Cited by: §1.
  • Y. He and C. H. Q. Ding (2001) Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. The Journal of Supercomputing 18 (3), pp. 259–277. External Links: Link, Document Cited by: §1, §5.1.
  • M. Hochbruck and C. Lubich (1999) Exponential integrators for quantum–classical molecular dynamics. Bit Numerical Mathematics 39 (4), pp. 620–645. External Links: Link, Document Cited by: item 9c.
  • T. Huckle and T. Neckel (2019) Bits and bugs: a scientific and historical review of software failures in computational science. Society for Industrial and Applied Mathematics, USA. External Links: ISBN 1611975557 Cited by: §1, item 8.
  • K. Isupov (2020) Performance data of multiple-precision scalar and vector blas operations on cpu and gpu. Data in Brief 30, pp. 105506. External Links: Link, Document Cited by: §1, §4.
  • P. Jasik, J. Franz, D. Kȩdziera, T. Kilich, J. Kozicki, and J. E. Sienkiewicz (2021) Spontaneous electron emission vs dissociation in internally hot silver dimer anions. The Journal of Chemical Physics 154 (16), pp. 164301. External Links: Document Cited by: Table 1, §1, §1.
  • P. Jasik, J. Kozicki, T. Kilich, J. E. Sienkiewicz, and N. E. Henriksen (2018) Electronic structure and rovibrational predissociation of the state in KLi. Physical Chemistry Chemical Physics 20 (27), pp. 18663–18670. External Links: Link, Document Cited by: Table 1, §1, §1.
  • J.-F. Jerier, B. Hathong, V. Richefeu, B. Chareyre, D. Imbault, F.-V. Donze, and P. Doremus (2011) Study of cold powder compaction by using the discrete element method. Powder Technology 208 (2), pp. 537–541. External Links: Link, Document Cited by: §1.
  • J.-M. Jézéquel and B. Meyer (1997) Design by contract: the lessons of ariane. Computer 30, pp. 129–130. Note: External Links: Document, ISSN 0018-9162 Cited by: §1, §2.4.
  • M. Joldes, V. Popescu, and W. Tucker (2014) Searching for sinks for the hénon map using a multipleprecision gpu arithmetic library. ACM SIGARCH Computer Architecture News 42 (4), pp. 63–68. External Links: Link, Document Cited by: §1.
  • W. Kahan (2004) On the Cost of Floating-Point Computation Without Extra-Precise Arithmetic. Note: Cited by: §1, §1, §2.4, item 6.
  • W. Kahan (2006) How Futile are Mindless Assessments of Roundoff in Floating-Point Computation ?. Note: Cited by: §1, §1, §2.4, §3, §3, Table 4, item 5, item 12.
  • R. Kosloff (1997) Quantum molecular dynamics on grids. Department of Physical Chemistry and the Fritz Haber Research Center. External Links: Link Cited by: item 7.
  • J. Kozicki and F.V. Donzé (2008) A new open-source software developed for numerical simulations using discrete modeling methods. Computer Methods in Applied Mechanics and Engineering 197 (49–50), pp. 4429–4443. External Links: Link, Document Cited by: Table 1, §1, §2.1, §2.1.
  • M. Krzaczek, M. Nitka, J. Kozicki, and J. Tejchman (2019) Simulations of hydro-fracking in rock mass at meso-scale using fully coupled dem/cfd approach. Acta Geotechnica 15 (2), pp. 297–324. External Links: Link, Document Cited by: Table 1, §1.
  • M. Krzaczek, M. Nitka, and J. Tejchman (2020) Effect of gas content in macropores on hydraulic fracturing in rocks using a fully coupled dem/cfd approach. International Journal for Numerical and Analytical Methods in Geomechanics 45 (2), pp. 234–264. External Links: Link, Document Cited by: Table 1, §1.
  • G. L. Lann (2006) The ariane 5 flight 501 failure - a case study in system engineering for computing systems. Technical report Research Report, RR-3079, INRIA. 1996. inria-00073613. Note: Cited by: §1, §2.4.
  • J. Laskar and M. Gastineau (2009) Existence of collisional trajectories of mercury, mars and venus with the earth. Nature 459 (7248), pp. 817–819. External Links: Link, Document Cited by: §1.
  • J. L. Lions, L.Lübeck, J.-L. Fauquembergue, G. Kahn, W. Kubbat, S. Levedag, L. Mazzini, D. Merle, and C. O’Halloran (1996) Ariane 5 flight 501 failure. Technical report Report by the inquiry board. Note: Cited by: §1, §2.4.
  • E. Loh and G. W. Walster (2002) Rump’s Example Revisited. Reliable Computing 8 (3), pp. 245–248. External Links: Link, Document Cited by: §1.
  • F. Lominé, L. Scholtès, L. Sibille, and P. Poullain (2011) Modeling of fluid-solid interaction in granular media with coupled lattice boltzmann/discrete element methods: application to piping erosion. International Journal for Numerical and Analytical Methods in Geomechanics 37 (6), pp. 577–596. External Links: Link, Document Cited by: Table 1, §1.
  • R. Lougee-Heimer (2003) The common optimization interface for operations research: promoting open-source software in the operations research community. IBM Journal of Research and Development 47 (1), pp. 57–66. External Links: Link, Document Cited by: §2.3.
  • M. Lu, B. He, and Q. Luo (2010) Supporting extended precision on graphics processors. pp. 19–26. External Links: ISBN 9781450301893, Link, Document Cited by: §1.
  • R. Maurin, J. Chauchat, B. Chareyre, and P. Frey (2015) A minimal coupled fluid-discrete element model for bedload transport. Physics of Fluids 27 (11), pp. 113302. External Links: Link, Document Cited by: §1.
  • C. McCurdy, H. D. Simon, W. T.C. Kramer, R. F. Lucas, W. E. Johnston, and D. H. Bailey (2002) Future directions in scientific supercomputing for computational physics. Computer Physics Communications 147 (1–2), pp. 34–39. External Links: Link, Document Cited by: §1, §6.
  • R. I. McLachlan, G. R. W. Quispel, and N. Robidoux (1998) Unified Approach to Hamiltonian Systems, Poisson Systems, Gradient Systems, and Systems with Lyapunov Functions or First Integrals. Physical Review Letters 81 (12), pp. 2399–2403. External Links: Link, Document Cited by: item 9c.
  • J. -P. Merlet (2007) Jacobian, manipulability, condition number and accuracy of parallel robots. In Springer Tracts in Advanced Robotics, pp. 175–184. External Links: Link, Document Cited by: item 11.
  • S. Meyers (2014) Effective modern c++: 42 specific ways to improve your use of c++11 and c++14. 1st edition, O’Reilly Media, Inc.. External Links: ISBN 1491903996 Cited by: §2.3, footnote 13.
  • I. P. Omelyan (2006) Extrapolated gradientlike algorithms for molecular dynamics and celestial mechanics simulations. Physical Review E 74 (3). External Links: Link, Document Cited by: §5.5, item 9b, item 9c.
  • I.P. Omelyan, I.M. Mryglod, and R. Folk (2002) Optimized Forest–Ruth- and Suzuki-like algorithms for integration of motion in many-body systems. Computer Physics Communications 146 (2), pp. 188–202. External Links: Link, Document Cited by: §5.5, item 9b.
  • I. P. Omelyan and A. Kovalenko (2011) Overcoming the barrier on time step size in multiscale molecular dynamics simulation of molecular liquids. Journal of Chemical Theory and Computation 8 (1), pp. 6–16. External Links: Link, Document Cited by: item 9b.
  • OpenMP Architecture Review Board (2021) OpenMP application program interface version 5.1. External Links: Link Cited by: §4.
  • Y. Orimo, T. Sato, A. Scrinzi, and K. L. Ishikawa (2018) Implementation of the infinite-range exterior complex scaling to the time-dependent complete-active-space self-consistent-field method. Physical Review A 97 (2). External Links: Link, Document Cited by: item 9c.
  • K. Pachucki and M. Puchalski (2005) Extended hylleraas three-electron integral. Physical Review A 71 (3). External Links: Link, Document Cited by: §1.
  • I. Peterson (1996) Fatal defect: chasing killer computer bugs. Mckay, David. External Links: ISBN 0679740279 Cited by: §1, item 8.
  • G. R. W. Quispel and G. S. Turner (1996) Discrete gradient methods for solving ODEs numerically while preserving a first integral. Journal of Physics A: Mathematical and General 29 (13), pp. L341–L349. External Links: Link, Document Cited by: item 9c, item 9.
  • N. Revol and P. Theveny (2014) Numerical reproducibility and parallel computations: issues for interval algorithms. IEEE Transactions on Computers 63 (8), pp. 1915–1924. External Links: Link, Document Cited by: §5.1, item 11.
  • M. J. Saltzman (2002) COIN-or: an open-source library for optimization. Springer US. External Links: Link, Document Cited by: §2.3.
  • E. Santamato and F. De Martini (2015) Proof of the spin–statistics theorem. Foundations of Physics 45 (7), pp. 858–873. External Links: Link, Document Cited by: §1.
  • A. Savitzky and M. J. E. Golay (1964) Analytical Chemistry 36, pp. 1627–1639. Cited by: §5.3.
  • I. Schaefer, H. Tal-Ezer, and R. Kosloff (2017) Semi-global approach for propagation of the time-dependent Schrödinger equation for time-dependent and nonlinear problems. Journal of Computational Physics 343, pp. 368–413. External Links: Link, Document Cited by: item 7, item 9c, footnote 35.
  • L. Scholtès and F. Donzé (2013) A DEM model for soft and hard rocks: Role of grain interlocking on strength. Journal of the Mechanics and Physics of Solids 61 (2), pp. 352–369. External Links: Document, Link Cited by: §1.
  • A. Scrinzi (2021) TRecX – an environment for solving time-dependent schrödinger-like problems. Note: (under review in Computer Physics Communications) External Links: 2101.08171 Cited by: item 9c.
  • K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov (2012) AddressSanitizer: a fast address sanity checker. pp. 28–28. Cited by: Table 1, §3, §6.
  • M. Siłkowski and K. Pachucki (2020) Long-range asymptotics of exchange energy in the hydrogen molecule. The Journal of Chemical Physics 152 (17), pp. 174308. External Links: Link, Document Cited by: §1.
  • V. Šmilauer et al. (2015) Yade Documentation 2nd ed. The Yade Project. Note: External Links: Document Cited by: Table 1, §1.
  • M. A. Stadtherr (1998) High performance computing: are we just getting wrong answers faster?. Note: Cited by: §1.
  • T. P. Stefański (2013) Electromagnetic problems requiring high-precision computations. IEEE Antennas and Propagation Magazine 55 (2), pp. 344–353. Cited by: §1.
  • A. G. Stephenson, L. S. LaPiana, D. R. Mulville, P. J. Rutledge, F. H. Bauer, D. Folta, G. A. Dukeman, R. Sackheim, and P. Norvig (1999) Mars climate orbiter, phase i report. Technical report Mishap Investigation Board. Note: Cited by: item 8.
  • S. M. Stigler (1989) Francis galton’s account of the invention of correlation. Statist. Sci. 4 (2), pp. 73–79. External Links: Document, Link Cited by: §5.2.
  • B. Stroustrup (2014) Programming: principles and practice using c++ (2nd edition). 2nd edition, Addison-Wesley Professional. External Links: ISBN 0321992784 Cited by: §2.3, §6, footnote 13.
  • G. J. Sussman and J. Wisdom (1992) Chaotic evolution of the solar system. Science 257 (5066), pp. 56–62. External Links: Link, Document Cited by: §1, item 9.
  • H. Tal-Ezer and R. Kosloff (1984) An accurate and efficient scheme for propagating the time dependent schrödinger equation. jchemphys 81 (9), pp. 3967–3971. Cited by: item 7.
  • H. Tal-Ezer, R. Kosloff, and I. Schaefer (2012) New, highly accurate propagator for the linear and nonlinear schrödinger equation. Journal of Scientific Computing 53 (1), pp. 211–221. External Links: Link, Document Cited by: item 7, footnote 35.
  • The CGAL Project (2020) CGAL user and reference manual. 5.0.2 edition, CGAL Editorial Board. External Links: Link Cited by: §2.2.
  • The Yade Project (2020) Yade Publications. In Yade Publications, Note: Cited by: §1.
  • K. Thoeni, C. Lambert, A. Giacomini, and S.W. Sloan (2013) Discrete modelling of hexagonal wire meshes with a stochastically distorted contact model. Computers and Geotechnics 49, pp. 158–169. External Links: Document Cited by: Table 1, §1.
  • D. Vandevoorde, N. M. Josuttis, and D. Gregor (2017) C++ templates: the complete guide (2nd edition). 2nd edition, Addison-Wesley Professional. External Links: ISBN 0321714121 Cited by: §2.3, §6, footnote 13.
  • P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. Jarrod Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. Carey, İ. Polat, Y. Feng, E. W. Moore, J. Vand erPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and S. 1. Contributors (2020) SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17, pp. 261–272. External Links: Document Cited by: §5.2.
  • Z. Yan and G. W. F. Drake (2003) Bethe logarithm and qed shift for lithium. Physical Review Letters 91 (11). External Links: Link, Document Cited by: §1.
  • T. Zhang, Z. Yan, and G. W. F. Drake (1996) QED corrections of to the fine structure splittings of helium and he-like ions. Physical Review Letters 77 (9), pp. 1715–1718. External Links: Link, Document Cited by: §1.