Numerical simulations based on particle methods have been widely used in many fields of science and engineering. For example, in astrophysics, gravitational -body and smoothed particle hydrodynamics (SPH) simulations are commonly used to study dynamics of celestial bodies. In the context of engineering and biology, moving particle semi-implicit (MPS) method, molecular dynamics (MD), power dynamics, and distinct element method (DEM) simulations are utilized for a variety of purposes such as disaster prevention and drug discovery. The reason of frequent use of particle-based methods may be attributed to the fact that various kinds of physical systems can be well modeled by collections of particles and particle-based methods have several advantages (e.g. adaptivity in space and time, Galilean invariance). The numbers of particles used in simulations go on increasing because there has been a rise in the need for refining results drawn from numerical simulations. For instance, cosmological -body simulations can be used to study the evolution of rare objects such as quasars and active galactic nuclei. In order to make a prediction for such objects with a good statistical accuracy, we need to simulate large volume of the universe (Ishiyama et al., 2015). Large-scale simulations are also required in MD simulations of vapor-to-liquid nucleation for a direct comparison of predicted nucleation rates with those measured by laboratory experiments (Diemand et al., 2013). This trend requires researchers to develop simulation codes that run efficiently on modern distributed-memory parallel supercomputers.
However, developing such a code is quite difficult, time-consuming task because in order to achieve high efficiency we need to implement efficient algorithms such as the Barnes-Hut tree algorithm (Barnes & Hut, 1986) as well as some load balancing mechanism into the code. To date, simulation codes have been developed by individual researchers or research groups in each field of science, with spending a huge amount of time and effort, even though numerical algorithms used are very similar to each other. Besides, in many cases, simulation codes are being developed for each specific application of particle methods, making it difficult for researchers to try a new method or another method. These are not efficient.
In order to improve this situation, we have developed a framework FDPS (Framework for Developing Particle Simulators)111https://github.com/FDPS/FDPS (Iwasawa et al., 2015; Iwasawa et al., 2016) that enables researchers to easily develop codes for massively parallel particle simulations for arbitrary particle methods without taking care of parallelization of their codes. More specifically, FDPS provides a set of functions that (1) divide a computational domain into subdomains based on a given distribution of particles and assigns each subdomain to one MPI process, (2) exchange the information of particles among processes so that each process owns particles in its subdomain, (3) collect the information of particles necessary to perform interaction calculations in each process, and (4) perform interaction calculations using the Barnes-Hut tree algorithm for long-range force or a fast tree-based neighbor search for short-range force. All of these functions (hereafter called FDPS API [application programing interface]) are highly optimized and codes developed by using FDPS show good scalings for a large number of processes (up to processes; Iwasawa et al. (2016)). Thus, FDPS allows researchers to concentrate on their studies without spending time to the parallelization and complex optimization of the codes. FDPS has already been used to develop various applications (e.g. Hosono et al. (2016a, b, 2017); Michikoshi & Kokubo (2017); Tanikawa et al. (2017); Iwasawa et al. (2017); Tanikawa (2018a, b)).
One issue of the previous versions of FDPS (version 2.0 or ealier) is that it requires researchers to develop codes in the C++ programing language. This limitation comes from the fact that FDPS is written in C++ to use the template feature in C++ to support arbitrary data types of particles. However, there are many supercomputer users who use mainly Fortran language to develop codes. In order for such people to use the functionalities of FDPS, they must first learn the C++ language with spending time. In addition, programs which they have written in Fortran in the past can no longer be used in newly-developed codes. To cope with this problem, we have redesigned FDPS for it to have a Fortran interface layer (hereinafter, we call it Fortran interface). This layer provides API for Fortran, i.e., a set of subroutines/functions to use the functionalities of FDPS from Fortran. Thus, in FDPS version 3.0 or later, researchers can develop their simulation codes in Fortran. In this paper, we present the basic design and implementation of Fortran interface and compare performance of a simple application written by using Fortran interface with that written in C++ by using FDPS. We will show that the overhead of Fortran interface is sufficiently small and the performances of both codes are almost identical.
This paper is organized as follows: In Section 2, we briefly review the C++ core part of FDPS necessary to explain the design and implementation of Fortran interface, which are described in Section 3. Performance of application developed by using Fortran interface is shown in Section 4. In Section 5, we present one example of practical application. Finally, in section 6, we summarize this study.
2 Overview of FDPS
In this section, we provide a brief overview of the previous versions (version 2.0 or earlier) of FDPS. In § 2.1, we describe what FDPS does and how FDPS looks like from users. In § 2.2, we explain the implementation details of the previous versions of FDPS.
2.1 What FDPS does and its user interface design
FDPS provides C++ library functions that perform tasks required for parallel particle simulations. In distributed-memory parallel supercomputers, particle simulations are generally performed in the following procedure.
The computational domain is divided into subdomains based on a given distribution of particles and each subdomain is assigned to one MPI process. We call this task “domain decomposition”.
Particles is exchanged among processes so that each process owns particles in its subdomain. We call this task “particle exchange”.
Each process collects the information necessary to perform interaction calculation for the particles belonging to this process.
Each process performs interaction calculation.
The information of particles is updated using the result of the interaction calculation in each process.
Among the procedures above, procedures (1)-(3) are specific for parallel computation and the implementation of them into a code is a daunting task. As described in § 1, FDPS provides APIs, a suite of functions, that perform these procedures efficiently. Therefore, by using FDPS, researchers can avoid the implementation of these complex procedures. In addition, FDPS also provide APIs that perform procedure (4), i.e. interaction calculation, with using fast algorithms for both long- and short-range forces. Thus, users of FDPS can perform these procedures just by calling FDPS APIs. The other parts of users’ codes including procedure (5) are implemented by the users. These parts do not involve parallelization and thus users of FDPS do not need to consider the parallelization of the codes explicitly.
Figure 1 schematically illustrates typical structure of parallel particle simulation code developed by using FDPS and how a user uses FDPS. The user program first defines particle data set and interaction by using class and function in the C++ programing language, respectively. Thus, the user program must be written in C++. This is because, as shown in the right portion of Fig. 1, FDPS is written in C++ by the reason explained in § 2.2. FDPS provides APIs to perform domain decomposition, particle exchange, and calculation of interaction. They are defined as function templates in C++ (see § 2.2). These function templates are combined with the definitions of particle and interaction in order to instantiate functions that perform domain decomposition, particle exchange, and calculation of interaction in the user program. The instantiated functions are used in the main loop of the user program as shown in the lower left portion of Fig. 1. FDPS accepts arbitrary types of particle and interaction. Thus, we can use FDPS to develop any kinds of particle simulation codes.
In the next section, we will explain how the user interface design described here is realized by using features in the C++ programing language.
2.2 Implementation details of FDPS
One of development goal of FDPS is to make it possible for FDPS to support arbitrary types of particle and interaction. In order to achieve this without a decrease in performance of simulation codes, FDPS is implemented by using the template feature in the C++ programing language. Roughly speaking, this feature allows functions and classes to be described by the use of general data types (for detail, see descriptions in the ISO C++ standard [ISO/IEC 14882:2014]). Functions and classes described by using the template feature are called function templates and class templates, respectively. All general data types used in a function template or class template are replaced by specific data types given through special arguments called template arguments at the compile-time of a program. Hence, there are no unknown data types at the compile-time, allowing compilers to aggressively optimize function templates. Thus, using the template feature, we can make a high performance library for general particle types. These are the reasons why we adopt the C++ programing language to develop FDPS.
In order to make it possible for FDPS to provide the APIs that performs domain decomposition, particle exchange, interaction calculation for arbitrary types of particle data and interaction, we adopt an internal structure for FDPS as described below. First, all functions in FDPS API are implemented as function templates to manipulate general particle types. As a result, users of FDPS must implement particle data as C++ classes and pass them to FDPS API through template arguments when using the API. Second, we require that particle classes defined by FDPS users must have some of public member functions that have specific names and specific functionalities. This is because FDPS, for example, must know which member variable represents the position of a particle to perform domain decomposition and particle exchange. We solve this by requiring that a particle class has a public member function named getPos that returns the position of a particle. For similar reasons, there are other public member functions that a user-defined particle class should have. For a complete list, one can refer to the specification document of FDPS. Third, we require that particle-particle interactions must be implemented as functions or functors that have a specific interface. With this, we can implement FDPS without knowing the contents of interaction functions. This requirement does not restrict types of interactions; we can still implement arbitrary types of interaction functions as long as they have the specified interface.
The FDPS API is actually provided as the public member functions of two class templates ParticleSystem and TreeForForce, and one (non template) class DomainInfo. Hereinafter, they are called collectively FDPS classes. Each of them is used to make FDPS do a certain task. To be more precise, instances of FDPS classes ParticleSystem, DomainInfo, and TreeForForce are used, respectively, for particle exchange, domain decomposition, and interaction calculation.
To illustrate the usage of FDPS API, we show in Figure 2 an example of a C++ code that uses FDPS. At the first line of the example, the file particle_simulator.hpp is included. This is the file where all FDPS classes are implemented. Hence, all user programs include this file. As explained above, in order to use the functionalities of FDPS, we first create instances of FDPS classes. This is done at lines 9–10 in the example, where “PS” is the name space in which FDPS classes are defined and the words separated by commnas in the angle brackets associated with FDPS classes ParticleSystem and TreeForForce are template arguments. ParticleSystem class takes FullParticle (FP) class as a template argument, where FP class is a class that contains all information about a particle necessary to perform simulations as member variables. Tfp in the example is the user-defined FP class. In the example, TreeForForceLong class is used. It is a TreeForForce class specifically for long-range force and takes the following three particle classes as template arguments — the first one is Force class which contains the quantities of an -particle used to store the results of the calculations of interactions as member variables; the second and third ones are EssentialParticleI (EPI) and EssentialParticleJ (EPJ) classes which contain the minimum sets of the quantities of - and -particles necessary to calculate interactions as member variables, where -particle is a particle that receives a force and -particle is a particle that exerts a force. In the example, Tforce, Tepi, Tepj are the user definitions of these three classes. Hereinafter, we call the four kinds of particle classes described above user-defined types222The user-defined types EPI and EPJ are introduced in order to reduce the requirements for the communication bandwidth and the memory bandwidth when performing interaction calculation.. After creating the instances of FDPS classes, we can use the FDPS API by calling public member functions of these instances as in lines 15–20.
More details about the previous version of FDPS can be found in Iwasawa et al. (2016).
3 Design, usage, and implementation of Fortran interface
In this section, we describe the details of our Fortran interface. In § 3.1, we describe the user interface design of the Fortran interface. In § 3.2, we explain the usage of Fortran interface with using a specific example of Fortran program. In § 3.3, we describe the implementation details of Fortran interface. In the following, we call the part of FDPS corresponding to the previous version of FDPS the core part.
3.1 User interface design of Fortran interface
The Fortran interface to FDPS developed in this study wraps the core part of FDPS and hence has essentially the same user interface design as shown in Fig. 3. The user program first defines particle and interaction by using derived data type and subroutine in Fortran 2003, respectively. Thus, the user program must be written in Fortran 2003. This is because, as will be explained in § 3.3, the Fortran interface uses features in Fortran 2003 in order to make Fortran interoperate with C++. In the definition of particles, the user must describe complementary information about particle by using FDPS directives, which are Fortran comments having special format and are introduced by us (for details, see § 3.2). This procedure corresponds to define specific public member functions in particle class when using FDPS from C++ (see § 2.2). The Fortran interface to FDPS is written in a Fortran module as shown in the right portion of Fig. 3 and all of the FDPS API are defined as public member functions of a Fortran class for manipulating the core part of FDPS. Corresponding with the APIs in the core part of FDPS, there are APIs for domain decomposition, particle exchange, and calculation of interaction in Fortran interface. The user program calls these APIs in the main loop as shown the lower left portion of Fig. 3.
The Fortran interface is also designed to accept arbitrary types of particle and interaction as with the core part of FDPS. In the core part of FDPS, this is realized by using the template feature in C++ as described in § 2.2. Fortran does not have such feature. In the Fortran interface developed in this study, we realize it by an automatic generation of source programs of Fortran interface specifically for particle types given by users based on the information given through FDPS directives (the reason why this kind of mechanism is needed will be described in § 3.3). The generation of source programs of Fortran interface is done by a Python script provided by us. The users of Fortran interface must run this script to use Fortran interface.
3.2 Usage of Fortran interface
In this section, we explain using a sample code how a user can write an application program using Fortran interface. The basic procedures to develop codes using Fortran interface are as follows.
Define a particle using a Fortran derived data type.
Generate Fortran interface programs.
Define an interaction using a Fortran subroutine.
Develop the main part of a code using the FDPS API given through Fortran interface.
Thus, the development procedures are similar to the case in which we develop codes in C++ using FDPS (see § 2). The only difference is the existence of Procedure II. There is no difficulty in this procedure because a user only have to execute the auto-generation script provided by us as stated earlier.
The sample code used is a gravitational -body simulation code and is essentially same as the one shown in § 2.2 in Iwasawa et al. (2016)
except that the code shown here is written in Fortran. The behavior of the code is very simple: it sets the initial condition by reading particle data from a binary file, and then, it follows the time evolution of the system by integrating Newton’s equations of motion of particles with the leap-frog method. The gravitational forces acting on particles are calculated by using FDPS with the tree algorithm in which the gravitational forces from distant particles are approximated as those from a superparticle with multipole moments. For simplicity, we use the center-of-mass approximation. In this case, the gravitational acceleration of particleis evaluated as the sum of contributions from other particles and superparticles:
where , , are the positions of particle , particle , and superparticle , respectively. and are the masses of particle and superparticle , respectively. the gravitational softening of particle , the gravitational constant.
Figure 4 shows the sample code, which runs either in a single-process execution case or an MPI parallel environment. The code consists of three parts: the definition of the particle (line 7–18), the definition of the interaction functions (line 22–88), and the other parts including numerical integration and I/O (line 92–205). These three parts corresponding to Procedures I,III,IV described at the beginning of this section. In the following, we explain each procedure in detail.
3.2.1 Defining particle types
Users of FDPS must first define particles as derived data types in Fortran (Procedure I). In the sample code, the particle data is defined as full_ptcl type in lines 7–18. It has the following member variables: id (particle identification number), mass (), eps (), pos (), vel (; the velocity of particle ), pot (; the gravitational potential at ), acc ().
As we will describe in § 3.3, the particle data type must be interoperable with C because Fortran interface uses the C interoperability feature in Fortran 2003 to exchange data between Fortran programs and the core part of FDPS. In order to be interoperable with C, a derived data type must satisfy the following conditions.
It has the bind(c) attribute.
The data types of all member variables must be interoperable with C.
The first condition is satisfied if a derived data type is declared with bind(c) keyword (line 7). As for the second condition, Fortran 2003 or later offers data types corresponding to primitive data types in the C language such as int, float, double, etc., through the iso_c_binding module. The second condition is fulfilled if we define member variables using these data types. integer(kind=c_long_long) is one such example. type(fdps_f64vec) type used at lines 14, 15, and 17, which is defined in module fdps_vector
and represents a space vector, is also interoperable with C because their member variables are all interoperable with C.
Users of FDPS must write FDPS directives within the definition part of the particles. You can see a variety of Fortran comments which begin with !$fdps
in the sample code. They are FDPS directives. FDPS directives are used to pass complementary information about particle, e.g. which member variable represents physical quantity FDPS must know, to FDPS. FDPS directives can be classified into the following three types.
Directive specifying the type of a particle.
Directive specifying which member variable represents which physical quantity.
Directive specifying the way of data operation performed in FDPS.
In the following, we explain each type of directive.
Directive (a) is used to specify which user-defined type a derived data type corresponds to out of FP, EPI, EPJ, and Force described in § 2.2. A derived data type can serve multiple user-defined types. Directive (a) in this sample code is written at line 7:
With this, full_ptcl type can be used as any of user-defined types in the sample code.
Directive (b) is used to specify which member variable represents physical quantity required by FDPS. In the present case, we need to specify the mass and the position of a particle (the mass is required to calculate the monopole information of superparticles). Therefore, directives (b) are written as follows (see lines 12 and 14):
real(kind=c_double) mass !$fdps charge type(fdps_f64vec) :: pos !$fdps position
With these, FDPS recognizes that the variables mass and pos represent the mass and the position of a particle.
In the sample code, three directives of type (c) are written in lines 8–10 as follows.
!$fdps copyFromForce full_ptcl (pot,pot) (acc,acc) !$fdps copyFromFP full_ptcl (mass,mass) (eps,eps) \ (pos,pos) !$fdps clear id=keep, mass=keep, eps=keep, \ pos=keep, vel=keep
where the symbol \ is used to represent that the current line continues to the next line because of space limitations and it cannot be used in an actual code (FDPS directive must be written within a line). The first two directives containing keywords copyFromForce and copyFromFP specifies how data copy is performed between different user-defined types. The former specifies the way of data copy from Force type to FP type and the latter specifies that from FP type to EPI type or EPJ type. The directive including keyword clear is used to specify how to initialize Force type before starting the calculations of interactions.
The detail of FDPS directives can be found in the specification document of Fortran interface in the distributed FDPS package.
3.2.2 Generating Fortran interface
Next step is the generation of Fortran interface programs (Procedure II), which can be simply done by executing the auto-generation script gen_ftn_if.py provided by us in the command line as follows.
where $(FDPS_LOC) represents the PATH of the top directory of FDPS library. If the auto-generation succeeds, all the files described in § 3.3.2 are created at the current directory.
3.2.3 Defining interaction functions
Users of FDPS must define interactions as subroutines in Fortran (Procedure III). In the sample code, two subroutines are defined — one for the gravitational interaction between particles (lines 22–54) and the other for the interaction between particles and superparticles (lines 56–88).
The interaction functions must also be interoperable with C because their function pointers are passed to the core part of FDPS using the C interoperability feature in Fortran 2003. To be interoperable with C, subroutines satisfies the following conditions.
It has the bind(c) attribute.
The data types of all variables used in subroutines must be interoperable with C.
These conditions are the same as those for derived data types which are interoperable with C (see § 3.2.1). Hence, we can clear these requirements in the same way: first add the bind(c) keyword after the argument list of the subroutine (see lines 22 and 56), and define variables using data types which are interoperable with C.
The interaction functions must have the following arguments. From the beginning, the array of particles (ep_i), the number of particles (n_ip), the array of particles or superparticles (ep_j), the number of particles or superparticles (n_jp), and the array of variables that stores the calculated interaction on particles (f). Due to the specification of the core part of FDPS, the numbers of particles and superparticles must be pass-by-value arguments. Hence, the value keyword is needed in the type declaration statements for these arguments (see lines 21 and 57).
FDPS does not take care about the optimizations of the interaction functions. So, users of FDPS must optimize the interaction functions by themselves in order to achieve high efficiency. For simplicity, the interaction functions in this sample code are implemented without any optimization techniques. Some typical optimization techniques are presented in § 4.
3.2.4 Developing the main part of a user code
Users of FDPS must implement the main routine of a user code within a subroutine named f_main() by the reason explained in § 3.3 (Procedure IV). f_main() of the sample code is implemented in lines 92-122 and it consists of the following steps.
Initialize FDPS (line 103)
Create and initialize instances of FDPS classes (lines 104-110).
Read particle data from a file (line 111).
Calculate the gravitational forces on all the particles at the initial time (line 112).
Integrate the orbits of all the particles with the leap-frog method (lines 113-120)
Finalize FDPS (line 121).
In the following, we first explain the variable declaration part of f_main(). Then, we explain each step in detail.
In the Fortran interface, FDPS API is provided as type-bound procedures of Fortran 2003 class fdps_controller defined in module fdps_module. This module is defined in FDPS_module.F90, one of the source programs of Fortran interface. In order to use FDPS API, we first need to make this module accessible from f_main(), and then we should create an instance of this class. These things are done at lines 93 and 102, respectively. Thus, in this sample code, FDPS API is used by calling type-bound procedures of class instance fdps_ctrl.
In step 1, API ps_initialize is called. This procedure initializes MPI and OpenMP libraries if they are used. If not, it does nothing.
In step 2, we create and initialize three instances of FDPS classes ParticleSystem, DomainInfo, and TreeForForce by calling type-bound procedures whose names contain ‘create’ or ‘init’, which, as the names suggest, create and initialize instances, respectively. Each of these procedures takes an integer argument (psys_num, dinfo_num, and tree_num in the sample code) because all instances of FDPS classes are identified by descriptors in integer variables in the Fortran interface. API create_psys receives the name of a derived data type representing FP as the second argument, and it creates an instance of ParticleSystem<Tfp>, where Tfp is the name of the C++ class corresponding to the derived data type having the specified name. In the sample code, full_ptcl is specified. Note that this API accepts only the names of derived data types qualified by FDPS directive of type (a) explained in § 3.2.1. API create_tree receives the type of an instance of class TreeForForce as the second argument. The type information must be given by a single string of characters that consists of five strings of characters delimited by commas. The first field represents the type of force (long-range or short-range), which is followed by the three names of derived data types corresponding to Force, EPI, and EPJ types, and the final field shows the subtype of tree for long/short-range force. In this example, we calculate the gravitational forces as the long-range force and use the monopole approximation. In addition, full_ptcl type is used as Force, EPI, EPJ types. Therefore, "Long,full_ptcl,full_ptcl,full_ptcl,Monopole" is specified.
In step 3, we read particle data from a binary file, by using subroutine read_IC defined in lines 178–205. In the framework of the Fortran interface, all the particle data is stored in a C++ program, which is one of Fortran interface programs (see § 2.2). In order to setup the particle data from a Fortran program, we should take the following steps.
Create and initialize an instance of ParticleSystem class (this is already done at step 2).
Allocate memory for this instance. In the sample code, this is done by calling API set_nptcl_loc. This API allocates a memory for the instance enough to store an array of the particles of size specified by the second argument (i.e. nptcl_loc in this sample).
Get the pointer to this instance. This is done by calling API get_psys_fptr. Because this API returns the pointer corresponding to the instance specified by its first argument (i.e., psys_num in this sample), we have to prepare the pointer of the same type. In the present case, the pointer for an array of full_type type should be prepared. After the pointer is set, we can use this pointer like an array.
Set the particle data using the pointer. This is done in lines 199–201.
In step 4, the interaction calculation is performed through the subroutine calc_gravity, which is defined in lines 124–140. This subroutine consists of the following three operations.
Perform domain decomposition. This is done by calling API domain_decomposition (line 131). The first argument of this API is an integer variable corresponding to an instance of DomainInfo class. The second one is an integer variable corresponding to an instance of ParticleSystem class, which is needed because the positional information of the particles is required to perform domain decomposition.
Perform particle exchange. This is done by calling API exchange_particle (line 132). This API takes an integer variable corresponding to an instance of DomainInfo class as the first argument. The second argument is an integer variable corresponding to an instance of ParticleSystem class. This is required because the information of subdomains is necessary to perform particle exchange.
Perform the calculation of interactions. This is done by calling API call_force_all_and_write_back (line 135). The number of the arguments of this API depends on the type of force. In the present case, it takes five arguments because of long-range interaction. The definitions of the first, fourth, fifth arguments should be obvious, and hence we do not explain them here. The second and third arguments are the function pointers for the subroutines that calculate particle-particle interaction and particle-superparticle interaction. The function pointers must be obtained by the intrinsic function c_funloc, and be stored in variables of c_funptr type. In the sample code, calc_gravity_pp and calc_gravity_psp are used for the calculation of interactions.
In step 5, the time integration is performed through the subroutines kick and drift, which are defined, respectively, in lines 142–158 and 160–176. Both subroutines have the same structure: first get the pointer to the particle data stored in the C++ side of Fortran interface programs by using API get_psys_fptr, and then, update the particle data using this pointer.
In step 6, API ps_finalize is called. It calls the MPI_Finalize function if MPI is used.
In this section, we have described how an application program can be written by using Fortran interface and have shown that researchers can develop particle-based simulation codes by writing Fortran codes only. Thus, the learning of the C++ programing language is totally unnecessary to use FDPS.
3.3 Implementation details of Fortran interface
In this section, we describe the implementation details of Fortran interface. In § 3.3.1, we describe why we need a generation of Fortran interface. In § 3.3.2, we describe the file structure of Fortran interface programs and the role of each file.
3.3.1 Necessity of generation of Fortran interface
One difficulty in developing a Fortran interface to FDPS is that the template feature in the C++ programing language cannot be used directly from Fortran (Fortran is not designed to interoperate with the C++ programing language). As described earlier, the template feature is essential for FDPS to support arbitrary types of particle and interaction. Hence, we need to workaround this problem in some way in order to make a Fortran interface support arbitrary types of particle and interaction.
In this study, we solve this problem by making use of (i) the fact that Fortran can interoperate with the C programing language through the use of iso_c_binding module introduced in Fortran 2003, (ii) the fact that C++ functions can be called from C programs by using extern "C" modifier, and (iii) auto generation of source codes. The outline of our solution is as follows. Owing to (i) and (ii), we can make Fortran interoperate with C++ via C. Therefore, it is also possible to use a FDPS library (a set of functions to manipulate FDPS) made for a specific particle class from a Fortran code through a C interface to the library. However, what we want to realize is to enable researchers to use a FDPS library for an arbitrary particle class from a Fortran code. The only way to realize this would be to generate a FDPS library for a given particle class and use it from a Fortran code. Another requirement in designing a Fortran interface is that researchers must be able to develop codes by Fortran only so that they will not need to invest much time in learning C++. To make this possible, we must generate a FDPS library based on some Fortran data structures representing particles, instead of based on C++ classes. It is natural to use Fortran’s derived data type, which is similar to structure in C, to define a particle. Thus, we adopted the following solution: first define a particle as a derived data type in Fortran. Then, generate a C++ class corresponding to this derived data type. Finally, generate a FDPS library for this particle class. The generation of required C++ classes and a library is automatically done by a script provided by us. Hence, researchers do not have to care about it.
The one remaining problem is how to generate a C++ class from a given Fortran derived data type. In FDPS, C++ class representing particle must have specific public member functions such as getPos as described in § 2. Fortran 2003 supports object-oriented programing and offers classes for it, which are derived data types having type-bound procedures and are essentially same as classes in C++. Thus, a simple way to generate a C++ class is to define particle as a class in Fortran 2003 that has public member functions required by FDPS and to convert it to a C++ class. However, we cannot take this approach because Fortran 2003 Standard specifies that derived data types must not have member procedures/functions to be interoperable with C. In order to solve this problem, we introduce FDPS directives. They are used for users of FDPS to pass all information necessary to generate public member functions of the corresponding C++ classes to the auto-generation script described above. For example, the auto-generation script needs to know which member variable of a given Fortran derived data type represents the position of a particle in order to generate a public member function getPos in the corresponding C++ class. This information is passed to the script by adding FDPS directive !$fdps position to the corresponding member variable of the Fortran derived data type. In other words, our auto-generation script is designed to generate getPos in the corresponding C++ class using a member variable of a given Fortran derived data type indicated by FDPS directive !$fdps position. In this way, we can pass all of the necessary information to the auto-generation script. In § 3.2, we have given a rough explanation about FDPS directive for simplicity, but actually this kind of things is done.
The generation of a Fortran interface can be summarized schematically as shown in Figure 5: Firstly, researchers define particles as derived data types in Fortran with FDPS directives (Step ➀ in Fig. 5). Next, the auto-generation script generates C++ classes corresponding to the given Fortran derived data types (Step ➁). Using the information given by FDPS directives, the script can generate public member functions required by FDPS. Then, the script generates a FDPS library specifically for these C++ classes (Step ➂). The generated library is not a template library, but a normal library that is callable in C manner. Finally, the script also generates a Fortran module to manipulate the generated library (Step ➃). Users of FDPS can use FDPS through this module (Step ➄).
3.3.2 Structure of Fortran interface programs
In this section, we explain the file structure of the generated programs and their internal structures.
Figure 6 is a brief summary of the file structure of the Fortran interface programs and their roles. Four files enclosed by the dashed line (FDPS_module.F90, FDPS_ftn_if.cpp, FDPS_Manipulators.cpp, main.cpp) are the files to be generated by the script, and the file f_main.F90 corresponds to user programs. In the files enclosed by the dotted line (FDPS_vector.F90, FDPS_matrix.F90, FDPS_super_particle.F90, etc.), several derived data types are defined, which are needed to define user-defined types and interaction functions in Fortran (see also § 3.2). In the following, we explain the role of each interface program.
At first, we explain the roles of FDPS_Manipulators.cpp and main.cpp. Because the core part of FDPS is written in C++, all of C++ instances of FDPS classes described in § 2 (i.e. DomainInfo, ParticleSystem, and TreeForForce classes) must be created and be managed in C++ codes. This task is performed by FDPS_Manipulators.cpp, which corresponds to a library shown in Step ➂ in Fig. 5. By the same reason, we must place the main function of the user program in a C++ file. Thus, main.cpp is generated. It calls a Fortran subroutine named f_main(). Hence, users should prepare a Fortran subroutine f_main() and must implement all parts of the simulation code inside f_main(). In the Fortran interface, all of C++ instances created in FDPS_Manipulators.cpp are assigned to Fortran’s integer variables. Hence, users also need to manage these instances using integer variables.
The file FDPS_ftn_if.cpp provides C interfaces to the functions defined in FDPS_Manipulators.cpp. These C functions can be called from a Fortran program by using the functionalities provided by module iso_c_binding in Fortran 2003 as described in § 3.3.1. The file FDPS_module.F90 provides a class in Fortran 2003, named fdps_controller, for users, which is used to call the C interface functions described above. All of the FDPS API for Fortran are provided as public type-bound procedures of this class. Therefore, in order to use the FDPS API, we first need to create an instance of this class, and then call its type-bound procedures.
As described above, the particle data is stored in FDPS_Manipulators.cpp as explained above. It is actually an array of C structures representing particles (cf. Step ➁ in Fig. 5). Fortran module iso_c_binding enables us to access it if a derived data type in Fortran corresponding to this C structure is interoperable with C. The Fortran interface makes use of this functionality, and therefore, we require that all derived data types that will be used in FDPS must be interoperable with C. We also use the functionalities of module iso_c_binding to pass interaction functions to FDPS_Manipulators.cpp. To take this approach, interaction functions must be implemented as subroutines in Fortran and they must be interoperable with C. This is because subroutines are passed in the form of the C addresses (i.e. function pointers in C), which can be obtained only if the subroutines are interoperable with C. These are the reasons why we require C interoperability in § 3.2.
4 Performance of application developed by Fortran interface
In this section, we present and discuss the performance of an application developed by Fortran interface to FDPS. Here, we mainly focus on the overhead of the FDPS API-call from a Fortran application because all the tasks of the API are processed in its C++ core and it is expected that the times required to process these tasks do not depend on development language. To this end, we compare difference between the performances of Fortran and C++ codes for -body simulations. The performance measurements are performed in different computer systems to check the dependencies of the performance on CPU architectures and compilers. Table 4 lists the computer systems and the compiler information used in the measurements.
System name & K Computer & Oakforest-PACS & Cray XC30
CPU & Fujitsu SPARC 64 VIIIfx & Intel Xeon Phi 7250 & Intel Xeon CPU E5-2650 v3
Compiler & Fujitsu compilers & Intel compilers & Intel compilers
& (ver. 1.2.0 P-id: L30000-15) & (ver. 17.0.4 20170411) & (ver. 16.0.4 20160811)
Compile options (C++) & -Kfast -Ksimd=2 -Krestp=all & -O3 -ipo -xMIC-AVX512 -no-prec-div & -fast -ipo -xCORE-AVX2 -no-prec-div
Compile options (F) & -X 03 -Kfast -Ksimd=2 -Free -fs & -O3 -ipo -xMIC-AVX512 -no-prec-div & -fast -ipo -xCORE-AVX2 -no-prec-div
The Fortran code used in the performance measurement is essentially same as the sample code described in § 3.2 (Fig. 4) except the following two differences. First, we apply standard optimization techniques to interaction functions. Figure 7 shows a Fortran subroutine for the gravitational interaction between particles. In the subroutine, the information of -particles are stored into local arrays (lines 11-16), and all the calculations in the innermost loop are described using arrays (lines 25-39). These modifications are aiming to facilitate compilers to optimize the code easier. We use almost the same subroutine for the interaction between particles and superparticles. For a fair comparison, in the C++ version of the -body simulation code, we use a function with the same structure, which is shown in Fig. 8.
Second, we rewrite all the operations between derived data types by using their member variables explicitly. This is because in the combination of Intel Xeon Phi and intel compilers the executable file which is generated with the highest optimization compile flags does not perform some numerical operations correctly. We believe that this is the problem on the compiler side and this situation will be improved by future upgrade of the compiler. But for now, we recommend users of Fortran interface to follow this prescription.
The both (Fortran and C++) codes solve the cold collapsed problem, which is one of the standard test problems in -body simulation codes. The most of the calculation times are spent on the part of the interaction calculation and the overhead due to Fortran interface may be hidden by this part. In order to clarify the cost of the overhead, we measure the performances for the following two cases: One is the case when the interaction functions are empty (empty cases) and the other is the case when the interaction functions shown in Figs. 7 and 8 are used (normal cases). Both codes are executed with a single core. In this case, the APIs for domain decomposition and particle exchange do nothing and just increase the overhead. The upper panels of Fig. 9 show the wall-clock time per step for different number of particles per processes for each computer system for the former case, while the lower panels show those for the latter case. As shown in the upper panels, the calculation times in both codes are almost the same for all computer systems we used. Relative differences of the wall-clock times between both codes are . Thus, the overhead due to Fortran interface is sufficiently small. The overall performances also show a similar behavior as shown in the lower panels of Fig. 9. Compared to the case of empty interaction function, the relative differences become small in each computer system. Thus, a code written by Fortran interface shows a performance nearly identical to that written by C++.
5 Example of practical application
In order to demonstrate its ability, we perform a large-scale global planetary ring simulation using a code developed by Fortran interface. Ring particles around a planet interact with each other through mutual gravity and inelastic collision. In the code, we calculate the gravity using the Tree method with an opening angle criterion of . The inelastic collision is modeled by the soft sphere model following Salo (1995) and Michikoshi & Kokubo (2017). The initial condition is taken from Michikoshi & Kokubo (2017), who investigated the dynamics of two narrow rings around Centaur Chariklo with changing the radius and mass of ring particles. Among their models, we consider the case of and , where and are the radius and density of ring particles, respectively. is the density of Chariklo. For simplicity, we consider the inner ring only, which is located at a distance of from the center of Chariklo. The radial width and optical depth of the ring are and , respectively. We model the ring by using particles. With these parameters, it is expected that the self-gravitational wakes form in the ring in a dynamical time (Toomre (1964)). The simulation is performed with using MPI processes and OpenMP threads on Oakforest-PACS.
Figure 10 show the distribution of ring particles at , where is the orbital period at the distance of . The particle distribution closely resembles that of Michikoshi & Kokubo (2017) as expected (see their Fig. 1).
Based on our experience, a great deal of time was not needed to complete this application. Thus, by using Fortran interface, researchers can concentrate on their study without spending a lot of time to develop simulation codes.
In this paper, we have presented the basic design and the implementation of the Fortran interface layer of FDPS, which is newly introduced in version 3.0. This layer provides API for Fortran and supports arbitrary types of particles and interactions as with the C++ core part of FDPS. This is realized by means of both a feature introduced in Fortran 2003 and auto-generation of interface programs. The Fortran interface also support almost all the standard features of FDPS. By using the Fortran interface, researchers can develop massively-parallel particle simulation codes in Fortran.
We also have presented the performance of an application developed by Fortran interface. By comparing with the performance of a C++ version of the application, we have shown that the cost of the overhead due to Fortran interface is sufficiently small and the overall performances of both codes are very similar to each other.
We are grateful to M. Tsubouchi, Y. Wakamatsu, and Y. Yamaguchi for their help in managing the FDPS development team. This research used computational resources of the K computer provided by the RIKEN Advanced Institute for Computational Science through the HPCI System Research project (Project ID:ra000008). Part of the research covered in this paper research was funded by MEXT’s program for the Development and Improvement for the Next Generation Ultra High-Speed Computer System, under its Subsidies for Operating the Specific Advanced Large Research Facilities. Numerical computations were in part carried out on Cray XC30 at Center for Computational Astrophysics, National Astronomical Observatory of Japan, and on Oakforest-PACS at Supercomputing Division, Information Technology Center, the University of Tokyo. L.W. in this work is supported by the funding from Alexander von Humboldt Foundation.
- Barnes & Hut (1986) Barnes, J. and Hut, P. 1986, Nature, 324, 446
- Diemand et al. (2013) Diemand, J., Angélil, R., Tanaka, K.K., & Tanaka, H. 2013, The Journal of Chemical Physics, 139, 074309
- Hosono et al. (2016a) Hosono, N., Saitoh, T.R., & Makino, J. 2016, , 224, 32
- Hosono et al. (2016b) Hosono, N., Saitoh, T.R., Makino, J., Genda, H., & Ida, Sh. 2016, Icarus, 271, 131
- Hosono et al. (2017) Hosono, N., Iwasawa, M., Tanikawa, A., Nitadori, K., Muranushi, T.R., & Makino, J. 2017, , 69, 26
- Iwasawa et al. (2015) Iwasawa, M., Tanikawa, A., Hosono, N., Nitadori, K., Muranushi, T. & Makino, J. 2015, WOLFHPC ’15, 1, 1
- Iwasawa et al. (2016) Iwasawa, M., Tanikawa, A., Hosono, N., Nitadori, K., Muranushi, T. & Makino, J., 2016, , 68, 54
- Iwasawa et al. (2017) Iwasawa, M., Oshino, Sh., Fujii, M.S., & Hori, Y. 2017, , 69, 5
- Ishiyama et al. (2015) Ishiyama, T., Enoki, M., Kobayashi, M.A.R., Makiya, R., Nagashima, M., & Oogi,T. 2015, , 67, 6116
- Michikoshi & Kokubo (2017) Michikoshi, Sh. and Kokubo, E. 2017, , 837, L13
- Salo (1995) Salo, H. 1995, Icarus, 117, 287
- Tanikawa et al. (2017) Tanikawa, A., Sato, Y., Nomoto, K., Maeda, K., Nakasato, N., & Hachisu, I. 2017, , 839, 81
- Tanikawa (2018a) Tanikawa, A. 2018, , 475, 67
- Tanikawa (2018b) Tanikawa, A. 2018, arXiv:1711.05451 (accepted)
- Toomre (1964) Toomre, A. 1964, , 139, 1217