Digital Blood in Massively Parallel CPU/GPU Systems for the Study of Platelet Transport

by   Christos Kotsalos, et al.

We propose a highly versatile computational framework for the simulation of cellular blood flow focusing on extreme performance without compromising accuracy or complexity. The tool couples the lattice Boltzmann solver Palabos for the simulation of the blood plasma, a novel finite element method (FEM) solver for the resolution of the deformable blood cells, and an immersed boundary method for the coupling of the two phases. The design of the tool supports hybrid CPU-GPU executions (fluid, fluid-solid interaction on CPUs, the FEM solver on GPUs), and is non-intrusive, as each of the three components can be replaced in a modular way. The FEM-based kernel for solid dynamics outperforms other FEM solvers and its performance is comparable to the state-of-the-art mass-spring systems. We perform an exhaustive performance analysis on Piz Daint at the Swiss National Supercomputing Centre and provide case studies focused on platelet transport. The tests show that this versatile framework combines unprecedented accuracy with massive performance, rendering it suitable for the upcoming exascale architectures.



There are no comments yet.


page 5

page 8

page 11

page 19


Performance comparison of CFD-DEM solver MFiX-Exa, on GPUs and CPUs

We present computational performance comparisons of gas-solid simulation...

State-of-the-art SPH solver DualSPHysics: from fluid dynamics to multiphysics problems

DualSPHysics is a weakly compressible smoothed particle hydrodynamics (S...

Mirheo: High-Performance Mesoscale Simulations for Microfluidics

The transport and manipulation of particles and cells in microfluidic de...

A simulation technique for slurries interacting with moving parts and deformable solids with applications

A numerical method for particle-laden fluids interacting with a deformab...

The Stabilized Explicit Variable-Load Solver with Machine Learning Acceleration for the Rapid Solution of Stiff Chemical Kinetics

Numerical solutions to differential equations are at the core of computa...

Validation and parameter optimization of a hybrid embedded/homogenized solid tumor perfusion model

The goal of this paper is to investigate the validity of a hybrid embedd...

Lattice Boltzmann Benchmark Kernels as a Testbed for Performance Analysis

Lattice Boltzmann methods (LBM) are an important part of current computa...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Background

Blood flow plays an important role in most of the fundamental functions of living organisms. Blood transports oxygen, nutrients, waste products, infectious parasites, tumour cells, to name a few, to tissues and organs. Despite remarkable advances in experimental techniques Tomaiuolo (2014), the type and detail of information provided remains limited. In the last two decades, computational tools for the direct numerical simulation of cellular blood flow have been developed. They complement experiments and have become an essential tool for in-depth investigations. These tools Freund (2014) have been used to study various poorly understood phenomena such as the non-Newtonian viscosity of blood Závodszky et al. (2017), the thrombus formation Fogelson and Neeves (2015), the Fåhræus effect, the characteristics of the cell free layer Vahidkhah et al. (2014), and the red blood cell (RBC) enhanced shear-induced diffusion of platelets Mehrabadi et al. (2015), Zhao and Shaqfeh (2011). Apart from physiological conditions, numerical tools have significantly assisted the understanding of pathological conditions Fedosov et al. (2011), Li et al. (2017), Chang et al. (2018)

, as they offer a controlled environment for testing a large number of parameters and classifying their effect on blood rheology. With the occurrence of more mature tools, there is an increased focus on developing/ analysing lab-on-chip systems

Rossinelli et al. (2015), Krüger,Timm et al. (2014) and drug delivery systems Sen Gupta (2016), Vahidkhah and Bagchi (2015). Despite such advances, we believe that there is a tremendous space for improvement in terms of fidelity, high-performance and clinically relevant scales.

Blood is a complex suspension of RBCs, white blood cells and platelets, submerged in a Newtonian fluid, the plasma. At hematocrit, RBCs occupy a substantial volume fraction of the blood and have therefore an important impact on the blood rheology. The accurate modelling of the collective transport of cells in the plasma is of paramount importance, since it can help decipher various in vivo and in vitro phenomena. A single of blood (almost a blood drop) contains a few million RBCs, a few hundred thousand platelets and a few thousand white blood cells. Thus, it is extremely challenging to develop a tool capable of simulating blood at cellular level for clinically relevant applications, using high fidelity models and utilising a reasonably limited amount of computational resources and time.

The absence of such a universal cellular blood flow computational tool constitutes the motivation behind the suggested framework. A universal framework should fulfil a number of criteria, such as generality, robustness, accuracy, performance and modularity. The criteria of generality, robustness and accuracy are addressed in our first description of the tool proposed in Kotsalos et al. Kotsalos et al. (2019). In this work we complete the framework by introducing an integrated environment that obeys all the criteria towards a universal computational tool for digital blood. Our framework is tailored for the fastest supercomputers (namely hybrid CPU/GPU clusters, which are well represented at the high end of the TOP500 list) and it is ready to be hosted in the upcoming exascale machines. Moreover, the suggested tool, even if it uses state-of-the-art numerical techniques, is not monolithically built upon them: the structural solver for the blood constituents can easily be replaced by another one, such as one based on discrete element models. Similarly, the lattice Boltzmann flow solver could be replaced by another option, which however needs to be similarly parallelisable through domain decomposition and allow interaction with solid particles through an immersed boundary method. In the last decade there are many groups working on HPC-capable cellular blood flow tools Rahimian et al. (2010), Peters et al. (2010), Bernaschi et al. (2011), Xu et al. (2013), Xu et al. (2017), Rossinelli et al. (2015) dealing with problems of increased complexity, which do however not reach the goal of a computational framework that fulfils simultaneously all above-mentioned criteria.

Our team (Palabos development group Latt et al. (2019), 28) has developed a numerical method and a high-performance computing (HPC) software tool for ab initio modelling of blood. The framework models blood cells like red blood cells and platelets individually, including their detailed non-linear elastic properties and a complex interaction between them. The project is particularly challenging because the large number of blood constituents (up to billions) stands in contrast with the high computational requirement of individual elements. While classical approaches address this challenge through simplified structural modelling of deformable RBCs (e.g. through mass-spring systems) Dupin et al. (2007), Fedosov et al. (2010), Reasor et al. (2011), Rossinelli et al. (2015), Závodszky et al. (2017), the present framework guarantees accurate physics and desirable numerical properties through a fully-featured FEM model Kotsalos et al. (2019). The required numerical performance is achieved through a hybrid implementation, using CPUs (central processing units) for the blood plasma and GPUs (graphics processing units) for the blood cells. The developed numerical framework is intended to grow to be a general-purpose tool for first-principle investigation of blood properties and to provide an integrative and scalable HPC framework for the simulation of blood flows across scales.

The present work is organised as follows: In section 2, we present the structure of our computational tool and the basic underlying methods. In section 3, we provide a performance analysis on the Piz Daint supercomputer and various case studies of platelet transport.

2 Methods

2.1 Computational Framework

The understanding and deciphering of a complex transport phenomenon like the movement of platelets requires the deployment of high fidelity direct numerical simulation tools that resolve the cellular nature of blood. Platelets are submerged in the blood plasma and collide continuously with the RBCs, also known as erythrocytes, that are present in much larger quantities. To capture accurately their trajectories and understand the driving mechanisms behind their motion, we propose a modular and generic high-performance computational framework capable of resolving fully 3D blood flow simulations. The computational tool is built upon three modules, namely the fluid solver, the solid solver and the fluid-solid interaction (FSI).

The fluid solver is based on the lattice Boltzmann method (LBM) and solves indirectly the weakly compressible Navier-Stokes equations. The 3D computational domain is discretised into a regular grid with spacing in all directions. For this study, we use the D3Q19 stencil, with the BGK collision operator and non-dimensional relaxation time (higher gives higher ). The time step is determined through the formula , where the fluid speed of sound is and the kinematic viscosity. Furthermore, external forcing terms (like the FSI force ) can be incorporated in the LBM through a modification of the collision operator using the Shan-Chen forcing scheme Shan and Chen (1993). More information on LBM can be found in Feng et al. (2007), Krüger et al. (2017), Krüger (2012).

The solid solver is based on the recently introduced nodal projective finite elements method (npFEM) by Kotsalos et al. Kotsalos et al. (2019), which offers an alternative way of describing elasticity. The npFEM framework is a mass-lumped linear FE solver that resolves both the trajectories and deformations of the blood cells with high accuracy. The solver has the capability of capturing the rich and non-linear viscoelastic behaviour of red blood cells as shown and validated in Kotsalos et al. (2019). The platelets are simulated as nearly-rigid bodies by modifying the stiffness of the material. The implicit nature of the npFEM solver renders it capable of resolving extreme deformations with unconditional stability for arbitrary time steps. Even though the solver is based on FEM and an implicit integration scheme, its performance is very close to the widely used mass-spring systems Fedosov et al. (2010), Závodszky et al. (2017), outperforming them in robustness and accuracy Kotsalos et al. (2019). Regarding the blood cell viscoelastic behaviour, the solver uses a two-way approach to handle the response of the cell to the imposed loads over time (Rayleigh and position-based dynamics damping Kotsalos et al. (2019)

). It should be pointed out that the interior fluid of the cell is implicitly taken into account, as its incompressibility contributes to the potential energy of the body and its viscosity augments the viscosity of the membrane. Furthermore, the force on the bodies is derived from the hydrodynamic stress tensor by considering the lattice points at the exterior of the bodies. A complete presentation of npFEM can be found in Kotsalos

et al. Kotsalos et al. (2019).

The fluid-solid interaction is realised by the immersed boundary method (IBM) and more specifically by the multi-direct forcing scheme proposed by Ota et al. Ota et al. (2012) (with minor modifications, see supplementary material). The IBM imposes a no-slip boundary condition, so that each point of the surface and the ambient fluid moves with the same velocity. The advantage of the IBM is that the fluid solver does not have to be modified except for the addition of a forcing term

. Moreover, the deformable body and its discrete representation do not need to conform to the discrete fluid mesh, which leads to a very efficient fluid-solid coupling. The exchange of information from the fluid mesh to the Lagrangian points of the discrete bodies and vice versa is realised through interpolation kernels with finite support. The

kernel Mountrakis et al. (2017)

is used throughout the simulations of the current study. The IBM is an iterative algorithm where the force estimate on a Lagrangian point is computed by the difference of the vertex velocity and the velocity interpolated by the surrounding fluid nodes. Then, this force is spread onto the fluid nodes (

) surrounding the Lagrangian point and the correction affects the collision operator of the LBM. This interpolation between the membranes and the fluid is repeated for a finite amount of steps. For the simulations shown in this article, just one iteration suffices for the required accuracy.

A brief but more instructive overview of the methods presented above can be found in the supplementary material.

2.2 Towards stable and robust FSI

There exist two main ways to realise fluid-solid interaction, which are monolithic and modular respectively. The former describes the fluid and solid phases through one system of equations and both are solved with a single solver, using as well the same discretisation. Examples include tools that use dissipative particle dynamics to resolve both the fluid and solid. An advantage of the monolithic approach is the consistency of the coupling scheme, which leads to more numerically stable solutions. The main disadvantage is that a single solver potentially falls short of satisfactorily addressing all the physics present in a complex phenomenon. In the modular approach, there is the freedom to choose well optimised solvers to address the different phases. However, the consistent coupling of the solvers becomes a major challenge, especially when the discretisation (space & time) is non-conforming. Our computational framework uses different spatial and time resolutions for the fluid and solid phases. For example, the solid solver is executed every two iterations (at least), which could potentially introduce coupling instabilities. The instabilities arise mainly from under-resolution and from integration schemes that do not conserve energy and momenta (linear/ angular) exactly, thus leading to spirally increasing energy oscillations between the solvers. The remedies suggested below are tailored to the specific framework, but could potentially give useful hints for other implementations.

The IBM requires a match between the lattice spacing and the average edge length of the discretised membranes (triangulated surfaces). The value of the mesh ratio appears to play a minor role as long as it is comprised in the range Krüger (2012). A RBC discretised with 258 surface vertices exhibits a ratio with a lattice spacing of . For low shear rates, this requirement can be further relaxed.

An accurate evaluation of the external forces acting on the immersed boundaries plays a critical role to achieve a consistent coupling. For higher accuracy we use the hydrodynamic stress tensor projected onto the surface normals instead of the native force term produced by the IBM. Furthermore, compatible with the aim to disregard the interior fluid of the blood cells, we found out that the most stable force evaluation scheme comes from measuring at the exterior most distant site from the Lagrangian point contained within the interpolation kernel.

A meticulous handling of the near-contact regions is deemed highly critical to suppress instabilities. The first step of our procedure consists of searching for Lagrangian points belonging to bodies other than the investigated one that are contained within the interpolation kernel of the current point. If there are no “foreign” particles in the kernel then no modification is needed. It is then assumed that the interaction physics is appropriately resolved by the fluid field in between bodies. Otherwise, the collision framework takes over, since the evaluation of is “contaminated” by the interior fluid of a neighbouring body. Subsequently, the forces on the Lagrangian point from the fluid are disregarded and a collision force, coming from a quadratic potential energy Kotsalos et al. (2019), is used instead. This technique is named by us particle in kernel (PIK) and resolves very accurately colliding bodies (more in supplementary material). We would like to highlight that the actual IBM algorithm is not affected by the PIK technique.

The selected IBM version Ota et al. (2012) starts from an estimate of a force at the Lagrangian points required to enforce a no-slip condition. This force is spread into the neighbourhood of the points to communicate the constraints of the solid boundary to the fluid. The force estimate is proportional to the difference of the vertex velocity (as computed by the npFEM solver) and the velocity interpolated by the surrounding fluid nodes. The component that can be controlled in the above procedure is the npFEM vertex velocity which, if it exceeds a value of , is truncated towards this threshold. The constant can be comprised between Krüger et al. (2017) and is related to the fact that the simulated Mach number should be , since LBM errors increase dramatically at high . This velocity capping proves to be very stabilising when necessary. If the classic IBM Peskin (1972) is used, then a force capping has the aforementioned stabilising effect.

Once the force is computed, a median filtering in the one-ring neighbourhood of every Lagrangian point attenuates force spikes that could result in energy oscillations.

2.3 High Performance Computing (HPC) Design

Direct numerical simulations of cellular blood flow are pushing the computational limits of any modern supercomputer, given the complexity of the underlying phenomena. The amount of unknowns per second varies from millions to trillions Rossinelli et al. (2015), and the proposed computational framework is built with genericity, modularity and performance in mind, able to tackle problems in the whole range of unknowns. This computational tool is orchestrated by Palabos Latt et al. (2019), 28, which is responsible for data decomposition and for the communication layer. Palabos (for Parallel Lattice Boltzmann S

olver) is an open-source library for general-purpose computational fluid dynamics, with a “kernel” based on the lattice Boltzmann method. Palabos is written in C

++ with extensive use of the Message Passing Interface (MPI) and with proven HPC capability, particularly in the domain of computational biomedicine Mountrakis et al. (2015), Zavodszky et al. (2017), Tan et al. (2018). On top of Palabos core library, we have developed the npFEM solver, which is written in C++ and CUDA, a general purpose parallel computing platform and programming model for NVIDIA GPUs and it is derived from the open-source library ShapeOp 36. There are two active branches of the npFEM library, a CPU-only implementation and a full GPU implementation leveraging NVIDIA GPUs. The GPU parallelization strategy is based on the idea of using one CUDA-thread per Lagrangian point and one CUDA-block per blood cell. This is feasible since the most refined blood cell model has less points (discretised membrane) than the maximum allowed number of threads per block (hardware constraint). Keeping all points of a cell within a single CUDA-block allows us to compute the entire solver time step in one CUDA-kernel call, thus making good use of cache and shared memory Bény et al. (2019).

Load balancing plays a critical role and impacts the efficiency and scalability of HPC applications. For our hybrid CPU/GPU system, three components require special attention. The first is the representation of the fluid domain through a homogeneous grid. The lattice sites are partitioned by Palabos and are distributed to the available MPI tasks (LBM on CPUs). The second component of the system are the plain Lagrangian points that describe the immersed blood cells for the IBM (see Figure 1, right-hand side image). These points are attached to their immediate fluid cells, and thus their dispatching to the available MPI tasks is aligned with the fluid decomposition (IBM on CPUs). The immersed bodies have a dual nature, i.e. they are seen both as a set of plain Lagrangian points for the IBM but also as a set of augmented Lagrangian particles (connectivity and material properties on top of position and velocity) on the solid solver side, see Figure 1 both images, where both the plain points and the surfaces are represented. This means that the Lagrangian points are duplicated for the IBM and the npFEM modules. Finally, the MPI tasks that are linked with a GPU are responsible for the solid phase. The blood cells are distributed evenly and statically to the available GPUs in a manner that is disconnected from the attribution of the fluid cells and Lagrangian points to the CPUs (npFEM on GPUs). For example, let us assume that MPI task j (see Figure 1, left-hand side image) handles blood cells. The blood cell with tag 1 is spatially located in the domain managed by MPI task k. Thus the communication of the external forces, the colliding neighbours and the new state of the body at t+1 is realised through MPI point-to-point communication for all surface vertices of the cell. The same parallel strategy is adopted in the CPU-only version. This strategy may seem counter-intuitive, leading to a substantial communication load, especially compared to an approach in which the structural solver is attributed to nodes dynamically and retains a processor locality with the coupled flow portions. However, the theoretical scalability of our approach is compatible with the massively parallel vision of the project, as the total amount of communicated data scales with the number of blood cells, and it is independent of the number of computational nodes (except in cases with very few nodes). Indeed every surface vertex is involved in exactly two point-to-point exchanges (a fluid-to-solid and a solid-to-fluid exchange). This fact avoids an over-saturation of the network in a situation of weak scaling, provided that the capacity of the network scales with the number of used compute nodes. It further guarantees that our framework can be connected to any structural solver, as the data provided to the structural solver is always fully contained on a compute node. Our approach further ensures a targeted communication strategy, as the data actually needed by the solver can be handpicked. By providing fully decoupled solvers, we favour a generic and modular approach over ad hoc and monolithic solutions. Figure 2 presents the decoupled structure of our framework, the communication layer and the advancement rules.

Figure 1: Load balancing of the fluid and solid phases for a modern CPU/GPU computing system. Cubic domain at hematocrit with RBCs and platelets. The immersed bodies exist both as plain Lagrangian points (positions and velocities for the IBM) and augmented Lagrangian points (positions, velocities and other properties relevant only to the solid solver). The LBM and IBM are executed on CPUs, while the npFEM on GPUs (hybrid version of the framework). In most cases, the number of GPUs is way less than the available MPI tasks, see Piz Daint with a ratio 1:12.
Figure 2: Modular structure of our computational framework. We present the two independent solver streams and the required MPI point-to-point communication for advancing the physical system in time. The decoupling of the solvers leads to a framework that is agnostic to the underlying numerical methods. This diagram remains the same for both the CPU-only version of the framework and the CPU/ GPU implementation (hybrid).

3 Results & Discussion

Goal of this section is to prove the capability of our computational framework to efficiently handle problems of varying size. This is done through an exhaustive presentation of performance metrics that are realised at Piz Daint, the flagship system of the Swiss National Supercomputing Centre (CSCS), ranked worldwide and in Europe according to the list TOP500 (June 2019) This supercomputer has 5704 GPU nodes equipped with 12 cores and one NVIDIA GPU each, and 1813 CPU nodes equipped with 36 cores each. A complete presentation of the supercomputer can be found in the supplementary material. Our main focus is the hybrid version of the framework, where the deformable blood cells are resolved on the GPUs, while the blood plasma and the FSI are resolved on the CPUs.

For every case study performed in this work, the flow field has a constant shear rate , realised with help of a moving top wall and a fixed bottom wall. This low shear rate is chosen in order to reproduce the experiments in Chopard et al. Chopard et al. (2017), and is not due to a computational limitation. Table 1 summarises all the different domains, represented through their dimension in format. The flow direction is parallel to the -axis, the height of the channel spans along the -direction, and the periodic boundaries in directions. Furthermore, the hematocrit of the systems varies between and , covering the whole physiological range. The domain is initialised by randomly positioning the blood cells (without avoiding interpenetration) and then executing the computational framework for few thousand steps while the fluid and the FSI solvers are deactivated. This novel cell packing approach is based on the very efficient collision detection offered by Palabos and the unconditional stability of the npFEM solver, which can resolve extreme deformations and interpenetrations. The platelets are simulated as nearly-rigid oblate ellipsoids with a diameter of , a thickness of , and a volume of , which is an average value for non-activated platelets. The platelet to RBC ratio is , and therefore substantially larger than the physiological one ( Mehrabadi et al. (2015)). This is a deliberate choice intended to provide more samples for the statistical analysis of the platelet transport. As for the shape of RBCs, the normal biconcave shape is used. A complete list of all the parameters used in this study can be found in the supplementary material. The performance metrics are followed by an analysis on platelet transport. A series of experiments with varying RBC viscoelasticity and channel height present the idiosyncratic behaviour of the platelets and their sensitivity to various flow factors.

Computational Domain () 50x50x50 50x100x50 50x500x50 50x1000x50 100x1000x100
Hematocrit 35% 476 95 953 190 4765 953 9531 1906 38126 7625
Hematocrit 45% 612 122 1225 245 6127 1225 12255 2451 49020 9804
Table 1: Numbers of blood cells, RBCs and Platelets (PLTs), for different case studies.

3.1 Performance Analysis

Simulations at the spatial scale of millimetre commonly ignore the detailed particulate nature of blood because of the tremendous computational cost, and instead model the effect of the particles through continuum modelling. On the other hand, in publications of state-of-the-art fully resolved whole blood simulations Vahidkhah and Bagchi (2015), Blumers et al. (2017), Závodszky et al. (2017), Tan et al. (2018), the overall spatial scale of the simulation remains very small, at the order of a few tens of micrometres. The suggested HPC framework is built towards the direction of simulating macroscopic flows, at the order of of whole blood, while representing the details of the microscopic physics, thus offering users the possibility to address a large range of problems with clinical relevance. In the context of the current scientific goals, the performance metrics of this HPC framework must be considered under the light of weak scaling. Indeed, the purpose of seeking more powerful computational resources is not to improve the resolution or increase the time span of the simulation, but to extend the physical volume of the blood considered in the model.

In the weak scaling, the computational load per processor (either CPU or GPU) remains constant. Thus, the problem size increases proportionally with the number of computational hardware units. The reference case study is a 50x50x50 domain, solved on 5 GPU nodes () with reference time noted as . The weak scaling efficiency is given by , where is the time spend in GPU nodes for a domain times larger than the reference one. Figure 3 presents the efficiency of the proposed computational framework for a problem growth up to times compared to the reference domain (at 400 GPU nodes). Even if the largest tested domain is still distinctly smaller than , the direction of interest (wall-bounded direction) approaches scales of macroscopic extent, and the remaining directions are adequately resolved through periodicity. Our long-term vision for macroscopic flows includes assigning further blood cells per GPU. This on its side requires strong CPU performance to cope with annex preparatory operations (the “Other” section on Figures 4, 5), which might be matched by novel, upcoming supercomputing systems. The code presents good efficiency and its performance does not degrade for higher hematocrit. Other frameworks that are based on a modular approach for the coupling of fluid and solid solvers (Rahimian et al. (2010), Clausen et al. (2010), Mountrakis et al. (2015), Tan et al. (2018)) demonstrate an efficiency between . On the contrary, frameworks that follow the monolithic paradigm Rossinelli et al. (2015), deliver a more impressive efficiency, often above . Nevertheless, this is a small penalty to be paid for genericity and modularity over ad hoc solutions.

Figure 3: Weak scaling efficiency for various domains at different hematocrit. To better understand the problem sizes consult table 1. The efficiency corresponds to the hybrid version of the code. The GPU nodes in Piz Daint are equipped with 12 cores and one NVIDIA GPU each.

Figure 4 presents the average execution time per iteration for different hematocrit. The bottom layer of the bars labelled as “Other” contains operations (executed on CPUs only) such as computation of external forces, collisions detection, particles advancement and various book-keeping. The “error” bars delimit the deviation from the average, where the minimum corresponds to the reference case study and the maximum to the largest case study (see table 1) in the context of weak scaling. A striking observation is that the GPU-ported npFEM solver constitutes only of the total execution time, especially if compared with other state-of-the-art implementations Mountrakis et al. (2015), Zavodszky et al. (2017) which report a contribution of the solid solver to over of the overall time. On the other hand, the fluid solver and the FSI account for about of the execution time with a consistent scaling over larger domains and higher hematocrit. The communication is seen to vary around of the execution time but does not seem to constitute a bottleneck since it is realised through optimised non-blocking point-to-point communication. The “Other” group of operations occupy a large portion of the total time, and this hot-spot reflects the choice of genericity and modularity. A possible conclusion from these observations could be to port the remaining parts of the solver to GPUs given the great performance of the solid solver. It is however debatable if this choice would be optimal, given that modern supercomputers tend to become “fatter” in both CPUs and GPUs, as shown in the example of Summit with 2 CPUs per node (21 cores each) and 6 NVIDIA Volta GPUs, ranked 1st according to the TOP500 list, June 2019. Thus, the best strategy is to fully exploit the available hardware and not develop one-sided applications. Another counter-argument is that some numerical methods such as the IBM have a large and complex memory footprint that renders them less GPU friendly. An earlier attempt Bény et al. (2019) to port the whole framework on GPUs could not serve as a justification to move in this direction.

Figure 4: Execution time per iteration for different hematocrit (hybrid version). The “Other” contains operations (CPU-only) such as computation of external forces, collisions detection, particles advancement and various book-keeping.

Figure 5 presents the execution time per iteration and compares the hybrid (CPU/GPU) version with the CPU-only version, including both the measured time averages and the deviations from the average. These results provide further insights into the weak scaling results up to a domain 50x500x50 at 35% hematocrit. Main assumption is the one-to-one correspondence of the GPU and CPU nodes of Piz Daint, e.g. solving the computational domain 50x500x50 in 50 GPU nodes (one GPU and 12 cores each) for the hybrid version and in 50 CPU nodes (36 cores each) for the CPU-only version. The npFEM solver on its own exhibits a speedup of about 4, favouring the GPU implementation. Moreover, in the CPU-only version it is obvious that the solid solver constitutes an overwhelming part of the overall performance, while in the hybrid version the GPU-porting solves this problem in a very efficient manner. More performance analysis results can be found in the supplementary material.

Figure 5: Comparison of execution time per iteration for the Hybrid (CPU/GPU) and CPU-only versions of the framework at 35 % hematocrit. The GPU nodes have 12 cores and one GPU each, while the CPU nodes have 36 cores each.

The main challenge of the computational tools for the simulation of the particulate nature of blood is to solve systems with a sufficient number of blood cells () for a physically relevant duration ( 1 s) in a reasonable time (less than a week) with the smallest possible amount of computational resources. The proposed computational framework achieves the aforementioned goals and can be compared with other state-of-the-art solvers Zavodszky et al. (2017), Tan et al. (2018). The main novelty is that we are able to achieve this by using high fidelity models for the blood cells which have a richer physical content than simple mass-spring systems. To the best of our knowledge, there is no other computational framework using fully validated FEM-based solid solver that can achieve these target values.

3.2 Platelet Transport with varying Viscoelasticity of RBCs

RBC viscoelastic behaviour, a collective term for the contribution of both the membrane and the cytoplasm, is a widely-accepted factor with critical impact on health and disease (pathophysiological conditions). Pathological alterations in RBC deformability have been associated with various diseases Tomaiuolo (2014) such as malaria, sickle cell anaemia, diabetes, hereditary disorders, chronic obstructive pulmonary disease, etc. Despite its crucial role, RBC viscoelasticity is overlooked in the majority of the computational tools for cellular flows. Here, we study the effect of different viscoelastic parameters of RBCs on the platelet transport and discriminate each case with the use of the mean square displacement (MSD) in the wall-bounded direction. The parameter that is altered is the as presented in Kotsalos et al. Kotsalos et al. (2019). The MSD is defined as , with the position of platelet in the wall-bounded y-direction. The averaging spans either over all the available platelets, i.e. RBC-rich layer (RBC-RL) and Cell Free Layer (CFL), or only over the platelets of the RBC-RL. The flow setup includes a constant shear rate at , a domain of size and a hematocrit of (see table 1).

Figure 6 presents the MSD over all the platelets of the domain for three different values of the viscoelastic parameter . There is a clear distinction of the less viscous RBCs () to the more viscous () and their projected effect on the platelet transport. The higher the viscoelasticity, the slower the response to the imposed external loads.

Figure 6: MSD with averaging over all the available platelets. The curves correspond to different values of .

The computation of the diffusion coefficient of the platelets demands the MSD to be averaged over the RBC-RL, and its slope corresponds to . Figure 7 presents the MSD and a linear fitting on the curves. The diffusion coefficient in all cases is about , in agreement with previously reported values Zydney and Colton (1988), Affeld et al. (2013), Vahidkhah et al. (2014). It is two to three orders of magnitude higher than the Brownian diffusivity, suggesting RBC-augmented diffusion. An interesting observation in conjunction with the study of Kumar and Graham Kumar and Graham (2011), Kumar and Graham (2012) and assuming that the more viscous RBCs are more “rigid” (slower response to external forcing), is that the less viscous RBCs lead to higher platelet diffusivity and thus faster concentration towards the walls. The varying diffusivity can be explained from the fact that in heterogeneous collisions the net displacement of the stiff particle (platelet) is substantially larger than that of the floppy particle (RBC) and the displacement is larger for larger rigidity ratio Kumar and Graham (2011). Thus for less viscous RBCs we expect higher displacements of platelets and thus larger diffusion coefficient, as shown in Figure 7. The platelets that reach the walls tend to stay in the CFL and behave as being trapped in this layer, under any experiment conducted.

Figure 7: MSD with averaging over the platelets in the RBC-RL. The slopes at the bottom right are obtained from a linear fitting on each curve. The corresponds to 300 ms from the beginning of the simulations. Diffusion coefficients are measured at steady state.

3.3 Platelet Transport for Larger Geometries

Most studies are bounded by domains of few micrometers and low hematocrit due to the high computational cost. Nevertheless, interesting phenomena can amplify as sizes increase Mountrakis et al. (2013). Given our HPC-capable framework, we are interested on quantifying the diffusivity of the platelets as the channel height varies. Here, a flow field with constant shear rate and hematocrit is considered. The wall-bounded direction takes three different sizes , while the other, periodic directions remain at (see table 1). The dimensionless numbers that describe the dynamics of the problem are the capsule Reynolds number with the shear rate, the characteristic length of the capsule, and the capillary number with the dynamic viscosity of blood plasma and the membrane shear modulus (see supplementary material and Kotsalos et al. (2019)). Figure 8 shows the mean square displacement in the RBC-RL for the largest case study along with the linear fitting. The diffusion coefficients for all the different experiments are about two to three orders of magnitude higher than the Brownian diffusivity Vahidkhah et al. (2014), Zydney and Colton (1988). Figure 9 summarises the simulations conducted for the varying channel case study.

Figure 8: Mean square displacement measured in the RBC-rich layer. The corresponds to 300 ms from the beginning of the simulations. The linear fitting on the MSD gives a slope that is equal to 2D, where D is the diffusion coefficient of the platelets.
Figure 9: Shear flow generated by our computational framework for fully resolved blood flow simulations. The left image shows two different viewpoints of the domain at 35% hematocrit. The right image depicts a domain 50x100x50 at 35 % hematocrit.

4 Conclusions

In this study, we provided a computational framework for digital blood. The full resolution of the particulate nature of blood is a challenging venture, especially when it is compiled into a framework that is based on generality, modularity, performance without compromising robustness and accuracy. The individual numerical techniques used for the simulation of blood constituents (LBM for the fluid and FEM for the solid phase) are characterised by their high fidelity for capturing physical phenomena, and their coupling has shown to sufficiently resolve the complex interaction between the blood cells.

This kind of computational tool complements the toolset for a digital lab. More precisely, the present project complements another research activity based on a coarse-grained approximation of blood using stochastic methods and random walks. The fully resolved models, apart from providing in-depth investigations on various case studies, are used to fine-tune the coarse-grained models, e.g. providing diffusion coefficients of various particles, thus constituting a critical component in this integrative approach towards digital blood. Our future scientific endeavours will be moving to this multi-scale direction as recently depicted in Herschlag et al. (2019).


We acknowledge support from the PASC project 2017-2020, Virtual Physiological Blood: an HPC framework for blood flow simulations in vasculature and in medical devices.


This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 823712 (CompBioMed2 project).


  • K. Affeld, L. Goubergrits, N. Watanabe, and U. Kertzscher (2013) Numerical and experimental evaluation of platelet deposition to collagen coated surface at low shear rates. Journal of Biomechanics 46 (2), pp. 430 – 436. Note: Special Issue: Biofluid Mechanics External Links: Document Cited by: §3.2.
  • J. Bény, C. Kotsalos, and J. Latt (2019) Toward full gpu implementation of fluid-structure interaction. In 2019 18th International Symposium on Parallel and Distributed Computing (ISPDC), pp. 16–22. External Links: Document Cited by: §2.3, §3.1.
  • M. Bernaschi, M. Bisson, T. Endo, S. Matsuoka, M. Fatica, and S. Melchionna (2011) Petaflop biofluidics simulations on a two million-core system. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, New York, NY, USA, pp. 4:1–4:12. External Links: Document Cited by: §1.
  • A. L. Blumers, Y. Tang, Z. Li, X. Li, and G. E. Karniadakis (2017) GPU-accelerated red blood cells simulations with transport dissipative particle dynamics. Computer Physics Communications 217, pp. 171 – 179. External Links: Document Cited by: §3.1.
  • H. Chang, A. Yazdani, X. Li, K. A.A. Douglas, C. S. Mantzoros, and G. E. Karniadakis (2018) Quantifying platelet margination in diabetic blood flow. Biophysical Journal 115 (7), pp. 1371 – 1382. External Links: Document Cited by: §1.
  • B. Chopard, D. R. de Sousa, J. Lätt, L. Mountrakis, F. Dubois, C. Yourassowsky, P. Van Antwerpen, O. Eker, L. Vanhamme, D. Perez-Morga, G. Courbebaisse, E. Lorenz, A. G. Hoekstra, and K. Z. Boudjeltia (2017) A physical description of the adhesion and aggregation of platelets. Royal Society Open Science 4 (4), pp. 170219. External Links: Document Cited by: §3.
  • J. R. Clausen, D. A. Reasor, and C. K. Aidun (2010) Parallel performance of a lattice-boltzmann/finite element cellular blood flow solver on the ibm blue gene/p architecture. Computer Physics Communications 181 (6), pp. 1013 – 1020. External Links: Document Cited by: §3.1.
  • M. M. Dupin, I. Halliday, C. M. Care, L. Alboul, and L. L. Munn (2007) Modeling the flow of dense suspensions of deformable particles in three dimensions. Physical review E 75 (6 Pt 2). External Links: Document Cited by: §1.
  • D. A. Fedosov, B. Caswell, S. Suresh, and G. E. Karniadakis (2011) Quantifying the biophysical characteristics of plasmodium-falciparum-parasitized red blood cells in microcirculation. Proceedings of the National Academy of Sciences 108 (1), pp. 35–39. External Links: Document Cited by: §1.
  • D. A. Fedosov, B. Caswell, and G. E. Karniadakis (2010) A multiscale red blood cell model with accurate mechanics, rheology,dynamics. Biophysical Journal 98 (10), pp. 2215–2225. External Links: Document Cited by: §1, §2.1.
  • Y. T. Feng, K. Han, and D. R. J. Owen (2007) Coupled lattice boltzmann method and discrete element modelling of particle transport in turbulent fluid flows: computational issues. International Journal for Numerical Methods in Engineering 72 (9), pp. 1111–1134. External Links: Document Cited by: §2.1.
  • A. L. Fogelson and K. B. Neeves (2015) Fluid mechanics of blood clot formation. Annual Review of Fluid Mechanics 47 (1), pp. 377–403. External Links: Document Cited by: §1.
  • J. B. Freund (2014) Numerical simulation of flowing blood cells. Annual Review of Fluid Mechanics 46 (1), pp. 67–95. External Links: Document Cited by: §1.
  • G. Herschlag, J. Gounley, S. Roychowdhury, E. Draeger, and A. Randles (2019) Multi-physics simulations of particle tracking in arterial geometries with a scalable moving window algorithm. In IEEE Cluster 2019, Albuquerque, New Mexico USA. Cited by: §4.
  • C. Kotsalos, J. Latt, and B. Chopard (2019) Bridging the computational gap between mesoscopic and continuum modeling of red blood cells for fully resolved blood flow. Journal of Computational Physics 398. External Links: ISSN 0021-9991, Document Cited by: §1, §1, §2.1, §2.2, §3.2, §3.3.
  • T. Krüger (2012) Computer simulation study of collective phenomena in dense suspensions of red blood cells under shear. Vieweg+Teubner Verlag. External Links: Document Cited by: §2.1, §2.2.
  • T. Krüger, H. Kusumaatmaja, A. Kuzmin, O. Shardt, G. Silva, and E. M. Viggen (2017) The lattice boltzmann method. External Links: Document Cited by: §2.1, §2.2.
  • Krüger,Timm, Holmes,David, and C. V. (2014) Deformability-based red blood cell separation in deterministic lateral displacement devices—a simulation study. Biomicrofluidics 8 (5), pp. 054114. External Links: Document Cited by: §1.
  • A. Kumar and M. D. Graham (2011) Segregation by membrane rigidity in flowing binary suspensions of elastic capsules. Phys. Rev. E 84, pp. 066316. External Links: Document Cited by: §3.2.
  • A. Kumar and M. D. Graham (2012) Margination and segregation in confined flows of blood and other multicomponent suspensions. Soft Matter 8, pp. 10536–10548. External Links: Document Cited by: §3.2.
  • J. Latt, O. Malaspinas, D. Kontaxakis, A. Parmigiani, D. Lagrava, F. Brogi, M. B. Belgacem, Y. Thorimbert, S. Leclaire, S. Li, C. Kotsalos, F. Marson, C. Coreixas, R. Petkantchin, F. Raynaud, R. Conradin, J. Beny, and B. Chopard (2019) Palabos: parallel lattice boltzmann solver. External Links: Document Cited by: §1, §2.3.
  • X. Li, H. Li, H. Chang, G. Lykotrafitis, and G. Em Karniadakis (2017) Computational Biomechanics of Human Red Blood Cells in Hematological Disorders. Journal of Biomechanical Engineering 139 (2). External Links: Document Cited by: §1.
  • M. Mehrabadi, D. N. Ku, and C. K. Aidun (2015) A continuum model for platelet transport in flowing blood based on direct numerical simulations of cellular blood flow. Annals of Biomedical Engineering 43 (6), pp. 1410–1421. External Links: Document Cited by: §1, §3.
  • L. Mountrakis, E. Lorenz, and A. G. Hoekstra (2013) Where do the platelets go? a simulation study of fully resolved blood flow through aneurysmal vessels. Interface Focus 3 (2), pp. 20120089. External Links: Document Cited by: §3.3.
  • L. Mountrakis, E. Lorenz, and A. G. Hoekstra (2017) Revisiting the use of the immersed-boundary lattice-boltzmann method for simulations of suspended particles. Phys. Rev. E 96, pp. 013302. External Links: Document Cited by: §2.1.
  • L. Mountrakis, E. Lorenz, O. Malaspinas, S. Alowayyed, B. Chopard, and A. G. Hoekstra (2015) Parallel performance of an ib-lbm suspension simulation framework. Journal of Computational Science 9, pp. 45–50. External Links: Document Cited by: §2.3, §3.1, §3.1.
  • K. Ota, K. Suzuki, and T. Inamuro (2012) Lift generation by a two-dimensional symmetric flapping wing: immersed boundary-lattice boltzmann simulations. Fluid Dynamics Research 44 (4). External Links: Document Cited by: §2.1, §2.2.
  • [28] (2019) Palabos. Note: Cited by: §1, §2.3.
  • C. S. Peskin (1972) Flow patterns around heart valves: a numerical method. Journal of Computational Physics 10 (2), pp. 252–271. External Links: Document Cited by: §2.2.
  • A. Peters, S. Melchionna, E. Kaxiras, J. Lätt, J. Sircar, M. Bernaschi, M. Bison, and S. Succi (2010) Multiscale simulation of cardiovascular flows on the ibm bluegene/p: full heart-circulation system at red-blood cell resolution. In SC ’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–10. External Links: Document Cited by: §1.
  • A. Rahimian, I. Lashuk, S. Veerapaneni, A. Chandramowlishwaran, D. Malhotra, L. Moon, R. Sampath, A. Shringarpure, J. Vetter, R. Vuduc, D. Zorin, and G. Biros (2010) Petascale direct numerical simulation of blood flow on 200k cores and heterogeneous architectures. In SC ’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. External Links: Document Cited by: §1, §3.1.
  • D. A. Reasor, J. R. Clausen, and C. K. Aidun (2011) Coupling the lattice-boltzmann and spectrin-link methods for the direct numerical simulation of cellular blood flow. International Journal for Numerical Methods in Fluids 68 (6), pp. 767–781. External Links: Document Cited by: §1.
  • D. Rossinelli, Y. Tang, K. Lykov, D. Alexeev, M. Bernaschi, P. Hadjidoukas, M. Bisson, W. Joubert, C. Conti, G. Karniadakis, M. Fatica, I. Pivkin, and P. Koumoutsakos (2015) The in-silico lab-on-a-chip: petascale and high-throughput simulations of microfluidics at cell resolution. In SC ’15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. External Links: Document Cited by: §1, §1, §1, §2.3, §3.1.
  • A. Sen Gupta (2016) Role of particle size, shape, and stiffness in design of intravascular drug delivery systems: insights from computations, experiments, and nature. Wiley Interdisciplinary Reviews: Nanomedicine and Nanobiotechnology 8 (2), pp. 255–270. External Links: Document Cited by: §1.
  • X. Shan and H. Chen (1993) Lattice boltzmann model for simulating flows with multiple phases and components. Physical Review E 47 (3), pp. 1815–1819. External Links: Document Cited by: §2.1.
  • [36] (2014) ShapeOp. Note: Cited by: §2.3.
  • J. Tan, T. R. Sinno, and S. L. Diamond (2018) A parallel fluid–solid coupling model using lammps and palabos based on the immersed boundary method. Journal of Computational Science 25, pp. 89 – 100. External Links: Document Cited by: §2.3, §3.1, §3.1, §3.1.
  • G. Tomaiuolo (2014) Biomechanical properties of red blood cells in health and disease towards microfluidics. Biomicrofluidics 8 (5), pp. 51501. External Links: Document Cited by: §1, §3.2.
  • K. Vahidkhah and P. Bagchi (2015) Microparticle shape effects on margination, near-wall dynamics and adhesion in a three-dimensional simulation of red blood cell suspension. Soft Matter 11, pp. 2097–2109. External Links: Document Cited by: §1, §3.1.
  • K. Vahidkhah, S. L. Diamond, and P. Bagchi (2014) Platelet dynamics in three-dimensional simulation of whole blood. Biophysical Journal 106 (11), pp. 2529 – 2540. External Links: Document Cited by: §1, §3.2, §3.3.
  • D. Xu, C. Ji, E. Avital, E. Kaliviotis, A. Munjiza, and J. Williams (2017) An Investigation on the Aggregation and Rheodynamics of Human Red Blood Cells Using High Performance Computations. Scientifica. External Links: Document Cited by: §1.
  • D. Xu, E. Kaliviotis, A. Munjiza, E. Avital, C. Ji, and J. Williams (2013) Large scale simulation of red blood cell aggregation in shear flows. Journal of Biomechanics 46 (11), pp. 1810 – 1817. External Links: Document Cited by: §1.
  • G. Zavodszky, B. van Rooij, V. Azizi, S. Alowayyed, and A. Hoekstra (2017) Hemocell: a high-performance microscopic cellular library. Procedia Computer Science 108, pp. 159–165. External Links: Document Cited by: §2.3, §3.1, §3.1.
  • G. Závodszky, B. van Rooij, V. Azizi, and A. Hoekstra (2017) Cellular level in-silico modeling of blood rheology with an improved material model for red blood cells. Frontiers in Physiology 8, pp. 1–14. External Links: Document Cited by: §1, §1, §2.1, §3.1.
  • H. Zhao and E. S. G. Shaqfeh (2011) Shear-induced platelet margination in a microchannel. Phys. Rev. E 83, pp. 061924. External Links: Document Cited by: §1.
  • A. Zydney and C. K. Colton (1988) AUGMENTED solute transport in the shear flow of a concentrated suspension.. International Journal of Multiphase Flow 10 (1), pp. 77–96 (English (US)). Cited by: §3.2, §3.3.