A new object-oriented framework for solving multiphysics problems via combination of different numerical methods

04/28/2019
by   Juan Michael Sargado, et al.
NORCE
0

Many interesting phenomena are characterized by the complex interaction of different physical processes, each often best modeled numerically via a specific approach. In this paper, we present the design and implementation of an object-oriented framework for performing multiphysics simulations that allows for the monolithic coupling of different numerical schemes. In contrast, most of the currently available simulation tools are tailored towards a specific numerical model, so that one must resort to coupling different codes externally based on operator splitting. The current framework has been developed following the C++11 standard, and its main aim is to provide an environment that affords enough flexibility for developers to implement complex models while at the same time giving end users a maximum amount of control over finer details of the simulation without having to write additional code. The main challenges towards realizing these objectives are discussed in the paper, together with the manner in which they are addressed. Along with core objects representing the framework skeleton, we present the various polymorphic classes that may be utilized by developers to implement new formulations, material models and solution algorithms. The code architecture is designed to allow achievement of the aforementioned functionalities with a minimum level of inheritance in order to improve the learning curve for programmers who are not acquainted with the software. Key capabilities of the framework are demonstrated via the solution of numerical examples dealing on composite torsion, Biot poroelasticity (featuring a combined finite element-finite volume formulation), and brittle crack propagation using a phase-field approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

12/19/2019

Assembly of multiscale linear PDE operators

In numerous applications the mathematical model consists of different pr...
09/24/2019

An adaptive edge-based smoothed finite element method (ES-FEM) for phase-field modeling of fractures at large deformations

This work presents the Griffith-type phase-field formation at large defo...
03/20/2021

FEniCS-preCICE: Coupling FEniCS to other Simulation Software

The new software FEniCS-preCICE is a middle software layer, sitting in b...
03/03/2021

Second-order Decoupled Energy-stable Schemes for Cahn-Hilliard-Navier-Stokes equations

The Cahn-Hilliard-Navier-Stokes (CHNS) equations represent the fundament...
09/11/2019

DuMu^x 3 – an open-source simulator for solving flow and transport problems in porous media with a focus on model coupling

We present version 3 of the open-source simulator for flow and transport...
09/12/2018

A Conceptual Approach to Complex Model Management with Generalized Modelling Patterns and Evolutionary Identification

Complex systems' modeling and simulation are powerful ways to investigat...
02/16/2017

High Accuracy Mantle Convection Simulation through Modern Numerical Methods. II: Realistic Models and Problems

Computations have helped elucidate the dynamics of Earth's mantle for se...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Computer simulations involving multiple interacting physical processes are becoming more and more common, in large part due to the availability of machines with fast processors having multiple cores and high memory capabilities. While the most demanding simulations still need to be run on high-performance computing clusters, nowadays a majority of simulations related to both research and industry are in fact carried out on much more modest systems such as desktop and even laptops computers. Likewise, the field of computational science has seen rapid expansion. In 1928 when Courant, Friedrich and Lewy first described in the now famous CFL stability condition, finite differences had yet to find wide application for solving partial differential equations. In contrast, there is now a plethora of available techniques dealing on numerical solution of PDEs. For instance, finite elements have become the method of choice for solid mechanics applications, while in computational fluid dynamics finite difference and finite volume methods find popular usage, and to some extent also boundary elements. More recently, a substantial amount of research has gone towards the development of particle and meshfree schemes, and the invention of new approaches such as peridynamics, isogeometric analysis and virtual elements.

The main ingredient in the performance of scientific computing is of course, software. Initial impetus for the finite element method originated in the aerospace industry with Boeing, and likewise advancement of FE software was driven by industrial needs beginning with the work of Wilson at Aerojet and the subsequent development of the NASTRAN® code in the late 1960s.[12, 38] The commercial FE codes Abaqus® and ANSYS® had their initial releases in the 1970s and remain among the most popular simulation tools in industry. On the other hand, the present landscape of computational science is markedly different. For instance it can be argued that there is now a clearer divide between academia and industry, with most of the programming work related to implementation of new approaches and models being done by academic researchers utilizing interpreted languages such as MATLAB and Python. The popularity of these platforms stems from the fact that they allow for rapid implementation, prototyping and visualization of results as well easier debugging due to access to intermediate states of variables during execution time. On the other hand, codes used to generate results in publications are often hand-tailored to the specific problems being solved and are impossible to apply without substantial modification to other cases. The unfortunate result is that a lot of different numerical methods are accompanied by implementations that are not robust enough for general testing, which in turn hinders their investigation and acceptance by the community at large. Additionally, execution speed becomes a critical factor for simulations dealing with large problem sizes, in which case it is advisable to make use of optimized software written in a compiled language such as C/C++ or Fortran.

In recent years, the trend has been towards open-source software packaged as libraries in order to provide the greatest amount of flexibility to researchers. One such project that has found quite a bit of success in the community is deal.II[6], which is written in C++ and designed for performing numerical simulations based on adaptive quadrilateral/hexahedral meshes with automatic handling of hanging nodes. Another is FEniCS[22], a software platform written in C++/Python for implementing finite element solutions to general partial differential equations that gives emphasis to the proper setup of function spaces and variational forms. On the other hand, the DUNE project[7] was initially designed as a collection of modules providing modern interfaces to various legacy codes and on which libraries implementing numerical methods can be built, for instance FE and discontinuous Galerkin[14]. In general, a developer implementing some particular physical model makes calls to said libraries via a main file. The latter is usually written in the same language as used by the library, or in some cases via a special interpreted language such as UFL[3]. Furthermore the standard procedure is for problems involving the same physical model but different geometries and boundary conditions to be handled by separate main files. In effect, developers (e.g. researchers) who perform the coding and compilation for their respective models are also the intended end users of the resulting executable binaries.

An alternative approach is to write code that is implemented as components within existing software. This is known as the framework approach and is the option offered by most commercial simulation software (e.g. Abaqus and ANSYS) along with some open-source codes such as OOFEM [27], OpenFOAM [37] and FEAP [34]. An important characteristic of the framework approach is inversion of control: program flow is defined and controlled by the existing framework, with developers writing code that is designed to be called by the framework itself at specific instances during program execution. An immediate consequence of the framework approach is the clear delineation between the role of coder and end user. In particular, an implicit goal is for end users to be able to solve different problems (i.e. in terms of geometry and boundary/initial conditions) without having to recompile the software. While a framework’s more rigid structure allows for less flexibility compared to what can be achieved through a library approach, it nevertheless accomplishes two things: first, it forces a developer to consider an existing flow of control when conceptualizing and implementing a particular model and discourages the use of procedures that are too complicated to implement efficiently. At the least, it provides a common ground on which to judge the performance penalty incurred by such algorithms in relation to other methods. Secondly, developers are forced to follow the general procedure built into the framework with regards to the interaction between software and end user (for instance concerning specification of model parameters), which makes the deployment and testing of new models and methods straightforward for those already familiar with the style of input utilized by the software.

At present, most available simulation codes (both proprietary and open-source) are designed to accommodate a specific method such as FEM, or are otherwise tailored towards certain applications. Creating a general software package that can straightforwardly implement different numerical schemes is challenging, as these methods generally utilize varying formulations of the governing equations, and likewise may require different ways of imposing boundary conditions depending on the approximation properties of the assumed function space.

In this paper, we present the design of BROOMStyx, a new open-source software framework that seeks to address the challenge of attaining seamless coupling between fundamentally dissimilar formulations coming from different contributors. In particular, a main goal of the software to allow for monolithic solution strategies for combined formulations, or in the case of operator splitting schemes, to avoid the writing of intermediate output files that is necessary when coupling together different codes. The remainder of this paper is structured as follows: Section 2 explains the motivation behind development of the framework as well as some of its key features. Section 3 deals with the code architecture and aspects arising from its object-oriented design; different groups of classes that make up the framework are identified, along with the role of each class type and how it interacts with other components of the software. Section 4

expounds on the implementation of custom container classes for real-valued vector and matrices as well as operator overloading in a way that allows for user-friendly syntax in the coding of vector/matrix operations without sacrificing efficiency of computation. Meanwhile, aspects related to shared memory parallelization are briefly discussed in Section

5. To show that the current framework is applicable to a wide class of problems, we solve several numerical examples dealing on various topics in the mechanics of solids and porous media, and which have been specifically chosen to demonstrate novel features of the software as pointed out in previous sections. These can be found in Section 6, and include a discussion of the relevant theory for each problem and important implementation details in addition to the actual numerical results. Finally, concluding remarks are given in Section 7.

2 Design considerations

The name BROOMStyx is short for “Broad Range, Object-Oriented Multiphysics Software”, and as stated previously the project aims to provide a sufficiently flexible and extensible general framework for developers to implement new models and combine different formulations while at the same time giving end users a powerful tool for performing sophisticated numerical simulations. In this section, we discuss the main points that guided development of the BROOMStyx code, and some notable design features that distinguish it from currently available software.

2.1 Equal focus on developers and end users

Apart from core programmers of the software who are responsible for maintenance of the general code structure and the incorporation of fundamental modifications/additions to existing functionality, the persons interacting with numerical simulation software can be seen as belonging to at least one of two groups. In the first group are contributing developers, i.e. people involved with model creation and the development of solution algorithms who see the code as a platform upon which to build and test new models and methods. The other group consists of end users, essentially non-programmers who are interested in using the existing functionality in the software to analyze problems that occur in real life. Equal focus on both groups means that the overall usefulness of the project depends on the ease with which one can

  1. implement new formulations, material models and solution algorithms without having to achieve mastery of the entire software code, and

  2. make use of such models in real-life applications (e.g. with complex geometries, multiple components and large numbers of unknowns) without having to write additional lines of code.

In BROOMStyx, this is addressed first of all through a framework-type design that gives rise to a fixed workflow for end users with regard to the setup of input (both involving problem specifics and the domain discretization) and also the handling of simulation results for subsequent output writing and visualization. In particular we do not attempt to re-implement preliminary mesh construction or visualization of results. Instead it is preferred to implement functionality that converts data stored using the internal structure of the code to something that usable by third party software such as Gmsh[16] and Paraview[5]. Likewise, the object-oriented paradigm enables setting of a definite structure with regard to publicly accessible methods in base classes so that subsequent contributions implementing new models and methods in the form of derived classes are guaranteed to be compatible with other components of the framework.

Most currently available modeling software favor a target audience that consists either of researchers who are more concerned with basic method development or very specialized scientific modeling, or of industry practitioners who do not necessarily have a very deep knowledge of theory but nonetheless should be able to perform analysis of real world problems without having to get into the finer details of the formulation beyond providing the problem geometry, material properties and boundary/initial conditions. Good software design that bridges the gap between these two very different groups can potentially have tremendous impact as it gives practitioners access to the latest models and techniques. Furthermore it provides a means for these new developments to be studied rigorously in terms of their applicability to real world scenarios. Needless to say, the open-source nature of such software is vital. While some proprietary codes possess the flexibility of accommodating custom implementations (and indeed software such as COMSOL® are built for exactly this purpose), oftentimes it is not justifiable from a cost perspective to purchase new software simply to test some novel formulation or algorithm that is yet to be proven suitable for widespread usage.

2.2 Generality of equations and formulations

One of the core objectives of BROOMStyx is to allow native coupling of different numerical methods/formulations within a single framework. Consequently, the code structure is not specialized for any one specific numerical method. Nevertheless a universal aspect shared by these methods is the transformation of some given set of partial differential equations (PDEs) into a system of algebraic equations involving unknown degrees of freedom (DOFs). Let

be a vector of scalar field variables and their spatial derivative components up to some order, i.e.

(1)

in which signifies a derivative along the -th dimension. Using this representation, a system of PDEs can be written in the general form

(2)

The code operates by evaluating the global discrete approximations for , , and along with suitable approximations for temporal derivatives, and enforcing the equality of left and right hand sides over some given time interval. When dealing with solution algorithms that require assembly of a global coefficient matrix from local contributions, no assumptions are made regarding the structure of local matrices. In saddle point problems for instance where local matrices have the form

(3)

it should be possible exclude partitions consisting of zeros when constructing the sparsity profile of the global matrix and during subsequent assembly in order to avoid unnecessarily increasing the global matrix density.

Furthermore, BROOMStyx aims to place no restriction on the kinds of equations that can be modeled, other than that numerical quantities be real-valued. In particular, the code should be able to handle models that make use of an arbitrary number and combination of different DOF types without having to alter the core structure of the software. This creates a challenge with regard to the assignment of DOFs, since it is not possible to anticipate beforehand all the different DOF types that a particular model might require. The solution adopted in BROOMStyx is to split the task of managing DOF types between developers and end users. That is, specification of the required number of DOF types is provided by the model implementation, while the actual assignment is done by the end user at runtime, who specifies both the number of DOF types to be initialized per geometric entity (i.e., node, cell or facet) and how these are to be utilized by the model, as demonstrated in Section 6.

2.3 Model coupling across domains

To illustrate some of the complexities that must be addressed when designing user-friendly and efficient multiphysics code, consider the case of a problem involving two coupled PDEs. There are several possible domain configurations that might arise in connection with such a problem, some of which are shown in 1. For illustrative purposes, we further assume that the aforementioned equations are approximated using different numerical methods.

(a)
(b)
(c)
Figure 1: Different domain configurations for a multiphysics problem consisting of two PDEs.

Figure 0(a) represents the basic scenario most often encountered in multiphysics modeling where component PDEs share the same domain and boundary . On the other hand, Figure 0(b) illustrates a case where the two equations live on non-overlapping domains that share a common interface, . A classic example of such a scenario occurs in fluid-structure interaction problems, where acts as a boundary for both and . An additional rule must be specified relating the boundary conditions of the different PDEs, and depending on modeling choices this can result in either one-way or two-way coupling of the equations. On the other hand Figure 0(c) can be understood as an extension of Figure 0(a) in that one of the PDEs is defined over and the other over . Consequently, the latter must have its boundary conditions defined on . In contrast, this interface is non-existent with respect to the other equation whose boundary conditions are defined on . In order for the software to be effective, the end user must have the possibility to set up these different configurations solely via the input file.

2.4 Information access via manager classes

In traditional object-oriented programming, a member object generally does not have access to the methods of its parent or other objects higher up in the compositional hierarchy. Rigid adherence to this principle means that all information needed by a class implementing some particular numerical model must be passed as arguments when calling a particular class method, which is disadvantageous for two main reasons. First of all, such an approach makes method calls extremely long. The same phenomenon is obtained when a user-implemented function is not given access to other methods/functions within the framework. This is evident for instance in the definition of user-programmed element (UEL) subroutines in Abaqus, shown below[13]:

1      SUBROUTINE UEL(RHS,AMATRX,SVARS,ENERGY,NDOFEL,NRHS,NSVARS,
2     1 PROPS,NPROPS,COORDS,MCRD,NNODE,U,DU,V,A,JTYPE,TIME,DTIME,
3     2 KSTEP,KINC,JELEM,PARAMS,NDLOAD,JDLTYP,ADLMAG,PREDEF,NPREDF,
4     3 LFLAGS,MLVARX,DDLMAG,MDLOAD,PNEWDT,JPROPS,NJPROP,PERIOD)
5C
6      INCLUDE 'ABA_PARAM.INC'
7C
8      DIMENSION RHS(MLVARX,*),AMATRX(NDOFEL,NDOFEL),PROPS(*),
9     1 SVARS(*),ENERGY(8),COORDS(MCRD,NNODE),U(NDOFEL),
10     2 DU(MLVARX,*),V(NDOFEL),A(NDOFEL),TIME(2),PARAMS(*),
11     3 JDLTYP(MDLOAD,*),ADLMAG(MDLOAD,*),DDLMAG(MDLOAD,*),
12     4 PREDEF(2,NPREDF,NNODE),LFLAGS(*),JPROPS(*)
13     ...
14     ...
15      RETURN
16      END

More importantly, the amount of information accessible to model/algorithm developers becomes hard-coded in the declaration of base classes and cannot be subsequently modified without altering each and every derived class already coded. To circumvent this problem, BROOMStyx makes use of manager classes, whose public methods are accessible to all objects. While this may seem to break encapsulation, it endows model-implementing classes with almost unlimited flexibility, and at the same time allows them to access only the information that is actually required by the model at a given instant. In this sense, the software can be viewed as a library/framework hybrid that aims to retain the benefits of both approaches.

3 Framework architecture

The general structure of the BROOMStyx code is comprised of different class groups as shown in Figure 2.

Figure 2: Class diagram showing the general structure of the BROOMStyx code.

The first group consists of non-polymorphic classes within the broomstyx namespace whose modification/maintenance is left to the core developers of the framework. These are referred to as core classes, and may further be divided into two subgroups depending on the number of actual objects that can be instantiated from them at runtime. The rest are polymorphic classes which deal with different aspects of model and algorithm development, the reading of different mesh formats and the writing of output.

3.1 Single-instance core classes

Single-instance core classes consist of AnalysisModel, ObjectFactory, and the various manager classes which are colored blue in Figure 2. AnalysisModel provides two public methods. The first is initializeYourself() which instantiates the manager classes, reads the input file and creates all the necessary objects for the simulation through calls to the various methods provided by said manager classes. The second is solveYourself() which carries out the actual simulation. To ensure that only a single instance is generated for the entire duration of the program, AnalysisModel is implemented as a Meyers singleton pattern. In addition, the various manager classes have private constructors and destructors that are accessible to AnalysisModel by virtue of being declared a friend class. Finally, AnalysisModel is responsible for instantiating the proper MeshReader object to read the mesh file supplied by the user.

Among the manager classes, DomainManager is responsible for the management of objects corresponding to the nodes and cells of the mesh, setting up information regarding connectivity, and providing methods for accessing the degrees of freedom associated with each geometric entity. On the other hand, the creation and deletion of the actual degrees of freedom are handled by the DofManager class, which also provides methods to set subsystem, group and equation numbers at each DOF, and to update and finalize values for the unknown variables of the simulation. In addition, DofManager provides a method for reading multi-freedom constraints from the input file which can then be used to set up constraints involving master/slave DOFs.

The MaterialManager class is responsible for reading the number and types of materials that are declared in the input file, and for instantiating and initializing the corresponding Material objects. In general, destruction of managed objects is performed when the destructor for their corresponding manager object is invoked; this is true for the MaterialManager class as well as the subsequent manager classes that will be discussed. Furthermore in the input file each material must be assigned a unique integer label, and the manager class provides a method to access instantiated materials based on their assigned labels. Class NumericsManager is similar to MaterialManager in its function, i.e. it reads the different numerics types specified in the input file together with their assigned integer label, and instantiates the corresponding Numerics objects. Likewise, it provides a method for accessing the instantiated numerics based on their assigned labels.

The main role of the SolutionManager class is to read the number of load steps from the input file and instantiate the corresponding LoadStep objects. It also reads all specified initial conditions from the same file, and imposes them at the beginning of the solution process. In addition, it is responsible for instantiating and managing UserFunction objects that implement user-programmed functions. Finally, the OutputManager class is in charge of managing all things related to simulation output. It instantiates a particular OutputWriter object according to the type specified by the user in the input file. It also reads the different output quantities that need to be calculated and creates the appropriate OutputQuantity objects. The manager class provides methods for initializing these objects as well as for calculating and writing output at the end of each converged time step.

Finally, BROOMStyx contains an ObjectFactory class, which as its name implies is in charge of the actual creation of objects based on derived classes. To accomplish this, the framework provides the macro

    registerBroomstyxObject(<baseClass>, <derivedClass>)

within the broomstyx namespace, which must be invoked in the source code of each derived class. Like AnalysisModel, the ObjectFactory class is implemented as a Meyers singleton and provides two sets of methods. The first set involves registration of derived classes that is accessed indirectly via the aforementioned macro, while the second set creates objects of the different derived class types. The various manager classes instantiate derived objects via a call to ObjectFactory which then returns a pointer to the instantiated object, at which point responsibility for the said object is transferred to the calling manager class. At program startup, a map is automatically created within the ObjectFactory instance whose key entries are the names of all available derived classes, which are in turn paired with pointers to the corresponding class constructors.

As mentioned in Section 2, public methods of the manager classes can be accessed anywhere within the broomstyx namespace and are initiated via a call to the AnalysisModel object which has global scope. For instance, one can retrieve the nodes associated with some particular cell object via the following call:

1std::vector<Node*> cellNodes = analysisModel().domainManager().giveNodesOf(targetCell);

3.2 Multiple-instance core classes

The subgroup of multiple-instance core classes consist of class types pertaining to geometrical entities and also concepts related to the solution process. Class Dof is an abstraction for a single degree of freedom, and it stores data pertaining to which DOF group, solution stage and subsystem it belongs, its equation number, current and converged values of its associated unknown and the corresponding residual value. It also keeps track of its status, i.e., whether it is constrained (and if so, the value of the constraint) and in the case that it is of slave type, the pointer to its associated master DOF. The class itself does not provide any methods for accessing its members, which are all declared private. Instead their retrieval and update is done through DofManager that is declared a friend class of Dof. Meanwhile, the Node class is an abstraction for a geometric entity, specifically a point in 3-dimensional space. It stores the point’s coordinates, a unique ID number, a vector of pointers to Dof objects that live on the node, and a vector of real numbers whose length is set by the end user at runtime and which serves to store values that will subsequently be written into output files. Furthermore it contains two sets of pointers to Cell objects that are attached to the node, one for domain cells and the other for boundary cells.

The Cell class on the other hand has a more nuanced function. Rather than simply being an abstraction for a particular geometric entity, its primary purpose is to act as an anchor to which Numerics objects can attach their associated data and perform the necessary calculations related to the governing equations. Every Cell object contains a vector of real numbers, the length of which is set at runtime in order to store values needed for writing output similar to the case of the Node class. Each Cell object also stores its own type (e.g., 3-node triangle, 10-node tetrahedron, general -point polygon) and dimension, its ID number, physical entity, and a vector of pointers to Dof objects associated with the cell. In addition it stores data related to connectivity, such as its neighbors and faces. Every Cell instance also contains a vector of pointers to Node objects, whose exact relation to the cell may vary depending on the particular formulation being used. For instance within the context of a standard finite difference scheme each cell may be associated to some FD stencil so that the cell nodes are meant to coincide with the stencil points, whereas in a finite element scheme the Cell object can be seen as representing a single element, and the nodes as entities to which shape functions are associated in order to approximate a piecewise solution. Cell objects are also used to represent geometrical entities such as surfaces on which boundary and interface conditions are applied. Lastly, when cell faces are required to be constructed by a particular formulation, these are also represented as Cell objects. For this reason the DomainManager class distinguishes between three types of cells: a) domain cells that are associated with some given Numerics instance111This group may also include lower-dimensional interface cells, for example when the latter are associated with a Numerics object implementing some inter-domain or multiphysics coupling., b) boundary cells that are associated some particular boundary condition, and c) face cells.

The EvalPoint class is an abstraction for a single evaluation point. It stores the coordinates of its location in addition to its assigned weight (in the case of Gauss integration), and contains a pointer to a NumericsStatus object that in turn stores data such as history variables required by Numerics and Material objects for performing their calculations. As shown in Figure 2, EvalPoint objects typically occur as components of some NumericsStatus instance. Nonetheless their existence is not mandatory; cells which have only one evaluation point for example can have the necessary numerics and material variables stored directly as members of the immediate NumericsStatus object connected to the cell instance, thus avoiding compositions of the type Cell->NumericsStatus->EvalPoint->NumericsStatus.

The BoundaryCondition class is used to store information regarding a single boundary condition, such as the physical entity label corresponding to the boundary on which the condition is to be applied, the particular BC type, the target Dof type and the implementing numerics. This last piece of information is necessary since a BoundaryCondition object is not in charge of actually imposing its given BC. Rather, said object calculates the value (possibly time-dependent) of the relevant quantity at the required location on the boundary and leaves the imposition of this value to the implementing Numerics object.222The main reason for adopting such a strategy is that numerical methods generally differ in the way they impose boundary conditions. For instance some schemes such as the Element-Free Galerkin method[8] make use of shape functions that do not possess Kronecker-delta properties, hence Dirichlet conditions must be imposed differently compared to finite element schemes even if both are based on a variational formulation of the governing equations. Passing on the actual implementation of boundary conditions and the like to the specific numerics allows us to avoid having to overload BoundaryCondition for each new Numerics derived class. The FieldCondition and InitialCondition classes are have similar function — the former is used to describe domain-wide quantities, for example specific source terms and possibly also parameters of governing equations that are made to vary in space and time, whereas the latter returns the value of a specific initial condition at some given location within the domain. As with BoundaryCondition, the actual implementations of these field and initial conditions are left to the respective associated Numerics objects.

Class LoadStep is an abstraction for a solution step taken over a range of time, which in turn can be further divided into substeps. Essentially the workhorse class of the entire code, LoadStep objects are responsible for marching the solution forward in time according parameters specified by the end user. Each LoadStep object reads its required data from the input file; these consist of the starting time and ending time of the solution step, the initial time increment for each substep, the maximum allowed number of substeps, all the boundary and field conditions to be imposed for the current solution step, the particular type of solution method to be used, and all special pre-/post-processing procedures available in numerics implementations that are required to be performed according to end user specifications. The LoadStep class provides the method solveYourself(), which carries out the sequence procedures listed in Table 1.

1. Initialize solvers to be used by solution methods.
2. Carry out pre-processing routines.
3. Find constrained and active degrees of freedom.
4. Determine target time for current substep. If this exceeds the specified end time, shorten it so that the target time exactly coincides with the end time.
5. For each solution stage, have the corresponding SolutionMethod object calculate the solution given the specified boundary and field conditions.
6. If solution in step 5 converged, finalize the relevant data. Otherwise, implement contingency procedures (e.g., change the time-increment and go back to step 4, or alternatively terminate the simulation due to unconverged results.)
7. Update current time and possibly also the time increment (adaptive time stepping). If end time has been reached, then proceed to step 8. Otherwise, go back to step 4.
8. Carry out post-processing routines.
Table 1: General solution procedure executed on invocation of LoadStep->solveYourself().

Note that steps 2 and 8, pre- and post-processing do not refer to meshing/visualization procedures as commonly understood in FEM terminology, but instead denote access points built into the framework which can be exploited by model developers to perform calculations that are not meant for execution during assembly and solution of system equations, for instance the initialization of some history variables based on the latest converged solution.

3.3 Polymorphic classes

The BROOMStyx framework contains several polymorphic classes that can be used to extend functionality of the software. As illustrated in Figure 2, these fall into four groups. The first group is composed of the MeshReader class and deals with the interpretation of mesh files generated from third party software and the creation of the corresponding entities in the internal memory of the software. Meanwhile, the second group is concerned with model creation, and consists of the Numerics, Material, ShapeFunction and IntegrationRule classes. The third group allows for the implementation of different solution schemes and back-end solvers, and includes the SolutionMethod, LinearSolver and SparseMatrix classes. The last group allows the production of different output formats and the calculation of specialized quantities through the classes OutputWriter and OutputQuantity.

3.3.1 MeshReader

The MeshReader class provides a way for the BROOMStyx framework to understand output files from external CAD and mesh generation tools which contain the discretization (i.e. nodes and elements) to be used in the numerical simulation. At present, the framework allows for the instantiation of one MeshReader-derived object that reads a single mesh file containing all the necessary components of the discretization, whose name is specified in the input file. At the same time, said MeshReader object interfaces with the DomainManager object to create the resulting geometric entities in memory.

3.3.2 Numerics

The Numerics class is an abstraction for a PDE (or system of PDEs) together with its numerical discretization, and is the fundamental building block around which the rest of the code was designed. It contains methods for calculating local contributions corresponding to different components of the governing equations as described in Section 2. Its derived classes are responsible for declaring the required geometric cell type to be used with said class and the number of DOFs that will be utilized at each node, cell or face. The base class declaration is given in Listing 1, where for brevity only the virtual methods are listed.

1class Numerics
2{
3  friend class NumericsManager;
4public:
5  Numerics();
6  virtual ~Numerics();
7  ...
8  virtual void initializeMaterialsAt( Cell* targetCell );
9  virtual bool performAdditionalConvergenceCheckAt( Cell* targetCell, int stage );
10  virtual void performPreIterationOperationsAt( int stage, int iterNum );
11  virtual void printPostIterationMessage( int stage );
12  virtual void readAdditionalDataFrom( FILE* fp );
13  virtual void removeConstraintsOn( Cell* targetCell );
14
15  virtual void finalizeDataAt( Cell* targetCell ) = 0;
16  virtual void deleteNumericsAt( Cell* targetCell ) = 0;
17  virtual void initializeNumericsAt( Cell* targetCell ) = 0;
18
19  virtual std::tuple< std::vector<Dof*>, std::vector<Dof*>, RealVector >
20    giveStaticCoefficientMatrixAt( Cell* targetCell, int stage, int subsys, const TimeData& time );
21  virtual std::tuple< std::vector<Dof*>, RealVector >
22    giveStaticLeftHandSideAt( Cell* targetCell, int stage, int subsys, const TimeData& time );
23  virtual std::tuple< std::vector<Dof*>, std::vector<Dof*>, RealVector >
24    giveTransientCoefficientMatrixAt( Cell* targetCell, int stage, int subsys, const TimeData& time );
25  virtual std::tuple< std::vector<Dof*>, RealVector >
26    giveTransientLeftHandSideAt( Cell* targetCell, int stage, int subsys, const TimeData& time,
27        ValueType valType );
28  virtual void imposeConstraintAt( Cell* targetCell, int stage, const BoundaryCondition& bndCond,
29        const TimeData& time );
30
31  // Error-generating virtual functions (must be implemented in derived class when called)
32  virtual double giveCellFieldValueAt( Cell* targetCell, int fieldNum );
33  virtual RealVector giveCellNodeFieldValuesAt( Cell* targetCell, int fieldNum );
34  virtual std::vector<RealVector> giveEvaluationPointsFor( Cell* targetCell );
35  virtual std::tuple< RealVector,RealVector > giveFieldOutputAt( Cell* targetCell, const std::string& fieldTag );
36  virtual RealVector giveNumericsParameter( const std::string& paramTag );
37  virtual std::tuple< std::vector<Dof*>, RealVector >
38    giveStaticRightHandSideAt( Cell* targetCell, int stage, int subsys, const BoundaryCondition& bndCond,
39        const TimeData& time );
40  virtual std::tuple< std::vector<Dof*>, RealVector >
41    giveStaticRightHandSideAt( Cell* targetCell, int stage, int subsys, const FieldCondition& fldCond,
42        const TimeData& time );
43
44  virtual void imposeInitialConditionAt( Cell* targetCell, const InitialCondition& initCond );
45  virtual void performPostprocessingAt( Cell* targetCell, std::string tag );
46  virtual void performPrefinalizationCalculationsAt( Cell* targetCell );
47  virtual void performPreprocessingAt( Cell* targetCell, std::string tag );
48  virtual void setDofStagesAt( Cell* targetCell );
49  ...
50};
Listing 1: Partial declaration of the Numerics base class, showing the methods that may be overloaded in derived classes.

Each instantiated Numerics object reads its required parameters from the input file, and manages its own data at each computational cell. Because different physical processes vary in the amount and types of data that they require to be stored, an auxiliary class NumericsStatus is provided whose role is to be a base container that can be specialized to suit the various Numerics-derived classes. Each computational cell includes a pointer to one NumericsStatus object in its attributes, which is instantiated by the Numerics object assigned to the cell when data storage beyond standard DOF solutions is required. As can be seen from Listing 1, the class contains methods for calculating output quantities, and for performing pre-/postprocessing of data as well as carrying out inter-iteration operations. This is to allow a maximum amount of flexibility to developers, since in BROOMStyx flow of control does not rest with any instantiated Numerics object but rather with the framework itself that in turn passes it to the specific solution method chosen by the end user at runtime. During assembly of the discrete equations, the latter cycles over all cells in the computational domain and interfaces directly with the Numerics object assigned to each cell, meaning that entities such as integration points are hidden from solution algorithms. This is a deliberate design choice, since the notion of integration points does not always make sense, for example when dealing with finite differences or control volume schemes. Hence for numerical methods based on weak formulations, the Numerics object itself is in charge of internally performing the numerical integration when calculating local matrices and vectors.

The convention adopted in BROOMStyx for stating method declarations is to clearly separate input arguments from output quantities, except when a particular variable acts as both input and output. Multiple return values are passed as a tuple, for instance in methods that return local matrix and vector contributions. As can be observed in Listing 1, matrix output is returned using something akin to coordinate (COO) format, i.e. as a collection of three vectors, the first two containing pointers to Dof objects which store their equation numbers (corresponding respectively to row and column indices in the global coefficient matrix) while the last vector contains the actual entries of the matrix. A key advantage of such an approach is that no assumption is made on the shape of such matrix, but rather that the method only returns local components that are actually nonzero. This is important in terms of efficiency, since the sparsity profiles for global coefficient matrices are determined by an initial assembly process that disregards local matrix values.

The following two details are worth mentioning with regard to interface design:

  1. The virtual methods defined in the Numerics base class for calculating matrix and vector quantities corresponding to different components of governing equations are not pure virtual methods but rather are implemented in the base class as empty functions. For example, transient () terms may be calculated with the following two methods:

    1virtual std::tuple< std::vector<Dof*>, std::vector<Dof*>, RealVector >
    2  giveTransientCoefficientMatrixAt(Cell* targetCell, int stage, int subsys, const TimeData& time);
    3
    4virtual std::tuple< std::vector<Dof*>, RealVector >
    5  giveTransientLeftHandSideAt(Cell* targetCell, int stage, int subsys, const TimeData& time, ValueType valType);

    The existence of an empty implementation in the base class means that developers only need to override methods that are actually relevant to the problem at hand. This reduces the amount of boilerplate code the must be written and also ensures that static equations are correctly solved when using solution methods that account for transient terms. Such functionality is necessary due to the fact that one can have a fully coupled system that features both static and time-varying PDEs.

  2. The return values for the above methods are in the form of tuples, which must be constructed utilizing both std::make_tuple(...) and std::move in order to avoid unwanted copying, e.g.

    1std::vector<Dof*> rowDof, colDof;
    2RealVector coefVal;
    3...
    4return std::make_tuple(std::move(rowDof), std::move(colDof), std::move(coefVal));

3.3.3 Material

Class Material represents the base class for constitutive models, whose role is to evaluate quantities that determine the coefficients/parameters of the governing equations. It provides methods to calculate scalar potentials, constitutive forces and tangent moduli. The class interface design is shown in Listing 2, and is based on the concept of a scalar potential function and its derivatives. That is, we assume that the material model can be described by some scalar potential

(4)

whose value can be computed give the current constitutive state and set of history variables .

1class Material
2{
3public:
4  Material();
5  virtual ~Material();
6
7  virtual MaterialStatus* createMaterialStatus();
8  virtual void destroy( MaterialStatus*& matStatus );
9  virtual void initialize();
10  virtual void readParamatersFrom( FILE* fp );
11  virtual void updateStatusFrom( const RealVector& conState, MaterialStatus* matStatus );
12  virtual void updateStatusFrom( const RealVector& conState, MaterialStatus* matStatus,
13      const std::string& label );
14
15  // Error-generating virtual methods (must be implemented in derived class when called)
16  virtual double givePotentialFrom( const RealVector& conState, const MaterialStatus* matStatus );
17  virtual RealVector giveForceFrom( const RealVector& conState, const MaterialStatus* matStatus );
18  virtual RealVector giveForceFrom( const RealVector& conState, const MaterialStatus* matStatus,
19      const std::string& label );
20  virtual RealMatrix giveModulusFrom( const RealVector& conState, const MaterialStatus* matStatus );
21  virtual RealMatrix giveModulusFrom( const RealVector& conState, const MaterialStatus* matStatus,
22      const std::string& label );
23  virtual double giveMaterialVariable( const std::string& label, const MaterialStatus* matStatus );
24  virtual double giveParameter( const std::string& label );
25
26protected:
27  std::string _name;
28  void error_unimplemented( const std::string& method );
29};
Listing 2: Declarations for MaterialStatus and Material classes.

Likewise, it is assumed that its gradients with respect to the constitutive state variables may also be obtained as

(5)
(6)

Subsequently, lines 16–22 in Listing 2 are abstractions for the calculations needed to obtain the expressions given above, and are declared virtual as they are to be overridden in derived class definitions. In general however, material models do not necessarily have the well defined structure implied by (4)–(6), and the framework does not require that all the aforementioned methods be defined. In order to reduce boilerplate code in derived classes, these methods are predefined in the base class implementation with the default behavior of throwing an exception when called. Such feature is intended to prevent simulations from accidentally using incorrect values for material quantities due to an unintentional failure in the part of a developer to implement the proper overriding method.333If the returned quantity is a vector or matrix that is immediately used in a following calculation, then such an oversight might result in a dimension mismatch between operands. However it can also be that the nature of subsequent operations is such that no further error occurs, and the program continues execution leading to incorrect results. An obvious limitation of the class design is that only one type of vector and matrix quantity may be returned by a particular Material instance. The reason for this is to keep interfaces simple; for more complex models one can extend functionality of the derived class by having its instance contain other Material objects in its attributes.

3.3.4 SolutionMethod

The SolutionMethod class is an abstraction for the solution scheme used to solve the discretized governing equations. Different algorithms can be implemented through derived classes, for instance linear static or transient schemes, Newton-Raphson, Runge-Kutta methods and so on. The base class declaration is given in Listing 3, and includes a method for imposing constraints in order to find and mark constrained DOFs which are then excluded from the unknowns to be solved in subsequent steps.

1class SolutionMethod
2{
3public:
4  SolutionMethod();
5  virtual ~SolutionMethod();
6
7  void getCurrentLoadStep();
8  void imposeConstraintsAt( int stage, const std::vector<BoundaryCondition>& bndCond, const TimeData& time );
9
10  virtual int  computeSolutionFor( int stage
11                                 , const std::vector<BoundaryCondition>& bndCond
12                                 , const std::vector<FieldCondition>& fldCond
13                                 , const TimeData& time ) = 0;
14
15  virtual void formSparsityProfileForStage( int stage ) = 0;
16  virtual void initializeSolvers() = 0;
17  virtual void readDataFromFile( FILE* fp ) = 0;
18
19protected:
20  LoadStep* _loadStep;
21  std::string _name;
22
23  bool checkConvergenceOfNumericsAt( int stage );
24};
Listing 3: Definition of the SolutionMethod class.

The main method of the class is computeSolutionFor(...), which essentially performs the global equation assembly and subsequent solution via calls to the specified linear solvers in the input file. In order to perform said assembly, the sparsity profile of global systems must first be determined. This is implemented in method formSparsityProfileForStage(int stage), which also assigns equation numbers to all unknowns. A notable feature of the BROOMStyx framework is that it allows for multi-stage in combination with multi-subsystem setups. The former is essentially an implementation of a hard-coded sequential solve, in which all quantities belonging to a previous stage are updated and finalized before proceeding to the next solution stage. On the other hand multiple subsystems simply implies a partitioning of the global system of equations associated with a given stage. For example in some solution algorithms, the global system may be partitioned into several subsystems that are solved in sequence within an iterative scheme. In this case the method makes use of the DOF group assignment to determine which degrees of freedom belong to their respective subsystems. It then assigns equation numbers for the unknowns in each subsystem and determines the sparsity profiles of the associated global matrices. This must be implemented separately for each new class deriving directly from SolutionMethod, as some algorithms may need to include additional unknowns such as Lagrange multipliers, for instance in the case of arc-length methods[29]. All derived classes must read their required data from the input file. This typically includes the linear solvers to be used, convergence criteria for each of the active DOF groups in the case of nonlinear solution schemes, and data related to special procedures such as line search options when these are implemented in the derived class.

3.3.5 LinearSolver and SparseMatrix

BROOMStyx relies on external software packages for the solution of linear systems. Class LinearSolver is a base class for wrappers to different solver libraries in order to provide a common interface that can be utilized by other classes within BROOMStyx. The base class declaration is given in Listing 4, and contains a varied number of methods that are relevant for either direct or iterative solvers.

1class SparseMatrix
2{
3public:
4  SparseMatrix();
5  virtual ~SparseMatrix();
6
7  std::tuple< int,int > giveMatrixDimensions();
8  int     giveNumberOfNonzeros();
9  bool    isSymmetric();
10  void    setSymmetryTo( bool true_or_false );
11
12  virtual void addToComponent( int rowNum, int colNum, double val ) = 0;
13  virtual void atomicAddToComponent( int rowNum, int colNum, double val ) = 0;
14  virtual void finalizeProfile() = 0;
15  virtual std::tuple< int*,int* > giveProfileArrays() = 0;
16  virtual double* giveValArray() = 0;
17  virtual void insertNonzeroComponentAt( int rowIdx, int colIdx ) = 0;
18  virtual void initializeProfile( int dim1, int dim2 ) = 0;
19  virtual void initializeValues() = 0;
20  virtual RealVector lumpRows() = 0;
21  virtual void printTo( FILE* fp, int n ) = 0;
22  virtual RealVector times( const RealVector& x ) = 0;
23
24protected:
25  int _dim1;
26  int _dim2;
27  int _nnz;
28  bool _symFlag;
29};
30
31class LinearSolver
32{
33public:
34  LinearSolver();
35  virtual ~LinearSolver();
36
37  virtual void allocateInternalMemoryFor( SparseMatrix* coefMat );
38  virtual RealVector backSubstitute( SparseMatrix* coefMat, RealVector& rhs );
39  virtual void factorize( SparseMatrix* coefMat );
40  virtual bool giveSymmetryOption();
41  virtual void initialize();
42  virtual void clearInternalMemory();
43  virtual void setInitialGuessTo( RealVector& initGuess );
44  virtual bool takesInitialGuess();
45
46  virtual std::string giveRequiredMatrixFormat() = 0;
47  virtual void        readDataFrom( FILE* fp ) = 0;
48  virtual RealVector  solve( SparseMatrix* coefMat, RealVector& rhs ) = 0;
49};
Listing 4: Definitions for the SparseMatrix and LinearSolver classes.

Derived classes must be able to specify the format of global coefficient matrices to be used during assembly of equations, read all data required by the back-end solver such as tolerances, and also perform the solution of a given linear system by internally making the necessary calls to the actual solver. At present, BROOMStyx can interface with different versions of the direct solver PARDISO[35, 1] as well as iterative solvers from the ViennaCL library[31] compiled to run on graphics processors via CUDA. Meanwhile, the SparseMatrix class is used to accommodate different matrix formats that may be utilized by the various linear solver packages, such as compressed sparse row (CSR) and coordinate (COO) formats. In order to allocate memory for a given spare matrix, its sparsity profile must first be determined. The class method initializeProfile(int m, int n) is used to specify the matrix dimensions, and also resets its nonzero components to the empty set. On the other hand, the method insertNonzeroComponentAt(int i, int j) declares the -th component of the matrix to be nonzero. The sparsity profile of the global matrix can then be constructed by constructing the local contributions from each cell and then calling the aforementioned method for each nonzero matrix component in the local contribution, with its corresponding global indices as arguments. Finally, finalizeProfile() constructs the actual arrays for storing location indices and values according to the specified matrix format.

3.3.6 OutputWriter and OutputQuantity

The primary function of the OutputWriter class is to create output files containing user-chosen simulation results at certain points in time as specified in the input file. These normally consist of the values for primary unknowns at nodes or elements, as well as physical variables such as stresses, strains and so on. In general, OutputWriter objects only have access to values specified in connection with the NodalField and CellFieldOutput keywords in the input file. Derived classes are in charge of writing said output files in a format that can be read by third party visualization packages. The specific details on what to include in these files is read by the OutputWriter instance from the input file, while the frequency of creating output files relative to the start and end times of the simulation is read by LoadStep objects as part of their required data.

On the other hand, the OutputQuantity class enables retrieval of specialized data during the course of simulations that are not normally found in the output files described above. Examples are force reactions pertaining to a particular portion of the external boundary, and results of special procedures such as -integral[28] calculations. In addition, derived classes may be used to implement integration of particular quantities over the problem domain (e.g. in order to calculate global error norms), or to set up “observation points” to track the evolution of specific quantities at a user-selected location.

4 Vector and matrix operations

For numerical schemes such as finite and boundary elements, a substantial portion of the total running time for a given simulation is spent on dense matrix and vector operations. It is therefore important that such calculations are performed efficiently, and in BROOMStyx this is be done by making use of specialized BLAS implementations such as MKL[1] or OpenBLAS[39]. Unfortunately, the interfaces provided by BLAS can be rather intimidating for those not well-versed in their usage. Consider for example the BLAS function dgemv that is used for matrix-vector multiplication with double floating point entries. It performs the operation for or and has the following declaration:

1void cblas_dgemv(const  CBLAS_LAYOUT Layout, const  CBLAS_TRANSPOSE TransA, const MKL_INT M, const MKL_INT N, const double alpha, const double *A, const MKL_INT lda, const double *X, const MKL_INT incX, const double beta, double *Y, const MKL_INT incY);

In order to provide a more user-friendly environment for developers, BROOMStyx provides RealVector and RealMatrix classes as abstractions for dense vectors and matrices of real (type double) numbers, the latter storing their components in column-major format. Accordingly, the +,-,* and / operators are overloaded to accept matrix and vector arguments. They function as high-level wrappers that call appropriate BLAS routines to perform the actual matrix operations. In addition, they distinguish between lvalue and rvalue operands in order to minimize the occurrence of deep copies associated with temporary objects. The main purpose is to allow the coding of sequences of matrix and vector operations in a way that mimics standard mathematical notation but at the same time results in efficient evaluation. For instance given scalars and , matrix of size and vectors and of length and respectively, the expression

(7)

can be coded as shown in Listing 5, with the back-end evaluation making use of the BLAS functions dscal, dgemv and daxpy.

1int m, n;
2...
3double alpha, beta;
4RealMatrix A(n,m);
5RealVector b(n), c(m), d;
6...
7d = alpha*trp(A)*b + beta*c;
Listing 5: User-friendly syntax for coding matrix operations.

As operator overloading involves static polymorphism, no run-time overhead is incurred due to virtual function calls. Nevertheless, a naive implementation of the operator overloading results in an inefficient calculation as the binary nature of the aforementioned operators prevents us from taking full advantage of the low level BLAS functions’ capabilities. Such implementation would yield for instance the sequence of operations shown in Table 2.

Operation Performed task / low-level BLAS call
heap allocation + manual transposition
dscal
heap allocation + dgemv
heap allocation + daxpy
daxpy
move assignment
Table 2: Sequence of operations for using naive operator overloading.

These operations must preserve the original input arguments , , , and . Accounting for the move assignment in the final step, we see that two intermediate objects are created which are discarded at the end of the sequence, namely and . Furthermore, manual transposition is performed and four calls to the low-level BLAS routines are made. In contrast, Table 3 shows an optimized sequence consisting of only two operations that take full advantage of BLAS capabilities. In particular, no transposition of the matrix components needs to be done before hand since this can be handled internally by dgemv. Also, no intermediate objects are created.

Operation Performed task / low-level BLAS call
heap allocation + dgemv
daxpy
Table 3: Optimized sequence of operations for .

In order to approach the efficiency of using direct BLAS calls but at the same time keep the user-friendly syntax shown in Listing 5, we employ a lazy evaluation strategy when it comes to matrix transposition and scalar multiplication operations. A similar strategy has been adopted previously in the C++ library FLENS[20], which uses expression templates to implement an approach based on the concept of closures from functional programming. In the current work, this lazy evaluation strategy is achieved by designing the RealVector and RealMatrix classes to store a scaling factor as well as the method isScaled() that returns a value of false if the scaling factor is 1.0 and true otherwise. In addition, the latter class also stores a transposition flag and an associated isTransposed() for accessing its value. Finally, both classes provide the method ownsItsPointer() that returns a boolean value indicating whether a particular instance owns the memory address space referenced by its pointer, or if said memory belongs to another vector or matrix object. Consequently, the operator overload implementation for the scalar-matrix multiplication operation needs only to return a matrix object whose scaling factor is multiplied by and whose internal data pointer references the memory address in which the components of are stored. The same trick is exploited in implementing the matrix transpose, so that for example the operation B = trp(trp(A)) does not actually carry out any transposition. The state in which an object is transposed, has non-unity scaling or non-ownership of its pointer is only allowed to occur for r-value objects, and to enforce this the copy and move assignment operators call a method simplify() that performs the following tasks: a) for matrices, perform manual transposition if the current object is in a transposed state, b) perform a deep copy of the matrix/vector components if the current object does not own its pointer, and c) scale the components accordingly if the scaling factor is not equal to 1, via the BLAS function dscal. In the case of matrices, the transposition is performed first as its implementation implies a deep copy (i.e., the actual transposition is not done “in place”). With the aforementioned strategy, the execution of (7) occurs according to the sequence shown in Table 4.

Operation Performed task / low-level BLAS call
set transpose flag and pointer
set scaling factor
heap allocation + dgemv
set pointer and scaling factor
daxpy
move assignment
Table 4: Sequence of operations for as implemented in BROOMStyx.

Similar to the earlier naive procedure, we can see that the same number of intermediate objects are created, i.e., and . However in the current sequence there is no memory allocation done for each of these objects; instead the pointer members of and reference memory belonging to and respectively. Furthermore, the BLAS calls are now identical to those appearing in Table 3. Since the intermediate objects are created by functions implementing the operator overloads (as opposed to within a class instance located on the heap), the objects themselves reside in stack memory and thus very little overhead is incurred with their creation. For the example discussed above, return value optimization (RVO) by the compiler reduces the overhead to one call each of the RealMatrix and RealVector move constructors for the procedure of Table 4. Thus in order for the operations to be lumped together as desired, the move constructor should not perform simplification. This introduces an unwanted side effect, as illustrated in the following section of code:

1RealMatrix A, B, C;
2...
3RealMatrix D = 0.5*trp(A + B*C); // D is move-constructed
4D(0,0) = 4.5;                    // will throw exception
5
6RealMatrix E;
7E = 0.5*trp(A + B*C); // Move assignment calls .simplify()
8E(0,0) = 4.5;         // will not throw exception

In line 3, the right hand side is the result of an operation and thus matrix D would normally be instantiated via move construction. However as the latter does not invoke the simplify() method, at line 4 the matrix D is in a transposed state and has a scaling factor of 0.5 that has not yet been multiplied to its stored components. Furthermore, the C++11 standard allows the compiler to perform a form of copy elision known as named return value optimization (NRVO), which omits the call to copy/move constructors. This makes the access of D(i,j) problematic: while one can easily deal with a transposed state by returning a reference based on row-major access, the latter cannot be made to carry additional information regarding the scaling factor.444In the specific example given, the line D(0,0) = 4.5 would require a value of 9 (and not 4.5) to be written into the appropriate place in memory, since the stored value should later simplify to 4.5 after the scaling is applied. Moreover if the object does not own the memory referenced by its pointer, then performing a write operation has a the unintended outcome of also modifying the original referenced object. In order to prevent such occurrences, access of matrix and vectors components via the () operator is locked while the objects are in an unsimplified state, and the operator throws an exception when called unless both the transpose and scaling flag are set to false, and the object in question owns the memory referenced by its pointer. The aforementioned scenario can be avoided by simply rewriting the sequence of instructions, as demonstrated in lines 6–8 of the above code snippet.

5 Software Parallelization

The current implementation of BROOMStyx supports shared-memory parallelism, which is achieved through the use of OpenMP directives. As the RealMatrix and RealVector

classes store their components as contiguous arrays with unit stride, performing operations on these objects in parallel is likely to result in

false sharing whereby different processors attempt to modify data residing on the same cache line, leading to forced local cache updates. The execution model adopted in BROOMStyx is such that all local matrix and vector contributions associated with a computational cell are handled by a single thread. An added advantage of this is that developers implementing new numerics and materials in the framework essentially write serial code without having to worry about particular details regarding parallelization.

It is common for OpenMP-accelerated software to have code regions that run serially interspersed between blocks that are executed in parallel. A common example of this is when the software has to perform I/O operations, although this can also be made to run such that a single thread executes write operations to disk while the remaining threads are already running calculations pertaining to the subsequent time step. While such procedure has not yet been implemented in BROOMStyx, we consider this a minor drawback since for heavily nonlinear problems that require a large number of iterations, the time spent doing computation far outweighs the amount of time it takes to perform serial writing of output. Nevertheless, it is beneficial to reduce the amount of time in which the software is running serial code in order to fully benefit from the multi-core architecture of present day CPUs. To this end, assembly of global matrices and vectors is done in parallel, and makes use of atomic operations to avoid data race conditions. On the other hand, solution of the resulting linear system is done via calls to third-party solvers. Consider for example the direct solver PARDISO, which implements a solution of the linear system based on a sparse LU decomposition of the global coefficient matrix. An advantage of direct solvers is that one does not need to provide preconditioners that are often necessary to accelerate iterative schemes, and which may be difficult to construct for general nonlinear systems. Their drawback is that the cost of the LU decomposition becomes prohibitive for very large problem sizes. The PARDISO interface allows for splitting of the solution procedure into four phases:

  1. Reordering, symbolic factorization and allocation of memory to be used by the solver

  2. numerical factorization

  3. forward and back-substitution

  4. Release of allocated memory

The first step is the most costly, taking up to 75% of the total solve time and is also not easily executed in parallel. Many problems of interest however can be solved without performing adaptive refinement of the initial spatial discretization, which means that the sparsity profile of the global coefficient matrix does not change over time. Nonlinear solution algorithms implemented in BROOMStyx exploit this property by performing symbolic factorization only at the beginning iteration of a substep, and then skipping this step in subsequent iterations along with the release of memory unless the solver returns an error, in which case the stalled solution is restarted going through all the phases listed above. We have found that when low order methods are employed for the discrete problem, use of the above strategy can reduce the total time for each iteration (i.e., assembly + solve) by as much as 50%.

6 Example applications

In this section, we present several numerical examples dealing on different topics and that are specifically chosen to highlights important capabilities of the BROOMStyx framework. We note that while all presented examples are in 2D, the code structure itself readily admits formulations in both lower and higher dimensions. The first example deals with torsion and shows the code’s capability to deal with higher order mappings and formulations as well as non-standard degrees of freedom that are not associated with specific geometric entities. The next two examples feature a monolithic coupling of variational and control volume formulations applied to the problem of poroelasticity. The final example deals with brittle fracture propagation in concrete using the phase-field approach, which gives rise to a coupled nonlinear system of PDEs that must be solved alternately with a Newton approach applied to the inner iterations. For all problems, discretization of the problem domain was performed using the Gmsh software[16], while 2D color plots were generated either Paraview and Gmsh, the latter used for higher order visualization.

6.1 Elastic torsion of composite shafts

The Saint-Venant torsion problem is concerned with determining the shear stresses induced in a non-circular prismatic shaft that is rigidly clamped at one end and subjected to an applied torque at the other. This problem has been extensively studied in the literature, and a discussion of solution approaches can be found in classic texts such as Fung[15]. Numerical solutions are generally designed based on Saint-Venant’s original approach of using a warping function, or alternatively on the Prandtl stress function. Both lead to Laplace/Poisson type governing equations but with notable differences in the resulting boundary conditions and constraints. In this example we make use of the former strategy, its chief advantage being that no special treatment is needed when dealing with multiply connected sections.

We begin by considering a prismatic shaft in Cartesian space, whose longitudinal axis is aligned with the -direction and whose cross section normal to said axis is denoted by . If is non-circular then it will experience warping when the shaft is twisted, however it can be assumed that its projection onto the -plane rotates as a rigid body. For a general section, the displacement at any point may be expressed as follows[36]:

(8)

wherein is the angle of twist per unit length, is the so-called warping function, and is a rigid body motion consisting of a pure translation component plus pure rotation that are both a priori unknown. The rigid clamping at corresponds to the conditions , and imposition of the first two yields

(9)

so that the displacement simplifies to

(10a)
(10b)
(10c)

On the other hand, the constraint cannot be directly imposed; instead it is weakly enforced via the condition

(11)

One can see from (10) that and vanish when and , hence the location is termed the center of twist for the cross section. For simplicity, we assume that is prescribed so that the relevant unknowns are the function and the quantities , and . Furthermore, we assume that the resulting strains are infinitesimal, i.e.,

(12)

Plugging in the displacement expressions into the above formula, we find that only two strain components are nonzero, these being and . If the component materials of the shaft are both linear elastic and isotropic, the corresponding stresses are given by

(13)

with . Consequently, the stress equilibrium equation in the absence of body forces reduces to the Laplace equation,

(14)

For inhomogeneous shafts, the surface traction must be continuous across the material interfaces. Equation (13) can be written in a more compact form as where . Consequently, the previous condition can be expressed as

(15)

where is an internal boundary separating two materials, and and denote quantities on either side of . On the other hand for surfaces that are traction-free, the relevant condition is

(16)

Finally for a given rate of twist, the resulting internal torque can be calculated as

(17)

The complete boundary value problem is given by equations (14), (15), (16) together with the constraint (11). The weak form corresponding to (14)–(16) is

(18)

where are the material interfaces, and denotes all internal and external boundaries that are traction-free. A further three equations are obtained by carrying out the minimization in (11) with respect to , and . This yields

(19)
(20)
(21)

To obtain a finite element discretization of the above problem, let and and their derivatives be approximated by

(22)

where and are values of and at node , are standard Lagrange-type shape functions, , and is the number of nodes. Plugging the above approximations into (18)–(21) which must then hold for arbitrary values of , we obtain the following system of equations in matrix form:

(23)

where the individual entries are given by

(24)

We note that (23) does not give a symmetric system. Furthermore, the subsystem consisting of the equation cannot be decoupled from the global system as (18) constitutes a pure Neumann problem and therefore does not have a unique solution in the absence of additional constraints.

The discrete problem described above is implemented in the class StVenantTorsion_Fe_Tri6 which utilizes isoparametric 6-node () triangles. The coefficient matrix is assembled by looping over all the domain elements, while the right hand side terms are assembled by looping over 3-node 1D elements that have been marked as part of the boundary. Domain integrals are evaluated using a 3-point Gauss Legendre quadrature since the integrands are at most quadratic. 555This is not always true; in particular when an element is curved then the shape functions become rational due to the mapping between reference and physical spaces. However such a case only occurs for elements immediately adjacent to a curved boundary, and with sufficient refinement the error arising from under-integration can be minimized. Meanwhile, the boundary terms are calculated using a 3-point quadrature rule in 1D as explained in Appendix B. From an implementation perspective, the system shown in (23) is interesting as the Dof objects corresponding to , and are not connected to any particular element. Instead they are associated with the Numerics instance itself, which implies that only one such object should be instantiated for the entire problem; inhomogeneity is handled through the use of appropriate Material objects which provide the necessary shear modulus for each subdomain.

To verify the implementation, we examine the case of a shaft with an elliptical cross section centered at with major axis oriented along the -direction. In this particular case the analytical solution for the warping function is given by[15]

(25)

where and are the respective lengths of the semi-major and semi-minor axes. For the numerical simulation, we take and , and the shear modulus and rate of twist are also set to unity, i.e., . The finite element solution for is plotted in Figure 2(a) for the coarsest mesh used in the study.

(a)
(b)
Figure 3: Saint-Venant torsion of a bar with elliptical cross section: (a) FE solution of the warping function using isoparametric 6-node triangles, and (b) convergence of to the analytical solution with hierarchical refinement of the mesh.

Furthermore, we can see from Figure 2(b) that the convergence rate with respect to mesh refinement is slightly better than cubic. This is to be expected for this particular case, i.e., even though the analytical solution is contained in the approximation space used, it is nonetheless impossible to reproduce it exactly as representation of the geometry with 6-node triangles results in the outer layer of elements having curved edges. Also, the warping function itself is physically meaningful only when the center of twist coincides with the origin. For cross sections having their center of twist elsewhere, the cross sectional warping is given by which must be calculated according to (10c). This is illustrated in Figure 4.

(a)
(b)
Figure 4: Numerical results showing (a) the warping function , and (b) the displacement component for an elliptical cross-section centered at .

Next, we use the same Numerics class as above to analyze a composite shaft consisting of an L-shaped lower block, an angled middle section and a hollow insert as shown in Figure 4(a).

(a)
(b)
(c)
Figure 5: Torsion of composite shaft, showing (a) geometry and discretization, (b) distribution of , (c) distribution of . It can be verified from the latter two plots that continuity of traction components normal to material interfaces is properly enforced across said surfaces.

Both the lower block and the hollow insert are made up of the same material, whose shear modulus is set to a value of , whereas the middle section has . The angle of twist is prescribed as 0.003. Once the proper mesh has been generated, setup of the problem is relatively straightforward as evident from Listing 6.

1*MESH_READER GmshReader
2*MESH_FILE CompositeSection.msh
3
4*SOLUTION_STAGES 1
5*FIELDS_PER_NODE 2
6*FIELDS_PER_CELL 4
7
8*DOF_PER_NODE 1
9    1 DofGroup 1 NodalField 1 2
10*DOF_PER_CELL 0
11
12*NUMERICS 1
13    1 StVenantTorsion_Fe_Tri6
14        NodalDof 1
15        Stage 1
16        Subsystem 1
17        CellFieldOutput 4
18            1 s_zx
19            2 s_zy
20            3 s_mag
21            4 u_z
22        TwistPerLength 0.003
23        DofGroup 1
24
25*MATERIALS 2
26    1 LinearIsotropicElasticity Torsion 100
27    2 LinearIsotropicElasticity Torsion 10
28
29*DOMAIN_ASSIGNMENTS 3
30    "LowerBlock" Numerics 1 MaterialSet 1
31    "InnerAngle" Numerics 1 MaterialSet 2
32    "HollowInsert" Numerics 1 MaterialSet 1
33
34*OUTPUT_FORMAT Gmsh
35    FILENAME  Torsion
36    OUTPUT_STYLE ascii
37    NODE_DATA 1
38        SCALAR WarpingFunction 1
39    ELEMENT_DATA 0
40    ELEMENT_NODE_DATA 5
41        SCALAR Stress_zx 1
42        SCALAR Stress_zy 2
43        SCALAR Stress_magnitude 3
44        VECTOR StressVector 1 2 0
45        SCALAR Z_Displacement 4
46
47*CSV_OUTPUT 0
48
49*LOADSTEPS 1
50    1 PREPROCESSING 0
51       START_TIME 0.0
52       END_TIME 1.0
53       INITIAL_TIME_INCREMENT 1.0
54       MAX_SUBSTEPS 100
55       BOUNDARY_CONDITIONS  1
56           "LateralSurface" 1 TorsionBoundaryTerm 1 None
57       FIELD_CONDITIONS 0
58       SOLUTION_METHODS
59           Stage 1 LinearStatic
60               LinearSolver MKL_Pardiso nThreads 12 Unsymmetric
61       WRITE_INTERVAL 0
62       POSTPROCESSING 0
63*END
Listing 6: Input file for torsion of composite shaft.

In particular, the implementation defines only one type of boundary condition, TorsionBoundaryTerm. This automatically distinguishes between material interfaces and free surface boundaries depending on whether the lower-dimensional boundary element (3-node curve in this case) has one or two adjacent domain (6-node) elements. The distribution of shear stress components and are plotted in Figures 4(b) and 4(c). One can observe that varies continuously across vertical material interfaces while is continuous across horizontal interfaces, in accordance with (15). Meanwhile, the effect of relative rigidity between subdomains on the distribution and flow of shear stresses is shown on Figure 6.

(a) ,
(b) ,
(c) ,
(d) ,
(e) ,
(f) ,
(g) ,
(h) ,
(i) ,
Figure 6: Dependence of shear flow on ratio of shear modulus of the lower block and hollow insert to that of the middle section, . Colors and arrows describe the magnitude and direction of the traction vector .

We can see that as the relative stiffness of one subdomain becomes negligible compared to its counterpart across a material interface, the behavior of said internal boundary shifts towards that of a free surface which in turn affects the flow of shear.

6.2 Monolithic FE-FV formulation for Biot poroelasticity

The linear poroelasticity theory of Biot[9] is a popular choice for describing the coupled response between mechanical deformation and fluid flow in a porous medium. It is based on the following assumptions:

  1. the medium is saturated with a single-phase fluid,

  2. Darcy’s Law is valid,

  3. mechanical deformation is quasi-static and results in infinitesimal strains, and

  4. the porous medium skeleton exhibits linear elastic behavior.

The unknown fields are the displacement of the porous medium skeleton, and the fluid pressure . Meanwhile the governing equations consist of two PDEs: a linear momentum conservation equation for the porous medium and mass conservation equation for the fluid. For some given domain having boundary , the complete initial-boundary value problem can be expressed as

in (26)
on (27)
on (28)
in (29)
on (30)
on (31)
in (32)

wherein the quantity is termed the effective stress, and the total/poroelastic stress. The adopted sign convention assumes that positive values for normal stresses denote tension, whereas fluid pressure is taken positive in compression. For pure solid mechanics problems, the most common approach is to express the equilibrium equation in a weak/variational form and to employ a method such as finite elements, as the resulting secondary variables have a well defined physical meaning. On the other hand, it is often preferred to solve the primal formulation of the fluid mass balance equation with a method such as finite volumes that enforces local mass conservation across cell boundaries. In contrast, continuous Galerkin methods enforce node-wise conservation of element nodal fluxes[19]. In the following two examples, we combine a nodal finite element scheme for stress equilibrium with a cell-centered finite volume method for fluid flow. The former utilizes piecewise contiuous shape functions constructed over a simplex mesh, while the latter makes use of the two-point flux approximation scheme (TPFA) to construct the fluxes across simplex element boundaries. The resulting FE-FV stencil is depicted in Figure 7 together with the local coefficient matrix profile corresponding to a fully coupled approach.

(a)
(b)
Figure 7: Local stencil for combined FE-FV discretization of the poroelasticity problem and corresponding local coefficient matrix for a fully coupled solution.

Notably, the latter is not square and furthermore has around 1/4 of its total entries equal to zero. One can readily see that for these type of problems, passing local contributions to the global matrix assembler in triplet (COO) format is preferable to a standard rectangular matrix since it allows for exclusion of local components that are known to be identically zero. Otherwise, the resulting nonzero profile of the global coefficient matrix becomes denser than what is necessary, which would in turn adversely affect the performance of linear solvers.

6.2.1 Mandel’s problem

Mandel’s problem [23] is a popular benchmark problem for testing the performance of both discretization schemes and solution algorithms owing to the fact that it is a non-trivial time-dependent problem in 2D with a known analytical solution. The set-up is shown in Figure 8 and involves the squeezing of a saturated poroelastic slab between two rigid, frictionless and impervious plates.


(a)
(b)
Figure 8: Setup for Mandel’s problem, showing (a) full geometry, and (b) computational domain.

The slab and plates are assumed to extend infinitely along the out-of-plane () direction so that deformation follows plane strain conditions, i.e., , and and are independent of . Due to the problem symmetry, it suffices to consider only the quarter domain , for which the relevant boundary conditions are as follows:

(33a)
(33b)
(33c)
(33d)
(33e)
(33f)

In addition, initial conditions are set to zero for both and . Rigidity of the plates along with problem symmetry implies a uniform vertical displacement at that is a priori unknown but dependent only on time. To achieve this condition, one must be able to constrain all degrees of freedom pertaining to on the top boundary to have the same value. BROOMStyx allows the end user to specify master/slave types of constraints in the input file, by selecting one master DOF (e.g., the one residing on the node located at ) and declaring all other DOFs on as slaves. The constraint is enforced in a strong sense during the simulation: unknowns associated with the slave DOFs are eliminated from the system of equations prior to determining the global sparse matrix structure. This also lumps the right hand side terms (i.e. resultant forces) onto the master DOF, which corresponds to the integral-type condition given in (33d).

Analytical expressions for the displacements and stresses have been derived by Abousleiman et al.[2], supplementing the original solution of Mandel[23] for the pressure. Due to the instantaneous application of external loading, the solutions are time-discontinuous at . Material properties for the current simulation are taken from the paper of Mikelić et al.[24] and are listed in Table 5.

Symbol Description Value
Horizontal dimension 100 m
Vertical dimension 10 m
Drained Young’s modulus Pa
Drained Poisson ratio 0.2
Intrinsic permeability m
Dynamic fluid viscosity 1.0 cP
Biot-Willis coefficient 1.0
Fluid compressibility Pa
Table 5: Dimensions and material parameters for Mandel’s problem.

A comparison of analytical and numerical solutions is shown in Figure 9.

Figure 9: Analytical vs. numerical results for the Mandel problem.

We can observe that the combined FE-FV scheme yields numerical results of the pressure that are in very good agreement with the analytical solution.

6.2.2 Terzaghi 1-D consolidation

The consolidation of a layered porous medium under a uniform external distributed load was initially proposed in Haga et al. [18], and later utilized by Rodrigo et al. [30] to demonstrate the occurrence of numerical oscillations in finite element solutions for the pressure at early times even when inf-sup stable elements are used. The problem domain is a square having each side of unit measure and consisting of a middle region of low permeability sandwiched between two highly permeable layers as illustrated in Figure 9(a).

(a)
(b)
Figure 10: Layered porous medium undergoing consolidation. Geometry, material properties and boundary conditions shown in (a). The domain is discretized using a structured mesh of alternating triangles as shown in (b); with the plot line located at .

The medium is assumed to be saturated an incompressible fluid (). For all layers, the Lamé parameters as well the Biot-Willis coefficient are set to unity, i.e. . On the other hand, the mobility coefficient is set to 1 in the outer layers and in the middle layer. Zero initial conditions are assumed for both and , and the time step is chosen as . For the current example, the domain is discretized using a structured triangular mesh as shown in Figure 9(b), with numerical results for the pressure at along a vertical line situated 0.65625 units from the left side of the domain. For comparison, these are plotted in Figure 11 together with the numerical solution obtained using unstabilized Taylor-Hood (-) elements.

Figure 11: Vertical variation of pressure at along the plot line (See Figure 10) in the layered porous medium, obtained via the hybrid FE-FV scheme. Results reported by Rodrigo et al. [30] for unstabilized - elements are shown for comparison.

As the pressure is piecewise constant across elements in the FV scheme used to model the flow equation, the initial jump in pressure at the interface between the upper and middle layers can be properly reproduced without the occurrence of overshoots, in contrast with formulations that model pressure as a continuous variable in space. Nonetheless slight oscillations are observed at later times. In particular for the current set of material properties, the oscillations achieve maximum magnitude at around (in a classical checkerboard pattern) and gradually disappear thereafter.

6.3 Brittle fracture of a heterogeneous specimen

Phase-field models for fracture are a relatively recent development in the direction of diffuse approaches to fracture modeling that have garnered a lot of interest in the last decade. In the phase-field method, it is assumed that the total potential energy associated with a fracturing solid medium is composed of bulk and surface terms along with external work terms, and can be regularized using a scalar damage/phase-field variable to yield the functional[10]

(34)

where and

denote respectively the displacement field and infinitesimal strain tensor,

is the body force, and is the external traction acting on the Neumann boundary . On the other hand, is the critical energy release rate from Griffith’s theory of brittle fracture and is a length scale controlling the amount of regularization/diffusion in the resulting crack representation. The functional in (34) leads to the classical 2nd-order phase-field model that is utilized in the current example. The regularized bulk energy density