## 1 Introduction

Currently, several hardware and software companies announced or already released augmented reality devices, such as Microsoft’s Hololens, Sony’s SmartEyeglass, or Epson’s Moverio Pro. Using such devices allows the user to see the real world augmented with additional information, such as the state of the object the user is looking at or even holograms displaying 3D scenes over the real world.

One interesting application of such devices is to display the result of numerical simulations. Having simulation results ubiquitously available supports engineers or decision makers in the field Dibak2014; Dibak2015; Dibak2017b. As an example for such pervasive applications, consider an engineer who has to find a solution for placing a hot exhaust tube during deployment of a machine in a factory. To this end, the engineer uses her augmented reality device, which directly shows the heat of the surface of the tube as if the machine were operational. She can adjust the airflow surrounding the tube by changing parameters. The application enables her to see the heat even in complex regions, e.g., in bends, and she can place the tube according to surrounding materials. Other applications for mobile simulations are the visualization of simulation results based on readings from nearby sensors for Internet of Things applications or simulations on drones in order to exploit the wind-field for energy-efficient flight routes Ware2016.

The main challenge for the computation of simulation results on mobile devices is the computational complexity. While it is feasible to visualize even complex 3D data using dedicated GPUs on the mobile device, calculating this data is still challenging due to slow processors and limited energy resources of mobile devices. Therefore, we present an approach to support interactive simulations on resource-constrained mobile devices by distributing computations between the mobile device and a server infrastructure. Moreover, the programmer should be able to re-use existing simulation code. Such code is typically optimized for the server architecture, which might include special hardware, such as additional GPUs, or vector processing instruction sets, not available on the mobile device. Besides the computational aspect, the communication overhead is critical for an efficient distributed execution on mobile devices and a remote server. Therefore, we also need to minimize the amount of data communicated between mobile device and server.

To support numerical simulations on mobile devices, we present a middleware in this paper that allows for the efficient distribution of simulations between a mobile device and remote server infrastructure. This middleware is based on the so-called Reduced Basis Method (RBM) that allows for reducing the complexity of simulations by pre-calculating an optimized basis reducing the dimensionality of the simulation problem. The basic idea of applying the RBM to distributed mobile simulations is to calculate the reduced basis remotely on a server and to transfer the basis to the mobile device. Using the reduced basis, the mobile device is able to perform simulations locally with reduced computational overhead to calculate approximate solutions, where the reduction of quality of approximate solutions is well-defined and bounded.

Using the RBM for efficient distributed simulations is not completely novel and has already been proposed in Huynh2010. However, approaches so far are rather static. They calculate and deploy a basis once, which is then used for all subsequent simulations. In contrast, our approach allows for the dynamic adaptation of the basis during runtime.

In detail, the contributions of this article are: (1) identification of a new class of serious augmented reality applications, namely interactive mobile simulations; (2) presentation of a mobile simulation middleware that supports the programmer in implementing such applications; (3) two approaches to enable mobile interactive simulations with and without a-priori known parameter restrictions; (4) two approaches optimizing the bottleneck on mobile devices, which is reading data from internal storage; (5) one approach to improve the pre-calculation step of the RBM for our mobile approach; (6) real-world evaluations, including comparison of run-time and energy consumption of the methods, showing that our approaches are up to over 131 times faster and consume 73 times less energy compared to offloading everything to a server.

This work is an extension of our prior work Dibak2017a, in which we presented two approaches to enable mobile interactive simulations and one approach to reduce the amount of data to be read from internal storage. In this work, we extend our previous work by another approach for reducing the read operations which is built on top of the previously introduced approach. Additionally, we present a novel approach for the optimized pre-calculation on the server that further reduces the energy cost during runtime on the mobile device.

The rest of the paper is structured as follows. Section 2 provides background on the RBM followed by the system model in Section 3. Section 4 introduces a basic approach, which will be modified to the adaptive approach in Section 5. In order to reduce the number of snapshots during the computation, we present the subspace approach and the reorder approach in Section 6 and Section 7. Section LABEL:sec:reorder-basis-generation introduces a new approach for pre-processing on the server in order to improve the reorder approach. Section LABEL:sec:evaluation evaluates our approaches, before we discuss related work in Section LABEL:sec:relatedwork and conclude the paper.

## 2 The Reduced Basis Method

Our approach to enable interactive mobile simulations utilizes the Reduced Basis Method (RBM) to solve complex numerical problems by calculating parts of the simulation on a remote server infrastructure. In order to better understand the problem to be solved and the solution, we first give a brief introduction to the numerical problems to be solved and the RBM in this section, before we explain the approach in the following section. A more in depth explanation of RBM can be found in Patera2007; Haasdonk2011b; Haasdonk2014.

### 2.1 The Full Numerical Problem

Simulations predict the behavior of a system based on a model. Commonly,
such models are described using differential equations. Such equations
need to be discretized, which leads to large algebraic equations of the
form , where is a given matrix, is a given vector
called the right-hand side, and is the unknown solution. We call
this equation the *full problem*.

Simulation models contain parameters describing different properties of the system that can be changed. Such parameters can be used to interact with the system, e.g., to insert sensor readings or user input. To express the dependency on parameters, we formulate the full problem as

(1) |

where is a vector including all parameters of the simulation.

### 2.2 Parameter Separable Matrices

The essential part of RBM is the parameter separability of the matrix and the right-hand side . Parameter separability of a vector or matrix is given if we know scalar functions that map from the parameter space to real numbers and matrices that have the same shape as such that

(2) |

Such a representation can be derived from the model equation or using Empirical Interpolation Methods

Barrault2004.### 2.3 The Reduced Problem

The RBM represents approximate solutions of the full numerical problem
as linear combination of *snapshots*. Snapshots are pre-computed
and linear independent solutions for typical parameters. The snapshots
form the snapshot matrix . Therefore, the approximation to the real
solution is , where vector
is called the reduced solution. The size of the reduced
solution is the number of snapshots.

Using , we can rewrite Equation 1 as . This is an overdetermined system. Therefore, we multiply the full problem from left with , which yields our reduced problem . We call the reduced matrix with snapshot matrix . Notice that is again a separable matrix. The matrices () in this separation can be pre-computed, and the matrix can be rapidly assembled:

Similarly, can be partially pre-computed and rapidly assembled.

The process of computing a reduced solution is now to solve and then to reconstruct to the full problem space by multiplication with . Solving the low dimensional problem is much faster than solving the full problem, as the low dimensional problem has only the size of the number of snapshots, which is typically much smaller than the full problem size.

Using a reduced basis instead of solving the full problem typically degrades the quality of the solution. To express the quality of the solution, we use the residual norm of the approximation as error indicator. The residual is the difference of and the right-hand side . If the residual is , the approximation is the exact solution of the algebraic problem . The residual is a well-known measurement for approximations in numerics and can be computed very efficiently using pre-computation (c.f. Appendix). This allows for fast quality checks for specific parameters during run-time.

### 2.4 The Basis Generation Process

Using the residual as error indicator, the snapshots can be computed
from a training set of parameters using a *greedy*
approach Veroy2003.

Figure 1 depicts the pseudo code of the greedy basis generation method. The user provides a set of training parameters () and a maximum threshold for the residual (). The initial basis can be either an existing basis, or the reduced basis based on the evaluation of a random training parameter. The algorithm will terminate when the generated basis provides a residual norm lower than the provided threshold for all parameters in the training set. In every iteration of the loop, one solution of the full problem is computed and added to the basis. For this computation, the parameter that yields the maximum residual norm using the current basis is chosen.

### 2.5 Limitations of the Reduced Basis Method

The RBM converts the full dimensional problem into a lower dimensional problem by computation on snapshots. The dimension of the reduced problem depends on the number of snapshots. The number of snapshots depends on the characteristics of the problem and the quality required by the user. The here introduced method works only for stationary problems without any changes to the geometry. However, in recent literature, RBM approaches for time-dependent problems or problems with changing geometry have already been proposed Haasdonk2014; Rozza2008. While such approaches could be used in the approaches presented in this article, we will focus on stationary problems and fixed-geometry.

## 3 System Model

Next, we describe the assumptions on hardware and software components, as well as the interfaces between the components in our mobile simulation middleware. Figure 2 depicts an overview of the system components and interfaces.

### 3.1 System Components

The system consists of two compute nodes, namely the mobile device and the server. Both nodes are connected via a wireless communication channel. Furthermore, the system consists of two software components provided by the application programmers, the numerical simulation and the user application, and the middleware, which defines the distribution of computations.

The mobile device is the augmented reality headset carried by the user. Energy consumption on the mobile device is critical as it is battery-powered. There are two distinct energy consumers on the mobile device, the processor and the communication module.

In contrast to mobile devices, the server provides fast execution. It can be scaled-up by using specialized hardware, such as GPUs for efficient computation of numerical codes, or scaled-out by adding more servers in a cloud infrastructure.

For the wireless communication channel between mobile device and server, we assume data rates of multiple Mbit/s, as provided by state of the art wireless communication technologies like IEEE 802.11 (WiFi) or 4G cellular networks.

The numerical simulation is implemented by the simulation expert. The simulation problem is implemented as a separable matrix and a separable vector representing the simulation problem as described in Section 2. Parameters of the simulation model are represented by a vector . Additionally, the simulation expert has to define the quality requirements of the application. The quality has to be specified by two parameters. The first parameter, say , is the discretization of the full problem. The second, say , is the maximum residual value, which is an indicator for the error introduced by the RBM.

The user application is implemented by the application programmer. It sends queries to the middleware. Queries contain a parameter vector , which encodes sensor data or user input. When the query is answered, the user application visualizes the simulation results on the augmented reality headset.

The mobile simulation middleware connects the components. It executes code on the server and on the mobile device. Intuitively, the reduced basis method will be used to answer queries with low latency on the mobile device, and the compute-intensive pre-computation of the reduced basis will be performed on the server.

### 3.2 Interfaces

The numerical simulation and the user application provide interfaces for the mobile simulation middleware. Figure 3 shows an overview of all interfaces. There are three methods of the numerical simulation to be called by the middleware and one method called by the user application.

The numerical simulation has to implement three interfaces providing the
problem matrix, the right-hand side, and solutions to the simulation
problem. The problem matrix and the right-hand side has to be provided
in parameter separable form. This call is only depending on the quality
parameter . The interface to provide solutions of the
simulation problem, called *snapshot*, provides as the
solution of the problem depending on the
parameter. Notice that the implementation of the interface to provide
snapshots is optional. The mobile simulation middleware could also use
a generic algorithm to solve this problem. However, the simulation
expert usually knows which solver should be used to solve the simulation
problem efficiently.

The user application sends queries to the mobile simulation middleware. Queries contain the parameter . The middleware will return an approximate solution which fulfills the quality requirements given by the simulation expert.

## 4 Basic Approach

In the following, we present our approaches for the efficient execution of mobile simulations using the Reduced Basis Method. We first present a basic approach in this section, which is then further extended to improve adaptability and energy efficiency in the following sections.

The basic approach for processing queries with different parameters on the mobile device consists of four steps: (1) generation of the reduced basis on the server; (2) communication of the reduced basis from the server to the mobile device where the reduced basis is stored on the internal storage; (3) loading the reduced basis from the internal storage of the mobile device; (4) processing queries on the mobile device using the reduced basis.

The generation of the basis is executed on the server. To this end, the mobile device sends a request to the server which contains all information needed for the basis generation process. This includes the training set and the minimum quality as maximum residual threshold, which depends on the application. The training set can be given by the domain expert or, in applications where sensor values are read, the mobile device can first collect some sensor data, statistically obtain the distribution of the parameter , and then use this distribution to create the training set for the reduced basis.

Once the basis has been generated on the server, it is sent to the mobile device. The mobile device stores the basis on internal storage. Notice that the pre-computation of the reduced basis can take multiple minutes, depending on the numerical simulation code, the training set, and the number of snapshots needed to achieve the quality as specified. However, this step is only needed once for initialization and should not be performed when latency-sensitive queries need to be processed.

Data | Size |
---|---|

Snapshots | |

Reduced Problem Matrix | |

Reduced Right Hand Side | |

Residual Computation Matrices |

Figure 4 lists the size of the data communicated and stored on the mobile device. The size of the data depends on the number of snapshots , the number of discretization points of the full problem , and the number of summands in the separation of the problem matrix and the right-hand side . The number of discretization points , which depends on , is by far the largest part, typically multiple thousand floating point numbers (in our evaluation up to with ). The number of snapshots depends on the residual and is typically below in our experiments. Numbers and are constant for a given problem. In our evaluation these values were and .

After the basis is stored in a file on the mobile device, this file needs to be read by the middleware on the mobile device. As the file size for the reduced basis can grow rapidly, reading the data from the file can take up to seconds. However, this step is needed only once and can be performed when the user starts the user application, long before the first query will be received by the middleware. The basis can then be stored in memory for processing of multiple queries.

Processing a query is then straightforward as described in Section 2.3. First, we need to assemble the reduced system and then compute the solution of the reduced problem. After that, we need to multiply the solution with the snapshots to get an approximate solution of the full problem. In addition to the approximate solution, we also calculate the residual of this approximation and provide this information to the application. Notice that fulfilling the quality constraints for queries with parameters outside the range of the training set cannot be guaranteed using this approach. However, it is known that the quality of the result does depend on the region of the parameters rather than the density or specific choice of parameters in the training set Haasdonk2014. Therefore, for queries with parameters inside the range of the training set, the resulting approximation should have high quality. Furthermore, for many practical problems, the parameter region is known a priori by physical constraints. For example, if one parameter is the heat conductivity of some material, the application can request the reduced basis in the range of all materials to be used for the specific purpose, e.g., all exhaust tubes ever used by the company.

The basic approach has several drawbacks. First, the parameter range needs to be known before the basis generation process. If the parameter range changes, e.g., because the range of sensor values changes, the approach has to start from scratch. We therefore present an adaptive approach in the next section. Another problem is the latency and energy overhead introduced by reading the reduced file from internal storage of the mobile device. This is significantly improved using the subspace approach, which will be presented in Section 6.

## 5 Adaptive Approach

If the parameter range and distribution are not known a priori, the basic approach might not be able to fulfill the constraint on quality for all queries. We therefore introduce an adaptive approach next that refines the basis during runtime. This approach is more flexible and also suitable for harder simulation problems, i.e., problems that need more snapshots to fulfill the user requirements.

### 5.1 Overview

The adaptive approach builds upon the basic approach. Similar to the basic approach, some initial reduced basis is made available on the mobile device as described in the previous section. However, in contrast to the basic approach, when a new query arrives, the adaptive approach first computes the residual of the approximate solution provided by the RBM. If the residual fulfills the quality requirements of the application, the query will be answered with the approximate solution. If the residual does not fulfill the requirements of the application, the mobile device will request an update of the reduced basis from the server. Once the mobile device receives the update, it can again compute the approximate solution, which will—as a property of the RBM—return the exact solution of the full problem. Figure 5 depicts the pseudocode of the adaptive approach.

In the following, we will describe the parts of the approach, including the computation of the error indicator and content of the server request, and the processing of the update on the server.

### 5.2 Error Indicator and Server Requests

In addition to the basic approach, for handling query , the mobile device has to compute error indicators for the approximate solution provided by the RBM. This error indicator represents the quality of the approximate solution. One very generic error indicator is the residual. The computation of the residual can be implemented very efficiently by exploiting the parameter separability (c.f. Appendix).

Once the mobile device has computed the error indicator, it can check if the quality bounds of the user can be met. If the result is insufficient, the mobile device will request a basis update from the server. This basis update contains the parameter of the query and the identifier of the reduced basis which is currently used. As an identifier, the parameters of the snapshots and the discretization of the underlying numerical simulation can be used.

### 5.3 Computation on the Server

When an update request with parameter and an identifier of the reduced basis is received by the server, the server first loads the properties of the reduced basis. It then computes a solution of the full problem with parameter and the discretization settings of the reduced basis. After computation of the full solution, this solution is orthogonalized to other basis vectors and is normalized to obtain more robust numerical systems. The server then computes the updated separable problem matrix and the separable right-hand side (c.f. Section 4). Last, the updated residual matrices are computed. All of these operations require high-dimensional and costly operations. However, the most time consuming operation is the computation of the full solution on the server. Therefore, there is only little overhead compared to a pure offloading approach, where only the full problem solution is computed on the server.

### 5.4 Basis Updates

Once the server has computed the update of the reduced basis available on the mobile device, it sends the update back. The update includes a snapshot and updates for the separable matrices. Most entries of the matrices can be re-used, and the update does only contain one column and one row vector of the matrices. Nevertheless, the size of the update grows linearly with the number of snapshots included in the reduced basis. However, for a small number of snapshots, the dominant part is still the snapshot of the full problem. Therefore, the overhead to only communicating the full problem result is very small (for instance only for a 2D problem with points, , and snapshots).

## 6 Subspace Approach

In our analysis of the basic approach, we found that reading the
snapshots from internal storage is the major energy-consuming part. We
therefore present in this and the following section approaches for
reducing the number of snapshots needed for query processing on the
mobile device. In this section, we present the *subspace
approach*, which limits the computation of the problem to a subspace of
the vector space spanned by all snapshots.

The reduced basis is generated such that it fulfills quality requirements for all parameters in the training set. However, for one specific parameter , it might be sufficient to compute on fewer snapshots. In the subspace approach, we therefore limit the computation to the first snapshots in the order given by the reduced basis. Therefore, if snapshots are given, we want to find such that the quality constraint is still fulfilled and compute an approximation only on (c.f. Fig. 6). This saves us from reading snapshots while still fulfilling the quality requirements of the user.

The subspace approach is divided into two problems. First, we explain how we can reuse the data structure of the matrix for computation on a subspace. Second, we explain how we can find the snapshot given the quality constraint by the user. Last, we shortly discuss how this approach can be combined with the adaptive approach.

### 6.1 Reusing the Data Structure

For computing a solution on the reduced basis spanned only by the snapshots , we can reuse the existing data structure. We can compute on sub-matrices which are created when trimming rows from the right and columns from the bottom.

For the reduced problem matrix, we just need the first rows and the first columns. Similarly, we only need the first entries of the reduced right-hand side. The residual matrices can be trimmed similarly. Notice that the right-hand side and the reduced problem matrix are separable matrices. Trimming the separable form of the matrices therefore includes trimming multiple matrices.

Notice that the reuse of the data structure is essential at this point. Re-computation of the reduced problem matrix would otherwise involve the high dimensional problem matrix and the snapshots. Using the sub-matrices, neither the problem matrix, nor the snapshots are needed for computation of the residual for the subspaces.

### 6.2 Subspace Selection

Now that we know how to compute a solution of the reduced problem by computation on the sub-matrices, we want to find , such that computing on snapshots gives us a solution that fulfills the quality requirements of the user. We call the subspace spanned by the first snapshots .

In order to find for , we use a linear search. When a query arrives with parameter , we first load the reduced problem matrix and the residual matrices into memory. We then loop, starting with , compute the subspace , and compute the residual for parameter on , until we find the lowest such that fulfills the quality requirements. Once this is known, we load the snapshots from the file into memory and reconstruct the reduced solution in the full problem space.

The linear search could also be bottom-up starting with one snapshot or could be replaced by a bisection approach. However, this would result in longer search time when the number of snapshots needed is high.

The subspace approach can also be used with the adaptive approach. If the quality check for fails, the mobile device can request a basis update from the server. We then have a three-level storage model, where snapshots are either stored in-memory, on internal storage, or on the server.

## 7 Reorder Approach

For the subspace approach, the order of the snapshots is fixed. This might lead to suboptimal solutions, e.g., when the query has the same parameter as the last snapshot. In this example, the subspace approach needs to choose all snapshots. If we would reorder the snapshots according to the importance of the snapshots, then the snapshot with the same parameter would be the first and the subspace with only the first snapshot would already be sufficient. This motivates our reorder approach, which we introduce in this section as a preceding step to the subspace approach. The reorder approach operates on pre-computed data in order to allow the subspace approach to reduce the number of snapshots needed for the computation.