## 1. Introduction

Many problems in insurance and finance lead to terminal or boundary value problems involving parabolic partial integro-differential equations (PIDEs). Examples include option pricing in models with jumps, the valuation of insurance contracts, ruin probabilities in non-life insurance, optimal reinsurance problems and many applications in credit risk. These PIDEs can be linear (such as PIDEs arising in risk-neutral pricing) or semilinear (such as the dynamic programming equation in many stochastic control problems). Practical applications often involve several underlying assets or economic factors, so that one has to deal with PIDEs in a high-dimensional space. These PIDEs do typically not admit an analytic solution, making the design of suitable numerical methods an ongoing challenge. Existing numerical methods include deterministic schemes such as finite difference and finite element methods and random schemes based on Monte-Carlo methods. However, finite difference and finite element methods (see e.g.

cont2005finite, andersen2000jump, matache2004fast, kwon2011second, briani2007implicit) cannot be used in the case of high-dimensional PIDEs as they suffer from the curse of dimensionality. Monte-Carlo methods (see e.g.

giles2008multilevel, casella2011exact, metwally2002using) on the other hand are suitable for problems in higher dimensions. However, these methods only provide a solution for a single fixed time-space point . This is problematic in risk management applications, where one needs to find the solution of a pricing problem for a large set of future scenarios. The naive solution via nested Monte Carlo is in most cases computationally infeasible. Regression based Monte Carlo methods (see bib:glasserman-03) can sometimes help, but the choice of proper basis functions remains a delicate issue. Moreover, it is not straightforward to apply Monte Carlo techniques to semilinear parabolic equations.For these reasons many recent contributions study machine learning techniques for the numerical solution of PDEs. A large strand of this literature is based on the representation of semilinear parabolic PDEs via backward stochastic differential equations (BSDEs). In the seminal papers han2017overcoming and bib:e-han-jentzen-17, a BSDE is discretized using a time grid , and the solution at the initial date and its gradient at every time step

are approximated by a combined deep neural network. The parameters are trained by minimizing the difference between the network approximation and the known terminal value of the solution. An error estimation of this method is given by

han2020convergence. kremsner2020deep consider an extension to elliptic semilinear PDEs and applications to insurance mathematics. Other contributions use a backward induction over the time steps . To begin with, hure2020deepestimate the solution and its gradient simultaneously by backward induction through sequential minimizations of suitable loss functions; moreover, they provide convergence results for their method. The paper of

beck2019deep uses a different discretization method, called*deep splitting*, that computes the unknown gradient of the solution by automatic differentiation, which reduces the size of the networks. For linear PDEs one may use instead the simpler global regression approach of beck2018solving. This paper uses a Feynman-Kac representation and the -minimality of conditional expectations to characterize the solution by means of an infinite-dimensional stochastic minimization problem which is solved with machine learning. pham2021neural combine the ideas of [hure2020deep] and [beck2019deep] to introduce a neural network scheme for fully nonlinear PDEs. Finally, germain2020deep extend the method in [hure2020deep] and they provide a convergence analysis that can be adapted to show convergence of the deep splitting method of [beck2019deep].

Applications of deep learning methods to partial *integro* differential equations on the other hand are scarce. castro2021deep presents an extension of [hure2020deep] to PIDEs and he generalizes the convergence results of [hure2020deep] to his algorithm. Numerical case studies are however not provided. In fact, from a numerical viewpoint the method of castro2021deep is quite involved, since one needs to approximate the solution, the gradient and the non-local term in the PIDE via three separate networks. The work of al2019applications is based on the *deep Galerkin method* of sirignano2018dgm. This is an alternative machine learning approach for PDEs, where the network is trained by directly minimizing the deviations of the fitted functions from the desired differential operator and boundary conditions.

In this paper we consider DNN algorithms for linear and semilinear parabolic PIDEs that generalize the regression approach of beck2018solving and the deep splitting method of beck2019deep, respectively. In the semilinear case we first linearize the equation locally in time using a time grid , . Then we perform a backward induction over the grid points, using in each step the DNN algorithm for the linear case. The advantage of this approach, as opposed to the method of castro2021deep, is the fact that we approximate only the solution by a DNN so that it suffices to train a single network per time step. In the semilinear case we propose a alternative linearization procedure to beck2019deep, and we show in numerical experiments that this procedure performs substantially better than the original deep splitting method of beck2019deep. Moreover, we apply our DNN algorithms also to boundary value problems, while beck2018solving and beck2019deep consider only pure Cauchy problems.

The focus of our paper is on applications to insurance and finance. Moreover, existing results on the convergence of DNN algorithms for PDEs give little guidance on how to construct an optimal network that achieves a given level of accuracy. For these reasons an extension of the convergence results from [hure2020deep] to PIDEs is left for future research. To assess accuracy and performance of our methodology we instead carry out extensive tests for several multi-dimensional PIDEs arising in actuarial mathematics. As a first test for the linear case we consider the pricing of a stop-loss type reinsurance contract in a model where claims arrive with stochastic intensity (see, e.g. grandell2012aspects, ceci2020value). In a second example we compute the ruin probability for an insurance company with three different business lines, which leads to a boundary value problem. In both cases we assess the performance of the deep learning method by comparing the solution to the results of an extensive Monte Carlo simulation. We go on and study semilinear PIDEs arising in stochastic control problems for jump diffusions. The first test case is the multi-dimensional linear quadratic regulator problem, see for instance oksendal2007applied. The second case study is an optimization problem in insurance. We consider an insurer who dynamically optimizes his holdings of some insurance portfolio in the presence of transaction costs and risk capital constraints. In the absence of capital constraints the problem admits an analytic solution and is therefore a useful test case. With capital constraints the model leads to a semilinear boundary value problem for which there is no explicit solution, and we study this case numerically. Our numerical experiments show that in all test cases the performance of the proposed approximation algorithms is quite satisfying in terms of accuracy and speed.

## 2. Modeling Framework

We fix a probability space , a time horizon and a right continuous filtration . Consider measurable functions , and , where is a separable measurable space. We assume that supports a -dimensional Brownian motion and a Poisson random measure on . The compensator of is given by for a sigma-finite measure on . We consider a -dimensional process that is the unique strong solution to the SDE

(2.1) |

Define the set and note that jumps at whenever . We assume that

(2.2) |

This condition ensures that has a.s. only finitely many jumps on , so that every integral with respect to is well-defined. The restriction to finite activity jump processes simplifies the exposition and it is sufficient for most applications in insurance. Conditions on the coefficients , and ensuring that the SDE (2.1) has a unique strong solution are given for instance in bib:gihman-skohorod-80 or in bib:kliemann-koch-marchetti-90.

By we denote the functions that are times continuously differentiable on the set , and by we denote functions that are once continuously differentiable in and -times continuously differentiable in on the set . For every function we write for the first derivatives of with respect to for respectively, for second derivatives, for , and finally denotes the first derivative with respect to time.

Define the matrix , by

and consider for the integro-differential operator given by

(2.3) | ||||

(2.4) |

The operator is the generator of , and is a solution of the martingale problem for , see bib:ethier-kurtz-86 for details.

Consider functions , and , and let be an open subset of . In Section 3 we are interested in the following boundary value problem

(2.5) | |||

(2.6) |

The special case corresponds to a pure Cauchy problem without boundary conditions; in that case we use the simpler notation to denote the terminal condition.

It follows from the Feynman-Kac formula that under some integrability conditions a classical solution of (2.5) has the probabilistic representation

(2.7) |

where . Conversely, bib:pham-98 and bib:colaneri-frey-21 provide conditions ensuring that the function defined in (2.7) is a classical solution of the problem (2.5) (for the case of a pure Cauchy problem). Further existence results for linear PIDEs include bib:gihman-skohorod-80, bensoussan1982optimal, and davis2013jump.

In Section 3 we propose a deep neural network (DNN) algorithm to approximate the function defined in (2.7). In Section 5 we are interested in semilinear problems of the form

(2.8) | |||

(2.9) |

where is a nonlinear function such as the Hamiltonian in a typical Hamilton Jacobi Bellman equation. To handle this case we partition the interval into time points , consider a linearized version of (2.8) for each subinterval , and apply the DNN algorithm for the linear case recursively.

## 3. Deep neural network approximation for linear PIDEs

In this section we consider linear PIDEs. We extend the regression-based algorithm of beck2018solving to PIDEs and we include boundary conditions into the analysis.

### 3.1. Representation as solution of a minimization problem

Fix some time point and a closed and bounded set . Define the function by the Feynman-Kac representation (2.7). We want to compute an approximation to the function on the set . The key idea is to write this function as solution of a minimization problem on an infinite dimensional space. This representation is used to construct a loss function for our deep neural network method.

Consider some random variable

whose distribution is absolutely continuous with respect to the Lebesgue measure such that the corresponding density has support (in applications the distribution ofis often the uniform distribution on

) and denote by the solution of the SDE (2.1) with initial value . Define the random variableAssume that and that the function belongs to . Since is a Markov process it holds that , where is the sigma-field generated by . Since is square integrable we thus get from the -minimality of conditional expectations that

(3.1) |

Since and since the density of is strictly positive on we conclude that is the unique solution of the minimization problem

(3.2) |

The problem (3.2) can be solved with deep learning methods, as we explain next.

### 3.2. The algorithm

The first step in solving (3.2) with machine learning techniques is to simulate trajectories of up to the stopping time . The simplest method is the Euler-Maruyama scheme. Here we choose a time discretization , ,^{1}^{1}1We use to index the time steps in the Euler-Maruyama scheme and to index the grid points used in the linearization step of the deep splitting method in Section 5.
generate simulations of the random variable and simulate paths of up to the stopping time by the following recursive algorithm. We let , and for ,

(3.3) | ||||

(3.4) |

Note that the integrand in the integral with respect to is evaluated at so that this integral corresponds to the increment of a standard compound Poisson process. Using these simulations we compute for each path

where the integrals on the right can be approximated by Riemann sums.

In the next step we approximate by a deep neural network . We determine the network parameters (training of the network) by minimizing the loss function

(3.5) |

For this we rely on stochastic gradient-descent methods; algorithmic details are given in the next section. This approach can be considered as a

*regression-based*scheme since one attempts to minimize the squared error between the DNN approximation and the given terminal and boundary values of the PIDE.

## 4. Examples for the linear case:

In this section we test the performance of the proposed DNN algorithm for linear PIDEs in two case studies. First we price a reinsurance contract in the model of ceci2020value, where the claims process follows a doubly stochastic risk process. In the second example we compute the ruin probability of a non-life insurance company with several business lines, which leads to a boundary value problem.

### 4.1. Valuation of an insurance contract with doubly stochastic Poisson arrivals

We consider an insurance company and a reinsurer who enter into a reinsurance contract with a given maturity . To model the losses in the insurance portfolio underlying this contract we consider a sequence of claim arrival times with nonnegative intensity process and a sequence of claim sizes that are iid strictly positive random variables independent of the counting process defined by . The loss process is given by We assume that the are Gamma(,) distributed with density

. This is a common choice in insurance. Moreover, the Gamma distribution is closed under convolution so that the sum of independent Gamma distributed random variables can be generated with a single simulation, which speeds up the sampling of trajectories from

. The claim-arrival intensity process satisfies the SDE(4.1) |

where is a standard Brownian motion. In this example it is convenient to write the process in the form . We assume that the reinsurance contract is a stop-loss contract, i.e. the indemnity payment is of the form with

(4.2) |

The market value at time of the reinsurance contract is defined by

(4.3) |

ceci2020value show that is the unique solution of the PIDE with terminal condition and generator

for , . There is no explicit solution for this PIDE, and we approximate on the set with a deep neural network . Table 1 contains the parameters we use. Paths of the processes and are simulated with the Euler-Maruyama scheme and .

Figure 1 shows the approximate solution obtained by the DNN algorithm; details on the training procedure are given in Remark 4.1.

As a reference we compute for fixed approximate values with Monte-Carlo using simulated paths for each point (paths are simulated with the Euler-Maruyama scheme).
The *relative -error* between the DNN approximation and the MC-solution is defined as

(4.4) |

Using 1000 simulations of we obtained a relative error of . On a Lenovo Thinkpad notebook with an Intel Core i5 processor (1.7 GHz) and 16 GB of memory the computation of via the training of a DNN took around 322 seconds, whereas the computation of with Monte-Carlo for a fixed point took around 4.3 seconds. This shows that the DNN approach is faster than the MC approach if one wants to compute for a grid with more than 100 grid points. Note also that the training of the networks can be accelerated further by using GPU computing.

###### Remark 4.1 (Details regarding the numerical implementation).

Our choice of the network architecture largely follows [beck2018solving]. Throughout we use a DNN with 2 hidden layers. In the reinsurance example we worked with 50 nodes for the hidden layers. For a payoff of the form (4.2) it is advantageous to use softplus

as activation function for the hidden layers and the

identity for the output layer. The networks are initialized with random numbers using the Xavier initialization. We use mini-batch optimization with mini-batch size, batch normalization and 10000 epochs for training. We minimize the loss function with the Adam optimizer and decrease learning rate from 0.1 to 0.01, 0.001 and 0.0001 after a 2000, 4000 and 6000 epochs.

### 4.2. Ruin probability of an insurer with different business lines

In the second case study we consider a non-life insurer who provides financial protection against losses from earthquakes, storms and floods over a given time horizon . The occurrence of these natural disasters is modeled using independent Poisson counting processes with constant intensities . The magnitudes of hazards caused by earthquakes, storms or floods are modeled by iid sequences , , of positive random variables.

Furthermore we consider catastrophic events that cause simultaneous losses in the windstorm and flood business lines. These events follow a Poisson process with constant intensity and iid claim sizes sequences , . Define . Then the risk process is given by

(4.5) | ||||

(4.6) | ||||

(4.7) |

where is a -dimensional standard Brownian motion, and . The continuous terms are an approximation of the difference of cumulative premium payments and small losses up to time for each business line. We assume that all claim sizes are Gamma(,) distributed for . The generator of is given by

where

We define the ruin time as the first time the minimum of all three risk processes falls below zero, i.e. . Define the set and note that is the first exit time of the risk process from . Put

The ruin probability at time given is

(4.8) |

By standard arguments is the unique solution of the linear PIDE

(4.9) |

with boundary condition .

We approximate on with a deep neural network assuming that . Moreover we compare the result to Monte-Carlo approximations for fixed points . Table 2 contains the parameters we use.

Figure 2 shows the approximate solution obtained by the DNN algorithm on the sections , and . The network architecture is as in Remark 4.1, but since we are working in dimensions we choose mini-batch size and 100 nodes for the hidden layers. To verify our result we computed with Monte-Carlo for fixed using path simulations for each point. The relative error (4.4) was computed to (using 1000 simulations of ), which is again very small. Training of the network took around 740 seconds; the computation of the reference solution via Monte Carlo took around 20 seconds per point.

This example clearly shows the advantages of the DNN method over standard Monte Carlo for computing the solution on the entire set . Suppose that we want to compute the solution on (as in our case study). Even the very coarse grid has already 125 gridpoints, and computing the solution for each gridpoint takes approximately seconds, which is about three times the time for training the network.

## 5. Deep learning approximation for semilinear PIDEs

Next we consider semilinear PIDEs of the form

(5.1) | ||||

(5.2) |

Here is a nonlinear function and the integro-differential operator is given by (2.3). Our goal is to extend the deep splitting approximation method developed by beck2019deep for semilinear PDEs without boundary conditions to the more general equation (5.1). Moreover, we propose an approach for improving the performance of the method. In the derivation of the deep splitting method we implicitly assume that a classical solution of this PIDE exists; see for instance davis2013jump for results on this issue.

### 5.1. Basic method

We divide the interval into subintervals using a time grid and we let . Suppose that is the unique solution of the PIDE (5.1). Define for a fixed the function

Then also solves the linear PIDE

(5.3) |

Applying the Feynman-Kac formula to the linear PIDE (5.3) we get the following representation for

(5.4) |

Here solves the SDE (2.1) with initial condition , that is

and the stopping time is given by .
In the deep splitting method proposed in [beck2019deep] the integral term in (5.4) is approximated by the *endpoint rule*

(5.5) |

An approximate solution , is then found by backward induction over the grid points. The solution at the maturity is defined using the terminal condition, that is Given we then compute , as follows. First we put

(5.6) | ||||

(5.7) |

We then compute by applying the regression based DNN algorithm method from Section 3. In this step it is important to work with DNNs with a smooth activation function so that is well-defined.^{2}^{2}2If is not one has to define as DNN approximation to the terminal condition; see for instance germain2020deep.

###### Remark 5.1.

An additional complication may arise if the domain is unbounded. In that case we compute the approximate solution on some bounded set and we have to make sure that the set is big enough so that the probability of generating paths with but is sufficiently small so that these paths can be ignored without affecting the training procedure. There are various ways to achieve this, see beck2019deep for details.

### 5.2. Alternative method

Next we discuss an alternative linearization procedure based on the *midpoint rule*

(5.8) |

Here is the midpoint of the interval . It is well known that (5.8) usually provides a better approximation to an integral than the endpoint rule (5.5). We therefore propose the following approximation scheme based on the midpoint rule. In a first step we apply the approximation (5.6) over the smaller interval ; this yields an approximate solution at . Next we define

(5.9) | ||||

(5.10) |

as before, a numerical approximation to is computed via the deep learning algorithm from Section 3. Our numerical experiments in Section 6 show that the approximation based on the midpoint rule performs significantly better than the original deep splitting method from beck2019deep.

###### Remark 5.2 (Convergence results).

germain2020deep recently obtained a convergence result for the deep splitting method for PDEs. They consider only Cauchy problems (no boundary conditions) and they make strong Lipschitz assumptions on the coefficients of the equation. Under these conditions they come up with an estimate for the approximation error in terms of the mesh of the partition and the size of the DNN used in the individual time steps. We are confident that similar results can be obtained for PIDEs. However, theoretical convergence results are of limited practical use for designing an effective DNN approximation to PIDEs, and we leave this technical issue for future work.

## 6. Examples for the semilinear case

In this section we test the algorithm in two case studies. First we consider the well-known stochastic regulator problem. This problem leads to a semilinear PIDE with an explicit solution that can be used as a validity check for our methods. The second case study is an actuarial example dealing with the optimization of an insurance portfolio under transaction costs and risk capital constraints.

### 6.1. Stochastic regulator problem

The first example is the *stochastic linear regulator*. Denote by an adapted control strategy and consider the controlled -dimensional process with dynamics

(6.1) |

Here is a -dimensional standard Brownian motion, , , are positive constants, and is the compensated jump measure of independent compound Poisson processes with Gamma()-distributed jumps. Denote by the set of all adapted -dimensional processes with and consider the control problem

The interpretation of this problem is that the controller wants to drive the process to zero using the control ; the instantaneous control cost (for instance the energy consumed) is measured by . At maturity the controller incurs the terminal cost .

The Hamilton-Jacobi-Bellman (HJB) equation associated to this control problem is

with terminal condition . The minimum in the HJB equation is attained at , so that the value function solves the semilinear PIDE

Comments

There are no comments yet.