1 Introduction
Suppose we are given an independent and identically distributed statistical sample from some unknown continuous density .^{1}^{1}1The subscript in does not indicate a partial derivative; it merely indicates that that
is associated with the random variable
. Estimating the density is one of the most common methods for discovering patterns in statistical data [44, 5, 43].When the support of is the whole real line, a simple and popular nonparametric method for estimating is the kernel density estimator
with a Gaussian kernel . Here is the socalled bandwidth parameter that controls the smoothness of the estimator (see, for example, [43, 44] and references therein).
It is wellknown that is not an appropriate kernel estimator when has compact support [18], which (without loss of generality) we may assume to be the unit interval . The main reason for this is that exhibits large boundary bias at the endpoints of the interval. For example, no matter how small the bandwidth parameter,
will have nonzero probability mass outside the interval
. Various solutions have been offered to cope with this boundary bias issue, which we may classify into three main types:
Sometimes, we not only know that has support on , but also we have extra information about the values of at the boundaries. One example of this situation is what we will refer to as a linked boundary condition, where we know apriori that
for some known given parameter .
This situation arises in the field of Systems Biology [25], where inferences from many snapshots of cell data from unknown time points are made via pseudotime algorithms and where data are known to be cyclical because of the cell cycle, for example. One then wishes to estimate the distribution of cells on the pseudotime scale. The value at the left boundary of this distribution must as a consequence of cell division be double the value at the right boundary. In other words, we have linked boundary conditions with the constant , but otherwise we do not know the value of the density at the boundaries of the domain. See [25], for example, for further motivation and discussion.
Unfortunately, to the best of our knowledge, all of the currently existing kernel density estimation methods, biascorrecting or not, cannot satisfactorily handle the linked boundary condition.
Figure 4 in the numerical section 5.2 shows a typical example of what can go wrong when currently available density estimators are applied to real biological data. The result is a smooth density with two unacceptable features:

The domain is not respected, and instead the solution has positive density for negative values of , and also for , which are physically unreasonable.

The density does not respect the important biological constraint of the linked boundary condition (that the left value should be double the right, in this particular application), and instead the density decays to zero as becomes large.
The purpose of this paper is to describe a new kernel density estimator that can handle the more general problem of linked boundary conditions with an arbitrary value of ; the situation of interest in the biological application where is then solved as an important special case. Figure 5 (C) shows a successful application of our proposed method.
Our proposed estimator is of type (a), that is, we construct a special kernel with support on , and such that the linked boundary condition is incorporated into the resulting estimator. Our kernel is inspired by the solution of a diffusiontype PDE [1, 3, 31, 38, 50]. In particular, we modify the diffusion model in [3] so that it satisfies the linked boundary conditions.
We also consider the discrete counterpart of the continuous model for two reasons. First, it is a numerical approximation to the continuous model and a useful way to compute the solution of the PDE. Second, the discrete model is relevant when we deal with data which is already prebinned in a large number of bins.
The rest of the article is organized as follows. In the next section we describe the continuous model for the application at hand. Our results provide the necessary assurances for this to be a useful model in statistics for the purposes of density estimation. Next, we study a discrete model of the same situation, and establish convergence to the continuous model. We then briefly discuss the issue of choosing a stopping time, including pointwise bias, asymptotic properties and the AMISE. These results are applied to an example with biological data in §5.2 and some simulated examples. We then finish the main part of the paper with a short conclusion. After that, we present a derivation of the solution in §7, and explanations of its properties in §8
. We formally solve the non selfadjoint initial boundary value problem as a contour integral using the unified transform, a novel transform for analysing boundary value problems for linear (and integrable nonlinear) partial differential equations. This formal solution is then rigorously verified and studied via a nonsymmetric heat kernel which generates a series expansion in nonseparable generalised eigenfunctions of the spatial derivative operator.
2 The Continuous Linked–Boundaries Model
First, recall the diffusion PDE approach to density estimation [3, 24], where a diffusion partial differential equation in one space dimension is employed for kernel density estimation purposes [43, 44]. We now present the continuous diffusion model that satisfies the linked boundary condition, and derive its analytical solution. Our proposed diffusion model for a linkedboundary KDE is the continuous solution of the PDE:
(1) 
Here the initial condition (IC),
(2) 
is simply the empirical measure of the given data . In other words, is a sum of Dirac delta distributions. Our results hold for the common cases of practical importance, such as for initial data of the form in (2), and also for functions that are probability densities (or even general distributions) which we denote by .
The first boundary condition (BC) involves a constant . Notice the ratio of the left boundary value to the right boundary value is
The constant ratio links the boundaries^{2}^{2}2Results that we present later, such as Theorem 1, can in fact be generalised to any complex (the PDE is degenerate irregular and the problem illposed when ), but due to the application we have in mind we stay with the case throughout this article. and is assumed to be nonnegative: .
This PDE (1) is a natural candidate for density estimation with such a linked boundary condition. However, it is not immediately obvious what properties the PDE has. For example, one question is whether or not the solution is a probability density. More fundamental questions concern existence and uniqueness, which are well known in the special selfadjoint case of periodic boundary conditions when .
2.1 Wellposedness of the continuous model
We begin by establishing existence and uniqueness in Theorem 1. It turns out that results analogous to the well known case hold. First we require the following definitions.
Definition 1.
Denote the class of finite Borel measures on by and equip this space with the vague topology (i.e. weak topology). Then we let denote the space of all functions
such that is continuous in the vague topology.^{3}^{3}3This is the condition that the integral is continuous as a function of for any given function continuous on the interval .
Notably, this is weaker than , where has the usual Banach space structure with total variation norm. In this case the would need to be continuous in this norm. For instance, .
The usual story in something like the finite element method, for example, is to first multiply the original differential equation (1) by a smooth test function, then integrate, and then apply integration by parts. This procedure leads to the weak formulation of the problem, and similar steps apply here too as follows, where we use to denote our smooth test functions. In terms of notation, denotes the derivative with respect to , whereas does not denote a derivative, and instead the subscript denotes the dependence on time. The notation denotes integration of the smooth test function against the measure.
The following adjoint boundary conditions are exactly those that arise from integration by parts (adjoint with respect to the standard inner product). That is, defined below in (3) is a subspace of , where is defined in (7), and where is the adjoint operator, that is defined in (47).
Definition 2.
Let denote all satisfying the adjoint linked boundary conditions
(3) 
Definition 3 (Weak Solution).
Let our initial condition . We say that is a weak solution to (1) for if and for all , is differentiable for with
(4) 
Theorem 1 (Well Posedness).
Assume our initial condition lies in and satisfies the linked boundary conditions in the very weak sense that . Then there exists a unique weak solution to (1) for for any , which we denote by .
For this weak solution is smooth in and real analytic in . Furthermore, in the more general setting of not necessarily being of the form (2), the solution has the following properties which generalise the classical case of :

If , then for any , converges to as . If then converges to as uniformly over the closed interval .

If and , then is the unique weak solution in and converges to as in .
2.2 Solution of the continuous model
If we ignore the constant in the boundary conditions of (1) (and replace it by the special case ), then we would have the simple diffusion equation with periodic boundary conditions. One can successfully apply Fourier methods, separationofvariables or Sturm–Liouville theory to solve the periodic version of this PDE [13, 19]. However, when , making the ansatz that a solution is of the ‘rank one’, separable form leads to no interesting solutions and separation of variables fails. It is possible to construct a somewhat nonobvious series solution. Surprisingly, this is not possible in general for well posed constant coefficient evolution equations on a finite interval [34]. The differential operator associated with the evolution equation in (1) is regular in the sense of Birkhoff [2] but not selfadjoint when due to the boundary conditions. It is possible to generalise the notion of eigenfunctions of the differential operator [6] and these generalised eigenfunctions form a complete system in [33, 26] (and in fact form a Riesz basis). However, the expansions become complicated due the fact that the characteristic values (which determine the spectrum) are not simple and the Green’s function does not have simple poles. In particular, this complicates the time dependence of the following series solution.
Theorem 2 (Representation).
The unique solution described in Theorem (1) has the following representations.
Fourier integral representation:
(5) 
Here the contours are shown in Figure 6 and are deformations of the boundaries of . The determinant function is given by and
Series representation:
(6) 
where and
In the more general case where , in addition to the usual separable solutions , the series expansion also includes the nonseparable solutions . We can understand these as being generalised eigenfunctions in the following sense (see the early papers [27, 48]). Define the operator, where denotes a derivative,
(7) 
Then it is easily checked^{4}^{4}4We use to denote the null space, which is sometimes often termed the kernel, of an operator, i.e.
is the space of all vectors
with . that . In particular, both and satisfy the required boundary conditions. Evaluating the series at gives a realisation of Theorems 3.1–3.5 of Chapter 5 of [26] which state that these generalised eigenfunctions form a complete system in . In particular, these functions block diagonalisethe operator in an analogous form to the Jordan normal form for finite matrices. If we consider any generalised eigenspace
corresponding to with and choose the basis , the operator acts on this subspace as the matrixwhich cannot be diagonalised for .
For our context of kernel density estimation, we define an integral kernel so that we can write the solution as
Owing to (39) and the residue calculus, this is given by the somewhat complicated expression
(8) 
which can be expressed in terms of the more common kernel and its derivative (see (44)). For the initial data (2) this gives the density estimate
Remark 1.
Theorem (1) also shows that the pointwise bias vanishes if is continuous with . Namely,
(9) 
uniformly in . This is in contrast to the standard Gaussian kernel density estimator [49] or its periodic version (when ) [3], where this is not the case at the boundary. In fact, a more careful analysis of §8.1 shows that if and then our estimator satisfies
with independent of .
2.3 Properties of the continuous model
The Maximum Principle [13, 19] for parabolic PDEs states that the diffusion equation attains a maximum on the ‘boundary’ of the two dimensional region in space and time . If our initial distribution is given by a continuous function then the maximum principle gives the following.
Proposition 1 (Bounds on Solution).
Suppose that the initial distribution is continuous with and nonnegative with for all . Then for any and we have
(10) 
In particular, remains bounded away from if and .
Similar bounds can be found for negative and even negative or complex , but we exclude those cases in this article.
The second boundary condition in words, says ‘left slope equals right slope’. A consequence of this BC is formal conservation of mass:
If we start with an initial condition that integrates to one, then the solution for all later times also integrates to one. In the context of density estimation, this essential property corresponds to conservation of probability. Seeing both that the solution is nonnegative (by the Maximum Principle above), and that the solution always integrates to one if the initial density integrates to one, confirms that the solution is indeed a probability density. This is made precise in the following Theorem (a more careful proof could come via integral kernels, but we omit that here) which does not require continuous initial data.
Theorem 3 (Conservation of Probability).
We also have the following, which immediately follows from (6), and describes the behaviour of the solution for large time.
Proposition 2 (Large Time Behaviour).
Suppose that the assumptions in Theorem (1) hold and denotes the unique weak solution, then as , converges uniformly on to the linear function
(11) 
This linear function is the unique stationary distribution that obeys the boundary conditions and has the same integral over as .
Two simple examples
Choose so that the linked boundary condition becomes, in words, the ‘left boundary value is double the right boundary value.’ A simple and explicit solution, which is a probability density, is
(12) 
The initial condition must be a probability density, and must be compatible with the above form in (12). In this example, the long time behaviour of (11) is . This solution in (12) is separable, and it has constant boundary values ( on the left, and on the right) that do not change in time. However, the solution in (12) is too simplistic. To be useful in applications we need solutions of (1), for any choice of , to have boundary values that automatically evolve in time, from the given initial data, and where we do not make assumptions about the form of that initial data. For instance, Figure 1 shows a second example, with , that, unlike (12), illustrates boundary values that evolve with time. This is one reason why we need results such as Theorem 2 which allows us to handle nonseparable solutions.
3 The Discrete Linked–Boundaries Model
The exact, continuous PDE (1) is replaced by a standard, finitedifference approximation that remains continuous in time but is discrete in space. This is sometimes known as the ‘method of lines’ [20], and it yields the first order ODE system:
(13) 
The structure of the matrix is described below. The solution of (13) is a vector that approximates the exact solution . That is, . Here is the th grid point on the grid of equally spaced points in the domain , for . The spacing between two consecutive grid points is The two boundary conditions give two equations involving values at the two boundary nodes, i.e. at node and at node . That is,
(14)  
(15) 
This motivates us to make the following definitions for the boundary nodes:
(16) 
(We have eliminated and from our equations in favour of and . We also always have that ) We are left with a set of equations involving unknown values , at the interior nodes (and only an corresponding matrix).
3.1 The Four Corners Matrix
All this leads to an matrix , with the following structure:
(17) 
The first row and the last row come from the boundary conditions. All interior rows come from the standard secondorder finite difference approximation of the (spatial) second derivative.
Various linked boundary conditions can be modelled by choosing different values for the four numbers and . We will only consider choices of and that maintain these key properties of the matrix (17), namely that it has zero column sum, and offdiagonals are negative and the main diagonal entries are positive. Then an attraction of this discrete model (13) is that it can be interpreted as a continuoustime Markov process. This ensures that the solution vector is a nonnegative probability vector. For example, our linked boundary conditions in (1), with , corresponds to the choice
which does allow the interpretation as a Markov process.
The solution of the discrete model problem (13), with a constant coefficient matrix comes as a matrix exponential
One way to compute that exponential is via diagonalization
(18) 
where is a diagonal matrix, with th diagonal entry given by the scalar exponential . The th column of the matrix
is the eigenvector corresponding to eigenvalue
.The ‘Four Corners Matrix’ (17), is a nonsymmetric example of a ‘tridiagonal Toeplitz matrix with four perturbed corners’ [51, 47]. Exact and explicit formulas for the eigenvalues and eigenvectors are available. Thus we have exact expressions for the solutions to this discrete model, via the above diagonalisation, if desired.
There is a unique zero eigenvalue, corresponding to the stationary density as . The stationary density is an affine function in the continuous PDE setting. In the discrete setting, with , the components of the stationary eigenvector are equallyspaced, in the sense that . Then it is easy to verify that the corresponding rank one matrix is ToeplitzplusHankel, for example by quickly checking the tests described in [47]. The same matrix (17), in the special symmetric case that , appears in the Four Corners Theorem of [47]. Here we observe that the Four Corners Theorem of [47] can be generalized to the nonsymmetric case. That is, the Four Corners Theorem still holds for the Four Corners Matrix (17) in our nonsymmetric case when : all functions of our nonsymmetric Four Corners Matrix (17), with in (19) below, are the sum of a Toeplitz matrix and a Hankel matrix. To see this is still true, we could check that the projections onto all eigenspaces are ToeplitzplusHankel. To do that, we could use the explicit expressions for the eigenvectors ( for or for given below in (20)). Then the entry of the rank one matrix , or of , is a sum of terms of the form . The elementary identity now allows us to write this as the sum of a Toeplitz part (coming from the part that is only a function of and not of or separately, which in our case is ) and a Hankel part (coming from functions of , i.e. ). Since the projection onto all eigenspaces is ToeplitzplusHankel, then the solution is likewise the sum of two parts: (i) a Toeplitz part, which can be thought of as the solution without boundary conditions; and (ii) a Hankel part, which is precisely the correction due to the boundary conditions.
All other eigenvalues of are negative. In the notation of [51] we have:
(19) 
and
so we are in the setting of [51, Theorem 3.2 (i)]. In the case that , we can group the spectral data into two classes with eigenvalues:
where
The zero eigenvalue, when , has already been discussed. Other eigenvalues correspond to eigenvectors with components (listed via subscripts)
(20) 
Some properties of the discrete model and its links to the continuous model are:

All eigenvalues of the Four Corners Matrix are purely real. Also, the eigenvalues of the operator in the continuous model are likewise purely real.^{5}^{5}5Recall that in this article we insist on the PDE in (1) where and real, and also where the equal slopes boundary condition applies. Generalisations to other values of and other boundary constraints are certainly possible, but we do not consider those generalisations, nor their spectra, in this article. This is perhaps surprising since the matrix is not symmetric, and the operator is not selfadjoint.

The Four Corners Matrix is diagonalizable, and in that sense of the right hand side of (18), the solutions to the discrete model are ‘separable.’ In contrast, the operator for the continuous PDE is not diagonalizable, and instead the analogy of the Jordan Normal Form from linear algebra applies to the operator. Despite this, it is straightforward to show that:

The eigenvalues of the discrete model converge^{6}^{6}6This can be made precise in say the AttouchWets topology  the convergence is locally uniform. to that of the continuous model (including algebraic multiplicities).

The eigenvectors converge to the generalised eigenfunctions of the continuous operator. Letting we have ()


Finally, the discrete model is stable in the maximum norm. This together with consistency and the fact that the continuous model is well posed implies that the discrete model converges to the solutions of the continuous model from the Lax Equivalence Theorem [36]. In particular, the bounds proven in Proposition 1 for the continuous model carry over to the discrete model with^{7}^{7}7We use a subscript to denote the matrix norm.
(21) From the interpretation of the discrete model as a Markov process, it follows that
and hence we also gain stability in any
–norm via interpolation.
In summary, we offer two different practical computational methods to perform density estimation with linked boundary conditions: (i) the discrete model, via the Four Corners Matrix; and (ii) the continuous model, via the solutions that we find here as a series or as a contour integral.^{8}^{8}8We provide MATLAB codes for both methods at https://github.com/MColbrook/KernelDensityEstimationwithLinkedBoundaryConditions. The two methods have relative advantages and disadvantages. The Four Corners Matrix is a finite difference method of modest accuracy but it is simple and easy to use. This might be attractive, especially if the initial data is already discretely binned, as is often the case. It also essentially guarantees maintaining positivity because of the attractive interpretation as a Markov process. In contrast, the series solution we provide for the continuous model is typically highly accurate for .
4 Choosing the stopping time
An important issue in density estimation is how to choose the bandwidth parameter (for kernel density estimation) or equivalently, the final time at which we compute the solution of the PDE. This issue has already received extensive attention in the literature. We now give a brief summary of that issue, and we also make two suggestions for known methods already available. After that, we address the issue specifically in the context of our linked boundaries model.
At one extreme, if we choose then we recover the initial condition, which is precisely the raw data, with no smoothing. At the other extreme, if we let , then we obtain a stationary density that is a linear, straight line, which contains no information whatsoever about the raw data. In between, , we have some smoothing effect while also retaining some information from the original data, and often this is interpreted as an optimal balance between the data and the prior.
One would like such a choice of an intermediate value of
to satisfy certain statistical properties. For example, as more and more data are included, we would like to converge to the ‘true’ density. Various proposals and their properties are available. Perhaps the most common choice is ‘Silverman’s rule of thumb’ which works very well when the data is close to being normally distributed. We expect that this choice is fine for the simpler datasets and examples that we consider in this article. However, the methods that we present in the article are more general, and for other applications it might well be necessary to adopt more sophisticated approaches.
In that regard, another possible approach is to use the output from the freely available software of one of the authors.^{9}^{9}9https://au.mathworks.com/matlabcentral/fileexchange/14034kerneldensityestimator In particular, the first output from that software is bandwidth=sqrt(t), when the software is called via
[bandwidth,density,xmesh,cdf]=kde(data,n,MIN,MAX).
This is expected to be a better choice than Silverman’s rule in situations where there are many widely separated peaks in the data.
4.1 Asymptotic properties
We now give a more precise treatment of the choice of stopping time for the linked boundaries model, as well as discussing the Mean Integrated Squared Error (MISE) defined by
(22)  
(23) 
Often one is interested in the asymptotic approximation to the MISE, denoted AMISE, under the requirements that and . The asymptotically optimal bandwidth is then the minimizer of the AMISE. For our continuous model of kernel density estimation we have the following result which gives the same rate of convergence as the Gaussian kernel density estimator on the whole real line.
Theorem 4.
Let be such that and and suppose that with . Then the following hold:

If then
(25) 
If then
(26)
Corollary 1.
Combining the leading order bias and variance terms gives the asymptotic approximation to the MISE:

If then
(27) Hence, the square of the asymptotically optimal bandwidth is
with the minimum value

If then
(28) Hence, the square of the asymptotically optimal bandwidth is
with the minimum value
A few remarks are in order. First, it is interesting to note that in the case of , the optimum choice and the minimum AMISE do not depend on , and are the same as the more familiar ‘whole line’ situation — in other words, we can confidently use existing methods in the literature (such as recommended above) to choose a stopping time. Second, it seems plausible that we could estimate (or the value of ) adaptively and change the boundary conditions in the model accordingly. A full discussion of solving the heat equation with linked boundary conditions for the first spatial derivative is beyond the scope of this paper, but can be done using the same methods we present here. Future work will aim to incorporate an adaptive estimate of the true boundary conditions (both for the density function and its first derivative) and resulting adaptive boundary conditions. We mention a result in this direction which will appear when we compare our model to that of [3], whose code is based around the discrete cosine transform, the continuous version of which solves the heat equation subject to the boundary conditions
(29) 
We have used the subscript to avoid confusion with our solution to (1). The analogous result to Theorem 4 is the following theorem which can be proven using the same techniques and hence we have omitted the proof. Similarly, one can then derive the optimum choice of and the minimum AMISE.
Theorem 5.
Let be such that and and suppose that . Then the following hold:

The integrated variance has the asymptotic behaviour
(30) 
If then
(31) 
If or then
(32)
5 Numerical Examples
5.1 Numerical Examples with Synthetic Data
Here we shall test the method for examples where the true density is known. We begin with the trimodal distribution shown in Figure 2. We have also shown a typical approximation of the distribution function using our proposed method and the density estimation proposed in [3] for a sample size of . The proposed method is more accurate near the boundaries of the domain (see zoomed in section of plot) and behaves similarly in the middle of the domain. For stopping times we have used the bandwidth bandwidth=sqrt(t) discussed in §4. We have also measured the error in two ways. We took the mean over simulations for each of both
and
where is the given estimator. Figure 2 shows these measures of error as we increase , as well as the minimum AMISE as a function of for each method. We can clearly see agreement with the analysis in §4.1. The difference between the two methods is more striking when measured using the uniform norm. We found this to be the case for a range of other tested distributions. Even in the case when and both methods obtain minimum AMISE scaling as , there was a significant difference in the uniform norm.
Next we consider the case when
is logconcave and not necessarily smooth. Denoting the PDF of the beta distribution with parameters
by , we letThe parameter controls the smoothness of near . We have compared our method to a method that computes logconcave maximum likelihood estimators [11, 10]. This seeks to compute the logconcave projection of the empirical distribution through an active set approach.^{10}^{10}10Code for this can be found at https://CRAN.Rproject.org/package=logcondens Details on this method and other similar methods can be found in [37], with a study of the more involved case of censored data in [12]. Tables 1 and 2 show the squared mean and errors respectively over simulations for , as we vary for the linked boundary diffusion estimator and the logconcave projection method (abbreviated to LC), as well as its smoothed version (LCS). In each case we have shaded the most accurate estimator. The linked boundary diffusion estimator performs much better when measured in the uniform norm but is slightly worse in the sense when the distribution function becomes less smooth. This is demonstrated in Figure 3 for a typical estimation using . The experiments in Tables 1 and 2 measured the errors over equally spaced points and were performed on a four year old laptop. The linked boundary diffusion estimator took about 0.1s on average per simulation (the cosine method based on the discrete cosine transform took about 7s ^{11}^{11}11The reason the linked estimator is quicker than the discrete cosine method is due to the exponential decay of the series for  only a small number of terms are needed to get very accurate results.), the logconcave projection took about 5s but its smoothed version was much slower, taking about 73s.
Linked  2.98E3  1.33E3  6.82E4  3.22E4  2.38E4 
LC  1.05E3  1.14E3  1.26E3  1.38E3  1.52E3 
LCS  1.23E3  1.03E3  9.42E4  1.04E3  1.19E3 
Linked  1.58E4  1.13E4  8.01E5  5.96E5  5.05E5 
LC  1.65E3  1.80E3  1.94E3  2.09E3  2.27E3 
LCS  1.30 E3  1.40 E3  1.66 E3  1.74E3  2.16E3 
Linked  7.32E2  4.19E2  2.52E2  1.31E2  7.97E3 
LC  5.34E1  6.40E1  7.51E1  8.71E1  1.00E0 
LCS  1.84E1  1.26E1  1.42E1  1.72E1  1.95E1 
Linked  4.42E3  2.85E3  1.18E3  4.78E4  2.39E4 
LC  1.14E0  1.28E0  1.44E0  1.60E0  1.78E0 
LCS  2.27E1  2.47E1  2.82E1  3.22E1  3.70E1 
5.2 Numerical Example with Cell Data
This short section demonstrates the application of the methods that we propose to a problem in biology with real data, where the data are taken from [25]. As mentioned in the introduction, Figure 4 shows an example of what goes wrong when current methods are applied. Figure 5 C demonstrate our proposed method, which performs satisfactorily.
Our example originates from the study of biological processes, in particular the cell cycle, with single cell data. The cell cycle can be split in different stages (Fig. 5 A): It starts with a growth phase (G1) followed by the Sphase in which DNA is replicated. The subsequent shorter growth phase (G2) then leads into Mphase (mitosis). A cell then divides into two daughter cells at the end of Mphase. The example dataset is obtained from an engineered cell line which has a fluorescent reporter protein named “geminin” which starts to accumulate once a cell transitions from G1 to Sphase and which is degraded during cell division (Fig. 5 A). This dataset compromises levels of DNA and geminin in single cells measured by flow cytometry (Fig. 5 B). The red curve in Fig. 5 B indicates the path that the average cell takes when it goes through the cell cycle. Pseudotime algorithms perform a dimensionality reduction by assigning a pseudotime value to each cell which can be interpreted as its position on the average curve. A recently developed theory based on ergodic principles makes it possible to derive a transformation to a realtime scale from cell ordering in pseudotime reflecting cell cycle progression ([21, 25]). The method relies on the distribution of cells on the pseudotime scale. As mentioned in the introduction, the distribution at the beginning and the end of the cell cycle are linked due to cell division by
(33) 
The example dataset (Figure 5) compromises levels of DNA and geminin in single cells in an exponential growing population of NCIH460/geminin cells measured by flow cytometry ([25]). A pseudotime algorithm provides a dataset of pseudotime values which represents a quantitative value for the progression of every single cell through the cell cycle. The kernel density estimator with linked boundary condition () thus produces an distribution that satisfies the conditions (33) on the density due to cell division (Figure 5 C).
6 Conclusion
Our study was motivated by a question from biology. This biological application required a method of density estimation that can handle the situation of linked boundaries. Boundary conditions are known to be a difficult problem in the context of kernel density estimation. To our knowledge, the linked boundary conditions that we handle here have not been previously addressed in the literature. We have provided two approaches: 1) a continuous PDE model with theoretical guarantees that its solution is well behaved and suitable for the proposed statistical applications; 2) a discrete model, which we prove converges to the continuous model. As a numerical example, we have applied the discrete model on data that arises in biology. The models and the analysis we provide naturally lead us to provide algorithms and software suitable for this application. We believe this can be useful to a wider community interested in this class of problems with linked boundary conditions. We found the method to compete well with other existing methods, both in terms of speed and accuracy. In particular, the method is more accurate close to the boundary.
There remain some open questions regarding the provided models. First, it seems likely that a good strategy may be to adaptively approximate suitable boundary conditions satisfied by the probability distribution (such as
or more generally linking the derivatives) and use this estimate to choose the appropriate model. Second, it is possible to adapt the methods in this paper to multivariate distributions. We leave both of these topics for a further study.Arguments to explain the key results
7 Derivation of the Solution
We begin with a description of how to obtain the solution in Theorem 2. It is straightforward to verify the solution works but this does not indicate how it is constructed. We will solve the problem via a powerful transform technique called the unified transform [14, 15, 17, 16, 7]. The unified transform applied to linear initialboundary value problems can be considered as constructing a transform tailormade to the given PDE and boundary conditions. It generalises many of the classical methods and computes solutions even in some cases when classical methods apparently fail [46, 22]. A review of this method can be found in [9]. So far the only case of our problem considered in the literature on this method has been (trivially periodic). For the heat equation with oblique Robin boundary conditions/nonlocal boundary conditions we refer the reader to [29] and for interface problems we refer the reader to [41, 42, 40].
We then construct the series solution by deforming the contours in the integral representation and applying Cauchy’s residue theorem. We remark that the unified method is similar in spirit to the contour method of Birkhoff in that it makes considerable use of the tools of complex analysis (see [32] for more details and history on this method). Birkhoff’s method tackles the problem of generalised eigenfunction expansions of the spatial derivative operator by using asymptotic estimates of the Green’s function associated with the problem for large eigenvalue parameter. Further discussion of the link between the unified transform and the spectral theory of twopoint linear differential operators is beyond the scope of this paper but we refer the reader to [35, 45]. We found that the unified transform considerably simplifies the proof of Theorem 1 as opposed to the use of series solutions. To the authors’ knowledge, our problem has not been considered (other than the obvious periodic case) in the literature concerning either classical methods or the unified transform.
7.1 Obtaining the solution
The first step is to write the PDE in divergence form:
We will employ Green’s theorem,
(34) 
over the domain . Here one must assume a priori estimates on the smoothness of the solution which will be verified later using the candidate solution. Define the transforms:
where again we assume these are well defined. Green’s theorem and the boundary conditions imply (after some small amount of algebra) the so called ‘global relation’, coupling the the transforms of the solution and initial data: