Estimating the distribution and thinning parameters of a homogeneous multimode Poisson process

08/28/2018 ∙ by Dragi Anevski, et al. ∙ 0

In this paper we propose estimators of the distribution of events of different kinds in a multimode Poisson process. We give the explicit solution for the maximum likelihood estimator, and derive its strong consistency and asymptotic normality. We also provide an order restricted estimator and derive its consistency and asymptotic distribution. We discuss the application of the estimator to the detection of neutrons in a novel detector being developed at the European Spallation Source in Lund, Sweden. The inference problem gives rise to Sylvester-Ramanujan system of equations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The motivation for the research in this paper comes from neutron detection, and is of importance for the European Spallation Source (ESS), which is a large scale research facility currently being built in Lund, Sweden. The main research problem from the physicists perspective in this connection is the estimation of the energy or, equivalently wavelength, distribution of a neutron beam. The data in the neutron scattering experiment for the neutron detector type that we are considering consists of counts of the number of neutrons that have been absorbed along the layers in the detector. Given the data, the goal is to estimate the unknown wavelength distribution in the neutron beam that one has observed. We have previously studied this problem in the simpler setting of there being exactly one wavelength in the neutron beam, which was then considered to be unknown, cf. [3]. The goal in [3] was to derive an estimator of the unknown wavelength, which was a maximum likelihood estimator (mle), and to derive properties of the estimator, in particular [3] showed the consistency and asymptotic normality of the mle. Relations between the properties of the mle and the detectors construction are of importance for the physicists, and then in particular for the number of layers used in the detector.

This paper can be seen as a generalisation of the study in [3], in the sense that we investigate the same detector type, but are now interested in a set of wavelengths with finite cardinality, say , and that both the wavelengths sizes/values as well as the distribution of the wavelengths in the neutron beam are unknwon. The goal in this paper is to construct an estimate of these parameters, and if possible to derive properties of the constructed estimator.

In [3], the neutron beam was assumed to be well described by a time homogeneous Poisson process, and we take a similar approach here. We assume that the neutron beam is a sum of individual wavelength neutron beams, each being described by a Poisson process; the proportions

of the individual wavelength neutrons in the total sum is however unknown, and is in fact a parameter that we want to estimate; the total sum is of course still a Poisson process. The data obtained from the neutron detector then consists of counts of neutrons that are absorbed and detected in the detector, and we may use the key observation that the probability of absorption of a specific neutron is, in principle, a known function of the wavelength. Thus each neutron in the beam will be absorbed with a probability which depends on the wavelength of that neutron and one may assume that the absorptions of different neutrons, even of the same wavelength, are independent events. This points to the direction of modeling with the use of thinned Poisson processes.

In fact, we treat in this paper an inference problem that can be stated as the estimation of the wavelength distribution, as well as the thinning probabilities , i.e. the wavelength sizes, of a multimode homogeneous Poisson process.

Having stated the problem and formulated a maximum likelihood estimator, we see that the problem becomes difficult to treat directly, if one goes trough the standard machinery of finding zeros to the score equations. In fact, the problem may be simplified by rephrasing it into estimating algebraic functions of some of the parameters, and then having obtained estimates of the algebraic functions, to try to solve he upcoming algebraic equations for the variables in those equations. This later problem can be seen as a problem that was studied by Ramanujan in [9], namely solving a system of algebraic equations, which in our setting can be written as, solving for the system of equations

for , where are given numbers, cf. (9) below. The solution was given by Ramanujan [9] and with later refinements given e.g. in [8]. In our setting denotes the distribution of the wavelengths, while denotes the thinning probabilities for respective wavelength. As shown by Ramanujan, the necessary number of equations to obtain a solution is , and thus above should be .

Using standard results on almost sure consistency and asymptotic normality for the mle, coupled with continuity and differentiability of the function that defines the solution of the Ramanujan equations, via the continuous mapping theorem and the delta method, we obtain almost sure consistency and asymptotic normality of the desired mle of . Taking into account the fact that the set of frequencies in a beam often is a basic frequency and its overtones, or equivalently that the set of wavelengths consist of a dominant wavelength and its fractions, it makes sense to model the wavelength distribution as a decreasing sequence. This is a motivation for finding an order restricted estimator of , and we therefore propose the projection of the unrestricted mle of on the set of decreasing probability mass functions. We are then able to use results on consistency and limit distributions for such isotonic regression estimators, see [10], [7] for results for i.i.d data and [4] for general results.

The remainder of the paper is organised as follows. In Section 2 we give a detailed description of the detector model that is being used, of the Poisson process model for the beam and of the Poisson data generated from the detector, cf. Lemma 1. In Section 3 we study the likelihood approach to estimating the parameters , and the system of algebraic equations that facilitates the estimation. In Theorem 1 we show that if the number of equations is then there is a unique mle of , obtained by the solution of the algebraic equations. In Subsection 3.2 and Theorem 2 we derive the consistency and asymptotic normality of the mle of . In Subsection 4.1 we define an order restricted estimator of and state its consistency and asymptotic distribution in Theorem 3. Finally in Section 5 we discuss the obtained results and some remaining and interesting future problems.

2 Motivation and description of the data generating mechanism

The inference problem is motivated by the following problem that arises in neutron detection. Assume that a neutron beam is pointed at a detector. We model the number of neutrons that arrive at the face of the detector in the time interval by a counting process . Assume that the neutron beam, i.e. the process , has constant intensity . Assume furthermore that there are different kinds of neutrons in the beam, with different wavelengths , such that

(1)

The values of the wavelengths are assumed to be unknown. We assume that we do however know the order in , and can thus distinguish which label to put on a neutron and its wavelengths placement in , cf. Section 5 for a discussion on possible extensions.

We model the neutron beam, or counting process , as the sum of the counting processes that count the number of neutrons that arrive at the face of the detector in , for the individual type neutrons. Thus we let the number of neutrons with wavelength , which we may label -neutrons, be denoted by , where is a counting process such that and with intensity , for . We write for the total number of neutrons that arrive at the face of the detector; then is a counting process with .

For a given total number of incoming neutrons in the time interval

, the vector

is assumed to follow a multinomial distribution with parameters , i.e.

(2)

with

The vector of proportions of numbers of different neutrons is the spectrum, or distribution, of an incoming neutron beam . We note that and assume that does not depend on .

Now assume that the incident beam is a Poisson process with intensity . In this case the components of the beam are independent Poisson processes with intensities , for , since the vector is the thinning of the original Poisson process, cf. e.g. [6].

We next introduce the so called multilayer detector that is used in this setting. We assume that the detector consists of a fixed number of layers, say layers, cf. Fig. 1. The value of will be elaborated on below, and will be shown to be determined by the number of different types of neutrons that are present in the neutron beam.

The detection of neutrons in the multilayer detector can be described as follows. When an incident beam of neutrons hits a layer of the detector each neutron in the beam can possibly be absorbed, and then detected, or otherwise not be absorbed. If the neutron is not absorbed it will go through the present layer and will subsequently arrive at the next layer. We assume that at each layer, absorption or passing through are the only possibilities for the neutrons interactions with the layer. We also assume that at each layer, different neutron particles interact with the layer independently of each other, i.e. at each layer the absorptions of different neutrons are independent events.

Figure 1: The scheme of the detector. Here is the number of transmitted -neutrons and is the total number of the neutrons absorbed at the layer .

Let be the vector of probabilities of transmition (the thinning parameters), so that is the probability of absorption for -neutron, for . It is a physical property of the neutron beam that the probability of transmission decreases with the neutron wavelength, cf. [2] and references therein, and therefore the thinning parameters can be modelled as a decreasing sequence

(3)

Let us consider a beam of -neutrons and denote the number of -neutrons that are absorbed at the first layer by , so that is the number of -neutrons that are transmitted. Then and are non-decreasing counting processes, obtained by thinning of the original Poisson process , so that and are independent Poisson processes with intensities and , respectively, cf. [6].

Now assume that the transmitted beam hits the second layer, at which, again, each -neutron can be either absorbed or transmitted. Let be the number of absorbed neutrons and the number of transmitted neutrons, at the second layer. Then, again, and are obtained by thinning of the Poisson process and therefore they are independent Poisson processes, with intensities and , respectively [6]. By iterating the argument, cf. also [3] for a similar and more detailed reasoning, we obtain the following result.

Lemma 1

, for , are jointly independent Poisson processes with intensities , respectively.

The goal of this paper is the estimation of the wavelength distribution of the incident beam as well as of the actual values of the wavelengths , based on observations of the (total) Poisson process, and with the use of the multilayer neutron detector, described above. Estimators of the wavelength values can be indirectly obtained via estimates of the thinning parameters , using a functional relation between wavelength and thinning probability, as explained in [3].

3 The maximum likelihood estimator of

In this section we state the inference problem, define the mle of the parameters , state conditions for its existence, and derive consistency and asymptotic normality for the mle of .

We start by the following note on the experimental setup, and the data: In order to derive the limit properties for the estimator, we need to define what we mean by “letting go to infinity”. This may be done in, at least, two ways. We can either let the time go to infinity, and view the data as stemming from on Poisson process which is run for a (very) long time, or we can keep the time fixed and gather data from several independent Poisson process runs, cf. [3] for a more detailed discussion about advantages and disadvantages with respective approach.

We will choose the second approach and view the estimation problem as a repeated sample problem. Thus we assume that during a fixed time interval and for fixed intensity , i.e. fixed intensities , of an incident beam there are repeated measurements. Let be the observed number of neutrons at layer , for , at the experiment round , . Then at each experiment round the vector is distributed according to Lemma 1, and furthermore the vectors are assumed to be independent.

The inference problem is, given data as above, to estimate the pair under the restriction that they lie in the parameter space , which is given by

(4)

Note that is a probability mass function while is merely a vector of probabilities. We would like to emphasize here that the main object of study is the wavelength distribution . The thinning probabilities however are also of interest, since they determine the values of the wavelengths, which we assume are unknown; if we know the actual wavelength values there is no need to estimate the thinning probabilities. Note that we do however know the order of the wavelength values, cf. (1). See Section 5 for further comments on this.

We will use the likelihood approach for making inference about the unknown parameters . We define the mle of by

(5)

where

(6)

is the log likelihood, and

(7)

is the total expected number of absorbed neutrons at layer divided by the intensity and the time .

3.1 Existence and uniqueness of the mle

In this subsection we prove the existence of the mle , introduced in (5), and obtain an explicit expression for it.

First, we note that working directly with the parameters , we obtain the first derivatives of and trying to solve the score equations proves to be quite cumbersome. Moreover, one can show that the log-likelihood seen as a function on the parameter space is not a concave function, which makes it difficult to find a solution even numerically.

We will therefore reparametrise the problem as an inference problem for the vector of expected total numbers of observed neutrons, divided by , and having found a solution to this simpler inference problem, solve an upcoming system of equations for obtaining the solution to (5). Introduce the notation , where

We then rewrite (6) as

and note that we have dropped the last two terms in (6), in the last equality. The function reaches its unique global maximum at , for . Therefore, if is a solution of the following system of equations

(9)

where are defined in (7), and if it satisfies the constraints in (4), then , i.e. the solution is the mle.

In order to reformulate the system of equations (9) on matrix form, we introduce the vector , where

(10)

for , and for we define the matrices and as

(11)

and

(12)

We next obtain a preliminary result saying that, for large enough

, the random matrix

is non-singular, almost surely.

Lemma 2

There exists such that for any

Proof.

First, note that from the strong law of large numbers one has

where denotes the a.s. limit of the sequence and, therefore, the matrix is given by

(13)

with the true values of the parameters and . Next, note that can be diagonalized as

where

and is a diagonal matrix with diagonal elements given by the vector . Since is a square Vandermonde matrix and , it is full-rank i.e. . Therefore . The continuous mapping theorem then implies

almost surely, which implies the statement of the lemma.

Next, we study the system of equations in (9). We will follow closely Ramanujans derivation of the solution in [9]. Define the function

with the vectors given by

and where denotes the restriction of the vector in to the index set . The next result says that if is nonsingular and if we have a certain relation between the number of layers and the support of the wavelength distribution, then the mle exists, and is unique, up to permutations of the indices.

Theorem 1

Assume that and . Then the solution to (9) is unique, up to permutations of the indices, and is given by

where are the coefficients in the following representation of ,

(14)

Proof. Recall that the system in (9) is given by

(15)

Note, that (15) can be simplified as

(16)

The system of equations (16) for was studied and solved by Ramanujan in his third paper, published in the Journal of the Indian Mathematical Society cf. [9]. From the results in [9] it follows that if , then the solution of (16) exists, it is unique, up to permutations of the indices , and given by , the coefficients in the parametrisation (14).

Since the solution is invariant under permutation of the indices, we may choose any permutation, and since we know the order for the wavelength values, cf. (1), the choice is simple: we choose the permutation that gives the known and correct order.

3.2 Asymptotic properties of the mle

Before we obtain the asymptotic distribution of the estimator we prove an auxiliary lemma. Assume that . We may rewrite the system of equations (16) as

(17)

with , where the functions are given by

for . We see that the system of equations in (17) gives an implicit definition of a function . The Jacobian matrix for the system (17) is then given by

(18)

The next lemma shows that the function , implicitly defined by the equations (17), is differentiable.

Lemma 3

Assume that is such that . Then the function , implicitly defined by (17), is differentiable at the point .

Proof. The statement of the lemma will follow from the implicit function theorem, for which we now check the conditions.

First we note that (17) is a rewriting of (16) which is a simplification of (15) which is identical to (9). Theorem 1 says that if , at some , then there are unique which satisfy (9), which implies that are unique solutions to (17), if we have chosen the correct permutation of the indices, cf. the comment after the proof of Theorem 1.

Second, the functions , for , are differentiable and continuous.

It remains to prove that the Jacobian in (18) is a non-singular matrix, i.e. to show that . In fact, we note that can be factored out of the determinant, i.e.

where

(19)

We rewrite on column matrix form as

(20)

where , and denotes the vector of componentwise first derivatives of the column vector .

Consider , which is a polynomial of order in . Let us show that the multiplicity of the component of the root is equal to . The third derivative of the polynomial is equal to

It follows that for , each term in the right hand side of the above expression contains two equal columns. Therefore, we have proved that at , which implies that the multiplicity of is at least . Now since is symmetric (with no sign change, since flipping two of the arguments means flipping four columns in the matrix at once), then any , for is also a root of , and the same argument as above shows that they all have multiplicity at least . Since has roots and is of order , the multiplicity is exactly 4, for each root. Therefore, we have shown that

where is a leading coefficient. Using the symmetry of the determinant, we may replace any of the ’s with and study the upcoming polynomial, to obtain

where is a constant.

Thus, we have shown that , provided for all . The fact that the unique (up to permutations of indices) solution to satisfies for all follows from a refinement of Ramanujans theorem, given in [8].

Theorem 2

Let . Then the mle in (5) is strongly consistent

and asymptotically normal

as .

Proof. From Lemma 3 it follows that for , such that , the system of equations in (17) gives an implicit definition of a differentiable function . Let denote the a.s. limit of the sequence , and recall that, because of the definition of the matrix in (13) and of the function ,

(21)

Combining Theorem 1 and Lemma 2, it follows that there exists an , such that for all , is the solution to (9), so that, furthermore, one has

(22)

Then, from (21), (22) and since , using the continuous mapping theorem we obtain the consistency result

for . Recall that is equal to the mle only when the restrictions in , cf. (4), are satisfied for , which we will prove below.

Now, let us consider the vector , defined in (3.1). Note that can be written as

where is a lower triangular

matrix of ones. Using a central limit theorem one can show that

(23)

as , where

with . Recall that the first element of is deterministic and equals , cf. (3.1), and thus we do not include it in the limit result (23).

Let be the matrix of partial derivatives of , i.e.

(24)

The values of can be found using the implicit function theorem. In fact, the -th column of is the solution of the following system of linear equations

where is defined by and for , cf. (17) and (18). The solution exists and it is unique, when , which is true for for all , and for . Thus the matrices are (uniquelly) given for all , and so is the matrix .

Since the derivatives are continuous, using (23) and the delta method, cf. [11], we derive the limit distribution for ,

as , with