Likelihood-free inference of experimental Neutrino Oscillations using Neural Spline Flows

02/21/2020 ∙ by Sebastian Pina-Otey, et al. ∙ 0

We discuss the application of Neural Spline Flows, a neural density estimation algorithm, to the likelihood-free inference problem of the measurement of neutrino oscillation parameters in Long Base Line neutrino experiments. A method adapted to physics parameter inference is developed and applied to the case of the disappearance muon neutrino analysis at the T2K experiment.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

First indications of neutrino oscillations using atmospheric neutrinos were presented by the SuperKamiokande experiment in Japan in 1998Fukuda and others (1998), the confirmation with solar neutrinos was performed in 2002 by the SNO experiment Ahmad and others (2001) and the check of the atmospheric oscillation with a human-made beam experiments at K2K Ahn and others (2006) for atmospheric oscillations in 2006 and at Kamland Araki and others (2005) for solar neutrino oscillations in 2004. After eight miraculous years, the oscillation phenomena was experimentally well established. Neutrino oscillation phenomena is the first, and so far the only, evidence of neutrinos having mass. Following the initial experiments, a successful set of measurements confirmed the oscillation picture and improved the understanding of the Pontecorvo-Maki-Nakagawa-Sakata (PMNS) neutrino flavour mixing matrix parameters Maki et al. (1962). By 2010, the measurement of a non-zero angle by T2K Abe and others (2011) and Daya-Bay An and others (2012) opened the potential existence of leptonic Charge-Parity (CP) symmetry violation (i.e. particles behaving differently from the antiparticles). As of today, almost all mixing angles have been measured Tanabashi and others (2018), including the two mass differences between the three mass neutrino eigenstates. The remaining mixing matrix parameter to be measured are the imaginary phase responsible of the CP violation and the sign of one of the two neutrino mass splitting, the so-called hierarchy. Both measurements are at reach for current and near future oscillation experiments.

Statistical methods used in the oscillation analysis so far comprises both a frequentist and Bayesian approaches Abe and others (2017); Acero et al. (2019). There are, however, some limitations to these methods. Both approaches require a binned likelihood algorithm which might limit the sensitivity of the experiment, as they impose a Gaussian dependency in some of the marginalized parameters affecting the correctness of the results. Additionally, they require very intensive CPU processing time, which is a limiting factor that reduces the flexibility of the statistical analysis and checks, and introduces strong constrains on the delivery of the results. These limitations are derived from the intrinsic difficulties of the statistical data analysis that are depicted in Sec. II, using the T2K experiment as a reference example.

In this paper we propose an alternative statistical method to overcome some of the limitations of the current methods in use. The proposed procedure is based on an un-binned likelihood inference using neural density estimators. This method has the potential of being accurate, fast and to reduce the possible bias due to the intrinsic binning in other approaches. The Neural Spline Flows, the implementation of neural density estimators we chose, has also some advantages since the Gaussian generator intrinsic to the method will facilitate the introduction of experimental errors in the distributions. We will discuss in this paper the basic concepts of the method and show the potential with a simplified example.

Ii Problem definition and physical simulator

Neutrino oscillation experiments search for the modification of the flavour content of a neutrino beam travelling in vacuum or matter for a certain distance. Beams are normally characterized at a near site, where the flux and neutrino flavour composition is not yet altered by oscillations. The same beam is sampled after a certain flight distance . The change on the flavour composition can be determined in two different ways:

  1. The neutrino flavour disappearance () experiments search for the disappearance of a certain neutrino flavour as function of the neutrino energy. The disappearance produces both a reduction in the flux of neutrinos of a concrete flavour and the distortion of the neutrino flux spectra that is observed in the distribution of the measured quantities, normally the muon momentum () and the angle with respect to the neutrino direction ().

  2. The neutrino flavour appearance (), experiments search for the appearance of a neutrino flavour that is normally suppressed in the original neutrino flux. In T2K, this new flavour are electron neutrinos. Dependency of the oscillation with the neutrino energy are inferred from the momentum and the angle with respect to the neutrino direction of the electron ejected in the interaction of neutrinos with matter.

The neutrino flavour is determined by the flavour of the charged lepton (muon, electron or tauon) produced in charged current interactions of neutrinos with the nuclei. For the current analysis, we will concentrate on the disappearance phenomena (i).

In a synthetic way, the experimentally observed neutrinos with observed properties () can be described by:

(1)
(2)

for the near detector and

(3)
(4)
(5)

for the far detector. The number of observed neutrinos depends on the cross-section (), the neutrino flux (

), the probability of observing the experimentally accessible quantities (

) given a neutrino energy (), the oscillation probability () and the backgrounds observed in the detectors ().

The experimental challenge comes from inferring the neutrino energy, , given the experimental observable (). The term encapsulates not only the detector resolution, but also the neutrino-nucleus cross-section model predictions and uncertainties. Other difficulties raise from the limited knowledge (9% in the latest T2K resultsAbe and others (2019)) of the neutrino flux () and the knowledge of neutrino cross-sections as function of the energy (). The background terms () are normally relevant (20% in T2K)Abe and others (2016) and they subsequently depend on the neutrino-nucleus cross-sections in a non trivial manner.

Both the frequentist and the Bayesian statistical approaches

Abe and others (2017); Acero et al. (2019)

utilize the Near detector data to predict the probability density function at the far detector in absence of oscillations:

(6)

Once determined, the conditional probability density function can be used to determine the oscillation parameters () by comparing to the far detector events. Most of the experimental effort is actually devoted to the determination of this conditional probability which depends also on a large number of hidden and correlated parameters describing uncertainties in detector performances, cross-section model and neutrino flux uncertainties. Hidden parameters are marginalized in the analysis, providing the experimental result for oscillation parameters as (the posterior in the case of Bayesian approaches) probability maps.

In the particular case of the T2K experiment, the experimentally accessible observables () are the momentum and direction of the lepton . The lepton is produced at the interaction of the neutrino with the target nucleus. In this case the probability density function () can be simplified to . The near detector of the experiment measures the neutrino flux and constrains the neutrino-nucleus interaction providing the estimation of the probability density function together with the expected number of interactions in the far detector in absence of oscillations. The near detector provides also a dependency with free parameters in the model and a full error covariance matrix relating all of them. To simplify the exercise, we ignore the error covariance matrix in this study and assume that the probability is implicitly known to the experiment through simulations.

The neutrino oscillation disappearance probability can be approached by the simplified two-flavour Tanabashi and others (2018) oscillation. The disappearance probability as a function of the initial energy of the neutrino is

(7)

where is the distance in kilometers between the near and far sites in the T2K experiment and is a scaling parameter to adjust the oscillation phase to distance in kilometers. is the neutrino energy in GeV, the mixing angle of the two flavours and the difference in mass of these two flavours in eV. and are the parameters governing the oscillations. Hence, determining their values is the goal of the neutrino disappearance oscillation experiments, after measuring the momentum and direction of the lepton at the far detector.

ii.1 Physical Simulator

We have simplified the problem to demonstrate the viability of the proposed method to determine the oscillation parameters using Neural Spline Flows. Event samples are generated using Monte Carlo event the NEUT Hayato (2009) event generator model that describes the interactions of neutrinos with nuclei. We also use a realistic neutrino flux energy spectrum provided by the T2K collaboration Abe et al. (2013). With both inputs, we generate M events of charge current quasi-elastic events (CCQE). CCQE is the most probable reaction at T2K energies where the neutrino transforms into a muon exchanging a neutron into a proton and the one dominating the statistical sensitivity of the experiment. To simplify, we ignore other reaction channels and potential backgrounds, and also the detector effects. The generator provides n-tuples of events weighted according to their appearance probability as function of neutrino energy and angle and momentum of the muon. As explained above, the neutrino energy is not directly accessible experimentally but is required to generated samples of oscillated events as it is explained in Sec. IV.1, so it is important that the relation is kept in the generated sample.

Iii Methodology

Evaluating the density of high-dimensional data has become an important task for unsupervised machine learning in recent years. Neural density estimation proposes a solution using neural networks by learning an estimation

of the exact target density from samples . In this work, this task is performed through Neural Spline Flows, a specific implementation of the more general Normalizing Flows methods, presented in Sec. III.1

. We explain how the density estimation at the near detector is combined with the analytical formula for neutrino oscillation for the far detector to obtain the likelihood used to perform Bayesian inference in Sec. 

III.2 for the T2K experiment. Additionally, an alternative way of computing the posterior is explained in Sec. III.3, based on the standard histogram approach but tweaked in order to attain the limit of un-binned likelihood, closely related to what is obtained by Neural Spline Flows.

iii.1 Neural density estimation using Neural Spline Flows

Consider a simulator which can produce samples over from the target density. Consider also a flexible family of density functions parametrized by , i.e., for all , and for all . Then, if we want to approximate through , we optimize the parameters by maximizing the log-likelihood of the approximated density under simulated data from the target distribution,

(8)

This objective function for the optimization procedure is equivalent to minimizing the Kullback-Leibler divergence between the target and approximated density,

(9)

which is always non-negative and only equal to 0 if both densities are equivalent. Let us proof quickly this statement:

(10)
(11)
(12)
(13)
(14)

where in the last step we have approximated the expected value of with respect to by the finite expected value. Hence, maximizing the objective function Eq. (8) is equivalent to minimizing the KL-divergence Eq. (9) between the distributions, i.e., approximating by .

Normalizing flows

are a mechanism of constructing such flexible probability density families for continuous random variables. A comprehensive review on the topic can be found in

Papamakarios et al. (2019), from which a brief summary will be shown in this Section on how they are defined, and how the parameters of the transformation are obtained, together with a specific implementation, the Neural Spline Flows (NSF) Durkan et al. (2019b).

Consider a random variable defined over , with known probability density . A normalizing flow characterizes itself by a transformation from another density of a random variable defined also over , the target density, to this known density, in the same space , via

(15)

The density is known as base density, and has to satisfy that it is easy to evaluate (e.g., a multivariate

-dimensional normal, as will be chosen through this work, or a uniform distribution in dimension

). The transformation has to be invertible, and both and have to be differentiable, i.e., defines a diffeomorphism over .

This allows us to evaluate the target density by evaluating the base density using the change of variables for density functions,

(16)

where the Jacobian is a matrix of the partial derivatives of the transformation :

(17)

The transformation in a normalizing flow is defined partially through a neural network with parameters , as will be described below, defining a density

(18)

The goal is to find the parameters to maximize Eq. (8) if . The subindex of will be omitted in the following, simply denoting the transformation of the neural network by .

If the transformation is flexible enough, the flow could be used to evaluate any continuous density in . In practice, however, the property that the composition of diffeomorphisms is a diffeomorphism is used, allowing to construct a complex transformation via composition of simpler transformations. Consider the transformation as a composition of simpler transformations:

(19)

Assuming and , the forward evaluation and Jacobian are

(20)
(21)

These two computations (plus their inverse) are the building blocks of a normalizing flow Rezende and Mohamed (2015). Hence, to make a transformation efficient, both operations have to be efficient. From now on forth, we will focus on a simple transformation , since constructing a flow from it is simply applying compositions.

To define a transformation satisfying both efficiency properties, the transformation is broken down into autoregressive one-dimensional ones for each dimension of :

(22)

where is the -th component of and the -th of . is the transformer, which is a one-dimensional diffeomorphism with respect to with parameters . is the -th conditioner, a neural network, which takes as input , i.e., the previous components of , and are the parameters of the neural network. The conditioner provides the parameters of the -th transformer of depending on the previous components , defining implicitly a conditional density over with respect to . The transformer is chosen to be a differentiable monotonic function, since then it satisfies the requirements to be a diffeomorphism. The transformer also satisfies that it makes the transformation easily computational in parallel and decomposing the transformation in one dimensional autoregressive transformers allows the computation of the Jacobian to be trivial, because of its triangular shape. To compute the parameter of each transformer, one would need to process a neural network with input for each component, a total of times.

Masked autoregressive neural networks Germain et al. (2015) enable to compute all the conditional functions simultaneously in a single forward iteration of the neural network. This is done by masking out, with a binary matrix, the connections of the -th output with respect to all the components with index bigger or equal to , , making it a function of the components.

The transformer can be defined by any monotonic function, such as affine transformations Papamakarios et al. (2017), monotonic neural networks Huang et al. (2018); Cao et al. (2019); Wehenkel and Louppe (2019), sum-of-squares polynomials Jaini et al. (2019) or monotonic splines Müller et al. (2018); Durkan et al. (2019a, b). In this work we will focus on a specific implementation of monotonic splines, the Neural Spline Flows.

In their work on Neural Spline Flows Durkan et al. (2019b), Durkan et al. advocate for utilizing monotonic rational-quadratic splines as transformers , which are easily differentiable, more flexible than previous attempts using polynomials for these transformers, since their Taylor-series expansion is infinite, and are analytically invertible.

The monotonic rational-quadratic transformers is defined by a quotient of two quadratic polynomial. In particular, the splines map the interval to , and outside of it the identity function is considered. The splines are parametrized following Gregory and Delbourgo Gregory and Delbourgo (1982), where different rational-quadratic functions are used, with boundaries set by the pair of coordinates , known as knots of the spline and are the points where it passes through. Note that and . Additionally, we need intermediate positive derivative values, since the boundary points derivatives are set to 1 to match the identity function.

Having this in mind, the conditioner given by the neural network outputs a vector

of dimension for the transformer , . and give the width and height of the bins, while is the positive derivative at the intermediate knots.

Stacking up many of these transformations, a highly flexible neural density estimator can be build and will be the one utilized during this work.

iii.2 Neural Spline Flows applied to the T2K Oscillation problem

We start by estimating the expected energy spectrum of the neutrinos, together with the momentum and angle of the measured muon without oscillations, obtaining the density , as measured by the near detector. This is done by learning the density using a NSF from the Monte Carlo data, generated as presented in Sec. II.1. The base density used for Eq. (18

) is a three-dimensional standard normal distribution.

Having estimated the joint probability of the initial distribution at the near detector, we need to construct the conditional density of the observed magnitudes given the oscillation parameters at the far detector in order to perform Bayesian inference. For this, we simply integrate the probability of not oscillating, , using Eq. (7

), over the energy spectrum of the joint distribution:

(23)
(24)

where is a constant of normalization computed after performing the integral. With this we have the probability of observing a single muon with momentum and angle after oscillating given the parameters and .

In order to take into account the number of observed samples, the extended likelihood Tanabashi et al. (2018); Barlow (1990) is used, modifying the likelihood with a Poisson count term to take into account the expected number of events for a given set of parameters and the actual observed number:

(25)

In our case, the Poisson parameter is obtained by integrating the possible oscillations over all the energy spectrum, scaled to the initial number of particles :

(26)
(27)

Hence, the extended likelihood we apply for the analysis is

(28)
(29)

and the posterior for the parameters takes the form of

(30)
(31)

with the prior information before observing the events and the set of observed events.

iii.3 Reference analysis using an approximate un-binned likelihood

The results of the experiments are validated using an approximate un-binned likelihood. The likelihood is computed following Eq. (29). To do so, event histograms () binned in muon momentum and angle given a neutrino energy are generated from the simulated data from Sec. II.1. Then, the oscillated probability is computed by reweighting the histogram content, using Eq. (7), as

(32)
(33)

and is interpolated linearly to reduce the effects due to binning:

(34)

The number of expected events () is computed by summing the binned probability:

(35)

The probability by normalizing the oscillated maps takes the form

(36)

The likelihood is finally computed for discrete values of and . Those values are distributed in a grid identical to the one used for the NSF method for proper comparison between both methods. For Sec. IV.3, we have tested the results with different number of initial bins in energy, momentum and angle to find a good compromise between stability of the result and speed. We call this in what follows the Hist method and it will be used as a reference for the NSF calculations.

Iv Experiments

The methodology is tested on experiments of simulated neutrino oscillations according to the T2K experiment. For this, ten different set of observed events are constructed following Sec. IV.1, with additional data used to fit both the NSF and construct the un-binned likelihood. The training of the NSF is shown and verified in Sec. IV.2, where different statistics of the learned density are computed to verify its integrity. Finally, in Sec. IV.3, the inference on the ten experiments are performed utilizing both for NSF and the un-binned likelihood, discussing the findings of both methodologies.

iv.1 Data samples and experiment simulation

As defined in Sec. II.1, M CCQE events are generated, using the neutrino flux energy spectrum from the T2K collaboration Abe et al. (2013) to produce the energy of the incoming neutrino , and the NEUT event generator Hayato (2009) to compute the momentum and angle of the resulting muon, forming a triplet of , describing an implicit density of these three magnitudes.

These events were split into three subsets:

  • Training set:

    M events, used to optimize the NSF during training, compute the gradient of the loss function Eq. (

    8) and update its parameters. This set is also used to construct the histograms of the un-binned likelihood of Sec. III.3.

  • Validation set: M events, used to perform validation of the NSF during the training and asses the best model that generalizes outside of the training samples.

  • Test set: M events, never seen by the NSF during the training and model selection, used to generate the observations of the different experiments, and perform the additional validation of the trained NSF in Sec. IV.2.

As just mentioned, the experimental observations are generated from the test set. Two kind of experiments were performed for five different set of parameters: one of low statistics (around 500 observations) and one of high statistics (around 50k observations). In order to generate the observations of a fixed set of parameters , the procedure was the following:

  1. Choose an approximated number of observed events (500 or 50k) .

  2. Sample from the test set events, make the event oscillate according to its energy and accept it with probability given by Eq. (7), until accepting a total of events. This took a number of tries.

  3. Sample a different number of events from the test set, and accept them with probability according to Eq. (7). The accepted samples form the observation set, with a number of samples, which is similar to .

Note that would correspond to the initial number of neutrinos produced at the source before oscillating and

to the number of neutrinos that have not oscillated. These numbers are hard to determine and also produce a stochastic number of observed events, hence this heuristic procedure to produce

and the variable number of .

The parameters of the initial experiments 1 and 2 in Sec. IV.3 are chosen following the best fit value of and from Abe et al. (2018). The parameter is computed as

(37)

where we have used the fact that Eq. 7 is symmetric over and fixed it in the interval . For the best fit value, maximal mixing is almost achieved (). Experiments 3 and 4 (5 and 6) follow the same procedure, but choosing within 1 (2) of the best fit. Experiments 7-10 the mixing angle has the value of experiments 3 and 4, but is changed in order to explore the behaviour on the parameter space.

iv.2 Training and validation of the NSF

In order to apply the methodology described in Sec. III.2, we start by estimating the density using a Neural Spline Flow on M samples, with M for validation.

Because of the similarities regarding dimension of the data and # of samples, but with a simpler structure, we have chosen the hyperparameters almost as the

Power data-set in Appendix B from Durkan et al. (2019b), i.e., Adam optimizer with learning rate 0.0005, batch size 512, training steps 400k, flow steps 5, transform blocks 5, hidden features 128 and bins 8. Additionally, the learning rate was decreased during the training using a cosine scheduler to ensure stabilization at the end of the training procedure. As shown in Fig. 1, the validation set stabilizes at the end of the training and appears to converge for the selected architecture of the network.

Figure 1: Neural Spline Flow training log probability for estimating for training (solid) and validation (dashed) sets, as shown by Eq. (8). For the validation, the log probability stabilizes during the training to converge to a certain value which depends on the architecture of the network.

To ensure that the transformation of Eq. (18) was found properly by the NSF, consider the transformed data for the test set, i.e., the events that were not used during the training of the algorithm. If is correctly approximated, should follow a three-dimensional standard normal distribution, as is shown qualitatively in Fig. 2. To verify this, two tests were performed.

Figure 2: Two dimensional histograms of samples from the initial distribution of (top) and the transformed data (bottom) under according to Eq. (18). If the transformation is approximated properly, should follow a three-dimensional standard normal distribution, as is depicted qualitatively in this Figure.

The first test was to take 10 disjoint subsamples of the transformed of size k and compute their covariance matrix. Then, the covariance matrix was also computed for samples of the same size from a real three-dimensional standard normal distribution, 10k times, in order to find the statistical distribution that one could expect for the covariance matrix values. In Fig. 3, the blue histograms show this distribution of the true covariance matrix values, together with disjoint subsamples of the transformed data as dashed black vertical lines, which agree with the expected distribution.

Figure 3: Covariance matrix value distributions for each of the components. Blue histograms represent the distribution of these values coming from a three-dimensional standard normal distribution for k samples. The black dashed vertical lines represent the covariance matrix values for the transformed data in disjoint subsamples of k samples. They follow the true distribution correctly, which is in agreement with the transformation of Eq. (18) found properly by the NSF.

Secondly, the Kolomogorov-Smirnov test was performed on each of the components of , with respect to a one-dimensional standard normal distribution. As shown in Table 1

, the p-value does not show to reject the null hypothesis that the data

was effectively drawn from a standard normal distribution.

KS-Statistic p-value
0.00056 0.9175
0.00092 0.3708
0.00099 0.2883
Table 1:

Kolmogorov-Smirnov test statistic and p-value for each of the components of the the transformed data

according to Eq. (18) on a sample of size M, with respect to a standard normal distribution. If was well approximated, each of the components should follow this distribution. According to the p-value, each of the components does not reject the null hypothesis that the data is drawn from such distribution.

Satisfying these two tests, each component being a standard normal distribution and the covariance matrix of the transformed data agreeing with the distribution of covariance matrix values of the base distribution (a three-dimensional standard normal), we can assume that the transformed data corresponds to samples from such base distribution, justifying that the transformation was properly found by the NSF, hence allowing to evaluate with a good enough approximation.

iv.3 Inference results

To test the performance of obtaining the posterior according to Sec. III.2, ten different observation sets were constructed, following Sec. IV.1, with five different mixing angles and difference in mass squared .

For each of the five combinations of parameters, low and high statistics experiments were performed. Low statistics are of the order of the real observed samples at T2K and used to assure its performance when a small number of events is dealt with. High statistics help to ensure that in the limit case where the traditional binning methodology can be very refined agrees with the un-binned NSF posterior.

Exp. Parameter True NSF Likelihood
# values 95 % C.L. 95 % C.L.
1 506 0.7594
2.463
2 49672 0.7594
2.463
3 532 0.7353
2.463
4 49646 0.7353
2.463
5 493 0.6847
2.463
6 49665 0.6847
2.463
7 506 0.7353
2.363
8 50145 0.7353
2.363
9 481 0.7353
2.663
10 49710 0.7353
2.663
Table 2: Posterior inference of ten different experiments, alternating between low and high number of observed events. For the inferred parameters using NSF and the un-binned histogram approximation, Hist, the 95% confidence level was computed using the 1-dimensional marginalized densities of each parameter. is given in rad and in .

Table. 2 shows the ten experiments, with the number of observed samples, the true parameters and the results using the NSF (Sec. III.2). Additionally, a result using an approximate un-binned likelihood, denoted by Hist (Sec. III.3), is also displayed, which would correspond to the limit case when the binning can be performed with enough refinement to behave like an un-binned estimation. Since Bayesian inference is used, the inference on the parameters describes a density function according to Eq. (31). The central value shown for each parameter is the one that maximizes the joint posterior density. The uncertainty is then computed by marginalizing in the 2-dimensional density one of the parameters to obtain the 1-dimensional one of the other, and finding the interval such that for a confidence level (CL), of the density is found on each side. This is done for both NSF posterior and Hist posterior.

In general, the results of both methodologies agree, with slight fluctuations in the confidence levels. The difference could come from the NSF not learning perfectly the density of the points, or from the interpolation done by the Hist method introducing wrong approximations.

Additionally, in order to visualize the agreement in 2-dimensions in Fig. 4, the Highest Posterior Density (HPD) curves Chen and Shao (1999) of 68% and 95%, together with the best fit (highest posterior value) were computed for experiments 1 (top left), 2 (top right), 7 (bottom left) and 8 (bottom right). In both HPD regions a clear overlap for low statistics, and slight fluctuations on larger statistics can be observed. When comparing the difference in area size, the relative difference of the Hist method with respect to NSF through the ten experiments is , showing that the areas agree within 2- on average. In Fig. 4, one observes that the areas, even being similar in size, are slightly shifted one from another. Empirical experiments of computing the posterior using actual discrete binning show that, by refining the binning, its posterior was shifting towards the NSF result. The displacement of the Hist method could still be a remainder of the initial binning used in the method before interpolating.

Figure 4: Highest Posterior Density (HPD) curves for both NSF and Hist posteriors of experiments 1 (top left), 2 (top right), 7 (bottom left) and 8 (bottom right), together with best fit of each posterior. Red (Blue) lines indicate 68 % (dashed) and 95% (continuous) HPD curves for the NSF (Hist) method. Orange x-crosses indicate the true parameter used to generate the observed events. Red stars (blue +-crosses) indicate the best fit for the NSF (Hist) method. A clear overlap can be found in experiments of low statistics (left) and a slight fluctuation on large statistics (right).

Both quantitative, Table 2, and qualitative, Fig. 4, show that NSF indeed provide a tool to perform likelihood-free inference on physical simulators such as the one of the T2K experiment, on par with un-binned likelihood approaches as we have compared it to.

V Summary

In this work we have presented the viability of a likelihood-free inference methodology through Neural Spline Flows on a simplified neutrino oscillation problem at the T2K experiment. For this the importance of estimating the initial flux of neutrinos in the near detector together with the properties of the interacting muons is presented, which allows for an analysis in the far detector of the oscillating properties. A brief introduction to normalizing flows as density estimators from data samples is performed, making emphasis on the particular implementation of Neural Spline Flows. We developed a framework to use this estimation of the density from data taken at the near detector in order to perform inference of the oscillation parameters at the far detector for a simplified 2-flavour neutrino problem, allowing to perform exact inference if the density is properly estimated. This gives an edge over traditional binned histogram methods, specially when the statistics is low as is the case in the T2K experiment, since the binning cannot be as refined as one would like to. An un-binned alternative formulated by interpolating the histogram method was constructed to check the results. Additionally the integrity of the learned density was thoroughly verified through different statistical and empirical tests. The results obtained using the Neural Spline Flow methodology and the un-binned likelihood methodology show results which are in agreement with each other and open new possibilities to use similar likelihood-free neural network inference for more complex analysis.

Acknowledgements.
The authors have received funding from the Spanish Ministerio de Economía y Competitividad (SEIDI-MINECO) under Grants No. FPA2016-77347-C2-2-P and SEV-2016-0588, from the Swiss National Foundation Grant No. 200021_85012 and acknowledge the support of the Secretariat of Universities and Research of the Department of Business and Knowledge of the Generalitat of Catalonia. The authors also are indebted to Thorsten Lux and the Servei d’Estadística Aplicada from the Universitat Autònoma de Barcelona for helpful discussions and continuous support in the preparation of this document.

References

  • K. Abe, N. Abgrall, H. Aihara, T. Akiri, J. B. Albert, C. Andreopoulos, S. Aoki, A. Ariga, T. Ariga, S. Assylbekov, D. Autiero, M. Barbi, G. J. Barker, G. Barr, et al. (2013) T2K neutrino flux prediction. Phys. Rev. D 87, pp. 012001. External Links: Document, Link Cited by: §II.1, §IV.1.
  • K. Abe, R. Akutsu, A. Ali, J. Amey, C. Andreopoulos, L. Anthony, M. Antonova, S. Aoki, A. Ariga, Y. Ashida, Y. Azuma, S. Ban, M. Barbi, G. J. Barker, et al. (2018) Search for violation in neutrino and antineutrino oscillations by the t2k experiment with protons on target. Phys. Rev. Lett. 121, pp. 171802. External Links: Document, Link Cited by: §IV.1.
  • K. Abe et al. (2011) Indication of Electron Neutrino Appearance from an Accelerator-produced Off-axis Muon Neutrino Beam. Phys. Rev. Lett. 107, pp. 041801. External Links: Document, 1106.2822 Cited by: §I.
  • K. Abe et al. (2017) Measurement of neutrino and antineutrino oscillations by the T2K experiment including a new additional sample of interactions at the far detector. Phys. Rev. D96 (9), pp. 092006. Note: [Erratum: Phys. Rev.D98,no.1,019902(2018)] External Links: Document, 1707.01048 Cited by: §I, §II.
  • K. Abe et al. (2019) Constraint on the Matter-Antimatter Symmetry-Violating Phase in Neutrino Oscillations. External Links: 1910.03887 Cited by: §II.
  • K. Abe et al. (2016) Measurement of double-differential muon neutrino charged-current interactions on CH without pions in the final state using the T2K off-axis beam. Phys. Rev. D93 (11), pp. 112012. External Links: Document, 1602.03652 Cited by: §II.
  • M. A. Acero, P. Adamson, L. Aliaga, T. Alion, V. Allakhverdian, S. Altakarli, N. Anfimov, A. Antoshkin, A. Aurisano, A. Back, C. Backhouse, M. Baird, N. Balashov, P. Baldi, B. A. Bambah, et al. (2019) First measurement of neutrino oscillation parameters using neutrinos and antineutrinos by nova. Phys. Rev. Lett. 123, pp. 151803. External Links: Document, Link Cited by: §I, §II.
  • Q. R. Ahmad et al. (2001) Measurement of the rate of interactions produced by solar neutrinos at the Sudbury Neutrino Observatory. Phys. Rev. Lett. 87, pp. 071301. External Links: Document, nucl-ex/0106015 Cited by: §I.
  • M. H. Ahn et al. (2006) Measurement of Neutrino Oscillation by the K2K Experiment. Phys. Rev. D74, pp. 072003. External Links: Document, hep-ex/0606032 Cited by: §I.
  • F. P. An et al. (2012) Observation of electron-antineutrino disappearance at Daya Bay. Phys. Rev. Lett. 108, pp. 171803. External Links: Document, 1203.1669 Cited by: §I.
  • T. Araki et al. (2005) Measurement of neutrino oscillation with KamLAND: Evidence of spectral distortion. Phys. Rev. Lett. 94, pp. 081801. External Links: Document, hep-ex/0406035 Cited by: §I.
  • R. Barlow (1990) Extended maximum likelihood. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 297 (3), pp. 496–506. Cited by: §III.2.
  • N. D. Cao, I. Titov, and W. Aziz (2019) Block neural autoregressive flow. In UAI, Cited by: §III.1.
  • M. Chen and Q. Shao (1999) Monte carlo estimation of bayesian credible and hpd intervals. Journal of Computational and Graphical Statistics 8 (1), pp. 69–92. External Links: ISSN 10618600, Link Cited by: §IV.3.
  • C. Durkan, A. Bekasov, I. Murray, and G. Papamakarios (2019a) Cubic-spline flows. ArXiv abs/1906.02145. Cited by: §III.1.
  • C. Durkan, A. Bekasov, I. Murray, and G. Papamakarios (2019b) Neural spline flows. In NeurIPS, Cited by: §III.1, §III.1, §III.1, §IV.2.
  • Y. Fukuda et al. (1998) Evidence for oscillation of atmospheric neutrinos. Phys. Rev. Lett. 81, pp. 1562–1567. External Links: Document, hep-ex/9807003 Cited by: §I.
  • M. Germain, K. Gregor, I. Murray, and H. Larochelle (2015)

    MADE: masked autoencoder for distribution estimation

    .
    In ICML, Cited by: §III.1.
  • J. A. Gregory and R. Delbourgo (1982) Piecewise rational quadratic interpola-tion to monotonic data. Cited by: §III.1.
  • Y. Hayato (2009) A neutrino interaction simulation program library NEUT. Acta Phys. Polon. B40, pp. 2477–2489. Cited by: §II.1, §IV.1.
  • C. Huang, D. Krueger, A. Lacoste, and A. C. Courville (2018) Neural autoregressive flows. ArXiv abs/1804.00779. Cited by: §III.1.
  • P. Jaini, K. A. Selby, and Y. Yu (2019) Sum-of-squares polynomial flow. In ICML, Cited by: §III.1.
  • Z. Maki, M. Nakagawa, and S. Sakata (1962) Remarks on the unified model of elementary particles. Prog. Theor. Phys. 28, pp. 870–880. Note: [,34(1962)] External Links: Document Cited by: §I.
  • T. Müller, B. McWilliams, F. Rousselle, M. Gross, and J. Novák (2018) Neural importance sampling. ACM Trans. Graph. 38, pp. 145:1–145:19. Cited by: §III.1.
  • G. Papamakarios, I. Murray, and T. Pavlakou (2017) Masked autoregressive flow for density estimation. In NIPS, Cited by: §III.1.
  • G. Papamakarios, E. T. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshminarayanan (2019) Normalizing flows for probabilistic modeling and inference. ArXiv abs/1912.02762. Cited by: §III.1.
  • D. J. Rezende and S. Mohamed (2015) Variational inference with normalizing flows. ArXiv abs/1505.05770. Cited by: §III.1.
  • M. Tanabashi, K. Hagiwara, K. Hikasa, K. Nakamura, et al. (2018) Review of particle physics. Phys. Rev. D 98, pp. 030001. Cited by: §III.2.
  • M. Tanabashi et al. (2018) Review of Particle Physics. Phys. Rev. D98 (3), pp. 030001. External Links: Document Cited by: §I, §II.
  • A. Wehenkel and G. Louppe (2019) Unconstrained monotonic neural networks. In BNAIC/BENELEARN, Cited by: §III.1.