## I Introduction

X-ray absorption near edge structure (XANES) appears in the energy region of 50 eV around the X-ray absorption edge and comes from the X-ray absorption due to electronic transitions from a core level to unoccupied and band states for the element selected by the edge energy of X-ray absorption.[1] XANES spectra are measured in condensed matter research because they offer crucial physicochemical information. They drastically change by the atom valence, the crystal field dependent on microscopic symmetry, and chemical bonding states. [2] The experimental results of XANES are compared with the density of states (DoS)[3, 4] obtained by first-principles calculations, and the electronic states are discussed.

XANES spectra contain multiple peaks and step structures. It is necessary to decompose the spectrum into these components to extract information about electronic transitions. However, estimating the number of peaks and each peak’s width, position, and intensity is generally challenging since the spectrum is often complex and multimodal. A naive approach is least-squares fitting using the steepest descent method, but it cannot avoid the local minima.[5] Local minima show results that depend on the initial values, making the analysis arbitrary.

An effective approach for the analysis of XANES spectra is Bayesian spectral deconvolution[6], where the following spectral parameters can be estimated: position, width, and intensity of the basis function. [7] In previous studies,[7, 6] the exchange Monte Carlo method (EMC)[8]

was used to get out of the local solution and approach the optimal global solution. EMC makes it possible to sample from the posterior distribution of spectral parameters. The number of peaks and noise variance can also be estimated by maximizing a function called the marginal likelihood.

[9]Previous studies on Bayesian spectral deconvolution used the regression function treating all peaks equally.[7, 9, 6] However, the properties and physical background of XANES spectra differ in the high- and low-energy domain. Conventional methods have not fully utilized these differences.

In this work, we propose the energy-domain-aware regression model for Bayesian spectral deconvolution that incorporates the prior knowledge that XANES spectra have sharp peaks in the low-energy domain and broad peaks in the high-energy domain. Our proposed model enables us to discuss the number of peaks in each energy domain separately and shows its superiority in convergence, sampling efficiency, estimation accuracy, and model evidence compared with the conventional model. This was confirmed via synthetic data analysis.

This paper is organized as follows. Section 2 introduces the framework of the Bayesian spectral deconvolution. We describe our model and show that our model contains the conventional model depending on the design of prior distributions. Section 3 compares the conventional and proposed models and verifies our framework via synthetic data analysis. In Sect. 4, we present our discussion and conclude this paper.

## Ii Framework

### ii.1 Model

In XANES spectra, peaks in the low-energy domain reflect discrete electronic transitions from the atomic core levels to unoccupied states or valence bands.[1] In contrast, peaks in the high-energy domain reflect transitions to continuum states, also observed in extended X-ray absorption fine structure (EXAFS). [10]

Thus, peaks in the low-energy domain are focused for mapping to the electronic transition analysis by first-principles simulation. The shape of the peak reflects the transition probability and density of states. Large sharp peaks tend to be seen in the low-energy domain, and small, broad peaks tend to be seen in the high-energy domain.

[1]Figure 1 shows the synthetic data and model. Figure 1 shows an example of the synthetic spectrum generated assuming XANES data measured at O-K edge () of , which have a complex structure in the low-energy domain. is expected to be used as a solid electrolyte material.[11] Figure 1 shows the spectrum components; the XANES spectrum consists of peaks and a step structure called absorption edge. At least one of the peaks appears near the absorption edge and is called the white line.

We propose a model that discriminates the peaks below and above the absorption edge. We assume that there are peaks below the absorption edge and peaks above the absorption edge. The spectral function for the incident energy value of can be written as follows:

(1) |

where the first term represents the absorption edge and white line. The second term represents the peaks, where is the parameter of the number of peaks. is the spectral parameter. We use the Gaussian and arctangent function for the model of peaks and the step structure, respectively. The following discussion does not lose its generality when Gaussian is replaced by the Lorentzian or pseudo-Voigt function. The function is defined as

(2) | |||||

where is the parameter set. The aforementioned equation indicates that the position of the absorption edge is and that of the white line is . The function of peaks is defined as

(3) | |||||

(4) | |||||

(5) |

where , , and are parameters. The function represents the peaks below the absorption edge and the function represents the peaks above the absorption edge. The peak positions must satisfy the conditions , .

We assume that an observed spectra intensity is obtained as the sum of model and noise as

(6) |

We assume that noise

is generated from a Gaussian distribution with zero mean and variance

. The conditional probability of observing given model is then given by(7) |

where is called precision. Assuming that we have a data set consisting of observed data, the conditional probability is given by

(8) | |||||

(9) |

where is the error function of the fitting function :

(10) |

### ii.2 Bayesian formulation

Bayesian spectral deconvolution treats the data set , spectral parameter , the parameter of the number of peaks , and the inverse noise variance

as random variables. We assume that

and are generated subject to the probability and respectively, is generated subject to the probability , and data set is generated subject to the probability . The joint probability density is then given by(11) |

The posterior distribution of data set given and

is represented by Bayes’ theorem as

(12) | |||||

(13) | |||||

(14) | |||||

(15) |

where is marginal likelihood, which is also called model evidence. The negative logarithm of is called as the Bayesian free energy:

(16) |

### ii.3 Prior distributions

The probability density of the spectral parameter given the parameter is called the prior distribution. We show the discrimination of the peaks below and above the absorption edge using a prior distribution. We also show that, depending on the prior distribution, our model includes a conventional model that treats all peaks equally.[7, 9, 6]

In the following, we define the probability density functions of prior distributions of each parameter. In this paper,

denotes the uniform distribution defined in

,denotes the normal distribution with the mean

and the covariance , anddenotes the gamma distribution with the shape parameter

and as follows:(17) | |||||

(18) | |||||

(19) |

where we refer , and

as hyperparameters.

#### ii.3.1 Proposed model

We define the relation between the position of the absorption edge and those of peaks as follows :

(20) | |||||

(21) |

where we can rewrite the condition and into and , respectively. The parameter and are not independent variables because and depend on via Eqs. (20) and (21). To treat parameters independently, we reparametrize and as , , and . Then, the prior distribution can be rewritten as

(24) | |||||

where and are independent variables.

spectral parameter | prior | hyperparameter |
---|---|---|

, | ||

spectral parameter | prior | hyperparameter |
---|---|---|

First, we define the prior distributions on the absorption edge and the white line. The prior distribution of can be written as the product of the prior distribution of each parameter as

(25) |

We show definitions of the prior distributions , and in Table 1. Since the position of the absorption edge appears near the ionization energy of the element of interest, we set up a Gaussian distribution centered on as . Since the white line position is near the absorption edge, we use a Gaussian distribution centered on as . We use the gamma distribution for the intensity parameter of white line to prevent the intensity becoming too small. We set the uniform distribution for , and since we have no knowledge except for a rough range.

Next, we define the prior distributions of peaks. We can write the prior distribution of as

(26) | |||||

(27) | |||||

(28) |

We show definitions of the prior distributions , and in Table 2. We restrict the positions of peaks by the uniform distribution. Assuming that the range of the peak positions is , we set hyperparameters , and as

(29) |

where we consider that is the normal distribution with width . These prior distributions satisfy the condition , . We use the gamma distribution for the prior distribution of the intensity and width. Using the gamma distribution, we can restrict the parameters to non-zero and control the shape of the distribution by adjusting the hyperparameters. We can incorporate the prior distributions on the knowledge that peaks below are sharp and peaks above are broad. The examples of concrete hyperparameters are shown in Sec. 3.

#### ii.3.2 Conventional model

We show that our model can represent the conventional model by unifying the prior distributions below and above the absorption edge as

(30) | |||||

(31) | |||||

(32) |

Since the conventional model does not discriminate the peaks below and above the step, is independent of and . Thus, and are independent and the right side of Eq. (24) can be written as

(33) |

Here, all peaks have the same conditions. Thus, we do not need to set and , but only determine , which corresponds to their sum. We define the parameter of the number of peaks as for the conventional model.

spectral parameter | prior | hyperparameter |
---|---|---|

Figure 2 shows the relation between proposed and conventional models. As seen in the figure, there are peaks in and peaks in in the proposed model and peaks in in the conventional model. We redefine the for the conventional model as follows:

(34) |

The prior distributions of the peaks in the conventional model can be written as . We show definitions of the prior distributions and in Table 3. As shown in Table 3, we use uniform distributions for all parameters. We set hyperparameters for the prior distributions of position of peaks as and .

Since XANES contains peaks with a wide variety of intensities and widths, the conventional model has no choice but to use a uniform distribution. The proposed model distinguishes the peaks below and above the absorption edge by devising a prior distribution for the energy position. This enabled us to reflect the physical properties in accordance with the energy domain in the prior distribution design.

### ii.4 Exchange Monte Carlo method

The calculation of the posterior probability Eq. (

14) is generally hard since it requires high dimensional integration as Eq. (15). However, we can obtain an empirical posterior distribution from samples by EMC without calculating .We prepare the different inverse variances and posterior distributions . We can obtain samples from the following joint density by EMC:

(35) |

where the inverse variancse satisfy the condition and are called the inverse temperatures. In addition to Metropolis-Hasting sweeps at each Markov chain called replicas, EMC uses exchange moves between adjacent pairs of replicas. To satisfy a detailed balance, the swap is accepted with probability

(36) | |||||

(37) |

where the is the current state of each replicas.

At low temperatures, the posterior density is concentrated around the parameter that minimizes the error function and has a complex local minima in the parameter space. As the temperature increases, the posterior distribution asymptotes to the prior distributions, enabling the Markov chain to explore the parameter space without becoming trapped in the local regions. If the replicas at low temperatures are trapped in the local minima, the exchange move enables it to reach the global minima through exploration at higher temperatures.

Moreover, we can calculate the model evidence or the free energy from samples obtained by EMC. Here, we define the following auxiliary function: [12, 7, 9]

(38) |

Eq.(14) and (15) can be rewritten

(39) | |||||

(40) |

We can calculate as

(41) | |||||

(42) | |||||

(43) |

where

denotes the expectation over the probability distribution

. This is the extended method of importance sampling. [13] We can then obatain Bayes free energy as follows:(44) |

### ii.5 Parameter estimation

Here, we describe the method to estimate the number of peaks, noise variance, and spectral parameters. The estimation of is called model selection, where the most probable model is chosen.[13] The optimal value of the and can be obtained by the empirical Bayes approach as

(45) | |||||

(46) |

where we select the noise value from the discrete inverse temperature

. We will only mention that we can interpolate the free energy using the multihistogram method considering the continuity of noise variance.

[9]Let be EMC samples for the -th replica, where is the number of EMC samples. The optimal value of spectrum parameter can be obtained by maximization of the posterior probability as

(47) | |||||

(48) |

This technique is called maximum a posteriori (MAP) estimation. From the forementioned steps, we can estimate the number of peaks , inverse noise variance , and the spectrum parameter .

spectral parameter | hyperparameter |
---|---|

, | |

spectral parameter | hyperparameter |
---|---|

spectral parameter | hyperparameter |
---|---|

### ii.6 Probability of the number of peaks

As previously described, we estimate the number of peaks and noise variance by the empirical Bayes approach. In addition, we can calculate the probability density of the number of peaks and noise variance by the hierarchical Bayes approach as follows:

(49) | |||||

(50) | |||||

(51) |

where we assume that and are uniform distribution. Note that it is also possible to restrict or on the basis of the results of first-principles calculations or the conditions of the experiment. As seen in Eq. (50), the maximum point of is the equivalent to the minimum point of the free energy . We can obtain and via Eq. (51) since the parameter corresponds to in the proposed model and in the conventional model, respectively. Moreover, we can calculate the and for the proposed model by the principle of marginalization:

(52) | |||||

(53) |

Thus, we can calculate the probability densities of the number of peaks below and above the absorption edge separately. As previously mentioned, the peaks are more important than peaks from the viewpoint of physical properties. Focusing on the crucial domain and discussing the number of peaks is impossible using the conventional model, but the proposed model makes it possible.

## Iii Numerical Experiments

In this section, we compare the conventional model and proposed models by numerical experiments on the synthetic data in Fig. 1. The synthetic data was generated in accordance with Eq. (6) with the number of peaks , inverse variance of noise , and . In the synthetic data, peaks are located below the absorption edge, while peaks are located above.

For the EMC method, the number of replicas was set using a geometric progression as follows: [14]

(54) |

where we set , which is the true value of the noise variance. The hyperparameters of the prior distributions are shown in Tables 4, 5, and 6. Here, we define iterations of the metropolis update as one Monte Carlo step (MCS). We performed 60,000 MCSs and discarded the first 30,000 of them as the burn-in period.

of the conventional and proposed models, respectively. The shade represents the standard deviation for 10 independent runs. (b) The red and blue lines show the autocorrelation function of the proposed and conventional models, respectively. The shade represents the standard error for 10 independent runs.

First, we compare the proposed and conventional models in terms of convergence speed and sampling efficiency. We set the number of peaks as for the conventional model and for the proposed model. The number of replicas and inverse temperatures was set as and for both models. Figure 3 shows the error function against MCSs. The initial value of each state for the EMC method was randomly chosen from the prior density . As seen from these results, the simulation using the proposed model can converge faster compared with that using the conventional model. Figure 3 shows the autocorrelation function[15] of the . We can see that the simulation of the proposed model has a lower autocorrelation than that of the conventional model. From these results, we can say that the simulation using the proposed model can convergence faster and obtain low-correlated samples in a small iteration compared with the conventional model.

In the following, we show the results of estimation of noise variance, number of peaks, and spectral parameters. For inverse temperatures, we set and for the proposed model and and for the conventional model. Figure 4 shows the free energy and calculated as described in Sect. 2.4. As seen from these results, the minimum point of the free energy as a functions of is . According to these results, the noise estimation via Eq. (2.10) seems to obtain the correct noise variance.

Figure 5 shows the results of the model selection. Figures 5 and 5 show the results of and , respectively. From these results, the estimation by the conventional method selects an incorrect number of peaks, . Figure 5 and 5 show the results of and respectively. From these results, the estimation by the proposed method selects an appropriate number of peaks, . As seen from Fig. 5, both the free energy