Modeling Stochastic Variability in Multi-Band Time Series Data

In preparation for the era of the time-domain astronomy with upcoming large-scale surveys, we propose a state-space representation of a multivariate damped random walk process as a tool to analyze irregularly-spaced multi-filter light curves with heteroscedastic measurement errors. We adopt a computationally efficient and scalable Kalman-filtering approach to evaluate the likelihood function, leading to maximum O(k^3n) complexity, where k is the number of available bands and n is the number of unique observation times across the k bands. This is a significant computational advantage over a commonly used univariate Gaussian process that can stack up all multi-band light curves in one vector with maximum O(k^3n^3) complexity. Using such efficient likelihood computation, we provide both maximum likelihood estimates and Bayesian posterior samples of the model parameters. Three numerical illustrations are presented; (i) analyzing simulated five-band light curves for a comparison with independent single-band fits; (ii) analyzing five-band light curves of a quasar obtained from the Sloan Digital Sky Survey (SDSS) Stripe 82 to estimate the short-term variability and timescale; (iii) analyzing gravitationally lensed g- and r-band light curves of Q0957+561 to infer the time delay. Two R packages, Rdrw and timedelay, are publicly available to fit the proposed models.

Authors

• 3 publications
• 3 publications
04/25/2021

Novel bivariate autoregressive model for predicting and forecasting irregularly observed time series

In several disciplines it is common to find time series measured at irre...
06/28/2021

Systematic evaluation of variability detection methods for eROSITA

The reliability of detecting source variability in sparsely and irregula...
09/17/2012

A Bayesian method for the analysis of deterministic and stochastic time series

I introduce a general, Bayesian method for modelling univariate time ser...
07/10/2018

Near-infrared Mira Period-Luminosity Relations in M33

We analyze sparsely-sampled near-infrared (JHKs) light curves of a sampl...
02/28/2019

Deep Learning and Gaussian Process based Band Assignment in Dual Band Systems

We consider the band assignment (BA) problem in dual-band systems, where...
03/03/2020

Procedural band patterns

We seek to cover a parametric domain with a set of evenly spaced bands w...
08/17/2021

AGNet: Weighing Black Holes with Deep Learning

Supermassive black holes (SMBHs) are ubiquitously found at the centers o...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A Gaussian process (GP) is one of the most important data analytic tools in astronomy due to its well-known computational and mathematical conveniences. GPs are especially useful for analyzing astronomical time series data in a sense that they are continuous-time processes accounting for irregular observation cadences. Moreover, GP’s state-space representation enables modeling heteroscedastic measurement errors as well. Such analytic advantages have made GPs so popular that it is nearly impossible to list all sub-fields of astronomy where GPs are useful; 19,483 ApJ articles appear on the webpage of IOPscience and 18,038 MNRAS articles show up on the webpage of MNRAS with the keyword “Gaussian process” (on Feb 19, 2020). However, it is the case that a multivariate GP (marquardt2007mCARMA; marquardt2007carfima; schlemm2012multivariate), which is specifically designed for modeling multi-band time series data, has not been well documented in the astronomical literature.

Thus, we propose a state-space representation of a multivariate damped random walk process as a specific class of a multivariate GP. This process is also called a multivariate Ornstein-Uhlenbeck process (gardiner2009; singh2018fast) and a vectorized continuous-time auto-regressive model with order one, i.e., a vectorized CAR(1) or CARMA(1, 0) (marquardt2007mCARMA). In particular, the proposed is a multivariate generalization of the work of kelly2009variations. They adopt a univariate GP with the Matérn(1/2) covariance function (i.e., damped random walk process) to fit single-filter quasar light curves. Using the single-band model, they investigate associations between model parameters and physical properties of quasars. Following their work, macleod2010modeling and kozlowski2010quantifying show more empirical evidence for such astrophysical interpretations on the model parameters. The proposed multivariate generalization of their analytic tools can incorporate more data from all available bands into one comprehensive model. This enables more accurate inference on such physically meaningful model parameters.

Many researchers have also pointed out some limitations of the damped random walk process; mushotzky2011kepler and zu2013is report empirical evidence that the damped random walk process fails to describe the optical variability of a quasar on a very short timescale; graham2014a and kasliwal2015are echo their arguments that not all types of AGN variabilities can be described by the damped random walk process. Looking into the limiting behavior of the process, tak2017bayesian also demonstrate that it fails to fit AGN light curves if the timescale of AGN variability is much smaller than the typical observation cadence in the data. Thus, they incorporate this limitation into their model by using an inverse-Gamma prior to set up a soft-lower bound on the timescale. We adopt this approach for the proposed multivariate generalization.

Although the damped random walk is not perfect, the multivariate aspect of the proposed is essential for studying stochastic AGN variability in the era of Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST, ivezi2019lsst). LSST lightcurves are supposed to be sparse when only one band is considered. In general, it is challenging to extract information about short-term variability and timescale from sparsely observed single-band light curves. This problem becomes worse if the actual timescale of an AGN variability is much smaller than the typical observation cadence in each band. The proposed multi-band model, however, can alleviate this issue of sparse sampling because it can take advantage of more data points observed at non-overlapping times across bands. This, in turn, will make it possible to estimate short timescales that single-band light curves may not exhibit. If the range of timescales that can be extracted from the LSST data becomes wider (toward small timescales), it will be helpful for investigating AGN variability and light curve classification. It will also lead to more accurate inference on short-term variability because timescale and short-term variability are negatively correlated (kelly2009variations).

From a methodological point of view, the proposed method is flexible enough to model various aspects of astronomical multi-filter light curves. Above all, the proposed process is a continuous-time process in a state-space representation, suitable for modeling irregularly-spaced multi-band time series data with heteroscedastic measurement errors. Also, the process does not require the data at each observation time to be a vector of the same length; the number of observations at each observation time can range from one to , the total number of bands. This feature is desirable because there can be ties in observation times possibly due to rounding. In addition, the lengths of multi-filter light curves do not need to be the same; a light curve from one band can be longer than other light curves from different bands. Such a flexibility makes the proposed process ideal for modeling SDSS and LSST multi-band time series data with heteroscedastic measurement errors.

Moreover, the proposed process has a couple of computational advantages. We adopt a Kalman-filtering approach for evaluating the resulting likelihood function with maximum complexity, where is the total number of unique observation times across the entire bands. This is more scalable and efficient than an existing strategy of applying a univariate GP with maximum complexity to multi-filter light curves that are stacked up in a single vector (zu2016app; czekala2017). Such an efficient likelihood computation makes the following likelihood-based inference efficient; we provide both maximum likelihood estimates and Bayesian posterior samples of model parameters.

Our numerical studies include a simulation study and two realistic data analyses to show that the proposed process leads to more comprehensive inference compared to independent single-band analyses. The simulation study generates multi-band light curves using the proposed model with some fixed parameter values. Then it checks whether the resulting inference successfully recovers these generative parameter values. The next numerical illustration analyzes realistic five-band light curves of a quasar obtained from SDSS Stripe 82 to infer short-term variabilities and timescales. Finally, we apply the proposed process to inferring the time delay between doubly lensed images of Q0957+561 whose light curves are observed in - and -band.

2 Model specification

A univariate damped random walk process (kelly2009variations) is defined as

 dX(t)=−1τ(X(t)−μ)dt+σdB(t),

where denotes the magnitude of an astronomical object at time , is the timescale of the process in days, is the long-term average magnitude of the process, is the short-term variability of the process on the magnitude scale, and is the standard Brownian motion.

A multivariate version of the damped random walk process (gardiner2009) is defined as

 dX(t)=−D−1τ(X(t)−μ)dt+DσdB(t), (1)

where is a vector of length that denotes magnitudes of the bands at time , is a diagonal matrix whose diagonal elements are timescales with each representing the timescale of the -th band in days, is a vector for long-term average magnitudes of bands, is a diagonal matrix whose diagonal elements are short-term variabilities (in magnitudes) of bands, and finally is a vector for standard Brownian motions whose pairwise correlations are modeled by correlation parameters such that . These correlations are essentially cross-correlations because they govern the correlations among different continuous-time processes. (The subscripts and will be numeric hereafter, i.e., the subscripts and denote the -th band and -th band, respectively, not -band and -band.) We use to denote a vector of these correlation parameters.

These correlation parameters are the key to the multi-band modeling. If these correlations are set to zeros, then the proposed multi-band model is essentially equivalent to a single-band model with an independent assumption across bands. In this case, data from one band do not contribute to the parameter estimation of other bands; see the left panel of Figure 1

. This is because a zero correlation between two multivariate Gaussian random variables means their independence. The proposed multi-band model accounts for the dependence between bands by introducing their correlation parameters, which enables sharing information across multi-band data to infer parameters of all bands; see the right panel of Figure

1.

The solution of the stochastic differential equation in (1) is Gaussian, Markovian, and stationary (105p, gardiner2009), i.e., given and for ,

 X(t)∣X(s),μ,σ,τ,ρ∼MVNk(μ+e−(t−s)D−1τ(X(s)−μ), Q(t−s)), (2)

where MVN) represents a

-dimensional multivariate Gaussian distribution with mean vector

and covariance matrix . The entry of the covariance matrix is defined as

 qjl=σjσlρjlτjτlτj+τl(1−e−(t−s)τj+τlτjτl). (3)

We evaluate this continuous-time process at discrete observation times

. Then the joint probability density function of

is

 f1(X(t)∣μ,σ,τ,ρ)=n∏i=1f2(X(ti)∣X(ti−1),μ,σ,τ,ρ), (4)

where denotes the density function of the multivariate Gaussian distribution defined in (2), , and the subscript will be used to distinguish observation times ().

The observed data are multi-filter light curves measured at irregularly spaced observation times

with known measurement error standard deviations,

. Since one or more bands can be used at each observation time , the length of a vector can be different, depending on how many bands are used at the -th observation time. For example, if - and -bands are used at time , then is a vector containing two magnitudes from the - and -bands, and is a vector of two corresponding measurement error standard deviations. We assume that these observed data are realizations of the latent multi-filter light curves

with known Gaussian measurement error variances

. That is, for ,

 xi∣X(ti)∼MVNki(X∗(ti), D2δi), (5)

where is the number of bands used at observation time , and denotes a sub-vector of corresponding to the bands that are used to observe . For example, if - and -bands are used for measuring at , then is a vector of length two () composed of the two elements of corresponding to the - and -bands. The notation denotes a diagonal matrix whose diagonal elements are . These observed data are assumed to be conditionally independent given the latent data , and thus the resulting joint probability density function of the observed data given the latent data is expressed by

 h1(x∣X(t))=n∏i=1h2(xi∣X(ti)), (6)

where is the multivariate Gaussian density function defined in (5).

We summarize the proposed state-space representation in the following diagram.

The arrows represent dependent and conditionally independent relationships. For example, both and depend only on , and they are conditionally independent given because there is no direct arrow between and . The conditional distributions of the latent magnitudes in the State level are defined in (2), and those of the observed data given the latent magnitudes in the space level are given in (5). The advantage of this state-space approach is that we can model the noisy observations with known measurement error variances , as is done in (5), which is the unique feature of astronomical time series data. See also kelly2009variations and kelly2014flexible for the state-space representations of univariate CARMA(1, 0) and CARMA(), respectively, and Sec. 3 of durbin2012time for details of state-space representation of GPs.

Consequently, the likelihood function of the model parameters with the latent process integrated out is

 L(μ,σ,τ,ρ)=∫h1(x∣X(t))f1(X(t)∣μ,σ,τ,ρ) dX(t). (7)

Here and are defined in (4) and (6), respectively.

3 Computation of the likelihood function via Kalman-filtering

Kalman-filtering (kalman1960) is a well-known technique to evaluate the likelihood function of a state-space model when both state and space models are Gaussian. Since the proposed has a Gaussian state-space representation as shown in (2) and (5), it is natural to adopt Kalman-filtering to compute the likelihood function in (7) via a product of -dimensional multivariate Gaussian densities (). This leads to complexity. The minimum complexity is when only one band is used at each observation time () and the maximum complexity is when all of the bands are used at every observation time ().

Let denote the natural filtration at time , i.e., all of the information about the observed data available until time . Using this notation, we define the following predictive mean vector and covariance matrix at : With ,

 μi|i−1=E(X(ti)∣F(ti−1),μ,σ,τ,ρ)=μ+e−ΔtiD−1τ(μi−1|i−1−μ),Σi|i−1=Cov(X(ti)∣F(ti−1),μ,σ,τ,ρ)=e−ΔtiD−1τ(Σi−1|i−1)e−ΔtiD−1τ+Q(Δti). (8)

We assume that

 μ1|0=μ  and  Σ1|0={qjl}={σjσlρjlτjτlτj+τl}.

Here, each element of the covariance matrix in (8) is defined in (3), and the updated mean vector and covariance matrix (after observing data at ) are

 μi|i=E(X(ti)∣F(ti),μ,σ,τ,ρ)=μi|i−1+Σ.,∗i|i−1(Σ∗,∗i|i−1+D2δi)−1(xi−μi|i−1),Σi|i=Cov(X(ti)∣F(ti),μ,σ,τ,ρ)=Σi|i−1−Σ.,∗i|i−1(Σ∗,∗i|i−1+D2δi)−1Σ∗,.i|i−1.

The notation denotes a sub-matrix of restricted to the bands used for observing , is a sub-matrix of whose rows correspond to the bands used and columns correspond to the entire bands, and is a sub-matrix of whose rows correspond to the entire bands and columns correspond to the bands used. For example, suppose there are five bands, , and we use - and -bands to observe . Then, is a 22 covariance matrix constructed by selecting rows and columns corresponding to - and -bands from , is a 25 matrix made by choosing two rows corresponding to - and -bands and all columns from , and is a 52 matrix built by choosing all rows and two columns corresponding to - and -bands from .

Consequently, the likelihood function in (7) can be computed as follows:

 L(μ,σ,τ,ρ)=n∏i=1p(xi∣F(ti−1),μ,σ,τ,ρ), (9)

where is another multivariate Gaussian density of

 xi∣F(ti−1),μ,σ,τ,ρ∼MVNki(μ∗i|i−1, Σ∗,∗i|i−1+D2δi). (10)

The notation denotes a sub-vector of restricted only to the bands used to observe .

By definition, the maximum likelihood estimates of the model parameters are the values that jointly maximize . We use a gradient-free optimization algorithm (nelder1965) to obtain the maximum likelihood estimates.

4 Bayesian inference

For Bayesian hierarchical modeling, we adopt scientifically motivated, weakly informative, and independent prior distributions on the model parameters (tak2017bayesian; tak2018how): For and ,

 μj∼% Unif(−30, 30),τj∼inv-Gamma(1,1),ρjl∼Unif(−1, 1),σ2j∼inv-Gamma(1,c), (11)

where inv-Gamma

denotes the inverse-Gamma distribution with shape parameter

and scale parameter . We assume that each long-term average magnitude is in a reasonably wide range between and 30. The correlation parameters are between and 1 by definition. Setting up an inverse-Gamma prior on an unknown parameter is considered as setting up a soft lower bound, , of the unknown parameter (Sec. 4.2, tak2018how). We set up soft lower bounds of ’s and ’s to prevent undesirable limiting behaviors of a damped random walk process (Sec. 2.5, tak2017bayesian)

. Specifically, the observed time series will look like a white-noise process if the timescale of the process is much smaller than the typical observation cadence (i.e., in the limit of timescale going to zero). Thus, we set up a half-day soft lower bound for each

. This undesirable behavior is also expected if the short-term variability (variance) of the process is much smaller than the typical measurement error variance. To reflect this, the constant in the inverse-Gamma prior of in (11) is set to an arbitrarily small constant, , so that the soft lower bound of each is 0.00022.

Let be a joint prior density function of , , , and whose distributions are specified in (11). Then, the resulting full posterior density function is

 π(μ,σ,τ,ρ∣x)∝L(μ,σ,τ,ρ)×q(μ,σ,τ,ρ). (12)

We adopt a Metropolis-Hastings within Gibbs sampler (tierney1994markov) to draw (dependent) posterior samples from the full posterior distribution . Initial values of the model parameters are set to their maximum likelihood estimates. Then, it sequentially updates each parameter given the observed data and all the other parameters at each iteration. For example, suppose we have three parameters , , and to be updated at iteration  given previously updated values. We sequentially update each parameter as follows:

 Given  (θ(s−1)1,θ(s−1)2,θ(s−1)3), sample π1(θ1∣θ(s−1)2,θ(s−1)3,x), setting it to θ(s)1, sample π2(θ2∣θ(s)1,θ(s−1)3,x), setting it to θ(s)2, sample π2(θ3∣θ(s)1,θ(s)2,x), setting it to θ(s)3.

The parenthesized superscript indicates at which iteration the value of the parameter is updated. We use a truncated Gaussian distribution between and as a proposal distribution for each , a truncated Gaussian proposal distribution between and for each of ’s, and a log-Normal proposal distribution for each of ’s and of

’s. We also use an adaptive Markov chain Monte Carlo

(Sec. 3.2, tak2017bayesian) so that proposal scales (or so-called jumping scales) are automatically tuned.

A tuning-free R package, Rdrw, that can analyze and simulate both single- and multi-band light curves according to the proposed model is publicly available at CRAN.

5 Numerical illustrations

5.1 A simulation study on five-band light curves

We simulate a set of five-band light curves from the proposed multi-band model via a two-step procedure. The first step simulates the entire data and the second step deletes some of them to be more realistic, e.g., making seasonal gaps. The simulation setting of the first step is as follows; (i) the total number of unique observation times across five bands is set to 300; (ii) the observation cadences are randomly drawn from the Gamma() distribution whose mean and standard deviation are equal to 3 (days); (iii) the measurement error standard deviations of the -th band are randomly drawn from the distribution () so that the first band accompanies the smallest measurement error standard deviations and the last band involves the largest; (iv) the generative parameter values are , , for . Also, for so that the first and last bands are least correlated; (v) given these parameter values, the latent magnitudes are generated via (2); (vi) finally, conditioning on these latent magnitudes, the observed magnitudes are generated via (5). The R package Rdrw has a functionality to return as a matrix.

Given these simulated data in the first step, we remove some observations, assuming that only three consecutive observations occur for each band, and there are three seasonal gaps; (i) for the -th band, we only keep the following observation numbers: . For example, the observation numbers to be kept for the second band are 4, 5, 6, 19, 20, 21, , 289, 290, 291, and the other observations in the second band are removed. Similarly, the observation numbers for the third band () are 7, 8, 9, 22, 23, 24, , 292, 293, 294. By this rule, we can keep the 300 observation times in total and each observation time accompanies a measurement from one band ( for all , leading to complexity; see (10)). (ii) Next we create three seasonal gaps by removing 120 observation times out of 300. The observation numbers falling into the three seasonal gaps range from 41 to 80, from 141 to 180, and from 241 to 280. The total number of observations in the final data set is 180. The simulated five-band light curves before and after removing some of the data are displayed on the top and bottom panels of Figure 2, respectively. The heteroscedastic measurement error standard deviations are too small to be displayed. Also, the details of the simulated data are summarized in Table 1.

Using this simulated data set (i.e., after the removal), we fit a univariate damped random walk model (kelly2009variations) on each single-band light curve. And we fit the proposed multivariate damped random walk model on the entire five-band data set. For each fit, we run a Markov chain for 60,000 iterations, which is initiated at the maximum likelihood estimates of the model parameters. We discard the first 10,000 iterations as burn-in, and then we thin the remaining chain by a factor of five, i.e., from length 50,000 to 10,000. We use this thinned Markov chain Monte Carlo sample of size 10,000 to summarize the results. We check the convergence of the Marko chain by computing effective sample sizes as a numerical indication of auto-correlation function; the higher the effective sample size is, the more quickly the auto-correlation function decreases. The average effective sample size for the five short-term variabilities is 2,896 and that for the five timescales is 1,866.

The marginal posterior distributions of the short-term variabilities obtained by the proposed multi-band model are exhibited in the first column of Figure 3, and those of timescales are displayed in the second column of Figure 3. The black solid curves superimposed on the histograms denote the posterior distributions obtained by the single-band model. The vertical dashed lines represent the generative true values. It turns out that the posterior distributions obtained by the proposed multi-band model (histograms) tend to have higher peaks and narrower spread around the generative true values than those obtained by the single-band models (solid curves). These results show that the proposed model results in more accurate inferences on the model parameters. This makes intuitive sense because the single-band model accesses only 36 observations in each band, while the multi-band model enables sharing information across bands via their correlations. That means, the multi-band model allows all of the 180 observations to contribute to the inference on every model parameter.

When it comes to cross-correlation parameters, a comparison between single-band and multi-band models is not possible because correlations are available only in the multi-band model. We display the marginal posterior distributions of the ten cross-correlation parameters in Figure 4. It turns out that the true parameter values are closely located near the modes of these posterior distributions.

5.2 Five-band light curves of an SDSS S82 quasar

The five-band () light curves of a quasar used in this illustration are obtained from a catalog of 9,258 SDSS Stripe 82 quasars that are spectroscopically confirmed (macleod2012). The name of the quasar (dbID) is 3078106. We display these five-band light curves in Figure 5. Most of the measurement error standard deviations are too small to be displayed in this figure. There is no tie in observation times, i.e., a single-band magnitude is measured at each observation time ( for all in (10)), leading to complexity for the likelihood computation. The length of each single-band light curve is different; 132 observations in -band, 137 in -band, 141 in -band, 138 in -band, and 139 in -band. In total, there are 687 observation times across the bands (); see Table 2 for more details of the data.

We run a Markov chain of length 60,000, discarding the first 10,000 iterations as burn-in. We thin the remaining chain by a factor of five from length 50,000 into 10,000. We use this thinned Markov chain to make an inference on each parameter because the effective sample sizes are satisfactory across all model parameters. The sampling result is summarized in Table 3.

We compare the marginal posterior distributions of the short-term variabilities and timescales obtained by the proposed multi-band model with those obtained by the single-band models. Figure 6 displays these marginal posterior distributions of the short-term variabilities in the first column and those of the timescales in the second column. The histograms indicate the posterior distributions obtained by the multi-band model, while the superimposed solid black curves represent those obtained by the single-band model.

Overall, the marginal posterior distributions obtained by the multi-band model tend to have higher peak and narrower spread than those obtained by the single-band models. In particular, the modes of all five marginal posterior distributions of timescales in the second column are closely located near 2.3, while single-band models do not exhibit such consistent modal locations. Thus, the result of multi-band modeling indicates that the true timescales of quasar 3078106 may be almost the same across the five bands unlike single-band analyses. Such a consistent inference may be ascribed to the multi-band aspect of the proposed model that allows sharing information across bands. Also, the posterior samples of timescales obtained by the multi-band model tend to be smaller than those obtained by the single-band models. This may be because the multi-band model accounts for more data across bands, which enables detecting smaller timescales that individual single-band light curves may not exhibit.

5.3 Time delay estimation between doubly-lensed multi-band light curves of Q0957+561

Time delay estimation is one of the key factors for the Hubble constant estimation via time delay cosmography (kai2015; treu2016time; suyu2017holicow). However, it has been the case that the time delays are estimated only from single-band light curves. The proposed process can be used to estimate time delays among gravitationally lensed multi-band light curves of an AGN, as is the case for a single-band model (dobler2015; kai2015; tak2017bayesian). For this purpose, we introduce a few more parameters for the time delays and microlensing adjustment.

We use the proposed multi-filter process to model each of multiply lensed images observed in several bands, e.g., , , , and corresponding to quadruply lensed images, , and . (The notation here is consistent to the one in Section 2 except the subscripts that distinguish lensed images.) Taking lensed image as an example, the notation represents and each component is a vector of length (the number of available bands), i.e., . The observed data of lensed image are and corresponding measurement error standard deviations are .

We account for microlensing by subtracting a polynomial long-term trend of order from each latent magnitude (hojjati2013robust; tewes2013a; tak2017bayesian). That is, for ,

 X′A,j(ti)=XA,j(ti)−pA,j(ti), (13)

where is the polynomial long-term trend at defined as

 pA,j(ti)=βA,j,0+βA,j,1ti+βA,j,2t2i+⋯+βA,j,mtmi. (14)

The notation denotes the polynomial regression coefficients of band  for image . We use to collectively denote the regression coefficients for image . The polynomial order can be set differently for each band (e.g., ). We note that the long-term average magnitude of band  for image , i.e., , is now absorbed into the intercept term of .

We assume that each multi-band process, whose polynomial long-term trends are removed across bands, is a shifted version of the other in the horizontal axis by the time delays. That means,

Consequently, given the time delays (, , ) and polynomial regression coefficients for microlensing (, , , ), we can use only one multi-band process to model all of the gravitationally lensed multi-filter light curves, i.e., , , , and .

Moreover, given the time delays and polynomial regression coefficients for microlensing, we can unify the notation, combining all multi-band light curves of the lensed images. Let be the sorted observation times among , , , and . The unified notation for the observed data at is defined as follows: For ,

 yi=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩xA,i  if  ~ti∈t,xB,i  if  ~ti∈t−ΔAB,xC,i  if  ~ti∈t−ΔAC,xD,i  if  ~ti∈t−ΔAD

with measurement error standard deviation

 ηi=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩δA,i  if  ~ti∈t,δB,i  if  ~ti∈t−ΔAB,δC,i  if  ~ti∈t−ΔAC,δD,i  if  ~ti∈t−ΔAD.

The unifying notation for the latent data is

 Y(~ti)=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩X′A(~ti)+pA(~ti)  if  ~ti∈tX′A(~ti)+pB(~ti)  if  ~ti∈t−ΔABX′A(~ti)+pC(~ti)  if  ~ti∈t−ΔACX′A(~ti)+pD(~ti)  if  ~ti∈t−ΔAD,

where is a vector of length composed of , and we similarly define , , and .

Using the unified notation, we specify the distributions for the proposed time delay model. Let us define and . Then the joint density function of the latent data is

 f1(Y(~t)∣σ,τ,ρ,Δ,β)=4n∏i=1f2(Y(~ti)∣Y(~ti−1),σ,τ,ρ,Δ,β), (16)

where the conditional distributions for are defined as

 Y(~ti)∣Y(~ti−1),σ,τ,ρ,Δ,β∼MVNk(e−(~ti−~ti−1)D−1τY(~ti−1), Q(~ti−~ti−1)) (17)

for . The entry of the covariance matrix is defined in (3). The joint density function of the observed data is

 h1(y∣Y(~t),Δ,β)=4n∏i=1h2(yi∣Y(~ti),Δ,β), (18)

where the conditional distributions for are

 yi∣Y(~ti),Δ,β∼MVNki(Y∗(~ti), D2ηi) (19)

for . The notation denotes a sub-vector of corresponding to the bands used for observing at . We also note that the conditions in (16)–(19) include and because without and we cannot define ’s and ’s. The resulting likelihood function of the model parameters with the latent process marginalized out is

 L(σ,τ,ρ,Δ,β)=∫f1(Y(~t)∣σ,τ,ρ,Δ,β)×h1(y∣Y(~t),Δ,β) dY(~t), (20)

where and are defined in (16) and (18), respectively. The same Kalman-filtering procedure in Section 3 is used to calculate this likelihood function. The only difference is that there are two times more observations (from to ) for doubly-lensed system and four times more (from to ) for quadruply-lensed system. The number of unknown parameters when bands are used for lensed images with the -th order polynomial regression is . That is, ’s, ’s, ’s, time delays, and polynomial regression coefficients. For example, the number of unknown parameters is 103 when five bands are used () for a quadruply lensed quasar () with a cubic polynomial regression ().

For a Bayesian inference, we adopt the same priors for

, , and as specified in (11), and weakly-informative independent prior distributions for the additional parameters, and (Sec. 2.4, tak2017bayesian). Specifically, we assume an independent multivariate Gaussian distribution, MVN, for each of ’s, ’s, ’s, ’s, where denotes a vector of zeros and

is a diagonal matrix whose diagonal elements are set to relatively large constants. We also adopt a Uniform distribution over the range between

and for each of , , . We use to denote the joint prior distribution of these unknown model parameters. The resulting full posterior distribution of the unknown model parameters is

 π(σ,τ,ρ,Δ,β)∝L(σ,τ,ρ,Δ,β)×q(σ,τ,ρ,Δ,β). (21)

We adopt the same adaptive Metropolis-Hastings within Gibbs sampler to draw posterior samples from this full posterior distribution. This time, it is natural to think of a two-step sampling scheme. First, given the parameters related to the time delay model, i.e., and , we can completely determine and . Thus, given and , we can update the parameters relevant to the multivariate damped random walk process, i.e., , , and , as done in Sections 5.1 and 5.2. Second, given the updated parameters, , , and , we update the time-delay-related parameters, and . Here we use a truncated Gaussian proposal distribution for each of , and we do not need a proposal distribution for because the conditional posterior distribution of is a multivariate Gaussian. The sampler iterates these two steps at each iteration. An R package, timedelay, features this two-step update scheme to fit the proposed time delay model on a doubly lensed system observed in two bands. (The package will be updated later with more general cases.)

Using the proposed time delay model and fitting procedure, we estimate the time delay between doubly lensed images of Q0957+561. The data are observed in - and -bands and are publicly available (refId0). The details of the data are summarized in Table 4. The - and -band light curves of lensed image are displayed on the top panel of Figure 7, and those of lensed image are shown on the bottom panel of Figure 7. The resulting model involves 22 unknown model parameters.

To fit the proposed time delay model with a cubic polynomial regression for microlensing (), we implement a Markov chain of length 60,000, discarding the first 10,000 iterations as burn-in. We thin the remaining chain by a factor of five (from 50,000 to 10,000). We summarize our inferential result using this thinned Markov chain. The effective sample size of the time delay is 675, leading to its auto-correlation quickly decreasing to zero. As for single-band fits, we adopt a quadratic polynomial regression () for the microlensing adjustment because a Markov chain with a cubic order () does not converge.

The top panel of Figure 8 shows the marginal posterior distributions of obtained by fitting two single-band models independently. The green curve indicates the marginal posterior distribution of based only on the -band data, and the red dashed curve represents the distribution obtained only with the -band data. Clearly, the fitting results are not consistent. The -band posterior distribution has a mode near 415 days, while the -band posterior distribution shows two modes, one near 415 days and the other near 420 days.

The bottom panel of Figure 8 exhibits the marginal posterior distribution of obtained by fitting the proposed multi-band model on the entire data. It is now clear which mode the data support more; the relative height of the mode near 420 days is not even comparable to that of the mode near 415 days and the peak near 415 days is higher than before. This makes intuitive sense because the -band light curves have more fluctuations, as shown in Figure 7, and thus it is easier to find the time delay by matching those fluctuations. Consequently, the joint model puts more weight on the information from the -band, enabling us to narrow the two possibilities down to the highest mode near 415 days. The resulting posterior mean and standard deviation of are 414.324 and 2.307, respectively.

This feature of sharing information across different bands will be useful for finding time delays from the multi-band LSST light curves. This is because single-band LSST light curves are sparsely observed, and thus single-band observations may not exhibit enough fluctuation patterns needed to estimate time delays.

6 Discussion

The proposed state-space representation of a multivariate damped random walk process can be applied to other sub-fields of astronomy as well. Photometric reverberation mapping is one of such fields. zu2011an model continuum variability by a univariate GP with damped random walk covariance function to estimate AGN reverberation time lags. Later, zu2016app modify this model to infer a two-band reverberation time lag. They stack up two-band light curves in one vector and apply the univariage GP framework with complexity. We note that their two-band time-lag model is closely related to the single-band time delay model between doubly-lensed light curves. This is because the former reduces to the later if an indicator function , which is one when and zero otherwise, is used as a transfer function . This implies that the proposed multivariate process can be useful for improving their reverberating time-lag estimate, as is the case in multi-band time delay estimation.

We also note that the proposed parametric approach can be an alternative to a widely used non-parameteric method in multi-band data analyses. For example, though not handled in this work, the cross-correlation parameters themselves may be of interest when a non-parametric cross-correlation method is used as a data analytic tool, e.g., edelson2015space. This is because the proposed multivariate process can be considered as a parametric cross-correlation method in a sense that it produces posterior distributions of all pair-wise cross-correlation parameters. Its applicability is not limited to correlations among multi-band time series of one object. It can be used to find cross-correlations of multiple time series data of different sources if they are of interest.

There are a couple of limitations of this work. One obvious disadvantage is the computational cost. The proposed model incorporates more data from all bands and involves more parameters to model their dependence. Thus, the resulting computational cost is inevitably expensive. For example, the CPU time taken for running the five-band simulation in Section 5.1 is about 26 hours, that for analyzing the SDSS five-band data in Section 5.2 is about 100 hours, and that for fitting the time delay model on the Q0957+561 data in Section 5.3 is about 9 hours on a personal laptop. Speeding up the code is necessary to make it computationally less burdensome for fitting bigger data of large-scale surveys. Another limitation is that the code for fitting the proposed model is for now only available in R (r2018), although astronomers are more familiar with Python. Thus, we plan to collaborate with bilingual astronomers or statisticians either to translate the R code to a Python code or to develop a wrapper.

Several opportunities to build upon the current work exist. (i) Since multimodality is common in the time delay estimation, it is promising to incorporate a multimodal Markov chain Monte Carlo sampler. For example, a repelling-attracting Metropolis algorithm has been successfully applied to a single-band time delay model (tak2018ram). (ii) Next, the proposed model is stationary, meaning that it does not account for outlying observations. tak2019robust

demonstrate that the parameter estimation of a univariate damped random walk process can be severely biased in the presence of a few outliers. As an easy-to-implement solution, they introduce a computational trick that turns the Gaussian measurement errors to Student’s

-errors, leading to more robust and accurate inferences. It will be a great improvement if this trick is incorporated into the proposed model fitting procedure. (iii) Also, improving the convergence rate of a Markov chain is important for enhancing computational efficiency. For a heteroscedastic model, it is theoretically shown that a specific data augmentation scheme can expedite the convergence rate significantly (tak2020dta). Although this scheme has not been applied to heteroscedastic time series data, it is worth to investigate. (iv) Finally, it is necessary to generalize the proposed multi-band model further. This is because the current model can describe only a specific type of variability defined by a damped random walk process, i.e., CARMA(1, 0). In a univarate case, however, a general-order CARMA() model has been widely used due to the limitation of the damped random walk process (kelly2014flexible; moreno2019stochastic). Thus, more flexible modeling on AGN variability will be possible by using the general-order multi-band CARMA() model (marquardt2007mCARMA; schlemm2012multivariate). We note that it is crucial to carefully design the state-space representation of the multi-band CARMA() to account for the heteroscedasticity of astronomical time series data. We leave these as future directions to explore.

7 Concluding remarks

In the era of astronomical big data with large-scale surveys, such as SDSS and LSST, it is important to have various data analytic tools to handle the resulting multi-band data. In this sense, it is worth to mention that there are possibly many existing tools that can be more powerful than the one presented in this work if a user is fully aware of their strong and weak points. For example, if fitting smooth curves on multi-band time series data is of interest, various non-parametric curve-fitting methods such as kernel-smoothing, spline, wavelet, local polynomial regression (loader1999local; schafer2013)

are available. However, these non-parameteric tools may not necessarily have interpretability that a parametric model can provide. A multivariate GP regression is a flexible way to model various types of variability, e.g., incorporating a quasi-periodic aspect of the Doppler shift in a covariance function

(jones2017improving). However, it is well known that a GP regression often involves a prohibitive computational cost because it requires taking an inverse of a covariance matrix. The tool we present in this work will be useful for modeling stochastic variability in irregularly-spaced astronomical time series data with heteroscedastic measurement errors, though it is not a panacea as described in Section 6. We hope this work to initiate more active methodological research and discussion on multi-band data analyses in preparation for the upcoming era of LSST-driven time-domain astronomy.

Acknowledgements

We appreciate the members of the International CHASC Astrostatistics Collaboration, especially Kaisey Mandel, David van Dyk, Vinay Kashyap, Xiao-Li Meng, and Aneta Siemiginowska, for productive discussion on the first presentation of this work. Hyungsuk Tak appreciates Jogesh G. Babu, Eric D. Feigelson, and Eric Ford for their insightful comments and thoughtful suggestions that have significantly improved the manuscript. Also, Hyungsuk Tak acknowledges computational supports from the Institute for Computational and Data Sciences at Pennsylvania State University.

Rdrw (Hu and Tak, 2020, version 1.0.0), timedelay (Tak, Mandel, van Dyk, Kashyap, Meng, Siemiginowska, 2015, version 1.0.10)