Streamlined Empirical Bayes Fitting of Linear Mixed Models in Mobile Health

03/28/2020
by   Marianne Menictas, et al.
0

To effect behavior change a successful algorithm must make high-quality decisions in real-time. For example, a mobile health (mHealth) application designed to increase physical activity must make contextually relevant suggestions to motivate users. While machine learning offers solutions for certain stylized settings, such as when batch data can be processed offline, there is a dearth of approaches which can deliver high-quality solutions under the specific constraints of mHealth. We propose an algorithm which provides users with contextualized and personalized physical activity suggestions. This algorithm is able to overcome a challenge critical to mHealth that complex models be trained efficiently. We propose a tractable streamlined empirical Bayes procedure which fits linear mixed effects models in large-data settings. Our procedure takes advantage of sparsity introduced by hierarchical random effects to efficiently learn the posterior distribution of a linear mixed effects model. A key contribution of this work is that we provide explicit updates in order to learn both fixed effects, random effects and hyper-parameter values. We demonstrate the success of this approach in a mobile health (mHealth) reinforcement learning application, a domain in which fast computations are crucial for real time interventions. Not only is our approach computationally efficient, it is also easily implemented with closed form matrix algebraic updates and we show improvements over state of the art approaches both in speed and accuracy of up to 99

READ FULL TEXT VIEW PDF
12/21/2020

Fast Physical Activity Suggestions: Efficient Hyperparameter Learning in Mobile Health

Users can be supported to adopt healthy behaviors, such as regular physi...
10/13/2021

Sparse Linear Mixed Model Selection via Streamlined Variational Bayes

Linear mixed models are a versatile statistical tool to study data by ac...
04/03/2022

Dynamic physical activity recommendation on personalised mobile health information service: A deep reinforcement learning approach

Mobile health (mHealth) information service makes healthcare management ...
09/08/2019

Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity

With the recent evolution of mobile health technologies, health scientis...
02/28/2019

Linear mixed models under endogeneity: modeling sequential treatment effects with application to a mobile health study

Mobile health is a rapidly developing field in which behavioral treatmen...
02/23/2020

Rapidly Personalizing Mobile Health Treatment Policies with Limited Data

In mobile health (mHealth), reinforcement learning algorithms that adapt...
04/10/2018

Personalization of Health Interventions using Cluster-Based Reinforcement Learning

Research has shown that personalization of health interventions can cont...

1 Introduction

This work is motivated by a mobile health study in which an online Thompson Sampling contextual bandit algorithm is used to personalize the delivery of physical activity suggestions [24]. These suggestions are intended to increase near time physical activity. The personalization occurs via two routes, first the user’s current context is used to decide whether to deliver a suggestion and second, random effects, as described in Section 2.1

, are used to learn user and time specific parameters that encode the influence of each of the contextual variables in this decision. The user and time specific parameters are modeled in the reward function (the mean of the reward conditional on context and action). To learn these parameters, information is pooled across both users and time in a dynamic manner, combining Thompson sampling with a Bayesian random effects model for the reward function. In contrast to fully Bayesian methods, empirical Bayes estimates the value of hyper-parameters as a function of observed data.

The contributions of this paper are as follows

  • We develop a Thompson sampling algorithm coupled with explicit streamlined empirical Bayes updates for fitting linear mixed effects models. To estimate hyper-parameters and compute the estimated posterior distribution, our algorithm computes closed form updates which are within the class of two-level sparse least squares problems introduced by [22].

  • This work provides an efficient empirical Bayes algorithm in which the amount of storage and computing at each iteration is , where is the larger dimension of the two grouping mechanisms considered. For example, may represent the number of users and the time points or vice-versa.

  • Our approach reduces the running time over other state-of-art methods, and critically, does not require advanced hardware.

These contributions make our approach practical for online mHealth settings, in which incremental learning algorithm updates are required (e.g., at nightly increments), where swift computations are necessary for subsequent online

policy adaptation. This facilitates incremental, accurate tuning of the variance hyper-parameters in a Thompson-Sampling contextual bandit algorithm.

In section 2.1 we describe the problem setting and review Thompson-Sampling with the use of a Bayesian mixed effects model for the reward [24]. Section 2.2

describes a natural parametric empirical Bayes approach to hyperparameter tuning and Section

2.3 presents our streamlined alternative. A performance assessment and comparison is shown in Section 3.

2 Methods

2.1 Problem Setting

At each time, , on each user,

, a vector of context variables,

, is observed. An action, , is then selected. Here we consider actions, where . Subsequently a real-valued reward, is observed. This continues for times and on users. We assume that the reward at time is generated with a person and time specific mean,

where , and are known features of the context and action . are unknown parameters; in particular is the vector of th user parameters and is the vector of time parameters. Time corresponds to “time-since-under-treatment” for a user. User-specific parameters, , capture unobserved user variables that influence the reward at all times ; in mobile health unobserved user variables may include level of social support for activity, pre-existing problems or preferences that make activity difficult. The time-specific parameters, capture unobserved “time-since-under-treatment” variables that influence the reward for all users. In mobile health unobserved “time-since-under-treatment” variables might include treatment fatigue, decreasing motivation, etc.

The Thompson Sampling algorithm in [24] uses the following Bayesian mixed effects model for the reward :

(1)

The algorithm is designed with independent Gaussian priors on the unknown parameters:

(2)

The and are called random effects in the statistical literature and the model in (1) and (2) is often referred to as a linear mixed effects model [15]

or a linear mixed model with crossed random effects (e.g.,

[2, 13]). At each time, , Thompson Sampling is used to select the action, , based on the context . That is, we compute the posterior distribution for where

and for context , select treatment

with posterior probability

(3)

where are the posterior mean and variance covariance matrix given in the sub-blocks of (7).

2.1.1 Bayesian Mixed Effects Model Components

We define the following data matrices

and the following parameter vectors

where, as before, and represent the known features of the context and action . The dimensions of matrices, for and , are

(4)

2.1.2 Posterior Updates

The posterior distribution for the immediate treatment effect for user at time is updated and then used to assign treatment in the subsequent time point, . Here, we show the form of the full posterior for . Define

The estimated posterior distribution for the fixed and random reward effects vector is

(5)

where

(6)
(7)

The focus of this work is to enable fast incremental estimation of the variance components . We describe a natural, but computationally challenging approach for estimating these variances in Section 2.2 and our streamlined alternative approach in Section 2.3.

2.2 Parametric Empirical Bayes

At each time, , the empirical Bayes [20, 5] procedure maximizes the marginal likelihood based on all user data up to and including data at time with respect to . The marginal likelihood of Y is

and has the following form

The maximization is commonly done via the Expectation Maximisation (EM) algorithm [7].

2.2.1 EM Method

The expected complete data log likelihood is given by

where the expectation is over the distribution of given in (2). The M-step yields the following closed form iteration estimates for the variance components in :

(8)

where the posterior mean reward components for the fixed and random effects

(9)

and the posterior variance-covariance reward components for the fixed and random effects

(10)

are computed in the E-step using equation (7). Note that (9) are the sub vectors in the the posterior mean and (10) are sub-block entries in the posterior variance covariance matrix . The naïve EM algorithm is given in Algortihm 1.

Initialize: Set repeat E-step: Compute and via equation (7) to obtain necessary mean and variance-covariance components needed for the M-step. M-step: Compute variance components in via equation (8). until log-likelihood converges
Algorithm 1 Naïve EM Algorithm for empirical Bayes estimates of the variance components in the Bayesian mixed effects model as given in (1) and (2).

The challenge in Algorithm 1 is computation of the posterior mean vector and posterior variance-covariance matrix at each iteration. We discuss the details of these challenges in Section 2.2.2.

2.2.2 Computational Challenges

At each iteration in Algorithm 1, computation of requires solving the sparse matrix linear system

(11)

where the LHS of (11) has sparse structure imposed by the random effects as exemplified in Figure 1.

Figure 1: Sparsity present in under the Bayesian mixed effects model as represented in (1) and (2). Here, . Non-zero entries are represented by a blue square and zero entries are represented by a light-yellow square.

This matrix has dimension

It is often the case that the number of fixed effects parameters , the number of random effects parameters per user and the number of random effects parameters per time are of moderate size. Consequently, it is well known that naïve computation of is , that is, cubic dependence on the number of random effects group sizes and .

To address this computational problem, we employ the fact that in this setting the matrix requiring inversion is sufficiently block-diagonal that its sparsity can be exploited. In addition, the closed form updates of the variance components in (8) require computation of only the sub-blocks of that correspond to the non-zero sub-blocks of (as illustrated in Figure 1) and not the entire matrix.

2.3 Streamlined Empirical Bayes

Streamlined updating of and each of the sub-blocks of required for the E-step in Algorithm 1 can be embedded within the class of two-level sparse matrix problems as defined in [21] and is encapsulated in Result 1. Result 1 is analogous and mathematically identical to Result 2 in [18]. The difference being that the authors in [18]

do not apply their methodologies to the mobile health setting and use full variational Bayesian inference for fitting as opposed to our use of empirical Bayes.

Result 1 (Analogous and mathematically identical to Result 2 in [18]). The posterior updates under the Bayesian mixed effects model as given in (1) and (2) for and each of the sub-blocks of are expressible as a two-level sparse matrix least squares problem of the form where and the non-zero sub-blocks of B, according to the notation in the appendix, are, for ,

with each of these matrices having rows. The solutions are

and

, where the , , , and notation is given in the appendix.

The streamlined equivalent of Algorithm 1 is given in Algorithm 2. Algorihm 2 makes use of the SolveTwoLevelSparseLeastSquares algorithm which was first presented in [21] but also provided in the appendix of this article.

Initialize: Set repeat E-step: Compute components of and sub-blocks of : where returns , , , and . M-step: Compute variance components in via equation (8). until log-likelihood converges
Algorithm 2 Streamlined EM algorithm for empirical Bayes estimates of the variance components in the Bayesian mixed effects model as given in (1) and (2).

The computing time and storage for the streamlined updating of and each of the sub-blocks of required for the E-step in Algorithm 2 becomes . For moderate sized , this reduces to . If both and are large, one may resort to coupling the streamlining present in this article with an approximation of the posterior so as to further reduce computation time and storage. However, care needs to be taken with the choice of approximation so as to incur as little degradation in accuracy as possible. As explained in Section 3.1 of [19]

, mean field variational Bayes approximations tend to be very accurate for Gaussian response models. However, such high accuracy does not manifest in general. Ignoring important posterior dependencies via mean field restrictions often lead to credible intervals being too small (e.g.

[25]).

3 Performance Assessment and Comparison

In order to evaluate the speed achieved by our streamlined empirical Bayes algorithm, we compare the timing and accuracy of our method against state of the art software, GPyTorch [11], which is a highly efficient implementation of Gaussian Process Regression modeling, with GPU acceleration. Note that the Gaussian linear mixed effects model as given in (1) and (2) is equivalent to a Gaussian Process regression model with a structured kernel matrix induced by the use of random effects. For ease of notation, in the following, we use sEB to refer to the streamlined empirical Bayes algorithm, GPyT-CPU to refer to empirical Bayes fitting using GPyTorch with CPU and GPyT-GPU to refer to empirical Bayes fitting using GPyTorch with GPU. The sEB and GPyT-CPU computations were conducted using an Intel Xeon CPU E5-2683. The GPyT-GPU computations were conducted using an Nvidea Tesla V100-PCIE-32GB.

3.1 Batch Speed Assessment

We obtained timing results for simulated batch data according to versions of the Bayesian mixed effects model as given in (1) and (2

) and for which both the fixed effects and random effects had dimension two, corresponding to random intercepts and slopes for a single continuous predictor which was generated from the Uniform distribution on the unit interval. The true parameter values were set to

and, during the studies, the values were set to , and the number of datapoints specific to each user and time period, , was set to 5. Four separate studies were run with differing values for the number of users . The total number of data points is then , that is, . We then simulated 50 replications of the data for each and recorded the computational times for variance estimation from GPyT-CPU, GPyT-GPU and sEB. Algorithm 2 was implemented in Fortran 77. The EM iterations were stopped once the absolute difference between successive expected complete data log-likelihood values fell below . The stopping criterion was the same for gPyTorch and additionally the maximum number of iterations was set to 15.

Table 1

shows the mean and standard deviation of elapsed computing times in seconds for estimation of the variance components using

sEB, GPyT-CPU and GPyT-GPU. Figure 2 shows the absolute error values for each variance components estimated using sEB, GPyT-CPU and GPyT-GPU summarized as a boxplot.

Datapoints sEB GPyT-CPU GPyT-GPU
1,500 0.7 (0.10) 5.8 (0.14) 1.5 (0.16)
7,500 1.7 (0.15) 163.8 (1.81) 1.3 (0.04)
15,000 2.8 (0.21) 736.2 (38.36) 5.2 (0.03)
1,500,000 322.1 (24.82) NA (NA) NA (NA)
Table 1: Mean (standard deviation) of elapsed computing times in seconds for estimation of the variance components in the Bayesian mixed effects model as represented in (1) and (2) using sEB via Algorithm 2, GPyT-CPU and GPyT-GPU for comparison.
Figure 2: Summary of the simulation study in Section 3.1 under the Bayesian mixed effects model as represented in (1) and (2

) where the absolute error values for each variance component estimated using one of three empirical Bayes methods summarized as a boxplot.

3.2 Online Thompson Sampling Contextual Bandit mHealth Simulation Study

Next, we evaluate our approach in a simulated mHealth study designed to capture many of the real-world difficulties of mHealth clinical trials. Users in this simulated study are sent interventions multiple times each day according to Algorithm 3. Each intervention represents a message promoting healthy behavior.

Initialize: for for Receive context features for user and time Obtain posterior using Result 1 Calculate randomization probability in (3) Sample treatment Observe reward if Update hyper-parameters with Algorithm 2 Update posterior
Algorithm 3 Thompson-Sampling algorithm with linear mixed effects model as given in (1) and (2) for the reward.

In this setting there are 32 users and each user is in the study for 10 weeks. Users join the study in a staggered fashion, such that each week new users might be joining or leaving the study. Each day in the study users can receive up to 5 mHealth interventions.

The Bayesian mixed effects model as represented in (1) and (2) offers several advantages in this setting. In mHealth not only can users differ in the context that they experience, but in their response to treatment under the same context. The user level random effects allow learning of personalized policies for each user, overcoming the flaws of methods which treat individuals as the same. Additionally, there can be non-stationarity in how users respond to treatment, for example, they might be more responsive in the beginning of a study than in the end. By modeling time level random effects , each person’s policy can be sensitive to a dynamic environment, and is informed by how other users’ responsivity has been influenced by time.

We evaluate our approach in a setting which demands personalization within a dynamic environment. Users are modeled to be heterogenous in their response to treatment. Additionally, their responsivity declines with time.

In Figure 3 we show the ability of our streamlined algorithm to minimize regret where real data is used to inform the simulation. We also compare our approach to GPyT-CPU and GPyT-GPU. For all users we show the average performance for their th week in the study. For example, we first show the average regret across all users in their first week of the study, however this will not be the same calendar time week, as users join in a staggered manner. The average total time (standard deviation) for estimation of the variance components was 757.1 (76.48) using GPyT-CPU, 7.5 (0.27) using GPyT-GPU and 7.3 (0.16) using sEB.

Figure 3: Regret averaged across all users for each week in the simulated mHealth trial associated with approaches sEB, GPyT-CPU and GPyT-GPU.

4 Related Work

The fundamental streamlined empirical Bayes Algorithm 2 makes use of linear system solutions and sub-block matrix inverses for two-level sparse matrix problems ([22]). Our result for streamlined posterior computation in Section 2.3 is analogous and mathematically identical to Result 2 in [18] who instead focus on streamlined mean field variational Bayes approximate inference for linear mixed models with crossed random effects. In the present article, we make use of a similar result for empirical Bayes posterior inference for use in the mobile health setting. Our empirical Bayes algorithm allows streamlined estimation of the variance components within an online Thompson sampling contextual bandit algorithm.

Other approaches include using mixed model software packages for high-performance statistical computation. For example: (i) BLME provides a posteriori estimation for linear and generalized linear mixed effects models in a Bayesian setting [10]; and (ii) Stan [4]

provides full Bayesian statistical inference with MCMC sampling. Even though

BLME offers streamlined algorithms for obtaining the predictions of fixed and random effects in linear mixed models, the sub-blocks of the covariance matrices of the posterior required for construction of the EM method in the streamlined empirical Bayes algorithm are not provided by such software. On the other hand, Stan does offer support for computation of these sub-blocks, but is well known to suffer computationally in large data settings.

As we point to in Section 3, the Gaussian linear mixed effects model used in the Thompson-Sampling algorithm is equivalent to a Gaussian Process regression model with a structured kernel matrix induced by the use of random effects. Gaussian process models have been used for multi-armed bandits ([6, 12, 1, 23, 8, 26, 9, 3]), and for contextual bandits ([16, 14]). To address the challenges posed by mHealth, [24] illustrate the benefits of using mixed effects Gaussian Process models in the context of reinforcement learning. Computational challenges in the Gaussian Process regression setting is a known and common problem which has led to contributions from the computer science and machine learning communities. For instance, to address challenges posed for Gaussian Process regression suffering from cubic complexity to data size, a variety of scalable GPs have been presented, including the approach we compare to earlier: GPyTorch [11]. A review on state-of-the-art scalable GPs involving two main categories: global approximations which distillate the entire data and local approximations which divide the data for subspace learning can be found in [17]. The sparsity imposed by the use of random effects, however, afford us accurate inference in the cases considered in this article, and thus do not suffer from the potential loss of accuracy that could result from the approximate methods, such as those discussed in [17].

5 Discussion

We compare three empirical Bayes approaches, sEB, GPyT-CPU and GPyT-GPU for use within a batch simulation setting and an online contextual bandit mHealth simulation study.

Within the batch simulation setting in Section 3.1, inspection of the computational running times in Table 1 shows that sEB achieves the lowest average running time across all simulations and data set sizes, compared to GPyT-CPU and GPyT-GPU. In the starkest case this results in a 98.96% reduction in running time. Even in the more modest comparison, sEB timing is similar to that of GPyT-GPU but doesn’t require the sophisticated hardware that GPyT-GPU does. The improvement between GPyT-CPU and GPyT-GPU is impressive and GPyT-GPU is clearly designed to excel in a resource rich environment. However, in a clinical trial it is unclear if such an environment will be available. In contrast, sEB does not require advanced hardware to achieve state-of-the-art performance.

Our method makes use of computing only the necessary sub-blocks of the posterior variance-covariance matrix at each time step, as opposed to computation of the entire matrix. We were unable to template an equivalent streamlined computation within GPyT-CPU and GPyT-GPU. Consequently, we were unable to run GPyT-CPU and GPyT-GPU on the largest dataset as this involves constructing a matrix of dimension , even before the optimization procedure is called. Templating the variance-covariance matrix such that it did not require this matrix as input might have allowed us to run GPyT-GPU on the largest dataset. An advantage of our method is that it can efficiently exploit the structure inherent within the variance-covariance matrix to manage large datasets.

The median reduction in error from GPyT-CPU to sEB is 28.3% and from GPyT-GPU to sEB is 22.2%. From Figure 2, we see that sEB achieves lower absolute error on average than either GPyT-CPU or GPyT-GPU. The difference is more pronounced between sEB and GPyT-GPU. sEB achieves the lowest average absolute error, at the fastest rate, on the simplest machine.

References

  • [1] S. Ambikasaran, D. Foreman-Mackey, L. Greengard, D. W. Hogg, and M. O’Neil (2015) Fast direct methods for gaussian processes. IEEE transactions on pattern analysis and machine intelligence 38 (2), pp. 252–265. Cited by: §4.
  • [2] R. H. Baayen, D. J. Davidson, and D. M. Bates (2008) Mixed-effects modeling with crossed random effects for subjects and items. Journal of memory and language 59 (4), pp. 390–412. Cited by: §2.1.
  • [3] I. Bogunovic, J. Scarlett, and V. Cevher (2016) Time-varying gaussian process bandit optimization. In Artificial Intelligence and Statistics, pp. 314–323. Cited by: §4.
  • [4] B. Carpenter, A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell (2017) Stan: a probabilistic programming language. Journal of statistical software 76 (1). Cited by: §4.
  • [5] G. Casella (1985) An introduction to empirical bayes data analysis. The American Statistician 39 (2), pp. 83–87. Cited by: §2.2.
  • [6] S. R. Chowdhury and A. Gopalan (2017) On kernelized multi-armed bandits. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 844–853. Cited by: §4.
  • [7] A. P. Dempster, N. M. Laird, and D. B. Rubin (1977) Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39 (1), pp. 1–22. Cited by: §2.2.
  • [8] T. Desautels, A. Krause, and J. W. Burdick (2014) Parallelizing exploration-exploitation tradeoffs in gaussian process bandit optimization. The Journal of Machine Learning Research 15 (1), pp. 3873–3923. Cited by: §4.
  • [9] J. Djolonga, A. Krause, and V. Cevher (2013) High-dimensional gaussian process bandits. In Advances in Neural Information Processing Systems, pp. 1025–1033. Cited by: §4.
  • [10] V. Dorie (2015) Blme: bayesian linear mixed-effects models. URL: https://CRAN. R-project. org/package= blme R package version, pp. 1–0. Cited by: §4.
  • [11] J. Gardner, G. Pleiss, K. Q. Weinberger, D. Bindel, and A. G. Wilson (2018) Gpytorch: blackbox matrix-matrix gaussian process inference with gpu acceleration. In Advances in Neural Information Processing Systems, Cited by: §3, §4.
  • [12] M. D. Hoffman, E. Brochu, and N. de Freitas (2011) Portfolio allocation for bayesian optimization.. In UAI, pp. 327–336. Cited by: §4.
  • [13] M. Jeon, F. Rijmen, and S. Rabe-Hesketh (2017) A variational maximization–maximization algorithm for generalized linear mixed models with crossed random effects. psychometrika 82 (3), pp. 693–716. Cited by: §2.1.
  • [14] A. Krause and C. S. Ong (2011) Contextual gaussian process bandit optimization. In Advances in neural information processing systems, pp. 2447–2455. Cited by: §4.
  • [15] N. M. Laird, J. H. Ware, et al. (1982) Random-effects models for longitudinal data. Biometrics 38 (4), pp. 963–974. Cited by: §2.1.
  • [16] L. Li, W. Chu, J. Langford, and R. E. Schapire (2010) A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pp. 661–670. Cited by: §4.
  • [17] H. Liu, Y. Ong, X. Shen, and J. Cai (2018) When gaussian process meets big data: a review of scalable gps. arXiv preprint arXiv:1807.01065. Cited by: §4.
  • [18] M. Menictas, G. Di Credico, and M. P. Wand (2019) Streamlined variational inference for linear mixed models with crossed random effects. arXiv preprint arXiv:1910.01799. Cited by: §2.3, §2.3, §4.
  • [19] M. Menictas and M. P. Wand (2015)

    Variational inference for heteroscedastic semiparametric regression

    .
    Australian & New Zealand Journal of Statistics 57 (1), pp. 119–138. Cited by: §2.3.
  • [20] C. N. Morris (1983) Parametric empirical bayes inference: theory and applications. Journal of the American statistical Association 78 (381), pp. 47–55. Cited by: §2.2.
  • [21] T. H. Nolan, M. Menictas, and M. P. Wand (2019) Streamlined computing for variational inference with higher level random effects. arXiv preprint arXiv:1903.06616. Cited by: §2.3, §2.3.
  • [22] T. H. Nolan and M. P. Wand (2019) Solutions to sparse multilevel matrix problems. arXiv preprint arXiv:1903.03089. Cited by: The SolveTwoLevelSparseLeastSquares Algorithm, 1st item, §4.
  • [23] N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger (2009) Gaussian process optimization in the bandit setting: no regret and experimental design. arXiv preprint arXiv:0912.3995. Cited by: §4.
  • [24] S. Tomkins, P. Liao, S. Yeung, P. Klasnja, and S. Murphy (2019) Intelligent pooling in thompson sampling for rapid personalization in mobile health. Cited by: §1, §1, §2.1, §4.
  • [25] B. Wang and D. Titterington (2005) Inadequacy of interval estimates corresponding to variational bayesian approximations.. In AISTATS, Cited by: §2.3.
  • [26] Z. Wang, B. Zhou, and S. Jegelka (2016) Optimization as estimation with gaussian processes in bandit settings. In Artificial Intelligence and Statistics, pp. 1022–1031. Cited by: §4.

Appendix A.

The SolveTwoLevelSparseLeastSquares Algorithm

The SolveTwoLevelSparseLeastSquares algorithm is listed in [22] and based on Theorem 2 of [22]. Given its centrality to Algorithm 2 we list it again here. The algorithm solves a sparse version of the the least squares problem:

which has solution where and where B and have the following structure:

(12)

The sub-vectors of x and the sub-matrices of A corresponding to its non-zero blocks of are labelled as follows:

(13)

and

(14)

with denoting sub-blocks that are not of interest. The SolveTwoLevelSparseLeastSquares algorithm is given in Algorithm 4.

Inputs:    ;    For : Decompose such that and is upper-triangular.    ;       ;      ;        ;     Decompose such that and R is upper-triangular.   ;      ;    For :   ;    Output:
Algorithm 4 SolveTwoLevelSparseLeastSquares for solving the two-level sparse matrix least squares problem: minimise in x and sub-blocks of corresponding to the non-zero sub-blocks of . The sub-block notation is given by (12) and (14).