Bayesian Matrix Completion Approach to Causal Inference with Panel Data

11/04/2019
by   Masahiro Tanaka, et al.
0

This study proposes a new Bayesian approach to infer average treatment effect. The approach treats counterfactual untreated outcomes as missing observations and infers them by completing a matrix composed of realized and potential untreated outcomes using a data augmentation technique. We also develop a tailored prior that helps in the identification of parameters and induces the matrix of the untreated outcomes to be approximately low rank. While the proposed approach is similar to synthetic control methods and other relevant methods, it has several notable advantages. Unlike synthetic control methods, the proposed approach does not require stringent assumptions. Whereas synthetic control methods do not have a statistically grounded method to quantify uncertainty about inference, the proposed approach can estimate credible sets in a straightforward and consistent manner. Our proposal approach has a better finite sample performance than the existing Bayesian and non-Bayesian approaches, as we show through a series of simulation studies.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/27/2017

Matrix Completion Methods for Causal Panel Data Models

In this paper we develop new methods for estimating causal effects in se...
12/25/2017

An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls

This paper introduces new inference methods for counterfactual and synth...
09/30/2021

Causal Matrix Completion

Matrix completion is the study of recovering an underlying matrix from a...
10/14/2019

The Bayesian Synthetic Control: Improved Counterfactual Estimation in the Social Sciences through Probabilistic Modeling

Social scientists often study how a policy reform impacted a single targ...
12/10/2021

On the Assumptions of Synthetic Control Methods

Synthetic control (SC) methods have been widely applied to estimate the ...
05/28/2020

Synthetic control method with convex hull restrictions: A Bayesian maximum a posteriori approach

Synthetic control methods have gained popularity among causal studies wi...
11/03/2021

Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data

Estimating personalized treatment effects from high-dimensional observat...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Program/policy evaluation and comparative case study using observational data are pervasive in social and natural sciences as well as in practice in government and business. In particular, causal inference is an integral part of social sciences, where randomized experiments are usually infeasible. For instance, although Abadie et al. (2015) analyzed the economic cost of the German reunification in 1990, we cannot repeat such a political event many times in a controlled fashion.

The primary interest of this study is inference of average treatment effect (ATE). Suppose we have a panel data with units and time periods. An outcome of unit at period is denoted by , where when it is exposed to treatment and otherwise. Let and be sets of indices for treated and untreated observations, respectively. Then, ATE is represented as

where denotes the cardinality of a set . Estimation of ATE amounts to estimation of counterfactual untreated outcomes , , or “potential outcome” in terms of Neyman–Rubin’s causal model (Holland, 1986). Inference of ATE poses a serious challenge to statisticians, and numerous approaches have been proposed: the ifference-in-difference estimator, regression discontinuity design, matching-based methods, etc.111See, e.g., Imbens and Rubin (2015); Athey and Imbens (2017); Abadie and Cattaneo (2018).

In the study, we propose a new Bayesian approach for inferring ATE with panel data. We treat counterfactual untreated outcomes as missing observations and infer them via data augmentation (Tanner and Wong, 1987)

. In other words, we transform a statistical inference of ATE into a matrix completion problem, which is an extensively studied issue in the machine learning community (e.g.,

Srebro et al., 2005; Keshavan et al., 2010). To facilitate the purpose, we develop a tailored prior that ensures identification of parameters and induces the model to be lower rank, adapting a cumulative shrinkage process prior (Legramanti et al., 2019).

Our Bayesian approach has two notable advantages. First, it can provide credible intervals in a consistent manner, while the existing non-Bayesian approaches do not have a statistically grounded method to quantify uncertainty about inference. As hypothesis testing is an essential part of scientific research, this advantage would be a strong reason to resort to Bayesian methods. Second, it has better finite sample performance than the existing approaches. Through a series of simulation studies, we show that our proposal consistently outperforms the existing approaches in terms of the precision of the prediction of potential outcomes.

Two strands of literature are particularly relevant to the study. First, the proposed approach is related to a class of synthetic control methods (SCMs) (e.g., Abadie and Gardeazabal, 2003; Abadie et al., 2010; Xu, 2017; Doudchenko and Imbens, 2017).222See Abadie (forthcoming) for a recent overview of the literature on SCMs. This class of methods is aimed at obtaining “synthetic” observations of untreated outcomes as weighted sums of the outcomes of the control units. Despite its increasing popularity, the original SCM (Abadie et al., 2010) has two notable shortcomings. First, it imposes strong assumption that the weights of synthetic observations are non-negative and sum-to-one. This implies that the treated unit falls in the convex hull of the control units and that synthetic observations are positively correlated to the control units, which is not plausible in many real situations. Second, it does not have a an effective method for assessing uncertainty on obtained estimates. Indeed, Abadie et al. (2010) propose to conduct a series of placebo studies, but it is merely a sort of robustness check. Doudchenko and Imbens (2017) propose a variant of SCM that is free from the three assumptions above, while Kim et al. (2019) develop a Bayesian version of their approach that can deliver credible sets.

Second, an approach developed by Athey et al. (2018) is particularly related to our proposal. They also treat counterfactual untreated outcomes as missing data and estimate them via matrix completion with the nuclear norm penalty (Mazumder et al., 2010). Athey et al.

’s (2018) non-Bayesian approach does not have an estimator of confidence intervals.

The remainder of the study is structured as follows. In Section 2, we introduce a new Bayesian approach to causal inference with panel data and compare it with the existing alternatives. In Section 3, we illustrate the proposed approach by applying it to simulated and real data. We conduct a simulation study and show that our proposal is competitive with the existing approaches in terms of the precision of the predictions of counterfactual untreated outcomes. Then, the proposal is applied to the evaluation of the tobacco control program implemented in California 1988. The last section concludes the study.

2 Proposed Approach

2.1 Framework

An individual outcome is modeled as follows: for ; ,

where is a unit- and time-specific intercept, is an

-dimensional vector of covariates which may contain unit- and/or time-specific effects,

is the corresponding coefficient vector, and is an error term. We focus on cases having a single type of treatment. Let and be sets of indices for treated and untreated observations, respectively. It is assumed that there is no interference between units. The covariates are completely observed for all the units and periods. Let be a -by- matrix composed of (actually) untreated outcomes, , , and counterfactual untreated outcomes that are actually treated, , . We treat the latter elements as missing observations and infer them via matrix completion. Let with . Then the model can be posed in a matrix representation as

where .

A matrix of untreated outcomes can be structured flexibly. For instance, when only the th unit is affected by the treatment for the last periods as in the standard SCM, is specified as

where denotes a missing entry. It is possible to allow more than one treated unit:

Furthermore, it is possible to handle a more complex structure:

To allow serial correlations in the error terms, their distribution is specified as

where

is a variance parameter and

is a correlation matrix whose generic element is specified as a function of an autocorrelation parameter .

We define sets of observed and unobserved untreated outcomes as and , respectively. The elements in the latter set are treated as unknown parameters. Then, the likelihood is represented as

2.2 Bayesian inference

We use a prior that induces to be low rank. We decompose into two parts as

Setting induces to be rank deficient a priori, although one can use an arbitrarily large . As the decomposition is not unique, some restriction is necessary for identification. In the study, we restrict to be unitary, i.e., , and assign a uniform Haar prior to , , where denotes a Stiefel manifold with dimensions of and

denotes the indicator function. Let the singular value decomposition of

be represented as . Then, is interpreted as the right orthonormal matrix, while is the product of the left orthonormal matrix

and the diagonal matrix having the eigenvalues in its principal diagonal

.

For , we adapt a cumulative shrinkage process prior (Legramanti et al., 2019) to our context. A prior of is specified by the following hierarchy:

where

is an inverse gamma distribution with shape

and rate , and

is a beta distribution (of the first kind) with scale parameters

and . The prior of

is a scale mixture of normal distributions. The prior distribution of the variances

belongs to a class of spike-and-slab priors (e.g., Ishwaran et al., 2005), in that the prior consists of spike and slab parts. Although can be zero, we set it to a small non-zero value for the ease of posterior simulation (Ishwaran et al., 2005; Legramanti et al., 2019). The prior distribution of the weights exploits the stick-breaking construction of the Dirichlet process (Ishwaran and James, 2001). As grows, the distribution of concentrates around since almost surely.

Although we do not consider it in the study, Bhattacharya and Dunson (2011) consider a prior similar to the cumulative shrinkage process prior, called the multiplicative gamma process prior. This prior cannot control the rate of shrinkage and the prior for the active elements simultaneously. Thus, it readily overshrinks the model. See also Durante (2017) and Legramanti et al. (2019) for further discussion.

There are many fully Bayesian methods for estimating low-rank matrices (e.g., Salakhutdinov and Mnih, 2008; Paisley and Carin, 2009; Ding et al., 2011; Babacan et al., 2012; Valera and Ghahramani, 2014; Fazayeli et al., 2014; Tang et al., 2019). Compared with them, the above-mentioned prior construction method has two distinct features. First, the prior we use imposes the unitary restriction on , while the existing Bayesian approaches do not. The only exception is Tang et al. (2019) who decompose a matrix into three parts as in a standard singular value decomposition, that is, a diagonal matrix and two unitary matrices. Second, while the other Bayesian approaches to low-rank matrix estimation do not care about ordering of the eigenvalues, our prior specification progressively shrinks the prior variances of as increases. Although it does not impose an exactly monotone restriction, it can aid in the parameter identification.

In turn, for the remaining parameters, we employ standard priors. For , we use an independent normal prior with mean zero and precision , .

is assumed to be distributed according to a uniform distribution with support of

, . A prior distribution of is specified by an inverse gamma distribution, .

For posterior simulation, we develop a Markov chain Monte Carlo sampler that is a hybrid of three algorithms. For

,. we use a random walk Metropolis–Hastings algorithm, while for we employ the geodesic Monte Carlo on embedded manifolds (Byrne and Girolami, 2013). As the conditionals of the remaining parameters are standard, they are updated via Gibbs steps. See the Appendix for the computational details.

2.3 Comparison to existing approaches

We compare our proposal to existing approaches. For simplicity, in this section, we assume that there is no covariate and that only the th unit is exposed to treatment during the last periods. Let denote the length of the pre-treatment periods (thus ). Then, the set of indices of the treated observations is

Of the statistical methods for estimating ATE, a class of SCMs (Abadie and Gardeazabal, 2003; Abadie et al., 2010) is particularly related to the proposed approach. In SCMs, “synthetic” untreated outcomes are estimated as weighted sums of the untreated units. The original SCM (Abadie et al., 2010) specifies an estimation problem as

(1)
(2)

This approach imposes three strong assumptions: no-intercept, non-negativity of the weights, and sum-to-one constraints on the weights. However, none of these assumptions seem to be plausible in many real cases. The proposed approach is free from such restrictions.

Doudchenko and Imbens (2017) propose an approach that does not impose any of these restriction on the weights to be estimated. Instead, they propose estimating synthetic observations by solving the following optimization problem, which corresponds to the elastic net estimator (Zou and Hastie, 2005):

(3)

where are tuning parameters. All the existing non-Bayesian approaches, including Abadie et al. (2010) and Doudchenko and Imbens (2017), share the same caveat: there is no statistically grounded method to estimate confidence intervals. Abadie et al. (2010) conduct a series of placebo studies: the original SCM is iteratively applied to every other state in replacing with California. Indeed, such an exercise can provide some notion of uncertainty about estimation, but it is a way of a robustness check, not a serious statistical method for hypothesis testing. This caveat is critical because hypothesis testing is an essential part of statistical inference in scientific research.

Kim et al. (2019) develop a Bayesian version of Doudchenko and Imbens’s (2017) approach. Instead of the elastic net penalty, they propose to use shrinkage priors. For instance, when the horseshoe prior (Carvalho et al., 2010) is employed, the probabilistic representation of a model considered in Kim et al. (2019) is specified as follows:

(4)
(5)
(6)
(7)
(8)

where and are prior distributions of and , respectively and

denotes a standard half-Cauchy distribution with a probability density function (PDF).

As with the Bayesian inference, their approach can obtain credible intervals consistently. Our fully Bayesian approach also enjoys the same advantage.

Athey et al. (2018) develop a novel approach to causal inference with panel data. In a sense, our proposal can be seen as a Bayesian counterpart of their approach. As in Section 2.2, let denote a matrix composed of untreated outcomes, and observations exposed to treatment are handled as missing. Then, an estimation problem can be posed as a kind of matrix completion problem:333See, e.g., Candes and Plan (2010) and Shi et al. (2017) for a survey.

(9)

where is a tuning parameter and denotes the Frobenius norm. denotes the nuclear norm of a matrix defined as

where is the th eigenvalue of . When denotes a set of indices corresponding to observed entries, the operator is defined for a matrix as

Athey et al. (2018) call this estimator a matrix completion with a nuclear norm minimization estimator (MC-NNM). The penalty term induces to be lower rank as the nuclear norm is a convex relaxation to the rank constraint (Fazel et al., 2001). The prior of used in the proposed approach plays a similar role to the nuclear norm penalty. This family of approaches involving matrix completion has two notable advantages over a family of SCMs. First, a treatment is allowed to occur arbitrarily, not consecutively. Second, while SCMs only use the pre-treatment observations for estimation, this family exploits all the observations including the treated periods (except treated outcomes). Therefore, this class is likely to be statistically more efficient than SCMs, as shown in the simulation study below. In common with other non-Bayesian approaches, Athey et al.’s (2018) approach only provides a point estimation, while our proposal readily estimates credible intervals.

Lastly, Brodersen et al. (2015) also develop a Bayesian approach to estimate ATE that uses structural time series models, more specifically, state-space models. Their approach relies on the fit of a state-space model for the single time series of a treated unit. Their and our approaches are similar in that they tend to obtain counterfactual outcomes using Bayesian methods. On the other hand, they are targeting different types of data: Brodersen et al.’s (2015) approach is better suited for relatively long time series, while the proposed approach is designed for typical panel data with rather short time periods.

3 Application

3.1 Simulated data

We conduct a simulation study to demonstrate the proposed approach. Only the th unit is treated, and it is exposed to the treatment during the last periods of the total time periods . Let denote the number of untreated periods; thus, . The realized treated outcomes are specified by the sums of hypothetical untreated outcomes and the average treatment effect :

is generated from a factor model: for ; ,

We do not include any covariate and assume no autocorrelation in the error term , fixing to zero (not estimated). We consider two types of data-generating processes (DGPs), named DGP-independent and DGP-dependent. They only differ in how is generated. In the DGP-independent case, the “latent factors” are independently distributed according to a uniform distribution specified as

In the DGP-dependent case, the row of motion of is specified by a vector autoregressive process:

Entries in the “factor loading” are generated independently from a standard normal distribution:

We compare the proposed approach to four alternatives. The first is the original synthetic control method described in Abadie et al. (2010) (SCM-Plain), where an estimation problem is specified as (1)–(2). The second is a method proposed by Doudchenko and Imbens (2017) (SCM-EN) (3), and the third is a matrix completion with a nuclear norm minimization estimator (Athey et al., 2018) (MC-NNM) (9). In SCM-EN and MC-NNM, the tuning parameters and are chosen using five-fold cross-validation. The fourth is a Bayesian SCM developed by Kim et al. (2019). According to their simulation study, specifications with the horseshoe (Carvalho et al., 2010) and spike-and-slab (Ishwaran et al., 2005) priors outperform other alternatives. While the performances of these two priors are fairly comparable, posterior simulation using the horseshoe prior is faster. For these reasons, we consider the horseshoe prior for Kim et al.’s (2019) approach and refer to this specific approach as BSCM-HS in what follows. The probabilistic representation of BSCM-HS is given by (4)–(8). While Kim et al. (2019) use non-informative priors for and , we employ vague, proper priors. For , we use a normal prior with large variance, . We assign the same prior to as the proposed approach, . We choose , inducing the prior of to be fairly non-informative. We sample using the elliptical slice sampler of Hahn et al. (2019) and the remaining parameters using Gibbs sampler, as in Makalic and Schmidt (2016). We simulate 10,000 warmup draws and estimate the posterior densities using the subsequent 40,000 draws.

For the proposed approach, following Legramanti et al. (2019)

, the prefixed hyperparameters for the cumulative shrinkage process prior are chosen as

and . While Legramanti et al. (2019) use , we set it to a smaller value, . For a fair comparison to BSCM-HS, we use the same hyperparameters for the prior of , . The maximum rank of is set to , where denotes the ceiling function. We obtain 40,000 draws after discarding the initial 10,000 draws.

We consider four types of sample size, namely, combinations of and , while the length of the treated periods is fixed to . A total of 500 experiments are conducted for each case. As noted earlier, an estimation of ATE amounts to an estimation of potential outcomes. Therefore, we evaluate the alternatives based on the precision of estimates of , , measured by the mean squared errors (MSE) and the absolute errors (MAE). For the Bayesian approaches, we compute posterior means of predicted potential outcomes. We also report computation times measured in seconds (Time).444We wrote all the programs in Matlab R2019a (64 bit) and executed them on an Ubuntu Desktop 18.04 LTS (64 bit), running on Intel Xeon E5-2607 v3 processors (2.6 GHz). For each experiment, MSE and MAE are normalized by the corresponding values for SCM-Plain, (i.e., the smaller, the better than SCM-Plain, and vice versa). Medians of the performance measures over the experiments are calculated.

Table 1 summarizes the results of the simulation study for DGP-independent. In terms of MSE and MAE, irrespective of combinations of , the proposed approach consistently outperforms the others as well as SCN-Plain. The recently proposed alternatives are comparable with SCM-Plain, and are not always better than the original. In terms of computational time in seconds, as expected, Bayesian approaches are slower than non-Bayesian options. Indeed BMC-CSP is computationally heavy, but the computational cost is not prohibitive. In our simulation study, SCM-EN is rather slow to converge, possibly due to non-smooth objective functions.

The simulation results for DGP-dependent are reported in Table 2. In terms of MSE and MAE, SCM-EN, MC-NNM, and BMC-SCP perform better than SCM-Plain, while BSCM-HS tends to be inferior to the others. For this DGP, approaches using matrix completion are likely to be more precise than a family of SCMs. BMC-CSP is slightly better than MC-NNM, being the absolute winner.

3.2 Real data

As an illustration, we apply the proposed approach to the evaluation of California’s tobacco control program implemented in 1988. We replicate Abadie et al.’s (2010) study using the same data, annual state-level panel spans from 1970 to 2000.555The data and the Matlab program were downloaded from Jens Hainmueller’s personal website. (https://web.stanford.edu/~jhain/synthpage.html) The first 19 years are pre-treatment periods. Only California is treated, while the other 38 states are used as control units. We include seven time-invariant covariates: log of gross domestic product per capita, percentage share of 15–24–year–old people in the population, retail price, beer consumption per capita, and cigarette sales per capita in 1980 and 1975. See Abadie et al. (2010) for further details. We choose the hyperparameters as in the simulation study. We draw 50,000 posterior samples and use the last 40,000 samples for posterior analysis.

Figure 1 compares the realized per-capita cigarette sales in California (solid black line), the potential per-capita cigarette sales in “synthetic California” obtained using the original SCM (Abadie et al., 2010) (dashed black line), and the posterior mean estimates of the corresponding couterfactual outcomes obtained by the proposed method (solid red line). The estimates using the proposed method are in line with the estimates using the original SCM. Posterior estimates of 90% and 70% credible sets are also reported (shaded areas). As the credible sets do not include the realized California, we see that the program has statistically significant effects on tobacco consumption in California, confirming the conclusion in the original paper.

4 Concluding Remarks

This study develops a novel Bayesian approach to causal analysis using panel data. We treat a problem of inferring average treatment effect as a matrix completion problem: counterfactual untreated outcomes are inferred using a data augmentation technique. We also propose a prior structured to ensure identification and to obtain a low-rank approximation of the panel data. While existing non-Bayesian methods cannot deliver confidence intervals, the proposed Bayesian approach can estimate credible intervals straightforwardly. Through a series of simulation studies, we show that the proposed approach outperforms the existing ones in terms of the prediction of hypothetical untreated outcomes, that is, the accuracy of estimation of ATE.

Appendix: Computational Details

This appendix describes the computational details of the posterior simulation of the proposed approach. Each sampling block is specified in what follows.

Sampling

For ,

Sampling the shrinkage parameters

where is the PDF of a multivariate normal distribution with mean and covariance evaluated at ,and is the PDF of a multivariate t distribution with location , scale , and degrees of freedom.

where denotes the indicator function.

Sampling

To sample , we employ the geodesic Monte Carlo on embedded manifolds developed by Byrne and Girolami (2013). The algorithm for sampling is summarized in Algorithm 1. Let be the posterior density of conditional on the other parameters. Then the gradient with respect to is derived as

The step size is adaptively tuned so as to keep the average acceptance rate being around a target value . At the th iteration, is updated as follows:

where is the average acceptance rate at the th iteration, and is a tuning parameter. We choose and .

Sampling

where denotes the column-wise vectorization operator and denotes the Kronecker product.

Sampling

Sampling

As the conditional posterior of does not belong to a class of canonical distributions, we update using a random walk Metropolis–Hastings algorithm with adaptation (Haario et al., 2001). At the th iteration, given a current state , a proposal is generated from a normal distribution . The proposal variance is adaptively tuned as . The target acceptance rate is set to 0.25.

Sampling

Without loss of generality, we focus on a case where the th unit is treated during the last periods, . Sampling procedures for other cases are analogously constructed. Given all the parameters, the distribution of the th row of is represented as

where and are the th rows of and , respectively. We define partitionings as

where , , , , and . Then, the conditional posterior distribution of the missing observations is

References

  • (1)
  • Abadie and Cattaneo (2018) Abadie, A. and M. D. Cattaneo (2018) “Econometric Methods for Program Evaluation,” Annual Review of Economics, Vol. 10, 465–503.
  • Abadie et al. (2010) Abadie, A., A. Diamond, and J. Hainmueller (2010) “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program,” Journal of the American Statistical Association, Vol. 105, No. 490, 493–505.
  • Abadie et al. (2015)    (2015) “Comparative Politics and the Synthetic Control Method,” American Journal of Political Science, Vol. 59, No. 2, 495–510.
  • Abadie (forthcoming) Abadie, A. (forthcoming) “Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Apects,” Journal of Economic Literature.
  • Abadie and Gardeazabal (2003) Abadie, A. and J. Gardeazabal (2003) “The Economic Costs of Conflict: A Case Study of the Basque Country,” American economic review, Vol. 93, No. 1, 113–132.
  • Athey et al. (2018) Athey, S., M. Bayati, N. Doudchenko, G. Imbens, and K. Khosravi (2018) “Matrix Completion Methods for Causal Panel Data Models,”Technical report, arXiv:1710.10251.
  • Athey and Imbens (2017) Athey, S. and G. W. Imbens (2017) “The State of Applied Econometrics: Causality and Policy Evaluation,” Journal of Economic Perspectives, Vol. 31, No. 2, 3–32.
  • Babacan et al. (2012) Babacan, S. D., M. Luessi, R. Molina, and A. K. Katsaggelos (2012) “Sparse Bayesian Methods for Low-rank Matrix Estimation,” IEEE Transactions on Signal Processing, Vol. 60, No. 8, 3964–3977.
  • Bhattacharya and Dunson (2011) Bhattacharya, A. and D. B. Dunson (2011) “Sparse Bayesian Infinite Factor Models,” Biometrika, 291–306.
  • Brodersen et al. (2015) Brodersen, K. H., F. Gallusser, J. Koehler, N. Remy, and S. L. Scott (2015) “Inferring Causal Impact Using Bayesian Structural Time-series Models,” Annals of Applied Statistics, Vol. 9, No. 1, 247–274.
  • Byrne and Girolami (2013) Byrne, S. and M. Girolami (2013) “Geodesic Monte Carlo on Embedded Manifolds,” Scandinavian Journal of Statistics, Vol. 40, No. 4, 825–845.
  • Candes and Plan (2010) Candes, E. J. and Y. Plan (2010) “Matrix Completion with Noise,” Proceedings of the IEEE, Vol. 98, No. 6, 925–936.
  • Carvalho et al. (2010) Carvalho, C. M., N. G. Polson, and J. G. Scott (2010) “The Horseshoe Estimator for Sparse Signals,” Biometrika, Vol. 97, No. 2, 465–480.
  • Ding et al. (2011)

    Ding, X., L. He, and L. Carin (2011) “Bayesian Robust Principal Component Analysis,”

    IEEE Transactions on Image Processing, Vol. 20, No. 12, 3419–3430.
  • Doudchenko and Imbens (2017) Doudchenko, N. and G. W. Imbens (2017) “Balancing, Regression, Difference-In-Differences and Synthetic Control Methods: A Synthesis,”Technical report, arXiv:1610.07748.
  • Durante (2017) Durante, D. (2017) “A Note on the Multiplicative Gamma Process,” Statistics and Probability Letters, Vol. 122, 198–204.
  • Fazayeli et al. (2014) Fazayeli, F., A. Banerjee, J. Kattge, F. Schrodt, and P. B. Reich (2014) “Uncertainty Quantified Matrix Completion Using Bayesian Hierarchical Matrix Factorization,” in 2014 13th International Conference on Machine Learning and Applications, 312–317, IEEE.
  • Fazel et al. (2001)

    Fazel, M., H. Hindi, and S. P. Boyd (2001) “A Rank Minimization Heuristic with Application to Minimum Order System Approximation,” in

    Proceedings of the American Control Conference, Vol. 6, 4734–4739, Citeseer.
  • Haario et al. (2001) Haario, H., E. Saksman, and J. Tamminen (2001) “An Adaptive Metropolis Algorithm,” Bernoulli, Vol. 7, No. 2, 223–242.
  • Hahn et al. (2019)

    Hahn, P. R., J. He, and H. F. Lopes (2019) “Efficient Sampling for Gaussian Linear Regression with Arbitrary Priors,”

    Journal of Computational and Graphical Statistics, Vol. 28, No. 1, 142–154.
  • Holland (1986) Holland, P. W. (1986) “Statistics and Causal Inference,” Journal of the American Statistical Association, Vol. 81, No. 396, 945–960.
  • Imbens and Rubin (2015) Imbens, G. W. and D. B. Rubin (2015) Causal Inference in Statistics, Social, and Biomedical Sciences: Cambridge University Press.
  • Ishwaran and James (2001) Ishwaran, H. and L. F. James (2001) “Gibbs Sampling Methods for Stick-breaking Priors,” Journal of the American Statistical Association, Vol. 96, No. 453, 161–173.
  • Ishwaran et al. (2005) Ishwaran, H., J. S. Rao et al. (2005) “Spike and Slab Variable Selection: Frequentist and Bayesian Strategies,” Annals of Statistics, Vol. 33, No. 2, 730–773.
  • Keshavan et al. (2010) Keshavan, R. H., A. Montanari, and S. Oh (2010) “Matrix Completion from Noisy Entries,” Journal of Machine Learning Research, Vol. 11, No. Jul, 2057–2078.
  • Kim et al. (2019) Kim, S., C. Lee, and S. Gupta (2019) “Bayesian Synthetic Control Methods,”Technical report, Cornell University.
  • Legramanti et al. (2019) Legramanti, S., D. Durante, and D. B. Dunson (2019) “Bayesian Cumulative Shrinkage for Infinite Factorizations,”Technical report, arXiv:1902.04349.
  • Makalic and Schmidt (2016) Makalic, E. and D. F. Schmidt (2016) “A Simple Sampler for the Horseshoe Estimator,” IEEE Signal Processing Letters, Vol. 23, No. 1, 179–182.
  • Mazumder et al. (2010) Mazumder, R., T. Hastie, and R. Tibshirani (2010) “Spectral Regularization Algorithms for Learning Large Incomplete Matrices,” Journal of Machine Learning Research, Vol. 11, No. Aug, 2287–2322.
  • Paisley and Carin (2009) Paisley, J. and L. Carin (2009) “Nonparametric Factor Analysis with Beta Process Priors,” in Proceedings of the 26th Annual International Conference on Machine Learning, 777–784, ACM.
  • Salakhutdinov and Mnih (2008) Salakhutdinov, R. and A. Mnih (2008) “Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte Carlo,” in Proceedings of the 25th international Conference on Machine learning, 880–887, ACM.
  • Shi et al. (2017) Shi, J., X. Zheng, and W. Yang (2017) “Survey on Probabilistic Models of Low-rank Matrix Factorizations,” Entropy, Vol. 19, No. 8, p. 424.
  • Srebro et al. (2005) Srebro, N., J. D. M. Rennie, and T. S. Jaakkola (2005) “Maximum-margin Matrix Factorization,” in Advances in Neural Information Processing Systems, 1329–1336.
  • Tang et al. (2019) Tang, K., Z. Su, J. Zhang, L. Cui, W. Jiang, X. Luo, and X. Sun (2019) “Bayesian Rank Penalization,” Neural Networks, Vol. 116, 246–256.
  • Tanner and Wong (1987) Tanner, M. A. and W. H. Wong (1987) “The Calculation of Posterior Distributions by Data Augmentation,” Journal of the American Statistical Association, Vol. 82, No. 398, 528–540.
  • Valera and Ghahramani (2014) Valera, I. and Z. Ghahramani (2014) “General Table Completion Using a Bayesian Nonparametric Model,” in Advances in Neural Information Processing Systems, 981–989.
  • Xu (2017) Xu, Y. (2017) “Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models,” Political Analysis, Vol. 25, No. 1, 57–76.
  • Zou and Hastie (2005) Zou, H. and T. Hastie (2005) “Regularization and Variable Selection Via the Elastic Net,” Journal of the Royal Statistical Society: Series B, Vol. 67, No. 2, 301–320.