1 Introduction
This paper proposes a new approach for estimating causal effects with multivariate longitudinal data that uses tensor completion. Our motivation is to evaluate government mandates adopted during the pandemic, such as travel restrictions, maskwearing directives, and vaccine requirements. Our evaluation follows Rubin, Holland, and Neyman [1, 2] in that we regard estimating the effect of a mandate as tantamount to predicting the number of COVID19 fatalities that would have occurred had a government not adopted it. In other words, we observe the number of fatalities that occur in a jurisdiction while a mandate is in place, and this paper is concerned with predicting the number of fatalities that would have occurred had the mandate not been in place. The main difference between our approach and other COVID19 analyses is that we use multiple outcomes to make these predictions—not just the number of COVID19 fatalities from other jurisdictions and time periods. For example, we also use the number of influenza and traffic fatalities. These additional outcomes are not of direct interest themselves, but they provide valuable information about COVID19 fatalities.
Our main insight is that tensor completion provides a simple and effective framework for combining multiple outcomes across multiple jurisdictions. The proposed algorithm generalizes matrix completion, which has proven to be a versatile tool for making predictions with matrixoriented data, such as longitudinal data [3]. An analysis with the proposed algorithm works like most longitudinal analyses. It leverages the staggered adoption of a mandate to borrow from the experience of "control" jurisdictions, whose governments did not contemporaneously adopt the mandate. However, unlike most longitudinal analyses, it also borrows from the experience of "control" outcomes that would not otherwise be of interest.
Our approach works particularly well when there are few time periods (socalled shortitudinal data), when one jurisdiction adopts a mandate for many time periods (i.e. many observations are missing from a row), or when many jurisdictions adopt a mandate in the same time period (i.e. many observations are missing from a column). These cases are not uncommon—all three occur when evaluating COVID19 mandates—and when they occur, traditional methodologies can fail because a single outcome provides insufficient overlap between jurisdictions and time periods. In order to borrow correctly from the experience of control jurisdictions, univariate analyses rely on strong parametric assumptions that cannot be validated from the data.
By including related outcomes, we increase the overlap between jurisdictions and time periods, reducing the reliance on untestable parametric assumptions. We stress the additional outcomes are not of direct interest themselves. Rather, they are informative of the factors that influence the outcome of interest. For example, many states adopted travel restrictions early in the pandemic to reduce tourism, a factor contributing to the spread of COVID19. A univariate longitudinal analysis would borrow from the experience of control states that did not reduce tourism through the use of travel restrictions. But tourism in the states that elected to restrict travel is not the same as tourism in states that did not, and the pandemic has not been observed over enough seasons to establish the difference. In contrast, a multivariate longitudinal analysis could borrow from the experience of other diseases spread by tourism, like influenza, which, unlike coronavirus, have been observed over many seasons, and thus can help account for the differential impact of tourism across states.
Combining multiple outcomes in this way dates back to the first statistical analysis—the first according to Willcox [4]. In 1634, John Graunt used the number of livergrown and spleen fatalities to study the number of rickets fatalities [5]
. Graunt’s motivation and approach are surprisingly similar to our own, perhaps suggesting it is obvious that predictions can benefit from more data, such as additional outcomes, provided that the data are included judiciously. But multivariate longitudinal data have not become widely utilized, we suspect, because it is often not obvious how to include additional outcomes judiciously. Policymakers cannot manipulate outcomes like the conditions of an experiment, and (to use statistical terminology) outcomes are rarely ancillary like covariates. Rather, outcomes are measured together with error or influenced by similar unobserved factors, and thus they should be treated as jointly random. This is problematic when the joint distribution of the outcomes is unknown or intractable, and a reasonable multivariate model is not easily specified. Such is the case when studying the government mandates adopted during the COVID19 pandemic.
We propose researchers use tensor completion to draw causal inferences with multivariate longitudinal data. We argue the approach does not make strong parametric assumptions about the relationship between jurisdictions and time periods, more accurately measures the effect of the government mandates adopted during the COVID19 pandemic, and therefore should be considered as an additional tool for studying the pandemic. We present our argument in five sections. Section 2 reviews the literature. In particular, we present an empirical paradox in which a metaanalysis of observational studies estimate the benefit of maskwearing directives to be five times higher than a randomized controlled trial (cf. [6] and [7]). We provide evidence suggesting the discrepancy is due to confounding. Section 3 presents our first contribution, the reevaluation of three mandates adopted by U.S. states during the COVID19 pandemic. We demonstrate the proposed method better fits the data than other, popular longitudinal analyses. In addition, the estimated effect is closer to the randomized controlled trial, providing a possible resolution to the empirical paradox introduced in Section 2. Section 4 presents our second contribution, a theoretical justification for using tensor completion to estimate causal effects with multivariate longitudinal data. It also describes our computational approach in detail. Section 5 concludes with a discussion. Our main takeaway is that while the proposed approach can be applied whenever multivariate longitudinal data are available, we believe it is particularly timely as governments increasingly rely on longitudinal data to choose between mandates. Governments should use all available data to choose mandates that best balance public health and individual liberties as new waves of COVID19 variants are expected to follow delta and omicron [8].
2 Background
2.1 Longitudinal data provide a substantial amount of the evidence supporting COVID19 mandates.
Due to their timeliness and ubiquity, longitudinal data provide a substantial amount of the scientific evidence supporting COVID19 mandates. Publications analyzing longitudinal COVID19 data have appeared in highprofile journals across a wide variety of disciplines, from medicine and epidemiology to statistics and economics, see [9, 10, 11, 12, 13, 14, 15, 16, 17]. Perhaps the most widely communicated analysis of COVID19 longitudinal data is the simple comparison of deaths between Eastern and Western countries. This comparison alone informs numerous news stories and policy debates, for example as reported by the Washington Post, the New York Times, and the Wall Street Journal. In this subsection, we discuss why confounding factors limit the relevance of these simple comparisons. In the next subsection, we argue that the more sophisticated longitudinal analyses conducted in the scientific literature share a similar limitation.
Nearly every country promptly publishes the daily number of COVID19 fatalities, and it is predominately from this data that government mandates have become associated with low fatality rates; countries early to adopt mandates after the virus was discovered during the winter of 2019 had significantly lower death rates than countries that delayed or failed to implement them. For example, according to data from the Oxford COVID19 Government Response Tracker (OxCGRT), Hong Kong (SAR China), South Korea (Republic of Korea), and Taiwan required schools to close at all levels within the first two months of the pandemic. These countries recorded 5 deaths per hundred thousand residents between December 21, 2019 and December 21, 2021. Meanwhile Germany, the United States, and the United Kingdom did not require schools to close within the first two months and recorded 204 deaths per hundred thousand residents—fortyone times higher [18].
Many cite this and similarly simple associations as evidence that timely government mandates prevent considerable fatalities [19, 20, 21]. But East Asian countries differ from Western countries in culture, geography, and sociodemographics. For example, residents from Hong Kong, South Korea, and Taiwan have a life expectancy nearly three years longer than in Germany, the United States, and the United Kingdom, suggesting a more healthconscious population overall. Perhaps virus transmission is lower in more healthconscious populations, regardless of any mandate. Other differences include the prevalence of comorbidities like obesity, the structure of physical infrastructure like mass transit and housing, and the possibility of extra protection against coronavirus from previous exposure to similar viruses. Thus, crosscountry comparisons alone cannot establish that a large number of deaths would have occurred in the former countries had mandates not been adopted quickly. More careful comparison is necessary to account for these baseline differences.^{1}^{1}1The problem with the six country comparison—and the many like it in the literature—is confounding, not insufficient sample size. The association is statistically significant at the five percent level, meeting the traditional minimum statistical standard for publication: there are
= 20 ways six states can be divided into two equal groups, only one of which classifies Hong Kong, South Korea, and Taiwan as having the significantly lower fatality rate—a pvalue of 1/20 = .05.
Additional evidence supporting COVID19 mandates comes from the comparison of jurisdictions within a country, such as the fifty states that comprise the United States. Withincountry comparisons are possible when local jurisdictions and not the national government decide whether to adopt COVID19 mandates, and such comparisons are often thought fairer than betweencountry comparisons since residents within a country are more similar in terms of culture, geography, and sociodemographics. The ease and popularity of statecomparisons in the United States are best evidenced by the large number of databases cataloging COVID19 policy, such as NCSL, Littler, Oxford (OxCGRT), Boston University (CUSP), MultiState, Kaiser Family Foundation. In fact, the Urban Institute identifies nearly 200 such databases.^{2}^{2}2Google Scholar also provides evidence supporting the popularity of COVID19 longitudinal data analysis at the country, state, and local levels. In January 2022, a search of articles posted online between 202021 using the term "covid" yields 713,000 hits. Adding the term "longitudinal" yields 117,000 articles, 16 percent of all "covid" articles. Searching "natural experiment covid" over the same time period yields 8,480 articles, "country comparison covid" yields 3,720 articles, and "state comparison covid" yields 409 articles. Note that the research used by governments to evaluate policy is typically unpublished or published on government websites and may not be indexed by Google. According to OxCGRT, states that required some or all schools to close for more than a year had 112 deaths per ten thousand residents in 2021, while states that did not had 145—thirty percent higher. This increase is smaller than the previous betweencountry comparison, perhaps reflecting both the different policy and the fairer comparison. Yet, its size is still considerable in its own right, suggesting school closings are an effective tool for reducing fatalities.
When the decisions made by jurisdictions over whether to adopt a COVID19 mandate are sufficiently haphazard, scientists conclude the data form a natural experiment—an "opportunity" [22, page 113] in which the data are consistent with the results of a randomized experiment. The jurisdictions with and without the policy can then be compared with little or no adjustment as we did above. Scientists might conclude states mandates form a natural experiment because the federal government largely refrained from instituting countrywide mandates. State and local governments were free to independently select among mandates as they deemed necessary [23]. (Indeed, recent rulings of the Supreme Court emphasize state authority to choose among mandates.) But just like the sixcountry comparison, key differences between states could explain the apparent effectiveness of mandates. This paper questions whether the adoption of COVID19 policies in the United States is sufficiently haphazard to compare states and local governments using the large number of databases described above. In the following subsection, we argue it cannot be taken for granted that even sophisticated statistical tools, designed to fully exploit the longitudinal structure of the data, correctly account for such differences, motivating our approach aimed at making the fairest comparison possible.
2.2 A metaanalysis by Talic et al. (BMJ 2021) could not find a longitudinal analysis of COVID19 mandates with a low risk of confounding.
State and local governments differ in numerous ways. These differences could theoretically explain the observed disparity in COVID19 fatalities. For example, transmission of the disease is influenced by meteorological factors like temperature, rainfall, and humidity. It is influenced by sociodemographic factors like the health of residents and their cultural norms, which determine how residents interact with each other. It is influenced by infrastructural factors like the public transit system, which govern population dynamics like the density and flow of residents and tourists—in particular, the differential travel patterns that arise during major holidays like the Fourth of July, Thanksgiving, and Christmas. It could be that a COVID19 mandate prevents few fatalities, but states with factors less conducive to transmission happened to be the ones that adopted the mandate. Since the remaining states would have higher transmission and fatality rates, interstate comparison would make the mandate appear more effective than it actually is. The misattribution of the cause of the observed difference is referred to as "confounding," and the responsible factors are called "confounding factors."
In theory, scientists can list potential confounding factors, as we have done, and adjust their analyses accordingly. In practice, however, scientists generally implement more sophisticated methodologies available for longitudinal data. (A wide variety of longitudinal methodologies exist for this purpose, see [24].) But the typical analysis does not appear to have sufficiently accounted for confounding, according to a recent metaanalysis that systematically reviewed a large number of COVID19 studies. Using the ROBINSI tool [25], Talic et al. found the risk of bias overall was low in only nine percent of observational studies. (The risk of bias considers multiple sources, including the risk of confounding, the risk of measurement error, and the selective reporting of results.) The authors found no study in which the risk of confounding was low [6, see Figure 2]. Other researchers have also raised concerns of confounding, for example see [26].
We consider whether confounding explains an empirical paradox that arises when observational studies are compared with other evidence of the effectiveness of COVID19 mandates. For example, when Talic et al. combine the studies in a metaanalysis, they find maskwearing directives reduce the incidence of COVID19 by nearly 50 percent [6]. In comparison, a randomized controlled trial of mask use in Bangladesh estimates a reduction of only nine percent—less than onefifth the size [7]. ^{3}^{3}3The difference between these two papers is stark: The observational studies suggest the majority of fatalities were prevented by jurisdictions that adopted maskwearing directives, while the randomized study suggests only a small minority were prevented. To emphasize this point, suppose 100 hundred fatalities were observed in a jurisdiction that adopted the mandate. The observational studies predict 200 fatalities would occur had the mandate not been adopted, while the randomized study predicts only 110 fatalities would have occurred. The randomized study is compelling because randomization is performed for the expressed purpose of eliminating the influence of confounding factors: When randomized, whether a government adopts a mandate cannot be influenced by whether the jurisdiction is subject to factors more or less conducive to transmission. Those factors are balanced on average between jurisdictions whose governments did and did not adopt the mandate, and it is unlikely those factors could align in such a way as to make an ineffective mandate look effective.
A variety of circumstantial evidence suggests large effects are unlikely. For example, limited access to tests, masks, and other equipment has made it difficult or expensive to comply with government mandates [27]. So has COVID confusion, which arises when government communications do not clearly state the details of mandates and recommendations [28]. Nevertheless, alternative explanations for the difference between the randomized and observational studies are possible. For example, it could be that the population or policy studied in the randomized study are different than the observational studies. Or the fact that the studies examined different time periods or measured the effect in different ways. For example, the randomized study only followed up with residents that had symptoms while some of the observational studies included cases identified by random testing.^{4}^{4}4Another possibility is publication bias—researchers only publish the times and places during which a mandate is effective. In short, the observational data must be reevaluated to determine whether the paradox is due to confounding or whether mandates do indeed prevent the majority of fatalities.
We propose a new approach that helps fill the gap between the methodologies used by the observational studies—nearly all longitudinal analyses—and the randomized study. Our approach uses additional outcomes, not otherwise of interest, to stand in for confounding factors. In Section 3, we present our main finding. We find univariate longitudinal analyses—like those identified in the metaanalysis—suggest COVID19 mandates prevented a large percentage of fatalities. However, those percentages are reduced substantially when additional outcomes are included to account for confounding factors. After justifying the proposed approach in Section 4, we conclude that the randomized controlled trial better represents the effect of maskwearing directives than the observational studies. But before presenting the results in detail, we review the literature on which our methodology is based in the following two subsections.
2.3 We reevaluate COVID19 mandates using additional outcomes to stand in for confounding factors.
We propose scientists use additional outcomes to reduce the influence of confounding factors. We stress the additional outcomes are not of direct interest themselves. Rather, they are informative of the confounding factors that drive the outcome of interest. In this subsection, we show the use of multiple outcomes in this way is not new. We trace the idea back to the first statistical analysis—whose similarity to our own motivation is uncanny—and follow the practice through the causal inference literature, as developed by the insight of Ronald Fisher, Austin Bradford Hill, and others, until today. We review with two highprofile examples in the causal inference literature that compare multiple outcomes. Although not new, the number of studies that use multiple outcomes to stand in for confounding factors is relatively small compared to the number of studies that use multiple units (i.e. jurisdictions, individuals, etc.). We speculate this is because current tools are not sufficiently simple or flexible, requiring subjective or ad hoc decisions that are difficult to justify. In the next subsection, we discuss the related but historically separate literature of multivariate longitudinal analysis, emphasizing tensor completion, a promising candidate for filling this gap. Our approach can be seen as a bridge between the two traditions, borrowing equally from each literature.
The practice of combining multiple outcomes dates as far back as 1662, when John Graunt published the first statistical analysis "Observations on the Bills of Mortality"[4]. In essence, Graunt calculates "excess mortality" by comparing multiple outcomes. For example, he notes that between 1629 and 1660, fatalities from rickets increased from zero to more than five hundred. Meanwhile, livergrown fatalities remained relatively stable at around one hundred. He concludes that rickets is an "increasing disease" [5, Chapter 3 page 26]. In other words, Graunt assumes that were the number of rickets fatalities subject to prevailing sociodemographic or meteorological factors, they would behave like livergrown fatalities. The increase in rickets fatalities beyond this fluctuation suggests an abnormal factor that should concern public health officials. (Recent research suggests the excess deaths were due to increasing air pollution [29], a factor mentioned by Graunt.)
Graunt also calculates excess mortality by comparing fatalities in the city of London with the country[5, Chapter 7 page 44]. This second comparison is similar to the typical longitudinal analysis of COVID19 mandates today. Although Graunt kept these comparisons separate—he did not combine multiple outcomes across multiple jurisdictions as we propose in this paper—it was clear to him that they provide equivalent types of evidence. Rosenbaum, writing nearly three hundred and fifty years later, articulates Graunt’s intuition. Rosenbaum classifies both as "devices," which he defines as "information collected in an effort to disambiguate an association that might otherwise be thought to reflect either an effect or a bias." Other Rosenbaum devices include gradients (e.g. giving different doses of a policy or treatment) and instruments (e.g. encouraging or nudging individuals to take a policy treatment) [22, page 113].
The use of devices as part of a causal inference workflow or "planning" of observational studies was suggested as early as William Cochran [30], who recounts Ronald Fisher’s now famous dictum for drawing causal conclusions from observational data: "make your theories elaborate"—meaning collect lots of different data (such as additional outcomes) to test causal hypotheses. Austin Bradford Hill includes the use of controls in a check list of properties an association should satisfy before being considered causal. His "specificity" criterion states that an association should be limited to specific circumstances and not others. Hill explains that "if … the association is limited to specific workers and to particular sites and types of diseases and there is no association between the work and other modes of dying, then clearly that is a strong argument in favor of causation" [31]. A variety of famous studies have established causality in the face of seemingly intractable confounding by comparing control outcomes and units in this way. Perhaps the most famous estimates the number of lung cancer deaths caused by smoking; Doll and Hill calculate the "excess death rate" by comparing the number of lung cancer fatalities to unrelated deaths among smokers [32, page 1466, Table 37].
In their comprehensive volume, Statistical Methods in Cancer Research, Breslow and Day review this and similar studies (see for example [33]). They refer to the use of control units as casecontrol studies and the use of control outcomes as proportionalmortality studies [34, e.g. Volume II, page 45 and 115]. They provide a unified "standardization" framework for estimating causal effects, based largely on loglinear models, noting the casecontrol and proportionalmortality studies are "formally equivalent" (Volume II, page 46) in the sense that the same methods can be applied to both studies. In principal, loglinear models allow the combination of multiple control outcomes and units, however like Graunt, we are aware of no attempt by the authors to combine the two analyses to produce a singular estimate as we propose in this paper. (Note that Breslow and Day consider a slightly different setup then ours; their units are individuals that can only die once, while our units are jurisdictions containing a large number of individuals.)
The theoretical justification of multiple outcomes (and other devices) has been influenced by the development of potential outcome notation, pioneered by Rubin, Holland, and Neyman [1, 2]. While many analyses of COVID19 mandates still rely on loglinear or related models for casecontrol data presented in [34], potential outcomes allows scientists to articulate the assumptions necessary to "disambiguate" causes from confounding factors (assumptions like "strong ignorability", see for example[35]). A modern reference of the most common devices available for assessing confounding, including control outcomes is provided by [36, page 482] (which they refer to as pseudo outcomes). Recent work studying control outcomes with this notation include [37] with an extension to "control exposures" considered by [38].
Despite these and other highprofile examples, we believe the current analyses of COVID19 mandates have not taken full advantage of the multiple outcomes available. One could consider a metaanalysis, such as the one conducted by Talic et al. ([6]), as an attempt to combine multiple outcomes. For example, if each study measures the outcome differently or studies the outcome under a different setting, one might conclude the metaanalysis has established the effectiveness of mandates across a range of biases. But if every study fails to account for the same confounding factor, the results could systematically misrepresent the effect of the mandate. For this reason, we speculate that if more devices were applied to COVID19 data, such as multiple outcomes, the observational studies in a metaanalysis and the randomized study conducted by Abaluck et al. would agree more closely. This speculation motivates the methodology presented in this paper. Our main insight is that tensor completion provides a simple and effective framework for combining multiple outcomes across multiple jurisdictions, as we discuss in the next subsection.
2.4 We use tensor completion to estimate causal effects with multiple outcomes.
In this paper, we demonstrate tensor completion provides a simple and effective framework for estimating causal effects with multivariate longitudinal data. We are motivated by an empirical paradox in which observational studies estimate the benefit of maskwearing directives to be five times higher than a randomized controlled trial (cf. [6] and [7]). We suspect the discrepancy can be explained by confounding: The typical observational study relies on longitudinal data, which arranges the outcome as a matrix with each row denoting a jurisdiction and each column denoting a time period. The typical analysis then leverages the staggered adoption of a mandate to borrow from the experience of control jurisdictions. We speculate this approach will fail to account for confounding when there is insufficient overlap between jurisdictions and time periods to borrow correctly. If this speculation is correct, the problem could be overcome by expanding the data from a matrix (univariate longitudinal data) to a tensor with three dimensions (multivariate longitudinal data), where the third dimension denotes a set of related outcomes. Analysis is then strengthened by the experience of both control jurisdictions and control outcomes.
Several methods exist for analysing multivariate longitudinal data. Two popular examples are sharedparameter and randomeffects models. Historically, these models arise when univariate longitudinal data are missing values, and a bivariate longitudinal model is specified in which one outcome is a binary variable indicating whether an entry is missing and the other is the outcome of interest. These and related models can be easily extended to multiple outcomes, but according to the "Longitudinal Data Analysis" handbook, they require "extremely strong, often unrealistic, assumptions about the association structure between the various outcomes"
[39, Section 13.2.5]. We suspect these assumptions are the main reason multiple outcomes have not become widely utilized to study COVID19 mandates. Scientists simply lack a simple and effective framework for analyzing multivariate longitudinal data.Our main insight is that tensor completion provides one such framework. Tensor completion is the generalization of matrix completion, from twodimensional to multidimensional arrays. In recent years, matrix completion problems have received considerable attention in machine learning, optimization, and highdimensional statistics
[40, 41, 42, 43, 44, 45, 46, 47]. Much of the attention comes from applications in collaborative filtering and recommendation systems, following the now famous solution to the "Netflix problem". But matrix completion has proven to be a versatile tool for solving missing data problems in a wide variety of applications, e.g. [48, 49, 50, 51, 52, 53]. Since our evaluation of COVID19 mandates follows Rubin, Holland, and Neyman [1, 2] in that we view causal inference as tantamount to solving a missing data problem, matrix completion provides a natural starting point.Strictly speaking, matrix completion could refer to any number of methods that fill the entries of a partially observed matrix. In this paper, matrix completion refers to "lowrank" matrix completion, which assumes the underlying "complete" matrix has approximately low rank. In such cases, the missing entries are often recoverable from the observed entries with error proportional to the noise level. The two most popular approaches to lowrank matrix completion is lowrank factorization (e.g. [41, 47]) and regularized estimation using methods that enforce low rank structure via a suitable (convex) penalty like the nuclear norm (e.g. [42, 43, 44, 45, 46]
). Of the two methods, lowrank factorization is computationally simpler, but it suffers from two major shortcomings: The objective function is not convex, and the rank of the underlying matrix is assumed to be known. In contrast, regularized estimation, in particular using the nuclear norm penalty, avoids these drawbacks. Moreover, estimates can be backed up with rigorous statistical guarantees. The price for these advantages is that regularized estimation is computationally more complicated. (Indeed, lowrank factorization can be accomplished with efficient block coordinate descent algorithms, such as alternating updates.) We stress that these approaches do not make strong parametric assumptions to complete missing entries, unlike sharedparameter models, random effects models, and other models typically estimated with the expectationmaximization algorithm
[54, abbr. EM], another wellstudied method for general missing data problems. See [55, 7.2.17] for an example of the sharedparameter model fit using EM, whose motivation is quite similar to our own.Tensor completion relies on the same principles as matrix completion, but it has a greater chance of recovering missing entries in the sense that the required fraction of observed entries can be significantly less than in the matrix case (e.g. [56, 57, 58, 59]). For intuition, consider a tensor of order three, as shown in the top right panel of Figure 1. The tensor can be thought of as a data cube storing data along three dimensions (e.g. unit, time, and outcome), or alternatively, as a sequence of matrices (e.g. unit and time) recorded along a single dimension (outcome). Pink entries are observed and blue entries are missing. Each matrix can be completed separately from the observed, pink entries only if there is enough information within each matrix to do so. But if the entries are correlated across matrices—that is, the entries of one matrix contain information about the entries of another matrix—matrices may be able to borrow the information they need when it does not exist within. Indeed, we find tensor completion works particularly well when there are few time periods (socalled shortitudinal data), when one unit adopts a mandate for many time periods (i.e. many observations are missing from a row), or when many jurisdictions adopt a mandate in the same time period (i.e. many observations are missing from a column). We make this intuition more rigorous in Section 4.
Real world applications of this type are not uncommon—all three occur when evaluating COVID19 mandates. In this paper, however, we focus on one version of this problem. In our setting, we assume one sparselysampled matrix is of primary interest (representing an outcome of interest) and a collection of ‘auxiliary’ matrices of the same dimension that are (almost) completely observed (representing additional control outcomes). The motivation remains the same: by grouping outcomes together in a tensor, we borrow as much information across outcomes as possible. In this way, the sparselysampled matrix containing the outcome of interest is completed correctly by using the auxiliary matrix, containing the additional outcomes, to stand in for confounding factors.
Our work extends Athey et al. [3], which uses matrix completion to estimate causal effects with univariate longitudinal data (which they refer to as panel data). Athey et al. in turn extend the synthetic control approach of Abadie et al. [60]. Indeed, Athey et al. explicitly connect matrix completion to factor models, which underlie the synthetic control method. Since factor models can be viewed as specific linear lowrank models for univariate longitudinal data the use of matrix completion—which hinges on the same lowrank model—is a natural approach. The extension to tensor completion proposed in the present paper may capture unique information about the effectiveness of COVID19 mandates not available to matrix completion, which we demonstrate in the following section.
3 Reevaluating COVID19 Mandates using Tensor Completion
In this section, we use tensor completion and five other comparison methods to estimate the number of fatalities prevented by three COVID19 mandates: travel restrictions, maskwearing directives, and vaccination requirements. We divide the first two years of the pandemic—from December 21, 2019 to December 21, 2021—into eight seasons. For each mandate, U.S. state, and season, we determine whether the state adopted the mandate at any time during the season. We then estimate the number of COVID19 fatalities that would have occurred in the states and seasons that adopted the mandate had the mandate not in fact been adopted. Finally, we estimate the effect of the mandate by calculating the average percent change in the number of COVID19 fatalities. Specifically, we record two numbers for each state and season during which a given mandate is in effect—, the observed number of COVID19 fatalities, and , the estimated number of COVID19 fatalities that would have occurred had the mandate not been in effect. We calculate the percent change by subtracting the latter from the former and dividing by the former—i.e. . Averaging the percent change across all states and seasons during which the mandate was in effect yields the estimated effect of the given mandate. We denote the estimate .
In the first subsection, we fix notation and describe the corresponding estimand, . We then review the outcome of interest, the number of COVID19 fatalities in each state reported by OxCGRT, the Oxford COVID19 Government Response Tracker. Finally, we justify the design of our study. We conclude that while other designs are possible, the application of popular longitudinal methodologies to our data yields estimates similar to the studies reported by Talic et al [6], suggesting we have sufficiently reconstructed the typical analysis and thus the empirical paradox presented in Section 2. (In other words, we believe a more elaborate design would complicate the comparison of methods without providing additional insight.) In the next subsection, we compare the proposed method to several popular methods using the data described in this subsection. In the last subsection, we compare these methods using simulated data. The theoretical and computational details of the proposed approach are postponed until Section 4.
3.1 We estimate the effect of mandates using two years of COVID19 fatalities across fifty U.S. states.
We begin by fixing notation. Let denote the th U.S. state and the th season. We define four matrices with the th row denoting the th state and the th column denoting the th season. The first two matrices contain the potential outcomes: Let matrix have entries , denoting the number of COVID19 fatalities that would occur were a given mandate adopted in state during season , and let matrix have entries , denoting the number of COVID19 fatalities that would occur had the mandate not been adopted. The second two matrices contain the observed data: Let matrix have binary entries , taking the value if the mandate is adopted by state during season and if not, and let have entries . We make the stable unit treatment value assumption (SUTVA, 1.6.1 in [36])—we assume the number of fatalities in any state and time period is unaffected by whether a mandate is adopted in a different state or time period. We also assume there is only one version of a given mandate. We do not study whether and how long a mandate is enforced during a season. We believe the vast majority of studies reported by Talic et al [6] make these or similar assumptions, and we discuss both further at the end of this subsection and in the Discussion.
Consider the following estimand,
measures the average relative effect of treatment on the treated—dividing by accounts for the fact that states vary considerably in the observed number of fatalities. (If we did not divide by and computed the absolute effect, the experience of two states would dominate the analysis.) We could calculate directly from the data were we to observe the potential outcomes and . However, we only see the observed data , from which we can neither completely reconstruct and nor calculate —an impediment called the "fundamental problem of causal inference" [1, 2]. This impediment occurs because either a government adopts the mandate in question during the season or does not adopt the mandate. They cannot do both. Thus, for any state and season , we only observe when and when . It follows that we observe and the denominator of , the numerator of which must be imputed in order to estimate .
To summarize, we cannot calculate directly from the observed data matrix because depends on the values of for which but only contains the values of for which . The values of for which are thus considered missing. This paper is concerned with methods that impute the missing entries of from the nonmissing entries so that can be estimated. Figure 1 illustrates this approach, motivating our proposal to augment the outcome of interest with additional outcomes. The summary proceeds counterclockwise. The top left panel depicts the observed data matrix . The pink entries represent the observed elements of —the states and seasons for which —while the blue entries represent the observed elements of —the states and seasons for which . The bottom left panel shows that the observed data matrix only reveals the pink part of corresponding to , the blue part corresponding to is considered missing and denoted by "?". The bottom right panel shows the express purpose of the methods considered in the paper: to complete the matrix by imputing the nonmissing entries so that can be estimated. The top right panel shows that additional outcomes are available to impute the missing entries. (Some of these additional outcomes may also be missing, and which entries are missing may differ by outcome.) By grouping multiple outcomes together as in a tensor, more accurate imputations may be possible. We stress that these additional outcomes are only included to improve the accuracy of the imputations; they are not otherwise of interest.
We conclude this subsection by reviewing the main outcome of interest and justifying the design of our study. The main outcome of interest is the number of COVID19 fatalities reported by each state as recorded by OxCGRT, the Oxford COVID19 Government Response Tracker. To aid our review, we display four histograms of the dates of the reported fatalities in Figure 2. The top two histograms correspond with the two largest states (by population) and the bottom two histograms correspond with the two smallest. The dates are binned by week, and the ticks on the horizontal axis denote the seasons. The colors match Figure 1, representing whether a mandate was in effect: pink denotes no maskwearing directive is in effect and blue denotes a mandatewearing directive is not in effect. We find the data are similar but not identical to the data reported by the CDC Case Task Force.
We conduct our analysis at the season level. That is, we aggregate the number of fatalities by state and season. We make this decision for several reason. The main reason is that we believe a more complicated study design is not necessary: We believe this design is consistent with the vast majority of the analyses of COVID19 mandates. Furthermore, when the data are aggregated in this way, the application of popular longitudinal methodologies to our data yields estimates similar to the studies reported by Talic et al. [6]. This suggests we have indeed sufficiently reconstructed the typical analysis and thus the empirical paradox presented in Section 2. While a more nuanced design might yield more nuanced results, our intention is to compare methods using the most typical design—not the most nuanced one. Furthermore, a nuanced design would complicate the summary and comparison of methods, the primary goal of this paper.
The second reason we aggregate the number of fatalities by state and season is to reduce the impact of reporting errors. The existence of reporting errors is clear from Figure 2; one would expect the number of COVID19 fatalities to change relatively slowly from week to week—the histogram of fatalities should be smooth. However, the data in Figure 2 are somewhat jagged. On closer inspection, it appears as though many fatalities at the end of the week, month, or season are not reported until the beginning of the next week, month, or season—certainly a result of the fatality reporting process and not COVID19 itself. By aggregating by state and season, we minimize the influence of these reporting errors. The number of fatalities within each season is much larger than the amount delayed by reporting. See [61] for a more general discussion of how to interpret COVID19 data.
A third reason to aggregate fatalities by state and season is to reduce violations of the stable unit treatment value assumption (SUTVA). In particular, we assume the mandate in anyone state and period does not affect the number of fatalities in a different state and period. According to the Center for Disease Control (cf. Frequently Asked Questions), COVID19 symptoms (and thus reporting) may appear two weeks after exposure—before which the infected may not be contagious—and infections can last a month or longer—until the infected is no longer contagious. This suggests SUTVA would be grossly violated at the day, week, or even month level. A mandate in one month could reduce fatalities in a subsequent month without the mandate. This intuition is further investigated in Figure 3, which shows the autocorrelation of COVID19 fatalities when aggregated by day (left), week, month, and season (right). We find that autocorrelation is small and statistically insignificant when the data are binned by season, but not when it is binned by month, week, or day Similar results occur when autocorrelation is examined by state (not shown).
Instead of aggregating the data by season, one might model the spread of COVID19 within and between states. For example, one might use an autoregressive model or a compartmental model. However, under the causal inference framework outlined at the beginning of this subsection, these models would have to accurately extrapolate the counterfactual fatality rate over long periods with no data. For example, Figure
2 shows that California’s maskwearing directive lasted a year and a half. We did not observe such models to be applied in this fashion; we found these models were typically used for short term prediction. We do not consider these models in this analysis.Our final assumption is that if a mandate is in place at any point during a season, we consider the mandate in effect for the entire season. We do this after careful reflection of the data and how the typical state government uses mandates. Our conclusion is that a short but strict mandate may be as effective as a long but unenforceable mandate. Furthermore, the enforcement of mandates may change over time or be enforced inconsistently within state for unexpected or undocumented reasons. (For example, New York Mayor stopped enforcing a socialdistancing mandate after advocates found nearly everyone arrested for breaking the mandate were Black or Hispanic.) We do not believe it is practical to account for such changes, and we do not want to subjectively and halfheartedly classify mandates based on duration or strictness. Therefore, we consider the scope, duration, and enforcement of a policy during a season as intermediate. (One might use the following "intent to treat" interpretation of the estimand in this paper: states who pass mandates intend to enforce travel restriction, mask wearing, or vaccination throughout the season with "perfect compliance." States who do not pass these mandates do not wish to enforce to enforce travel restriction, mask wearing, or vaccination.)
We believe this design is consistent with the studies discussed in Section 2. Equally important, we argue the design of the study described in this subsection permits a fair and realistic comparison of the proposed method and popular longitudinal methods currently being applied to study the effect of COVID19 mandates. We conclude that while more elaborate designs have merit, for our purposes they complicate the comparison of methods without improving insight. Having established the estimand and design of our study, we now show in the next subsection that the data offer a compelling case for the methodology proposed in this paper.
3.2 When compared to traditional methods, the proposed method fits the data better and suggests COVID19 mandates are less effective.
In this subsection, we apply the proposed method to the data described in the previous section. We also review and apply five comparison methods, which we find do not fit the data as well as the proposed method—even when additional outcomes are included. We then revisit the empirical paradox introduced in Section 2. We find the five traditional methods estimate maskwearing directives to be as effective as the observational studies described in the previous Section—that is, they all estimate , suggesting the average mandate prevents the majority of fatalities that would have occurred had it not been adopted. In contrast, the proposed method estimates the effect to be similar to the randomized controlled trial—that is, the two estimate , suggesting mandates only prevent a small minority of fatalities. In the next subsection, we further investigate this discrepancy with a simulation study. We conclude that of the six methods under consideration, the proposed method best adjusts for confounding and thus most accurately reflects the effect of COVID19 mandates. We postpone the technical details of the proposed method until Section 4.
We apply six models to the data described in the previous subsection. The first three are loglinear models, the second two are matrix completion methods, and the last model is the proposed tensor completion method. These models also differ according to whether additional outcomes are (1) excluded, (2) included as covariates, or (3) included as additional outcomes. All models also include three "baseline" covariates: the lognumber of vaccinated residents at the start of the season, the lognumber of vaccinated residents at the end of the season, and the logpopulation (as enumerated by the 2020 census) as an offset. (The only exception is that when we estimate the effect of vaccination requirements, we only include the number of vaccinated residents at the start of the season. The number of vaccinated residents at the end of the season is considered intermediate.) We do not consider these models exhaustive. Rather, we believe the estimates produced by these models reflect the range of estimates being published by researchers.
Before discussing each model, we briefly present the four additional outcomes considered in this analysis. For each state and season, we consider the number of flu fatalities and traffic fatalities two years earlier (i.e. the seasons Winter 2017 to Fall 2019) as reported by the Center for Disease Control, and the average temperature and rainfall from all stations in the Global Historical Climatology Network (i.e. the seasons Winter 2019 to Fall 2021). We consider these variables to be outcomes (and not covariates) because we believe they are influenced by similar unobserved factors (in the case of flu and traffic fatalities) or otherwise represent unobserved factors but with error (in the case of average temperature and rain). We denote the matrix of each additional outcome , and the th additional outcome of the th state and th season . To distinguish the additional outcomes from the baseline covariates, we denote the baseline covariates , , and the the th covariate of the th state and th season . The offset for state is denoted . We assume the offset is the same each season.
We choose these four additional outcomes because their values presumably do not depend on whether a COVID19 mandate is adopted or not. That is, we assume the values of are invariant with respect to the mandate; would take the same value regardless of whether or . As a result, we consider none of the entries of to be missing. We point out that we are not limited to control outcomes whose values do not depend on whether a COVID19 mandate is adopted. We could also include additional outcomes that are effected by the mandate, and these outcomes would be missing for the states and time periods that adopted the mandate. For example, one could include the number of flu and traffic deaths during the pandemic, which likely would be effected by maskwearing directives in the states and periods that adopted them. However, we believe the number of flu and traffic fatalities from two years earlier are sufficient for the purposes of this study, reflecting the confounding factors we believe are missed from currently applied methods. To support this claim, we calculate the outofsample mean squared error (estimated using leaveoneout cross validation), which suggests that when these additional outcomes are combined using the proposed method, the counterfactuals are imputed much more accurately than with competing methods.
We now present the three loglinear models used in our comparison. All three models assume
follow a negative binomial distribution with mean
and size . In loglinear model (1), is decomposed into a state effect, , a period effect, , and a linear combination of covariates, :loglinear model (1):  
Loglinear model (2) augments the first loglinear model by adding the additional outcomes as covariates on the logscale:
loglinear model (2):  
Loglinear model (3), augments the first model, not by adding the additional outcomes as covariates, but by adding them as identically distributed outcomes, offset by a factor , which varies by season but not state,
loglinear model (3):  
For all three models, the parameters are estimated by maximum likelihood (using the glm.nb command in the R package MASS, which maximizes the likelihood using Fisher’s scoring algorithm) from the observed entries of . The missing entries of are then imputed using the conditional expectation , and an estimate of , , is calculated using the definition of in the previous subsection but plugging in for . A confidence interval is calculated by deriving the multivariate normal approximation to the parameters of each model and applying the delta method. See [62] for computational details.
The bottom three approaches in Table 1 use the matrix/tensor completion approach described in Section 4. Without getting into the technical details, these three approaches correspond loosely with the three loglinear models presented above. Instead of maximizing the negative binomial likelihood, they choose the best approximation of the observed entries of (as measured by mean squared error), subject to a restriction on the complexity the approximation (as measured by the nuclear norm). The first model, matrix completion (1) also includes baseline covariates. The second model, matrix completion (2), includes both baseline covariates and the additional outcomes as covariates. The third model, tensor completion, adds baseline covariates but combines the outcomes as a tensor, generalizing matrix completion based on the nuclear norm.
As discussed in the previous section, the last three methods essentially assume a linear factor model. Thus, we take the log of the outcome (after adding one) to achieve a loglinearlike relationship (note that the notation does not reflect this transformation). We exploit the assumed linearity to calculate uncertainty in intervals using a residual bootstraptype algorithm. After solving the matrix or tensor completion problem, the residuals associated with the outcome of interest are permuted within column and added to the predicted (fitted) values to obtain a new matrix . The control outcomes are not permuted. We then substitute in the original tensor substituted by and recompute the solution of the matrix or tensor completion problem. This process is repeated one hundred times.
We apply all six methods to the data, yielding an estimate for each of the three mandates: maskwearing directives, travel restrictions, and vaccine requirements. A summary of the results are displayed in Tables 1, 2, and 3. Each table has five columns. The first column enumerated the six methods considered in this analysis. The second column contains the insample mean squared error (MSE), calculated among the observed entries of (on the log scale). The third column contains the outofsample mean squared error (MSE), approximated from the observed entries of using leaveoneout cross validation. (Note that MSE is only calculated using the observed entries of , again on the log scale.) The fourth column contains the estimate of , . The last column depicts a 95percent uncertainty interval for on the logscale. (The red dot represents log().) The dotted line represents 1. For above the line, the average mandate prevented the majority of the fatalities that would have occurred without the mandate (in the states where the mandate was adopted). For below the line, the average mandate prevented the minority of fatalities.
We find the proposed method (labeled "tensor completion" in the tables) has the best outofsample MSE for every mandate and the best insample MSE for two of the three mandates. Outofsample, the accuracy is by far the best, while insample the accuracy is somewhat similar to matrix completion methods. This suggests that the additional outcomes prevent the model from overfitting. For example, in Table 1, matrix completion (1) and matrix completion (2) have roughly half and onefourth the error of tensor completion. But when the outofsample MSE is approximated by leaveoneout cross validation, it has roughly three and four times the error. More importantly, the estimated effect of the mandate is unrealistically high—matrix completion (1) and matrix completion (2) suggest the number of fatalities would be five and nine times higher without the mandate, respectively. That is, if these estimates are to be believed, an 80 or 90 percent reduction in fatalities might be expected (on average) by implementing a maskwearing directive—significantly higher than both the randomized trial and the observational studies presented in Section 2.
Overall, tensor completion produces the lowest estimate of mandate effectiveness, sometimes by an order of magnitude. We note that loglinear model (3) also models the additional outcomes as random, but, due perhaps to the inflexibility of the parametric model, it does not obtain a competitive MSE either insample and outofsample. Indeed, in Table 2, the error is much greater than loglinear model (1) or loglinear model (2). (If unconvinced by the relatively high MSE, we also point out that the estimated effect is also unreasonable.) This partially supports our speculation that scientists may not consider including multiple outcomes because they worry the data may be incorporated incorrectly and increase error instead of reducing it. We believe these results support our intuition that tensor completion offers a compelling solution. We further investigate this point in the next subsection using a simulation study.
One might wonder whether it is sufficient to include multiple outcomes as covariates. We argued in Section 2 that this solution is unappealing on theoretical grounds. Outcomes are measured together with error or influenced by similar unobserved factors, and thus we argue should be treated as jointly random. The results appear to support this argument. That is, including multiple outcomes as covariates does not appear to be a viable substitute for including them as outcomes. All three tables indicate that adding multiple outcomes as covariates, either using loglinear models (loglinear model (2)) or matrix completion (matrix completion (2)) reduces MSE insample, but does not appear to improve MSE outofsample. We conclude that additional outcomes should be included in the model as outcomes.
3.3 Simulations show multiple outcomes can better adjust for confounding when combined using the proposed method.
The main insight of this paper is that tensor completion provides a simple and flexible approach for combining multiple outcomes that does not rely on strong parametric assumptions. In comparison, traditional methods that rely on a univariate outcome or combine multiple outcomes using strong parametric assumptions may not be sufficiently flexible to study the effect of COVID19 mandates. To test this insight, we conduct a series of simulation studies. In simulation 1, we sample from a simple loglinear model and demonstrate that matrix and tensor completion are competitive with maximumlikelihoodbased imputations, where the likelihood contains the true model. Simulation 2 and 3 complicate the loglinear model, first by adding interactions and then by mixing with Poisson "noise", respectively. In Simulation 2, matrix completion is not longer competitive. In Simulation 3, tensor completion is the only reasonably accurate method.
Throughout the simulations we assume for every state and season , . That is, COVID19 fatalities would be percent higher if the mandate were not implemented for any state in any period. (It follows that the average, .11.) Eleven percent was chosen to agree with the randomized control trial conducted by Abaluck et al. Thus, the goal of this section is to determine whether the much larger effects estimated by the observational studies in Talic et al. and the proceeding section can be explained by model misspecification—which we interpret as unobserved confounding.
In our simulations, the number of fatalities without the mandate, , is randomly distributed according to one of three models, labeled Simulation 1, Simulation 2, and Simulation 3. We only consider one additional outcome, and we do not consider covariates and offsets. (More complicated simulations produced similar results, but in our opinion did not provide additional insight.) Let denote the logmean of the Negative Binomial distribution and the size. We simulate from the following three distributions (with parameters defined below).
We assume one hundred observations are missing at random. (The mandate is adopted for one hundred stateseason pairs.) Similar results are obtained when simulated states adopt mandates completely at random, completely at random within season, or (inversely) proportional to the number of fatalities
. (The last case could arise if the "healthier" states are more likely to choose the policy during "healthy" seasons. For example, a state waits until after summer vacation to adopt a mandate for fear of losing tourism revenue.) Specifically, in the last case, we assume statesperiods adopt mandates with probability
(without replacement), where fatalities are centered and scaled across states each season:We also assume and . Finally, we set , , and . That is, during the Spring and Winter and during the Summer and Fall, reflecting a possible cyclical fluctuation of the pandemic that is constant on average across states but not reflected in the additional outcome.
We compare tensor completion to the models described in the proceeding section using this simulated data. The results for Simulation 1, Simulation 2, and Simulation 3 are displayed in Table 4, 5, and 6, respectively. Each table has five columns. The first column enumerates the four methods compared in the simulation study. (To keep the simulations simple, we do not consider covariates or offsets.) The second column contains the mean squared prediction error (MSE) between the number of fatalities predicted by each method as described in the previous subsection and the true number of fatalities. The third column contains the estimand, , set to be .11 in all simulations. The fourth column contains the estimated value of from applying each method, as described in the previous subsection. The last column depicts the mean absolute percentage prediction error (MAPE) for each state and period during which the mandate was adopted (the line shows the inner 95 percent of MAPEs on the log scale; the red dots are the estimates and tend to be in the center of the intervals on the level scale). The dotted line represents the value 1. Therefore, if the MAPE for a state and period is above 1, the prediction is more than 100 percent from the true value. If a MAPE is below 1, the prediction is less than 100 percent from the true value. (Note the intervals in these tables are different from the intervals in Tables 1 through 3
. The goal here is to show that outliers drive the imputation error.)
The simulations reveal that all methods can recover the true effect when the data follow a simple loglinear model (Simulation 1). However, when interactions and Poisson "noise" are added (Simulation 3), all estimates except that from the proposed method become five to twentyfive times greater than the truth—consistent with the empirical paradox presented in the previous section. In particular, column 2 of Table 4 shows all methods produce low MSE in simulation 1, and column 5 shows the MAPE is well below 1 for all states and periods that adopted the mandate. On the other hand, column 2 of Table 6 shows only the proposed method produces low MSE in simulation 3, and column 5 shows that only for the proposed mandate is the MAPE below 1 for all states and periods that adopted the mandate. Simulation 2 (Table 5 ) shows the loglinear model is accurate if the Poisson "noise" is removed and thus the model is correctly specified. However, matrix completion is not accurate—producing an estimate four times greater than the truth.
We conclude that that univariate longitudinal methods (loglinear model (1) and matrix completion (1)) are highly sensitive to stateseason interactions, and the loglinear models are highly sensitive to model violations. In other words, the simulations demonstrate how small changes to the data generating process might lead to a significant overestimation of the effect of the mandate. In particular, the fifth column reveals that this is due to outliers; in all simulations tensor completion is below 1.
We believe these simulations reveal how tensor completion provides a simple and flexible approach for combining multiple outcomes that does not rely on strong parametric assumptions. In comparison, traditional methods that rely solely on a univariate outcome or combine multiple outcomes using strong parametric assumptions may not be flexible enough to impute the missing entries of longitudinal data. In the next section, we provide a theoretical justification of the proposed approach. We also discuss the computation in greater detail.
4 Tensor Completion: Generalizing Matrix Completion to Include Additional Outcomes
In this section, we lay out the proposed approach in greater detail. We begin in Subsection 4.1 with a motivating model. This model provides the underlying rationale for using additional outcomes from the tensor completion perspective. We then generalize this model in Subsection 4.2. We then outline our computational approach is Subsection 4.3. Concluding with our statistical analysis and extension in Subsections 4.4 and 4.5, respectively.
4.1 Motivating model
The motivation for the proposed tensor completion method is simple: by grouping all of the data together in a tensor, we borrow as much information across outcomes as possible. This extra information can be especially important when there are few time periods.
To illustrate this central idea, consider a simplified variant of the loglinear model described in Simulation 1 of the preceding section:
An equivalent representation in matrix form is
(1) 
where here and in the sequel, we simply use as a placeholder for or optionally . Accordingly, the above can also be expressed via the factor model
(2) 
which is a particularly simple instance of a factor model with the second of the two factors being fixed to
and the first factor loading vector being fixed to
.Control outcomes observed along with outcomes of interest can naturally be accommodated within a tensor model built on top of a factor model such as (2). Specifically, in the setting of Simulation 1 of the previous section, we have
(3) 
By stacking on top of we obtain a threeway array (tensor) of dimensions , , and , where here and below the variable will be used for the total number of observed outcomes (outcome of interest plus control outcomes). Denote this tensor by with and , , . A Tucker(like) decomposition (cf. Appendix B for a summary of selected notions from tensor linear algebra) of is given by
where the symbol refers to the mode product, (cf. Appendix B.3), and
Accordingly, the socalled first mode (cf. Appendix B.1) of the tensor , i.e. the matricization of along its first dimension
(4) 
is a matrix of dimension that is easily seen to have rank in view of the rightmost representation in (4) and the obervation that .
The gist of the above tensor algebra is that while the addition of the control variable has led to twice the number of entries in the tensor in comparison to the matrix associated with the outcome of interest, the degrees of freedom in the resulting tensor becomes
, where refers to the underlying rank (here given by ), which is an increase compared to the rank matrix alone by the factori.e. the degrees of freedom are essentially the same as long as the first dimension of the tensor is dominant. This means that imputing missing entries in the tensor becomes a considerably simpler problem if the control outcomes are (almost) fully observed since the ratio of observed data vs. degrees of freedom becomes higher than when using observed entries in only to perform imputation. While this insight can be easily be obtained without any reference to tensors alone, by noting that the model for the outcome of interest (1) and the control outcome (3) share all but one of the parameters, the approach via tensors generalizes to a much larger variety of factor models. The next subsection is dedicated to a detailed discussion of those generalizations and their implications.
4.2 General rationale
As already noted in (2) and the subsequent discussion, the model considered in the previous subsection is a specifically simple factor model based on main effects for subjects and time, or equivalently, row and column effects in terms of the matrix . In the sequel, we drop the loglink as in (1) and instead assume for simplicity that refers to the logtransformed outcomes instead.
A general factor model of the form
where denotes the number of factors, offers considerably more flexibility. In particular, the modeler is freed of the task of selecting specific model terms to be included (such as the above row/column effects, certain interaction terms, etc.); instead, the promise of the above factor model is that relevant terms or synonymously “features" will be inferred from the data.
As we will be elaborated below, the availability of control outcomes can greatly benefit estimation of the above factor model, particularly if the observed outcome is only observed partially while the control outcomes are (almost) fully observed, as it is the case for the data sets studied herein. Loosely speaking, benefits are possible whenever the control outcomes exhibit a reasonably strong (linear) correlation with the outcome of interest. In an ideal scenario, the outcome of interest and the control outcome of interest are linearly independent. This is the case under the model
(5) 
for by matrices . Eq. (5) can be seen as a higherdimensional analog to the model in 4.1 involving only a single factor, i.e. . Combining the above models for the outcome of interest and the control variables leads the following model for the tensor with and , , with dimensions , , and obtained by stacking all outcomes:
where is the byby tensor stacking the matrices and , and
denotes an identity matrix whose dimension is specified in the subscript. The above tensor
involves degrees of freedom, whereas the total number of entries in the tensor is given by . The ratio of the latter two quantities is given by(6) 
in the regime , where indicates an upper bound up to an absolute constant. By contrast, if we instead considered the factor model for the outcome of interest without the inclusion of the control outcomes, the corresponding ratio would scale as . As shown in Theorem 1 below, the mean squared error (MSE) in estimating (including the case of missing entries in ) essentially scales like the ratio (6), whereas in the corresponding problem concerning estimation of the MSE again scales as . The latter observation indicates that reliable estimation puts a stringent limit on if is small, i.e. if only a small to moderate number of time periods is observed.
Clearly, the above considerations work under the idealized assumption of perfect linear dependence between the outcome of interest and the control outcomes, yet the gist of the argument still goes through if the rank increases only slowly with the number of control outcomes. In general, improvement via augmentation with control outcomes is not guaranteed: in the worst case, the control outcomes are all orthogonal to the outcome of interest. The latter situation, however, is not aligned with the idea of borrowing strength and the concept of control outcomes altogether.
4.3 Computational Approach
Let denote the tensor obtained by stacking and the control outcomes , and denote by the indices of the entries that are observed. Furthermore, consider the associated Frobenius (semi)norm with respect to on the space of threeway tensors of dimensions :
If the rank of the tensor is known, one possible approach to tensor completion is to (approximately) compute a CP or Tucker factorization via alternating least squares (ALS, [63, 64]) with respect to . Apparent downsides are that is typically not known and that ALS is not equipped with any optimality guarantees in general, given the fundamental computational hardness of the problem arising from nonconvexity [63, 65, 58, 57, 56, 66, 67].
An alternative is to consider an optimization problem of the form
where is a regularizer that encourages lowrank solutions. Choosing
as the tensor rank would lead to another computational hard problem. Unfortunately, convex surrogates such as the tensor nuclear norm (the analog to the nuclear norm for matrices), are computationally hard to compute as well
[58, 67, 68]. An established approach [65, 66] that sidesteps these computational difficulties is based on matricization (aka matrix unfolding, cf. B.1), in which case the regularizer is taken as a convex combination of matrix nuclear norms associated with each mode of the tensor. This yields the convex optimization problem(7) 
where denotes the matrix nuclear norm, i.e.
the sum of singular values, and the
are nonnegative weights summing to one. In the specific setting under consideration here, the three modes of are of dimension , , and . In the regime as discussed above, it is appropriate to consider only the first mode, since the rank of the two other modes is already upper bounded by and , respectively. Accordingly, we use the weights , . Since the first term in (8) only involves the vectorization of the tensor entries (i.e. the specific tensor structure does not enter), the resulting optimization problem boils down to the following matrix completion problem with nuclear norm regularization(8) 
where, with some slight overload of notation, and here refer to the observed entries in .
Even though matricization in general may lead to suboptimal statistical performance of tensor completion, this is not the case in the setting here, assuming the aforementioned scaling of relative to and . In fact, tensor completion based on (8) is shown to yield optimal rates up to logarithmic factors in the subsequent section.
4.4 Analysis
In the following, we present a nonasymptotic bound on the mean squared error of the estimator (8). The results below is based on established analysis techniques from the literature on highdimensional statistics as summarized in the monograph [69]. We start by listing the underlying assumptions.
(A1). The threeway tensor of dimensions , , and is generated according to the model
where the tensor
contains independent random variables with zero mean satisfying the Bernstein moment condition
for all , for absolute constants .(A2). Let denote the set of observed entries in the matrix for the outcome of interest , and let denote the set of observed entries for the control outcomes . Accordingly, let , . It is assumed that the are sampled independently and uniformly at random from . Moreover, it is assumed that such that
and absolute constants , .
(A3) The dimensions of the tensor are sufficiently large in the sense that Assumption (A1) corresponds to an additive error model with i.i.d. errors having subexponential tails, which facilitates the use of the Matrix Bernstein inequality. Assumption (A3) requires a mild lower bound on the dimensions of the tensor. Assumption (A2) specifies the sampling model for the observed entries in the tensor . While the assumption of uniformatrandom sampling is little realistic in most applications, this or similar assumptions are commonly employed in the literature on matrix/tensor completion [44, 45, 42, 43, 59]. Regarding the number of observed entries , it is assumed that all control outcomes are observed up to a fraction of , while for the outcome of interest the number of entries is required to be proportional to the degree of freedoms of the underlying matrix (modulo a logarithmic factor). The required number of observed entries also depends on the “spikiness" constants and , different variations of also occur across the aforementioned literature; in brief, low spikiness constants rule out situations in which the tensor to be recovered concentrates on few entries of large magnitude, in which case matrix completion becomes an illposed problem (cf. 10.6 in [69] for a more detailed discussion).
The assumptions in (A2) can be substantially relaxed at the expense of weaker recovery results or increased technical sophistication in the proofs; for example, the analysis in [3] covers sampling models such as the staggered adoption setting in Fig. 1. Such relaxations were not pursued since theoretical analysis is not considered the primary contribution of the present paper. The purpose of the following theorem is to confirm the gist of argument in Eq. (6) regarding the potential benefits from the use of control variables
Theorem 1
Let denote any minimizer of (8) with corresponding backfolded tensor . Under assumptions (A1) through (A3) for any with , it holds that
(9) 
with probability at least
where is a positive constant.
In order to better grasp the implications of the above theorem, it is easiest to replace in (9) by the “optimal" choice , which yields the root mean square error bound
which is proportional to in the regime , modulo a logarithmic factor. Note that the rate highlights the improvement that can achieved by the addition of control outcomes, and is in exact correspondence to (6). This correspondence implies the rate optimality of (9) up to a logarithmic factor.
4.5 Extensions
(i) Inclusion of covariates. Covariates that are relevant to the outcome of interest (i.e. the top matrix layer of the tensor) can be acommodated in a straightforward manner as described in 8.1 in [3]. Specifically, we may consider the following model for the outcome of interest
where is the top layer of an underlying lowrank tensor , is a matrix of subjectspecific covariates, is a matrix of timespecific covariates, and the coefficient matrix constitutes an unknown parameter that needs to be estimated. Note that in the absence of timespecific covariates, we may simply use (and analogously for the absence of subjectspecific covariates). The parameter can be estimated by a simple modification of the data fitting term in the objective (8); the resulting modified objective in and remains (jointly) convex and can be solved easily with established methods. Under the assumption that is small, the matrix is of fixed (low) rank, which aligns with the lowrank/factor model assumption for overall. Consequently, there is also no need in general to add a separate regularization term for the estimation of .
While the above representation is sufficient for basic covariate models, it is not general enough, e.g. to allow covariates that vary both across subject and time. In the context of this paper, a relevant example is given by the number of vaccinated individuals per state. Such covariates can be included similarly with a suitable construction of (or ) and coefficient matrix , but the resulting product is no longer of low rank (relative to ). In our data analyses, we did not observe any issues in the estimation of such models, despite the departure from the lowrank assumption.
(ii) Higherorder tensors. While in this paper, we focus on tensors of order three, the proposed approach extends to higherorder tensors. For example, an issue that has attracted much interest are potential disparate impacts of COVID19 on different demographic groups (e.g. strata defined by combinations of race and gender). The thus stratified data would yield a tensor of order four, assuming control outcomes that vary across the strata. The only necessary modification to accommodate the resulting tensor concerns computation: for a tensor of order three, there are three canonical matricizations given by its three modes, whereas for a tensor of order four with dimensions , one may consider matricizations of dimensions as well as , for any permutation of . The matricization should be chosen so that the dimensions of the resulting matrix are close to a square matrix since in that case, one obtains the best statistical recovery guarantees [57].
5 Discussion
We propose a new approach for estimating causal effects with multivariate longitudinal data that uses tensor completion. Our motivation is to evaluate government mandates adopted during the pandemic, such as travel restrictions, maskwearing directives, and vaccine requirements. Specifically, we present an empirical paradox in which observational studies estimate the benefit of maskwearing directives to be five times higher than a randomized controlled trial (cf. Talic et al., BMJ, 2021; Abaluck et al., Science, 2021). We are able to recreate this paradox when we apply current, popular methods to the observational data. But when we apply the proposed method, it agrees with the randomized controlled trial in the following sense: The observational studies suggest the average maskwearing directive prevented the worst of the pandemic (the majority of cases or fatalities were prevented that would have occurred had the mandate not been adopted), whereas the randomized study and the proposed method suggest the worst was not prevented. Our interpretation is that the proposed method better adjusts for confounding factors than current, popular methods. We provide several justifications supporting this interpretation. We conduct a simulation study, and we show that under standard regularity conditions, combining multiple outcomes improves the accuracy of counterfactual imputations.
We believe the proposed method is sufficiently sophisticated to address the motivating problem: both in terms of evaluating the effect of COVID19 mandates generally and investigating the empirical paradox specifically. However, other applications might benefit from extending the approach. In the remainder of this section, we discuss several such extensions. To motivate each extension, we state a hypothetical research question about COVID19 mandates that was not addressed by our analysis but could be of interest to policymakers. We then describe an extension that addresses that question. In this section we do not discuss theoretical or computational extensions of tensor completion itself. That discussion was considered at the end of the previous section.
The main outcome of interest in this paper is the number of COVID19 fatalities. Additional outcomes, such as the number of flu fatalities, were included only to better understand the main outcome. Nevertheless, the main outcome could be augmented to include other outcomes of interest, such as the number of COVID19 positive tests, the number of COVID19 cases, and the number of COVID19 hospitalizations. These outcomes would be considered missing in the states and periods during which the mandate was adopted. But by including these additional outcomes, more accurate imputations may be possible—not only for the number of COVID19 fatalities, but the number of COIVD19 postitive tests, cases, and hospitalizations as well—when compared to an analysis that treats each outcome separately).
The proposed method depends on a regularization or tuning parameter, , which is determined by crossvalidation. Other decisions could also be made with the aid of crossvalidation. For example, we choose four additional outcomes that we believe best represent unobserved confounding factors (such as travel patterns from commuting and tourism) that explain both whether a mandate is adopted and the number of COVID19 fatalities. We justify our choice of additional outcomes by estimating the outofsample mean squared error (using leaveoneout cross validation) and obtaining a smaller value than competing methods. But a different mandate or a desire for more accurate counterfactual imputations might require searching over a larger number of additional outcomes or considering transformations of those outcomes. These decisions could be made using crossvalidation, perhaps using a greedy stepwise algorithm.
We estimated the effect of the mandate for the states and seasons that adopted the mandate (i.e. the socalled effect of treatment on the treated). This effect is useful for discussing the amount of fatalities that were prevented by a mandate. However, governments may also be interested in the amount of fatalities that could have been prevented by a mandate. That is, we could have estimated the effect of the mandate for the states and seasons during which the mandate was not adopted (i.e. the socalled effect of treatment on the untreated). To estimate this effect, the proposed method could still be applied albeit with a different set of observations considered "missing." (This assumes that the states and periods that did not adopt the mandate would hypothetically adopt similar mandates to those that did.) The two approaches could be combined to estimate the effect of the mandate if implemented by all states. Indeed, a great many effects could be calculated with the proposed method using this or similar strategies.
We could also estimate a different effect by designing a different observational study. That is, we could arrange the data differently before applying the proposed method. For example, we studied mandates as implemented (and not implemented) by states; we assume that if a state did not adopt a mandate, policies and practices similar to states that did not adopt the mandate would prevail. This point is worth stressing: A "beneficial" maskwearing directive that reduces transmission could in fact increase fatalities if, absent the maskwearing directive, the state would have implemented a "more beneficial" travel ban, which better reduces transmission. A different study design could be constructed to estimate the number of fatalities prevented by a mandate "holding all else equal." Another study could be designed to estimate the number of fatalities prevented by mandates were residents to perfectly comply. We argues these effects have limited policy use, but observational studies could be designed to estimate them nonetheless.
We assume that if a mandate is adopted at any point during a season, the mandate is in effect for the entire season. We do this after careful reflection of how the typical state uses a mandate and how the typical analysis studies those mandates. Our conclusion is that a short but strict mandate can be more effective than a long but lenient one, and we did not want to subjectively classify policies based on duration or strictness. We are satisfied with this decision. We believe it is consistent with the typical observational study. Nevertheless, a reasonable alternative may be to weight the outcome according to how long the mandate was in effect (much like personyear exposure calculations in casecohort studies [34, Volume II page 45]) However, such weights are not as simple as determining when say, an employee was working at the factory and thus exposed to a carcinogen (as is typically done in such casecohort studies). Mandates may be unevenly enforced across the state. In this sense, one might interpret our estimates conservative as it considers some mandates that are only "on the books." In that case, however, the large estimates produced by traditional methods should be even more alarming.
We conclude that the proposed method should be used to supplement traditional methods. We demonstrate the method does not make as strong parametric assumptions about the relationship between jurisdictions and time periods, can more accurately measures the effect of the government mandates adopted during the COVID19 pandemic, and therefore should be considered as a viable tool for studying the pandemic. We believe the proposed method is particularly timely as governments increasingly rely on longitudinal data to choose between mandates. Governments should seek methods that borrow experience from all available places—both control jurisdictions and control outcomes—to choose mandates that best balance public health and individual liberties as new waves of COVID19 variants are expected to follow delta and omicron [8].
Acknowledgments
This research was supported in part by a grant from the COVID19 Database
References
 [1] Paul W Holland. Statistics and causal inference. Journal of the American statistical Association, 81(396):945–960, 1986.
 [2] Donald B Rubin. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469):322–331, 2005.
 [3] Susan Athey, Mohsen Bayati, Nikolay Doudchenko, Guido Imbens, and Khashayar Khosravi. Matrix completion methods for causal panel data models. Journal of the American Statistical Association, pages 1–15, 2021.
 [4] Walter F Willcox. The founder of statistics. Revue de l’Institut International de Statistique, pages 321–328, 1938.
 [5] John Graunt. Natural and political observations made upon the bills of mortality. Number 2. Johns Hopkins Press, 1939.
 [6] Stella Talic, Shivangi Shah, Holly Wild, Danijela Gasevic, Ashika Maharaj, Zanfina Ademi, Xue Li, Wei Xu, Ines MesaEguiagaray, Jasmin Rostron, et al. Effectiveness of public health measures in reducing the incidence of covid19, sarscov2 transmission, and covid19 mortality: systematic review and metaanalysis. bmj, 375, 2021.
 [7] Jason Abaluck, Laura H Kwong, Ashley Styczynski, Ashraful Haque, Md Alamgir Kabir, Ellen BatesJefferys, Emily Crawford, Jade BenjaminChung, Shabib Raihan, Shadman Rahman, et al. Impact of community masking on covid19: A clusterrandomized trial in bangladesh. Science, page eabi9069, 2021.
 [8] Ewen Callaway. Beyond omicron: what’s next for covid’s viral evolution. Nature, 600(7888):204–207, 2021.
 [9] Ting Tian, Jianbin Tan, Wenxiang Luo, Yukang Jiang, Minqiong Chen, Songpan Yang, Canhong Wen, Wenliang Pan, and Xueqin Wang. The effects of stringent and mild interventions for coronavirus pandemic. Journal of the American Statistical Association, pages 1–11, 2021.
 [10] Victor Chernozhukov, Hiroyuki Kasahara, and Paul Schrimpf. Causal impact of masks, policies, behavior on early covid19 pandemic in the us. Journal of econometrics, 220(1):23–62, 2021.
 [11] Gery P Guy Jr, Florence C Lee, Gregory Sunshine, Russell McCord, Mara HowardWilliams, Lyudmyla Kompaniyets, Christopher Dunphy, Maxim Gakh, Regen Weber, Erin SauberSchatz, et al. Association of stateissued mask mandates and allowing onpremises restaurant dining with countylevel covid19 case and death growth rates—united states, march 1–december 31, 2020. Morbidity and Mortality Weekly Report, 70(10):350, 2021.
 [12] Jenna Gettings, Michaila Czarnik, Elana Morris, Elizabeth Haller, Angela M ThompsonPaul, Catherine Rasberry, Tatiana M Lanzieri, Jennifer SmithGrant, Tiffiany Michelle Aholou, Ebony Thomas, et al. Mask use and ventilation improvements to reduce covid19 incidence in elementary schools—georgia, november 16–december 11, 2020. Morbidity and Mortality Weekly Report, 70(21):779, 2021.
 [13] Miriam E Van Dyke, Tia M Rogers, Eric Pevzner, Catherine L Satterwhite, Hina B Shah, Wyatt J Beckman, Farah Ahmed, D Charles Hunt, and John Rule. Trends in countylevel covid19 incidence in counties with and without a mask mandate—kansas, june 1–august 23, 2020. Morbidity and Mortality Weekly Report, 69(47):1777, 2020.
 [14] Sohee Kwon, Amit D Joshi, ChunHan Lo, David A Drew, Long H Nguyen, ChuanGuo Guo, Wenjie Ma, Raaj S Mehta, Fatma Mohamed Shebl, Erica T Warner, et al. Association of social distancing and face mask use with risk of covid19. Nature Communications, 12(1):1–10, 2021.
 [15] Katherine A Auger, Samir S Shah, Troy Richardson, David Hartley, Matthew Hall, Amanda Warniment, Kristen Timmons, Dianna Bosse, Sarah A Ferris, Patrick W Brady, et al. Association between statewide school closure and covid19 incidence and mortality in the us. Jama, 324(9):859–870, 2020.
 [16] Song Gao, Jinmeng Rao, Yuhao Kang, Yunlei Liang, Jake Kruse, Dorte Dopfer, Ajay K Sethi, Juan Francisco Mandujano Reyes, Brian S Yandell, and Jonathan A Patz. Association of mobile phone location data indications of travel and stayathome mandates with covid19 infection rates in the us. JAMA network open, 3(9):e2020485–e2020485, 2020.
 [17] Hamada S Badr, Hongru Du, Maximilian Marshall, Ensheng Dong, Marietta M Squire, and Lauren M Gardner. Association between mobility patterns and covid19 transmission in the usa: a mathematical modelling study. The Lancet Infectious Diseases, 20(11):1247–1254, 2020.
 [18] Thomas Hale, Noam Angrist, Rafael Goldszmidt, Beatriz Kira, Anna Petherick, Toby Phillips, Samuel Webster, Emily CameronBlake, Laura Hallas, Saptarshi Majumdar, et al. A global panel database of pandemic policies (oxford covid19 government response tracker). Nature Human Behaviour, 5(4):529–538, 2021.
 [19] Mostafa Shokoohi, Mehdi Osooli, and Saverio Stranges. Covid19 pandemic: What can the west learn from the east? International Journal of Health Policy and Management, 9(10):436, 2020.
 [20] Rajib Shaw, Yongkyun Kim, and Jinling Hua. Governance, technology and citizen behavior in pandemic: Lessons from covid19 in east asia. Progress in disaster science, 6:100090, 2020.
 [21] Mingming Ma, Shun Wang, and Fengyu Wu. Covid19 prevalence and wellbeing: Lessons from east asia. World Happiness Report 2021, page 57, 2021.
 [22] Paul R Rosenbaum, PR Rosenbaum, and Briskman. Design of observational studies, volume 10. Springer, 2010.
 [23] Sarah H Gordon, Nicole Huberfeld, and David K Jones. What federalism means for the us response to coronavirus disease 2019. In JAMA Health Forum, volume 1, pages e200510–e200510. American Medical Association, 2020.
 [24] Garrett Fitzmaurice, Marie Davidian, Geert Verbeke, and Geert Molenberghs. Longitudinal data analysis. CRC press, 2008.
 [25] Jonathan AC Sterne, Miguel A Hernán, Barnaby C Reeves, Jelena Savović, Nancy D Berkman, Meera Viswanathan, David Henry, Douglas G Altman, Mohammed T Ansari, Isabelle Boutron, et al. Robinsi: a tool for assessing risk of bias in nonrandomised studies of interventions. bmj, 355, 2016.
 [26] David Zweig. The cdc’s flawed case for wearing masks in school. The Atlantic, 2021.
 [27] David Michaels and Gregory R Wagner. Occupational safety and health administration (osha) and worker safety during the covid19 pandemic. Jama, 324(14):1389–1390, 2020.
 [28] Communications Perspectives: COVID19 Resources. Lessons from covid19 on executing communications and engagement at the community level during a health crisis. 2021.
 [29] Gill Newton. Diagnosing rickets in early modern england: Statistical evidence and social response. Social History of Medicine, 2021.
 [30] William G Cochran. The planning of observational studies of human populations. Journal of the Royal Statistical Society. Series A (General), 128(2):234–266, 1965.
 [31] Austin Bradford Hill. The environment and disease: association or causation?, 1965.
 [32] Richard Doll and Austin Bradford Hill. Mortality in relation to smoking: ten years’ observations of british doctors. British medical journal, 1(5395):1399, 1964.
 [33] Anna M Lee and Joseph F Fraumeni Jr. Arsenic and respiratory cancer in man: an occupational study. Journal of the National Cancer Institute, 42(6):1045–1052, 1969.
 [34] Norman E Breslow, Nicholas E Day, and Elisabeth Heseltine. Statistical methods in cancer research, volume 1. International Agency for Research on Cancer Lyon, 1980.
 [35] Paul R Rosenbaum. From association to causation in observational studies: The role of tests of strongly ignorable treatment assignment. Journal of the American Statistical Association, 79(385):41–48, 1984.
 [36] Guido W Imbens and Donald B Rubin. Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, 2015.
 [37] Marc Lipsitch, Eric Tchetgen Tchetgen, and Ted Cohen. Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology, 21(3):383, 2010.
 [38] Wang Miao, Xu Shi, and Eric Tchetgen Tchetgen. A confounding bridge approach for double negative control inference on causal effects. arXiv preprint arXiv:1808.04945, 2018.
 [39] Geert Verbeke, Steffen Fieuws, Geert Molenberghs, and Marie Davidian. The analysis of multivariate longitudinal data: a review. Statistical methods in medical research, 23(1), 2014.
 [40] Jasson DM Rennie and Nathan Srebro. Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22nd international conference on Machine learning, pages 713–719, 2005.
 [41] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
 [42] Emmanuel J Candès and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717–772, 2009.
 [43] Emmanuel J Candès and Terence Tao. The power of convex relaxation: Nearoptimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053–2080, 2010.
 [44] Vladimir Koltchinskii, Karim Lounici, and Alexandre B Tsybakov. Nuclearnorm penalization and optimal rates for noisy lowrank matrix completion. The Annals of Statistics, 39(5):2302–2329, 2011.
 [45] Sahand Negahban and Martin J Wainwright. Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. Journal of Machine Learning Research, 13(1):1665–1697, 2012.
 [46] Rahul Mazumder, Trevor Hastie, and Robert Tibshirani. Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11:2287–2322, 2010.
 [47] R. Ge, J. Lee, and T. Ma. Matrix completion has no spurious local minimum. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29, 2016.
 [48] Chao Gao, Yu Lu, Zongming Ma, and Harrison H Zhou. Optimal estimation and completion of matrices with biclustering structures. The Journal of Machine Learning Research, 17(1):5602–5630, 2016.
 [49] Tianxi Li, Elizaveta Levina, and Ji Zhu. Network crossvalidation by edge sampling. Biometrika, 107(2):257–276, 2020.

[50]
Shuhang Gu, Lei Zhang, Wangmeng Zuo, and Xiangchu Feng.
Weighted nuclear norm minimization with application to image
denoising.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pages 2862–2869, 2014.  [51] Nagarajan Natarajan and Inderjit S Dhillon. Inductive matrix completion for predicting gene–disease associations. Bioinformatics, 30(12):i60–i68, 2014.
 [52] MarieHélène Descary and Victor M Panaretos. Functional data analysis by matrix completion. The Annals of Statistics, 47(1):1–38, 2019.
 [53] Hossein Alidaee, Eric Auerbach, and Michael P Leung. Recovering network structure from aggregated relational data using penalized regression. arXiv preprint arXiv:2001.06052, 2020.
 [54] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1–22, 1977.
 [55] George Casella and Roger L Berger. Statistical inference. Cengage Learning, 2021.
 [56] B. Barak and A. Moitra. Noisy tensor completion via the sumofsquares hierarchy. In Conference on Learning Theory (COLT), pages 417–445, 2016.
 [57] A. Montanari and N. Sun. Spectral algorithms for tensor completion. Communications on Pure and Applied Mathematics, 71:2381–2425, 2018.
 [58] N. Ghadermarzy, Y. Plan, and O. Yilmaz. Nearoptimal sample complexity for convex tensor completion. Information and Inference: A Journal of the IMA, 8:577–619, 2019.
 [59] D. Xia, M. Yuan, and C.H. Zhang. Statistically optimal and computationally efficient low rank tensor completion from noisy entries. The Annals of Statistics, 49:76–99, 2021.
 [60] Alberto Abadie, Alexis Diamond, and Jens Hainmueller. Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program. Journal of the American statistical Association, 105(490):493–505, 2010.
 [61] Erin Kissane and Jessica Malaty Rivera. How not to interpret covid19 data. The COVID Tracking Project, 2021.
 [62] Alan Agresti. Categorical data analysis, volume 482. John Wiley & Sons, 2003.
 [63] Tamara G Kolda and Brett W Bader. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.
 [64] Qingquan Song, Hancheng Ge, James Caverlee, and Xia Hu. Tensor completion algorithms in big data analytics. ACM Transactions on Knowledge Discovery from Data, 13(1):1–48, 2019.
 [65] Silvia Gandy, Benjamin Recht, and Isao Yamada. Tensor completion and lownrank tensor recovery via convex optimization. Inverse problems, 27(2):025010, 2011.
 [66] Ryota Tomioka, Kohei Hayashi, and Hisashi Kashima. Estimation of lowrank tensors via convex optimization. arXiv:1010.0789, 2010.
 [67] C. Hillar and L.H. Lim. Most tensor problems are nphard. Journal of the ACM, 60:1–39, 2013.
 [68] M. Yuan and C.H. Zhang. On tensor completion via nuclear norm minimization. Foundations of Computational Mathematics, 16:1031–1068, 2016.
 [69] M. Wainwright. Highdimensional statistics: A nonasymptotic viewpoint. Cambridge University Press, 2019.
 [70] J.A. Tropp. Userfriendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4):389–434, 2012.
Appendix
A Proof of Theorem 1
Our proof can be decomposed into three main parts: I) Derivation of the main inequality, II) Empirical process control, III) Verification of a technical condition known as restricted strong convexity (RSC). These three parts are in alignment with standard analysis schemes of matrix completion and related problems in highdimensional estimation [69], while the details of the individual steps have been tailored to the setup under consideration.
Part I: Derivation of the main inequality
We start by expanding the model into its matricized
counterpart: denote , the observed (i.e. nonmissing) entries are of the form
where denotes the total number of observed entries including both the outcome of interest and the control outcomes, , denote the row and column indices of the observed entries in the matricized tensor , and the denote the corresponding random error terms contained in .
Accordingly, the objective function in (8) can be expressed as
Since is a minimizer of (8), we obtain the basic inequality
Expanding the square on the left hand side and rearranging terms yields
(10) 
where denotes the linear operator defined by and denotes the adjoint operator given by with and denoting the canonical basis vectors of and , respectively.
Let the singular value decomposition of
be given by with , , and , and consider the associated matrix subspacewhere the columns of and span the orthogonal complement of the column spaces of and , respectively. Denote by the orthogonal complement of . Then the following properties hold true (cf. [69],