1 Introduction
Raking (Deming and Stephan, 1940) generally assists categorical data modeling, such as loglinear models, with known margins and plays a key role to adjust for selection bias ensuring the sample representativeness of the targeting population. In practice, it often acts as the final step in survey weighting adjustment to match the unbalanced marginal distributions of variables that affect inclusion (selection and response) from the sample to the benchmark control information (e.g., census records). Recently raking is widely used to incorporate auxiliary variables that are available to calibrate statistical inference, for example, to analyze missing data (Little and Rubin, 2002) and evaluate causal relationships (Schouten, 2018). Borrowing benchmark or auxiliary data from external sources improves the efficiency and reliability of survey inference, in particular when the raking factors are highly correlated with the survey outcome. Nonprobability based sampling surveys emerge because of operational feasibility and cost efficiency and demand rigorous adjustment for selection bias under the potentially informative sampling scheme. However, the generality or the quality of nonprobability survey inference relies on poststratification or calibration with benchmark information, where raking is generally applied since often only marginal distributions are available.
While utilizing the iterative proportional fitting (IPF) algorithm for fast computation, raking is faced with methodological and computational challenges due to the wide use and contemporary survey practice. Motivated by survey operation, we propose new strategies to improve raking in terms of variable selection, convergence guarantee and inferential efficiency.
Practical raking often adjusts sociodemographics, the marginal distributions of which are easily accessible. Nevertheless, to satisfy the likelihood principle, all variables and their potential highorder interactions that affect the sampling inclusion should be adjusted (Rubin, 1983; Little, 1983; Gelman et al., 2014)
. Raking can compensate the aforeimplemented weighting steps, such as inverse inclusion propensity score weighting and nonresponse adjustment, which usually cause high variability with extreme weight values. Weight trimming reduces variance but increases bias, again creating an imbalance. The discrepancy relies on raking to adjust a large number of raking variables that affect selection, coverage, and response propensities. IPF will suffer from convergence problems with numerous variables. There is no instruction checking whether IPF has converged to the correct estimation and how it affects the inference of complex survey data. It is crucial to inform which, how many and how detailed variables should be adjusted via raking, which is often arbitrarily determined in practice.
Brick et al. (2003) and Battaglia et al. (2013)
have reviewed the problems of raking in survey practice. They conclude that inconsistencies in the control totals, correlated raking dimensions, sparse tables, and measurement bias can cause severe issues in raking such as convergence difficulty and high variability. When the contingency table constructed by the crosstabulation of raking variables is sparse or presents empty cells, IPF will suffer from a computational burden and fail to converge. From the theoretical perspective, the inference after raking conditional on empty cells is not consistent with the design or response mechanism. When the sample cells have zero counts, raking treats the corresponding population cells as empty and thus results in estimation bias, in particular for domain inference, Moreover, variance estimation is challenging after raking. Resampling techniques accommodating complex sample design features are used to obtain standard errors but often ignore the nonresponse and poststratification adjustment, and thus fail to account for all sources of uncertainty. The validity of raking calls for rigorous evaluation.
We would like to develop a Bayesian framework for raking that settles IPF convergence issue, stabilizes domain inference, and integrates all sources of uncertainty to ensure inferential validity and efficiency while accounting for complex design and response features and incorporating marginal constraints.
The methodology development is motivated by our survey weighting practice for the New York City (NYC) Longitudinal Study of Wellbeing (LSW) (Wimer et al., 2014), which is a biennial survey (2013–2014) aiming to provide assessments of income poverty, material hardship, and child and family wellbeing of NYC adult residents. The sample includes a phone sample based on random digit dialing and an inperson respondentdriven sample of beneficiaries from Robin Hood philanthropic services and their acquaintances. We focus on the phone survey here as an illustration. The LSW phone survey interviews 2002 NYC adult residents based on 500 cell phone calls and 1502 landline telephone calls, where half of the landline samples are from poor NYC areas defined by zip code information. These collected baseline samples are followed up for every three months. The sample is not representative of the target population due to the oversampling design and differential inclusion propensities. To balance the discrepancies, we match the samples to the 2011 American Community Survey (ACS) for NYC adult records.
The baseline weighting process (Si and Gelman, 2014)
adjusts for the unequal probability of selection, nonresponse and coverage bias. To generate the classical weights, the poststratification weighting variables include sex, age, education, race/ethnicity, poverty gap and the number of children in the family. We also include highorder interaction terms in the weighting process: the twoway interaction between age and poverty gap, and the twoway interaction between the number of children in the family and poverty gap. The selection of variables for weighting is subjective depending on the conventional weighting practice routine and the organizers’ analysis of interest, for example, how many NYC children living in poverty. Raking is implemented by matching the marginal distributions of the selected factors. IPF fails to converge under the default setting but converges after increasing the number of iterations to be sufficiently large. However, it is unclear whether the computation has converged to the correct estimation, and how to perform diagnostics to assess the weighting process. An objective procedure with evaluation of statistical properties is necessary.
Gelman (2007) listed the struggles for survey weighting and advocated connecting regression models and sampling design. Modelbased approaches for survey data analysis are essentially predictions of nonsampled units based on the sample and the outcome model and well equipped to handle complex sampling design (Little, 2004). The general inference framework jointly models the survey outcome and the sampling inclusion indicator (Smith, 1983), where the sample size is a random draw from the inclusion model based on the selection and response mechanism.
Our goal is to combine weighting and prediction under multilevel regression and poststratification (MRP, Park et al. (2005); Ghitza and Gelman (2013); Si et al. (2015, 2018)) as a unified framework for survey inference. MRP starts from the cell estimates of the contingency table constructed by the discrete variables that affect sample inclusion and poststratifies with control information to yield the domain or overall population inference with weights proportional to population cell sizes. Inside each cell, individuals are included with equal probability. This poststratification cell structure is robust against model misspecification (Little, 1991, 1993; Gelman and Little, 1997; Gelman and Carlin, 2001).
MRP has two key elements: 1) poststratification accounting for the design information, and 2) the multilevel models used for cell estimation stabilize small area estimation (Rao and Molina, 2015). Suppose is the estimand of the survey outcome , such as the population mean or domain mean, and it can be expressed as a weighted sum over related poststrata, where is the corresponding estimand, is the population cell size in cell , and is the domain of interest. Hierarchical models are applied to stabilize the cell estimates of ’s with sparse structure and small sample size.
Currently, MRP assumes the population poststratification cell sizes—the joint distribution of calibration variables—are known, which is often not the case. Rather, marginal distributions of the poststratification variables are usually given in practice. Raking provides a supplemental tool for the population poststratification cell size estimation. We develop a Bayesian procedure to incorporate known margins into modeling, that is, Bayesraking, and embed Bayesraking with MRP to build up a modelassisted and designbased framework for survey inference. The proposed MRP estimator will be of the general form,
(1) 
where is the corresponding estimate and is the population cell size estimate in cell .
Greenland (2007); Kunihama and Dunson (2013); Schifeling and Reiter (2016) incorporate marginal prior information by appending synthetic observations or pseudorecords to the original data, and the degree of prior uncertainty is controlled by the size of the augmented sample. However, such data augmentation is computationally demanding and not operationally feasible with a large number of raking factors.
As an innovation, we transfer the marginal constraints as prior distributions, jointly model the survey outcome and the inclusion indicator as the likelihood, and generate the posterior samples of ’s, ’s and then . We evaluate the Bayesian procedure with frequentist randomness properties as calibrated Bayes (Little, 2011). Being able to capture all sources of uncertainty and make inferences, Bayesraking performs at least equivalently to classical raking with large sample size and outperforms when IPF suffers from computation problems.
Gelman et al. (1995) proposes Bayesian IPF, a Bayesian analog of IPF, and Schafer (1997); Little and Rubin (2002) present detailed discussions about the properties and examples. Focusing on the estimation of loglinear models with a crosstabulated contingency table, Bayesian IPF induces a Dirichlet prior distribution to the cell probabilities and generates draws of loglinear model parameters from the fully conditional Dirichlet distributions. Schafer (1997)
points out that the Dirichlet prior is naive because the cell probabilities are treated as unordered and the crossspecified structure of the contingency table is ignored. However, the crossclassified structure is fundamental to the quality of loglinear models. Alternative Dirichlet prior distributions have been proposed to reasonably incorporate prior distributions utilizing the structure of loglinear model
(Good, 1967; Bishop et al., 1974). Normal prior distributions are discussed by Laird (1978); Knuiman and Speed (1988), although conceptually attractive, but not widely used due to computational difficulty. Lazar et al. (2008) develop a noninformative Bayesian approach to finite population sampling using auxiliary variables, and the proposed prior distribution accounts for the crossclassified structure. However, the large number of free parameters in the approach of Lazar et al. (2008) often causes computational problems. Our new methods improve the computation and develop a general subspace format that covers the estimator in Lazar et al. (2008) as a special case.Current methods omit cells with zero sample sizes and can be stuck by boundary estimates with a sparse table. It is nontrivial to adjust the degree of freedom due to empty cells and correct for the estimation bias
(Bishop et al., 1974). Moreover, the existing approaches cannot handle the issues with raking such as inconsistencies in the control totals or correlated raking dimensions. We propose to treat the known margins as realizations drawn from the true models and account for the correlation between raking factors based on the hierarchical structure. The recent software development of Stan solves the hurtle of posterior computation with nonconjugacy (Stan Development Team, 2017, 2018) and modifies the Hamiltonian Monte Carlo sampler with adoptively setting path lengths (Hoffman and Gelman, 2014). We will examine normal priors and use Stan for fully Bayesian inference. As open source and userfriendly software, Stan improves computational performance and advocates the modelbased survey inference.
The paper is organized as below. Section 2 starts from a twovariable illustration and describes the details of the proposed Bayesraking estimation with three main components: the margin model, the selection model, and the outcome model. We compare Bayesraking with raking via simulation studies in Section 3 and apply to the LSW survey in Section 4. Section 5 concludes the contributions and further extensions.
2 Bayesraking
Deming and Stephan (1940) introduce raking and IPF to obtain the population cell count estimation in a contingency table subject to certain marginal constraints. Early work on loglinear modeling relies almost exclusively on IPF (e.g., Fienberg (1970); Bishop et al. (1974)
). As an illustration, we consider a twoway contingency table constructed by two categorical variables
and in the survey data of size , with the marginal distributions of and available from a census of size . Let be the population cell count with , and be the known margins, and be the corresponding sample cell count, for and . Here and denote the total number of levels for variables and . The sample cell counts are adjusted to estimate population cell count to the known margins and , such that(2) 
Stephan (1942) shows that the raking estimates of the population cell counts ’s have the form,
(3) 
with suitable choices of , and . The raking estimates minimize the KullbackLeibler (KL) divergence subject to the marginal constraints in (2) (Ireland and Kullback, 1968).
Moving from the discrimination information, Little and Wu (1995) consider four different distance functions of and develop the equivalent maximum likelihood estimation (MLE) under the corresponding models with the distance measures relating the population and the sample. Their comparison finds that the raking estimator is more efficient than the competitors with a smaller mean squared error for large samples. Meng and Rubin (1993) show that IPF is an expectation/conditional maximization, so the raking estimator shares asymptotic properties similar to MLE and is largesample Bayes.
When all the sample cell counts are nonzero, Ireland and Kullback (1968) prove that IPF can converge with the discussed rate. However, when empty cells occur, convergence can only be achieved under certain conditions and sampling schemes that are not always satisfied (Goodman, 1968; Bishop and Fienberg, 1969; Fienberg, 1970). In practice, a modest number of raking variables can create a huge contingency table with numerous sparse and empty cells. The raking computation calls for new development beyond IPF.
Based on (3) raking adjusts a table to given marginal totals and preserves the interaction structure—defined by the crossproduct ratios—of the observed sample table. The retained associations are based on the sample and then subject to sampling variability. With small sample sizes, it is possible that the sample associations deviate from the population associations, resulting in high variability and inconsistent inference.
Moreover, correlated raking variables can exacerbate the inconsistency phenomenon. When the main effects are determined by the dependent marginal totals, the observed sample correlation may not be appropriate to reflect the true correlation, and IPF will never converge. Brick et al. (2003) give an example of age and grade in a twodimension contingency table, and the correlation causes substantial variation in the raking adjustment factors for the cells even though none of the margins differ much between the sample and the control. With small samples, the raking adjustment factors are possibly below 1 for both age and grade according to the marginal totals. However, the observed sample cell count may be larger than that for the population cell, leading to the failure of IPF. Meanwhile, raking often adjusts highorder interaction terms that are treated as independent ignoring the hierarchy of the highorder interaction terms, which will cause estimation bias and inconsistency.
Bayesraking solves the IPF computational problems, propagates variation and achieves inferential stability with sparse contingency tables under the hierarchical structure and informative prior distributions. We integrate (2) with variants of (3) into a Bayesian paradigm to incorporate the marginal constraints into the estimation for the population cell counts and then for the finite population and domain inference. For large samples, Bayesraking yields MLE that is equivalent to the raking estimator; for small samples, Bayesraking is superior to raking with stability and efficiency.
Continuing the twovariable example, let be the
length vector of known margins for two variables
and , be the length vector of cell size estimators, and the loading matrix satisfy the constraint of known margins in (2), that is, .To transfer the marginal constraints into Bayesian modeling, we consider two strategies: constructing the solution subspaces and modeling with soft constraints. First, the solution space approach examines the null space of . The dimension of the null space of is 1, and the basis of Null is . We introduce one free variable and set an initial estimate . The basis solution subspace to the linear constraints (2) is given by
If we use the projection matrix of the bases C: and assume
is uniformly distributed over the space
, this provides an alternative estimate with the projection solution subspace The second method assumes that the margins are drawn from models, such as , where the constraints are treated as soft and incorporated via modeling mean structure but allow for variation.We propose an integrated procedure under MRP that incorporates the marginal constraints as a unified framework that stabilizes the overall population and domain inferences and accounts for design and response mechanisms. The modelbased inference with Bayesraking has three main components: 1) the margin model incorporating the known constraints; 2) the model for inclusion probability; and 3) the model to predict survey outcome. We describe the general notation for Bayesraking in the following.
2.1 The margin model
In the basic setting, the variables that affect sample inclusion propensities are used for raking to balance the sample and the population, and their crosstabulation constructs poststratification cells. Here we assume ’s are discrete (after discretization of continuous variables) with possible values , where is the total number of levels. All the possible combination categories of are labeled as cell , with value , , where ’s are indicators, the population cell size and the sample cell size , for , where is the total number of poststratification cells. The total population size is , and the sample size is . Denote , and . The joint distribution of is unknown, but the marginal distributions are available. That is, the population cell counts ’s are unknown, and the estimators ’s are subject to marginal constraints:
(4) 
where ’s denote the known margins for the th level of , . Let be the vector of known margins for Variable , and be the vector of known margins for all variables. Denote as all cell size estimators and introduce a loading vector that maps the marginal constraint: . Denote the loading matrix , and the loading matrix . Then we have . The constraints in (4) have the equivalent expression as
(5) 
The null space of the matrix is defined as with dimension . Suppose a basis of is , introduce a vector of free variables , and we have .
The basis solution subspace to the linear constraints (5) is defined as
(6) 
where is an initial estimator for and can be the raking estimate.
As an alternative solution, the projection solution subspace is defined as
(7) 
where is the projection matrix of the basis , and is uniformly distributed over the space .
We find that the estimator (7) is the same as that proposed by Lazar et al. (2008). Because the number of free variables of in the projection solution subspace is larger than that in the basis solution subspace, that is, , even though the linear combination is used in the estimate, the computation of the projection solution subspace becomes more difficult.
When , the number of basis functions of , is small, the subspace approaches yield stable inferences under noninformative prior distributions, as demonstrated in our simulation studies. The subspace approach can suffer from convergence problem due to the randomness of massive free variables , however, with a large number of sparse or empty cells. It is nontrivial to adapt the generation of . While prior distributions could be introduced to , the convergence is hard to achieve and calls for structural regularization. In our experiment, we assign the horseshoe distribution (Carvalho et al., 2010) as the prior to but fail to achieve convergence even with a modest number of raking variables.
This motivates us to propose the second strategy where the marginal constraints in (5) can be softened as model realizations
(8) 
where the marginal constraints are treated as random variables following a Poisson distribution as a natural choice for the counts. Alternative choices include different variance specifications under different distributions. Comparing Poisson distribution with normal distributions with different variance assumptions, our simulation and empirical outputs are not sensitive about the distribution or variance specification in terms of estimation of the population cell counts and the finite population and domain inference.
For empty cells, we introduce a weakly informative prior distribution on ,
, a half Cauchy distribution with mean 10 and standard deviation 3 restricted to positive values. The modestly small mean value
is chosen because small cells in the population tend to generate empty cells in the sample, and the modestly large scaleallows for flat and heavy tails in the Cauchy distribution. The outputs are invariant to the hyperparameter choices in the weakly informative prior distribution setting.
It is straightforward to incorporate additional information through an additional layer of prior distributions in model (8). For example, we can introduce a loglinear distribution or Dirichlet distribution as the prior belief on the population cell size proportions.
2.2 The sample inclusion model
The poststratification cell construction implicitly assumes that the units in each cell are included with equal probabilities. We have , where denotes the selection probability for cell . With a large contingency table, this can be approximated by
(9) 
where the Poisson distribution can facilitate model fitting via Stan. Assume
(10) 
Here is the coefficient for the indicator . Without loss of generality, the first level of each raking variable is set as the reference level. The specified model only includes the main effects of the raking variables, the same as the raking model (3), but differs by using the logit link, rather than the log link. Our analysis results are similar under the two link choices. The logit link is an intuitive choice to model the binary sample indicator and facilitates the Stan computation. Because of known margins, the selection model can only be identified from the sample data when main effects are included. When the number of raking variables or the levels of raking variable is large, informative prior distributions can be elicited to regularize the estimation, and multilevel models can be fit to smooth the estimates.
The informative prior distribution in Bayesraking increases the compatibility of including highorder interaction terms among the raking variables that may affect the selection probabilities. Under raking, the correlations are determined by the sample structure while the marginal constraints do not provide extra information. Hence, the classical raking model does not allow the inclusion of highorder interactions. If substantive knowledge shows the true selection probability model depends on highorder interaction terms, we can add such interactions in Bayesraking and induce proper prior information for model identification. Meanwhile, if the included highorder interaction terms are nonpredictive, Bayesraking under informative prior distributions will shrink the corresponding coefficients toward 0.
2.3 The outcome model
Inside each poststratification cell, the units are identically and independently distributed. For simplicity, when the outcome is continuous, assume ; for binary , consider , where denotes cell that unit belongs to and with a multilevel model specification. Besides the
variables that affect inclusion, we can add other variables that are predictive for the outcome. Ordinary linear or logistic regression models have
. We recommend flexible modeling strategies to predict survey outcome that are robust against misspecification and can stabilize the inference. Si et al. (2015) introduce Gaussian process as the prior distributions for the functional mean structure under MRP. Alternatives include penalized splines models (Zheng and Little, 2003, 2005; Chen et al., 2010) that perform robustly with efficiency gains comparing with classical weighted inference.Under MRP, the cell estimate is a weighted average between the sample cell mean and the overall mean. Hence, Bayesraking estimate uses smoothed predictions, while classical raking uses collected sample cell means that are subject to sampling variability.
2.4 The unified framework
The integrated Bayesraking procedure systematically induces the marginal constraints as prior distributions, jointly models the survey outcome and the inclusion indicator as the likelihood, and generates the posterior samples of ’s, ’s and then the MRP estimator (1). For an example with a continuous outcome, the model specification is summarized as
Generally, Bayesraking distinguishes from classical raking in five main aspects:

Bayesraking assumes a Bayesian model for the known margins and introduces uncertainty, while raking uses IPF to incorporate the constraint.

Bayesraking treats the sample cell sizes as random variables that are fixed by raking.

Bayesraking uses logit link while raking uses log link, resulting in different distance measures.

Bayesraking uses predicted outcome values after smoothing in the MRP estimator, while raking uses the observed sample values.

Bayesraking predicts population cell counts and estimates for empty cells that are omitted by raking.
This openended framework can accommodate additional hierarchical structure. The three components of the unified inference framework are fitted in Stan as an integrated procedure, and full Bayesian inference is implemented that generates posterior samples of model parameters and population quantities of interest. The computation via Stan is efficient with a userfriendly interface. The codes are deposited in GitHub (Si, 2019) and ready for public use.
3 Simulation studies
We use simulation studies to evaluate the Bayesraking estimation and compare with IPF. We treat the 2011 ACS survey of NYC adult residents as the “population” and perform repeated sampling to check the statistical validity. We randomly draw 150 repetitions out of the ACS according to a prespecified selection model. We simulate the outcome variable in the ACS as the truth. We implement the three Bayesraking approaches depending on the margin model choices: the projection solution subspace using the projection matrix that is similar to Lazar et al. (2008); the basis solution subspace using the basis functions; and the Bayesmodel with soft modeling constraints.
Focusing on the overall population and domain mean estimates, we calibrate the Bayesraking estimation by examining the bias, root mean squared error (RMSE), standard error (SE) approximated by the average of standard deviations and nominal coverage rate (CR) of the 95% confidence intervals. When no empty cells occur, our experiments show that the Bayesian approaches perform as well as raking with similar outputs. Hence, we present the cases with empty cells when the theoretical properties of IPF are not satisfied: the scenarios with a sparse contingency table and dependent raking variables.
3.1 Sparse contingency table
Survey practice often selects a modest number of raking variables, resulting in a sparse contingency table. The sparsity will be severe with an increasing number of raking variables. As an illustration, we collect five raking variables in the 2011 ACS data: age (age: 18–34, 35–44, 45–54, 55–64, 65+), race/ethnicity (race: white & nonHispanic, black & nonHispanic, Asian, Hispanic, other), education (edu: less than high school, high school, some college, bachelor degree or above), sex, and poverty gap (pov: under 50%, 50100%, 100200%, 200300%, 300%+; Higher pov value means higher income). The crosstabulation of the five ACS raking variables constructs 1000 () poststratification cells, 14 cells have no units, and the nonempty cells sizes vary between 1 and 1283.
Assume the outcome is binary and generated via a logistic regression: , where represents the main effects for the five ACS raking variables of unit . The nonzero components of the coefficient vector are . We consider the sample selection model without nonresponse: , where is the selection probability of unit . The coefficient vector has nonzero components: . The values of and are chosen to mimic the exploratory analysis structure of the LSW study.
Assume the marginal distributions of the five raking variables are known, but not the joint distribution. We are interested in the overall population mean estimation, the subgroup means of marginal levels for each raking variable, and the subgroup means of the twoway interaction between age and poverty gap. We implement the three Bayesraking estimation approaches and IPF. The two subspace approaches cannot converge with 986 cells because of computational difficulty with the high dimensional free variables. Therefore, we only report the outputs of Bayesraking with the soft Poisson modeling constraint and IPF.
To emphasize the comparison between Bayesraking and raking on their capability of accounting for design effect and calibration, we set the fitted outcome model the same as the data generation model. The true model parameters can be recovered via Stan. In practice, we recommend flexible models with diagnostics and evaluation as Section 4. We repeatedly draw samples from the ACS data with sample sizes around 2000, where the number of empty cells is between 368 and 424 across repetitions.
For the overall population mean, both methods give similar results, with the same mean estimate (0.59), SE (0.01) and RMSE (0.01) and similar coverage rates, 0.94 under Bayesraking and 0.96 under raking. When looking at the subgroup mean estimates shown in Figure 1, Bayesraking outperforms raking. Bayesraking and raking generate similar estimates that are generally unbiased though raking has negligibly smaller biases. Bayesraking yields smaller RMSE and SE with more reasonable nominal coverage rates than raking, particularly for the subgroups defined by the interaction. Raking has low coverage rates for the domain mean estimates of the groups who are older than 35 with low income (poverty gap below 50%), where the coverage rates are below 0.90 (0.82, 0.86, 0.89, 0.87). These groups have relatively small population cell sizes, between 330 and 586. Raking fails to yield valid inferences with small cell sizes. With empty cells in the sample, the theoretical guarantee for IPF convergence is not satisfied. Bayesraking treats the sample cell sizes as random variables generated from the Poisson models with the selection probability model and thus performs stably with small or even empty cells.
3.2 Dependent raking variables
When the raking variables are correlated, IPF can suffer from convergence problems and generate inferences with high variability. To examine the performances of Bayesraking and raking with dependent raking variables, we collect three ACS variables: age, poverty gap and the number of children in the family () and use the twoway interaction terms in the raking adjustment: and . This scenario is motivated by the substantive analysis questions of interest, such as the percentage of elders or children living in poverty.
Both twoway interaction terms have and are correlated, with a high value . However, IPF treats the interactions as independent in a twodimension contingency table (). The proposed Bayesraking estimation accounts for the crossclassified structure and thus the correlation between the two interaction terms.
The three variables construct 100 poststratification cells. We treat the ACS data as the population with population cell sizes varying between 12 and 5462 with a median 248. The assumed selection model includes main effects and twoorder interactions: . The nonzero components of the coefficients are . Across 150 repeatedly selected samples, the sample sizes are centered around 2000 with the number of empty cells close to 5, which tend to be middleaged (4554) or elder (65+) individuals with low income (pov50%) and many children (3+). The sample cell sizes are between 1 and 127 with a median 10.
We consider the outcome model: , , the implicit model for the HorvitzThompson estimator (Horvitz and Thompson, 1952), to introduce high correlation between the outcome and the selection probability. When the selection probability is small, the outcome has a large mean and high variance. The estimation model is the same as the data generation model. We compare the three Bayesraking estimation approaches and IPF.
Table 1 displays the comparison for the overall mean estimation, where the four methods perform similarly well. IPF generates negligibly smaller bias but larger SE and RMSE than the Bayesian approaches. Because of the wide confidence intervals, IPF has more conservative coverage than the Bayesian approaches, the coverage rates of all which are reasonable.
Mean  Bias  RMSE  SE  CI length  Coverage  

Projection  28.71  0.23  0.57  0.52  2.04  0.94 
Basis  28.71  0.23  0.58  0.52  2.02  0.93 
Bayesmodel  28.75  0.26  0.57  0.55  2.17  0.94 
Raking  28.48  0.00  0.70  0.76  2.98  0.97 
The efficiency gains of Bayesian approaches are further demonstrated in the domain inference in Figure 2, where the IPF algorithm yields substantially larger SE and RMSE than the Bayesian approaches, times, even with competitive bias and coverage rates. The correlation between the two raking factors makes the algorithm hard to converge and yields highly variable inference. The Bayesian approaches reconcile the hierarchy structure with the highorder interactions and are capable of incorporating the known margin constraints while propagating the uncertainty.
Among the three Bayesian approaches, the performances are generally similar in terms of SE and RMSE. The Bayesraking estimation with modeling yields more reasonable coverage rates than the subspace approaches that could yield low coverage rates. The subspace approaches have large bias and low coverage rate when estimating the domain mean for those whose age is between 35 and 44. This could be related with the computational performance of the subspace approaches, where the convergence is sensitive about the solution path and hard to be guaranteed during the repeated sampling process. We generally recommend the Bayesraking estimation with soft modeling constraints that perform stably under a large number of raking variables with dependency.
4 LSW study application
We apply Bayesraking to the LSW study. The poststratification weighting factors include sex, age, education, race, poverty gap, the number of children in the family, the twoway interaction between age and poverty gap, and the twoway interaction between the number of children in the family and poverty gap. We collect the marginal distributions of the poststratification factors from ACS^{5}^{5}5ACS has its own weights, and the marginal distributions have accounted such weights to be representative of the population. and implement the two raking methods. Similar to the cases in Section 3, some raking variables are correlated. IPF treats these factors independently, while Bayesraking procedure accounts for the hierarchy structure of highorder interactions.
Bayesraking  Raking  

Overall  0.607 (0.012)  0.604 (0.012) 
Age: 1834  0.613 (0.022)  0.622 (0.025) 
Age: 3544  0.645 (0.024)  0.651 (0.027) 
Age: 4554  0.675 (0.024)  0.667 (0.027) 
Age: 5564  0.584 (0.023)  0.569 (0.028) 
Age: 65+  0.48 (0.025)  0.473 (0.027) 
Less than high sch  0.742 (0.022)  0.799 (0.028) 
High sch  0.711 (0.021)  0.711 (0.026) 
Some coll  0.674 (0.021)  0.669 (0.028) 
Bachelor or above  0.394 (0.017)  0.361 (0.021) 
Male  0.555 (0.019)  0.555 (0.021) 
Female  0.648 (0.014)  0.647 (0.015) 
Pov: 50%  0.785 (0.025)  0.859 (0.031) 
Pov: 50100%  0.763 (0.024)  0.782 (0.03) 
Pov: 100200%  0.756 (0.021)  0.747 (0.029) 
Pov: 200300%  0.684 (0.03)  0.705 (0.033) 
Pov: 300%  0.453 (0.016)  0.434 (0.019) 
White&nonHisp  0.424 (0.019)  0.417 (0.022) 
Black&nonHisp  0.739 (0.018)  0.737 (0.021) 
Asian  0.444 (0.035)  0.419 (0.055) 
Hisp  0.727 (0.041)  0.729 (0.051) 
Other  0.803 (0.018)  0.829 (0.017) 
None cld  0.553 (0.014)  0.542 (0.016) 
#Cld: 1  0.679 (0.026)  0.688 (0.031) 
#Cld: 2  0.706 (0.028)  0.732 (0.029) 
#Cld: 3+  0.736 (0.037)  0.766 (0.041) 
The outcome of interest is the material hardship indicator: indicates individual suffers from material hardship; otherwise. We perform model evaluation and comparison using the leaveoneout cross validation (Vehtari et al., 2017), and the outcome model with only main effects outperforms those with interactions by generating competitive expected log pointwise predictive density with the smallest number of parameters. Hence the selected outcome model includes the main effects of sex, age, education, race, poverty gap and the number of children in the family.
We compare the performance of Bayesraking and raking on the proportion estimation of those who have material hardship among all NYC adult residents and subgroups of interest. The two subspace approaches in the Bayesian paradigm cannot converge, so we focus on the comparison between Bayesian modeling with soft constraints and IPF.
Table 2 and Figure 3 present the proportion estimation for NYC residents suffering from material hardship using the Bayesraking and raking procedures, respectively. Table 2 displays the estimation based on all NYC residents and subgroups defined by the marginal levels of the six raking variables. In Table 2, while the estimates are similar among the two approaches with overlapping confidence intervals, Bayesraking generally yields smaller standard error than raking. The similarity is due to the large sample sizes as expected. Bayesraking performs more efficiently by borrowing information across subgroups, with substantial efficiency gain for small sample sizes. As illustrated in Figure 3, the domain inference is more efficient with shorter confidence intervals under Bayesian raking and is significantly different for some groups between the two approaches.
Comparing with classical raking, Bayesraking generates significantly lower proportions of NYC residents with material hardship for families without any child and poverty gap 50100% or above 300%, families with one child and poverty gap above 300%, youth (1834) with poverty gap above 300%, and early middleaged (3544) with poverty gap 100200%, and has significantly higher estimates on material hardship proportion for those with two children and poverty gap 100200%, families with three children and poverty gap below 300%, and elder (65+) with poverty gap below 200%. Briefly, Bayesraking yields lower proportional estimates for material hardship for young NYC residents with a fewer number of children and higher income, but higher estimates for those with many children, elder, and lower income. The outputs based on Bayesraking are more reasonable for wellbeing studies. The LSW study oversamples poor individuals, and the estimates on material hardship without effective adjustment for sampling or nonresponse bias will be overestimated. Bayesraking substantively improves the adjustment by correcting the overestimation and presenting inferential efficiency.
Furthermore, only relying on the collected sample statistics, raking can yield unreasonable estimates and high variability. For youth (1834) with poverty gap 50100%, raking has a proportion with material hardship at 1 without any variability. However, Raking yields large standard errors for the families with three more children and poverty gap below 50% and youth (1834) with poverty gap above 300%. Bayesraking borrows information across subgroups to smooth the estimates and hence performs more robustly and efficiently than IPF adjustment.
5 Conclusion
We develop a Bayesian paradigm for raking to incorporate marginal constraints from two directions: utilizing basis and projection solution subspaces and modeling soft constraints. Bayesraking accounts for the crossclassified structure in a principled way and smoothes cell estimates especially with sparse contingency tables. Bayesraking is equivalent to raking as largesample MLE if the theoretical guarantee of IPF is satisfied, and outperforms raking when IPF suffers from convergence issues, such as cases with a sparse contingency table or dependent raking variables. Bayesraking yields statistical validity and efficiency gains, with the capability of uncertainty propagation.
We use Bayesraking to supplement MRP as a modelassisted, designbased approach to a unified framework for survey inference. Simulation experiments and application studies demonstrate the improvement of Bayesraking on frequentist properties and substantive findings. Furthermore, Bayesraking increases the weighting compatibility and is ready to be induced with additional prior distributions, which facilitates computation and information fusion. Bayesraking has great potential to calibrate nonprobability samples, especially the studies with informative sampling schemes and nonignorable nonresponse, and assist data integration with a systematic framework.
The integrated Bayesraking procedure opens several interesting directions worth further investigation. First, Bayesraking makes inference of the population cell counts, while raking minimizes the KL divergence between the estimates and the sample sizes. Little and Wu (1995) study various distance functions, such as the least square estimate, maximum likelihood estimation and estimate minimizing chisquared error. Bayesraking uses a logistic regression for the inclusion probability and a Poisson model for the cell size, which could result in a different distance measure. It is unclear how the various distance measures are related to design and asymptotic efficiency. Future work is needed to examine the theoretical properties and choose a proper distance measure that can directly access the estimation of population cell sizes as an intermediate inference goal and benefit the ultimate goal of stable finite population and domain inference. This is also related with weight smoothing on how to select proper loss or distance functions as an object to determine the optimal weights Särndal et al. (1992); Si et al. (2018).
Second, the modelbased approach uses survey outcome and then has potential efficiency gain. For probability sampling, ideally, the poststratification cells should be constructed using variables that predict both the outcome and selection, where withincell variability is minimized and betweencell variability is maximized. As a byproduct, the outcome can inform variable selection for raking. The variables that are not predictive of the outcome could be omitted from raking, in particular when raking suffers from convergence issues with sparse cell structure. Bayesraking can be extended by imposing connections between the coefficients in the outcome model and those in the selection model. This shall facilitate computation and inference efficiency, but demand a tradeoff between efficiency and robustness.
Furthermore, we recommend flexible models for the outcome, such as nonparametric Bayesian models, to achieve inferential stability against model misspecification. Extra information about the selection process can be leveraged via prior distributions. In the paper, we focus on selection assuming full participation. While nonresponse is unavoidable, the variables that affect response propensity can be included in the raking process if available. However, informative nonresponse or inclusion mechanism needs further investment. Bayesraking builds up the foundation of incorporating marginal constraints into modeling and is ready to integrate complex design and response features with flexible outcome modeling as a unified framework for survey inference.
References
 Battaglia et al. (2013) Battaglia, M. P., D. C. Hoaglin, and M. R. Frankel (2013). Practical considerations in raking survey data. Survey Practice 2(5), 1–10.
 Bishop and Fienberg (1969) Bishop, Y. M. and S. E. Fienberg (1969). Incomplete twodimensional contingency tables. Biometrics 25, 119–128.

Bishop
et al. (1974)
Bishop, Y. M., S. E. Fienberg, and P. W. Holland (1974).
Discrete Multivariate Analysis: Theory and Practice
. Springer.  Brick et al. (2003) Brick, J. M., J. Montaquila, and S. Roth (2003). Identifying problems with raking estimators. 2003 Joint Statistical Meetings, Section on Survey Research Methods.
 Carvalho et al. (2010) Carvalho, C. M., N. G. Polson, and J. G. Scott (2010). The horseshoe estimator for sparse signals. Biometrika 97, 465–480.
 Chen et al. (2010) Chen, Q., M. R. Elliott, and R. J. Little (2010). Bayesian penalized spline modelbased inference for finite population proportion in unequal probability sampling. Survey Methodology 36(1), 23–34.
 Deming and Stephan (1940) Deming, W. E. and F. F. Stephan (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. The Annals of Mathematical Statistics 11(4), 427–444.
 Fienberg (1970) Fienberg, S. E. (1970). An iterative procedure for estimation in contingency tables. The Annals of Mathematical Statistics 41(3), 907–917.
 Gelman (2007) Gelman, A. (2007). Struggles with survey weighting and regression modeling. Statistical Science 22(2), 153–164.
 Gelman and Carlin (2001) Gelman, A. and J. B. Carlin (2001). Poststratification and weighting adjustments. In R. Groves, D. Dillman, J. Eltinge, and R. Little (Eds.), Survey Nonresponse.
 Gelman et al. (2014) Gelman, A., J. B. Carlin, H. S. Stern, D. Dunson, A. Vehtari, and D. B. Rubin (2014). Bayesian Data Analysis (3nd ed.). CRC Press, London.
 Gelman et al. (1995) Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (1995). Bayesian Data Analysis (1st ed.). CRC Press, London.
 Gelman and Little (1997) Gelman, A. and T. C. Little (1997). Poststratification into many categories using hierarchical logistic regression. Survey Methodology 23, 127–135.
 Ghitza and Gelman (2013) Ghitza, Y. and A. Gelman (2013). Deep interactions with MRP: Election turnout and voting patterns among small electoral subgroups. American Journal of Political Science 57(3), 762–776.
 Good (1967) Good, I. J. (1967). A Bayesian significance test for multinomial distributions. Journal of the Royal Statistical Society Series B 29(3), 399–431.
 Goodman (1968) Goodman, L. A. (1968). The analysis of crossclassified data: Independence, quasiindependence, and interactions in contingency tables with or without missing entries: Ra fisher memorial lecture. Journal of the American Statistical Association 63(324), 1091–1131.
 Greenland (2007) Greenland, S. (2007). Prior data for nonnormal priors. Statistics in Medicine 26, 3578–3590.

Hoffman and
Gelman (2014)
Hoffman, M. D. and A. Gelman (2014).
The NoUTurn sampler: Adaptively setting path lengths in
Hamiltonian Monte Carlo.
Journal of Machine Learning Research
15, 1351–1381.  Horvitz and Thompson (1952) Horvitz, D. G. and D. J. Thompson (1952). A generalization of sampling without replacement from a finite university. Journal of the American Statistical Association 47(260), 663–685.
 Ireland and Kullback (1968) Ireland, C. T. and S. Kullback (1968). Contingency tables with given marginals. Biometrika 55(1), 179–188.
 Knuiman and Speed (1988) Knuiman, M. W. and T. P. Speed (1988). Incorporating prior information into the analysis of contingency tables. Biometrics 44(4), 1061–1071.
 Kunihama and Dunson (2013) Kunihama, T. and D. B. Dunson (2013). Bayesian modeling of temporal depen dence in large sparse contingency tables. Journal of the American Statistical Association 108(504), 1324–1338.
 Laird (1978) Laird, N. M. (1978). Empirical Bayes methods for twoway contingency tables. Biometrika 65, 581–590.
 Lazar et al. (2008) Lazar, R., G. Meeden, and D. Nelson (2008). A noninformative Bayesian approach to finite population sampling using auxiliary variables. Survey Methodology 34(1), 51–64.
 Little (1983) Little, R. (1983). Comment on “An evaluation of modeldependent and probabilitysampling inferences in sample surveys”, by M. H. Hansen, W. G. Madow and B. J. Tepping. Journal of the American Statistical Association 78, 797–799.
 Little (1991) Little, R. (1991). Inference with survey weights. Journal of Official Statistics 7, 405–424.
 Little (1993) Little, R. (1993). Poststratification: A modeler’s perspective. Journal of the American Statistical Association 88, 1001–1012.
 Little (2004) Little, R. (2004). To model or not to model? Competing modes of inference for finite population sampling inference for finite population sampling. Journal of the American Statistical Association 99, 546–556.
 Little (2011) Little, R. (2011). Calibrated Bayes, for statistics in general, and missing data in particular. Statistical Science 26, 162–174.
 Little and Rubin (2002) Little, R. and D. B. Rubin (2002). Statistical Analysis with Missing Data. Wiley.
 Little and Wu (1995) Little, R. and M.M. Wu (1995). Models for contingency tables with known margins when target and samples populations differ. Journal of the American Statistical Association 86(413), 87–95.
 Meng and Rubin (1993) Meng, X.L. and D. B. Rubin (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80(2), 267–278.
 Park et al. (2005) Park, D. K., A. Gelman, and J. Bafumi (2005). Statelevel opinions from national surveys: Poststratification using multilevel logistic regression. In J. E. Cohen (Ed.), Public Opinion in State Politics. Standord University Press.
 Rao and Molina (2015) Rao, J. and I. Molina (2015). Small Area Estimation. John Wiley & Sons, Inc.
 Rubin (1983) Rubin, D. B. (1983). Comment on “An evaluation of modeldependent and probabilitysampling inferences in sample surveys,” by M. H. Hansen, W. G. Madow and B. J. Tepping. Journal of the American Statistical Association 78, 803–805.
 Särndal et al. (1992) Särndal, C.E., B. Swensson, and J. H. Wretman (1992). Model Assisted Survey Sampling. Springer, New York.
 Schafer (1997) Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. London: Chapman & Hall.
 Schifeling and Reiter (2016) Schifeling, T. A. and J. P. Reiter (2016). Incorporating marginal prior information in latent class models. Bayesian Analysis 11(2), 499–518.
 Schouten (2018) Schouten, B. (2018). Statistical inference based on randomly generated auxiliary variables. Journal of the Royal Statistical Society, Statistical Methodology, Series B 80(1), 33–56.
 Si (2019) Si, Y. (2019). Codes for manuscript ”bayesraking: Bayesian finite population inference with known margins”. https://github.com/yajuansisophie/raking.
 Si and Gelman (2014) Si, Y. and A. Gelman (2014). Survey weighting for New York Longitudinal Survey on Poverty Measure. Technical report, Columbia University.
 Si et al. (2015) Si, Y., N. S. Pillai, and A. Gelman (2015). Bayesian nonparametric weighted sampling inference. Bayesian Analysis 10(3), 605–625.
 Si et al. (2018) Si, Y., R. Trangucci, J. S. Gabry, and A. Gelman (2018). Bayesian hierarchical weighting adjustment and survey inference. Survey Methodology (revision invited); https://arxiv.org/abs/1707.08220.
 Smith (1983) Smith, T. M. F. (1983). On the validity of inferences from nonrandom sample. Journal of the Royal Statistical Society. Series A (General) 146(4), 394.
 Stan Development Team (2017) Stan Development Team (2017). Stan modeling language user’s guide and reference manual. http://mcstan.org.
 Stan Development Team (2018) Stan Development Team (2018). Stan: A C++ library for probability and sampling. http://mcstan.org.
 Stephan (1942) Stephan, F. F. (1942). An iterative method of adjusting sample frequency tables when expected marginal totals are known. The Annals of Mathematical Statistics 13(2), 166–178.
 Vehtari et al. (2017) Vehtari, A., A. Gelman, and J. S. Gabry (2017). Practical bayesian model evaluation using leaveoneout crossvalidation and WAIC. Statistics and Computing 27, 1413–1432.
 Wimer et al. (2014) Wimer, C., I. Garfinkel, M. Gelblum, N. Lasala, S. Phillips, Y. Si, J. Teitler, and J. Waldfogel (2014). Poverty tracker—monitoring poverty and wellbeing in NYC. Columbia Population Research Center and Robin Hood Foundation.
 Zheng and Little (2005) Zheng, H. and R. Little (2005). Inference for the population total from probabilityproportionaltosize samples based on predictions from a penalized spline nonparametric model. Journal of Official Statistics 21, 1–20.
 Zheng and Little (2003) Zheng, H. and R. J. Little (2003). Penalized spline modelbased estimation of the finite populations total from probabilityproportionaltosize samples. Journal of Official Statistics 19(2), 99–107.
Comments
There are no comments yet.