Perturbed factor analysis: Improving generalizability across studies

10/07/2019 ∙ by Arkaprava Roy, et al. ∙ Duke University 0

Factor analysis is routinely used for dimensionality reduction. However, a major issue is `brittleness' in which one can obtain substantially different factors in analyzing similar datasets. Factor models have been developed for multi-study data by using additive expansions incorporating common and study-specific factors. However, allowing study-specific factors runs counter to the goal of producing a single set of factors that hold across studies. As an alternative, we propose a class of Perturbed Factor Analysis (PFA) models that assume a common factor structure across studies after perturbing the data via multiplication by a study-specific matrix. Bayesian inference algorithms can be easily modified in this case by using a matrix normal hierarchical model for the perturbation matrices. The resulting model is just as flexible as current approaches in allowing arbitrarily large differences across studies, but has substantial advantages that we illustrate in simulation studies and an application to NHANES data. We additionally show advantages of PFA in single study data analyses in which we assign each individual their own perturbation matrix, including reduced generalization error and improved identifiability.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 12

page 14

page 15

page 18

page 19

page 20

page 22

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Factor models are routinely used to characterize high dimensional, correlated data. The most commonly used factor model for -variate data is

(1.1)

where it is assumed that data are centered prior to analysis, are latent factors, is a factor loadings matrix, and

is a diagonal matrix of residual variances. Then the overall covariance matrix of the data is

.

We are particularly motivated to analyze phthalate concentrations in blood or urine across demographic groups, collected in the National Health and Nutrition Examination Survey (NHANES). Exposures to phthalates are ubiquitous. They are present in soft plastics, including vinyl floors, toys and food packaging. Medical supplies such as blood bags and tubes contain phthalates. They are also found in fragrant products such as soap, shampoo, lotion, perfume and scented cosmetics. Because metabolites of a single parent compound or chemicals in the same family are often moderately to highly correlated, it is common to attempt to identify a small set of underlying factors (for example weissenburger2017principal

). Epidemologists are interested in interpreting these factors and in using them in analyses relating exposures to health outcomes. However, our collaborators have noticed a tendency to estimate very different factors in applying factor analysis to similar datasets or groups of individuals. For reliable interpretability and generalizability, it is important to reduce this brittleness and may be desirable to have a method that can produce a single set of factors that hold across groups.

There has been work on developing factor analysis methods that are robust in the presence of outliers

(pison2003robust). However, our motivation is different. Previous work extending factor analysis to multiple groups has focused on estimation of shared covariance structure. Simple methods include extensions of PCA to multiple groups, while more recently many customized methods have been developed. A key contribution was JIVE (Joint and Individual Variation Explained), which identifies low-rank covariance structure that is shared among multiple data types collected for the same subject (lock2013joint; feng2015non; feng2018angle). In a Bayesian framework, roy2019bayesian proposed TACIFA (Time Aligned Common and Individual Factor Analysis) for a related problem. Other methods include MSFA and BMSFA (Bayesian Multi-study Factor Analysis), which include inference over the number of shared latent factors. BMSFA takes a Bayesian approach to sampling , focusing on situations with , and uses a new post-hoc method for recovering the loadings matrix (assmann2016bayesian; de2018bayesian). In these approaches, the total covariance is decomposed into three parts, a common/shared covariance, a group-specific covariance and an error or residual variance. One drawback to these methods is the computational complexity of determining the number of shared versus group-specific factors.

We aim to identify a single set of factors under the assumption that the data in each group can be aligned to a common latent space via multiplication by a perturbation matrix. We represent the perturbed covariance in group as , where is the common covariance, and is the group-specific perturbation matrix. As in the common factor model in  (1.1), the overall covariance can be decomposed into a component due to common factors and a residual variance. The utility of the perturbation model also extends beyond multi-group settings. In the common factor model, the error terms only account for additive measurement error. We can obtain robust estimates of factor loadings in a single dataset by allowing for observation-specific perturbations. This accounts for both multiplicative and additive measurement error. In this case, we define separate

’s for each data vector

. Here ’s are multiplicative random effects with mean . Thus, and the covariance structure on would determine the variability of .

Another challenge in factor analysis is non-identifiability. In particular, due to non-identifiability of the loading matrices and factors of  (1.1), posterior samples of the loading matrices are not interpretable without identifiability restrictions (seber2009multivariate; lopes2004bayesian; rovckova2016fast; fruehwirth2018sparse). Instead of imposing arbitrary constraints, with corresponding computational complications, a common technique is post-processing an unconstrained MCMC chain (assmann2016bayesian; roy2019bayesian). assmann2016bayesian calculate a post-processing estimate by solving an Orthogonal Procrustes problem, but without uncertainty quantification (UQ). roy2019bayesian

propose post-processing the entire MCMC chain iteratively to be able to draw inference with UQ. The latent factors in our model are assumed to be heteroscedastic, unlike in the usual factor model in (

1.1). This removes rotational ambiguity in the loading matrix, except for permutations, and helps to estimate loading matrices much more accurately even without any post-processing. We find this heteroscedastic model can also improve performance in estimating the covariance.

The next section describes the data and the model in detail. In Section 3, prior specifications are discussed. Our computational scheme is outlined in Section 4. We study the performance of our method in different simulation setups in Section 5. Section LABEL:real considers an application to NHANES data. We end with some concluding remarks in Section 6.

2 Data description and modeling

In the National Health and Nutrition Examination Survey (NHANES), chemical levels in urine are recorded over the span 2009 to 2013 for 2749 individuals. We consider in total eight phthalate metabolite chemicals, Mono-n-butyl (MnBP), Mono-isobutyl (MiBP), Mono-ethyl (MEP), Mono-benzyl (MBeP), Mono-2-ethyl-5-carboxypentyl (MECPP), Mono-(2-ethyl-5-hydroxyhexyl) (MEHHP), Mono-(2-ethyl-5-oxohexyl) (MEOHP) and Mono-(2-ethyl)-hexyl (MEHP), and 5 racial groups, Group 1: Mexican American (Mex), Group 2: Other Hispanic (OH), Group 3: Non-Hispanic White (N-H White), Group 4: Non-Hispanic Black (N-H Black) and Group 5: Other Race (Other/Multi) - Including Multi-Racial. We consider race and ethnicity because previous work has shown differences in patterns of use of products that contain phthalates (taylor2018associations) and in measured phthalate concentrations (james2017racial) across racial or ethnic groups. Excess levels of phthalates in blood/urine have been linked to a variety of health outcomes, including obesity (zhang2014age; kim2014phthalate; benjamin2017phthalates).

Mex OH N-H White N-H Black Other/Multi
Number of participants 566 293 1206 516 168
MnBP 3.85 4.31 3.56 4.21 3.76
(1.96) (4.96) (2.17) (1.81) (2.11)
MiBP 2.88 3.21 2.52 3.47 2.79
(1.36) (1.70) (1.20) (1.67) (1.21)
MEP 8.32 9.51 6.91 11.16 7.33
(6.64) (6.99) (5.43) (9.38) (7.57)
MBeP 2.79 2.78 2.70 3.06 2.56
(1.58) (1.61) (1.67) (1.77) (1.68)
MECPP 4.80 4.55 4.16 4.36 4.34
(3.68) (2.68) (2.40) (2.38) (2.64)
MEHHP 3.83 3.73 3.45 3.79 3.57
(3.01) (2.59) (2.25) (2.26) (2.45)
MEOHP 3.11 3.00 2.79 3.05 2.86
(2.41) (1.92) (1.69) (1.71) (1.86)
MEHP 1.58 1.59 1.33 1.61 1.58
(1.23) (1.29) (0.89) (0.99) (1.31)
Table 1:

Comparison across different groups in terms of number of participants and average chemical levels along with standard deviations in brackets.

Figure 1: Plots of the covariance matrices of the phthalate chemicals across different ethnic groups for NHANES data.
Figure 2: Estimated loading matrices for different racial groups.
Mex OH N-H White N-H Black Other/Multi
Mex 0.00 31.84 108.59 191.93 28.04
OH 31.84 0.00 156.45 63.48 28.69
N-H White 108.59 156.45 0.00 409.64 55.51
N-H Black 191.93 63.48 409.64 0.00 101.90
Other/Multi 28.04 28.69 55.51 101.90 0.00
Table 2: Hotelling statistic between phthalate levels for each pair of groups.

We provide a summary of the average chemical level across different groups in Table 1. In Table 2, we compute Hoteling statistic between each pair of groups. The three groups Mex, OH and Other/Multi seem to be closer to each other than the other two groups. For each phthalate level, we also fit a one-way ANOVA model to determine differences across the groups taking the group Mex as baseline. The results are included in Tables 6 to 13 in Section More exploratory analysis on NHANES data. For almost all the cases, effect of the group N-H White is significantly different from the others. We plot group specific covariances in Figure 1. It is evident that there are shared structures with minor differences across the groups. However, the estimated loading matrices are different when we fit the common factor model in (1.1) separately for each group. In the next subsection, we propose a novel methodology for multi-group factor analysis to identify the common factors.

2.1 Multi-group model

Assume that we observe multiple groups, with the “groups” corresponding to different studies, batches of data, or levels of a covariate within a single study. Each -dimensional response , belonging to group , for and , is modeled as:

(2.1)

The perturbation matrices are of dimension

, and follow a matrix normal distribution, with diagonal row and column covariances

. The latent factors are heteroscedastic, so that is diagonal with non-identical entries. We discuss the advantages of choosing heteroscedastic latent factors in more detail in Section 2.3. After integrating out the latent factors, the observations are marginally distributed as . If we write , then is the shared loadings matrix, and is a group-specific loadings matrix. We can quantify the magnitude of perturbation as , where stands for the Frobenius norm. For identifiability of ’s, we consider and . We call our model Perturbed Factor Analysis (PFA).

2.1.1 Model properties

Let and be two positive definite (p.d.) matrices. Then there exist non-singular matrices and , where and . By choosing , we have . If the two matrices and are close, then will be close to the identity. However, is not required to be symmetric. In our multi-group model in (2.1), the s allow for small perturbations around the shared covariance matrix . We define the following class for the ’s,

The index controls the amount of deviation across groups. In (2.1) we define , for some small , which we call the perturbation parameter. By choosing to be diagonal with identical entries, we impose uniform perturbation across all the rows and columns around . However, the perturbation matrices themselves are not required to be symmetric.

Lemma 1.

.

The proof follows from Chebychev’s inequality. This result allows us to summarize the level of induced perturbation for any given . Using this lemma, we can show the following corollary.

Lemma 2.

Therefore the Kullback-Leibler divergence between the marginal distributions of any two groups

and can be bounded by up to some constant. We define a divergence statistic between groups and as . This generates a divergence matrix , where larger imply a greater difference between the two groups.

In our multi-group model, there is no computational complication regarding specification of the ranks of the shared and individual spaces unlike other methods. We can directly rely on current methods for accommodating unknown number of factors in the single group setting (bhattacharya2011sparse)

. Also, computationally our method is faster than other multi-group factor models such as BMSFA due to smaller number of parameters, and better identifiability, leading to better mixing of Markov Chain Monte Carlo (MCMC).

2.1.2 Tuning the parameter for multi-group data

The hyper-parameter controls the level of perturbation across groups. In Section 5, we show that properly tuning is necessary to obtain accurate estimates of the loading matrix . We propose a cross validation technique to choose the optimal . We randomly divide each group into training and test sets. Then for a range of values, we fit the model on the training data, and calculate the predictive log-likelihood of the test set. After integrating out the latent factors, the predictive distribution is . If there are multiple values of with similar predictive log-likelihoods, then the smallest is chosen as optimal.

Alternatively, we can take a fully Bayesian approach and put a prior on . We call this method Fully Bayesian Perturbed Factor Analysis (FBPFA). We see that FBPFA performs similarly to PFA in practice. FBPFA involves a slightly more complex MCMC implementation. PFA avoids sensitivity to the prior for but requires the computational overhead of cross validation, and can potentially be less efficient in requiring a hold out sample, while also not characterizing uncertainty in estimating .

2.1.3 Posterior consistency

Let be the parameter spaces of and , respectively, be the set of positive semidefinite matrices corresponding to the parameter space of , and be the parameter space of . Let be the priors for and ’s. We restate some of the results from bhattacharya2011sparse for our modified factor model. With minor modification, the proofs will remain the same.

Let be a continuous map such that .

Lemma 3.

For any , we have .

The proof is similar to Lemma 1 of bhattacharya2011sparse. In our Bayesian approach, we choose independent priors for , and and that induces a prior on through the map . We also have following proposition.

Proposition 4.

If , then .

The proof is similar to Proposition 1 of bhattacharya2011sparse with minor modifications. Now, we proceed to establish that the posterior of our multigroup model is weakly consistent under a fixed and increasing regime. Let us assume that the complete parameter space is and let be the truth for .

Assumptions:

  1. For some , the true perturbation matrices are , with defined in Section 2.1.1.

  2. There exists some , and such that and for all .

Theorem 5.

Under Assumptions 1-2, the posterior for is weakly consistent at .

We first show that our proposed prior has large support in the sense that the truth belongs to the Kullback-Leibler support of the prior. Thus the posterior probability of any neighbourhood around the truth converges to one in

-probability as goes to as a consequence of schwartz1965bayes. Here is the distribution of a sample of observations with parameter . Hence, the posterior is weakly consistent. The proof is in Section Appendix.

2.2 Measurement-error model

We can modify the multi-group model to obtain improved factor estimates on a single dataset by considering observation-level perturbations. Here we observe ’s for , which are many proxies of true observation with multiplicative measurement errors ’s such that . The modified factor model is

(2.2)

In this model, the ’s apply a multiplicative perturbation to each data vector. We have where is a matrix. Thus, here the measurement errors are and . This model is different from the multiplicative measurement error model of sarkar2018bayesian. In sarkar2018bayesian, observations ’s are modeled as , where denotes the element wise dot product and ’s are independent of with . Thus, the measurement error in the - component (i.e. ) is dependent on primarily through . However in our construction, it depends on the entire vector . Thus, the measurement errors are a linear function of the entire true observation .

This is a much more general setup than sarkar2018bayesian. The identifiability of distributions of and is difficult. For simplicity, we again assume . In this case we have,

Thus, only the diagonal elements of are not identifiable and the perturbation parameter does not influence the dependence structure among the variables. Hence, with our heteroscedastic latent factors we can still recover the loading structure. To tune , we can use the marginal distributions MVN to develop a cross validation technique when for all as in Section 2.1.2. However, when for all , this is not possible. Thus, we propose to estimate

as in FBPFA by using a weakly informative prior concentrated on small values, while assessing sensitivity to hyperparameter choice.

2.3 The special case for all

For a single study dataset without any measurement error, we can modify our PFA method by taking for all . Then the model reduces to a traditional factor model with heteroscedastic latent factors:

(2.3)

where is assumed to be diagonal with non-identical entries. Integrating out the latent factors, the marginal likelihood is . Except for the diagonal matrix , the marginal likelihood is almost similar to the marginal likelihood for a traditional factor model. As has non-identical diagonal entries, the likelihood is no longer invariant under arbitrary rotations. For the factor model in 1.1, and have equivalent likelihood for any orthonormal matrix . This is not the case in our model unless is a permutation matrix. Thus, this simple modification over the traditional factor model helps to efficiently recover the true loading structure. This is demonstrated in Case 1 of Section 5. We can also show that the posterior is weakly consistent based on the results established in Section 2.1.3.

3 Prior

As in bhattacharya2011sparse, we put the following prior on to allow for automatic selection of rank and easy posterior computation:

The parameters control local shrinkage of the elements in , whereas controls global shrinkage of the - column. For , we have stochastically increasing with , which imposes greater shrinkage for columns with higher column index.

For the heteroscedastic latent factors, each diagonal element of has an independent prior:

for some constant . In our simulations, we see that does not influence much the predictive performance of our method. However, as increases, it puts more shrinkage on the latent factors. We choose for most of our simulations. For the residual error variance , we place a weakly information prior on the diagonal elements:

In our simulations, weakly informative IG(0.1,0.1) prior on works well to get good predictive performance and to estimate the loading structure, even for a single study dataset.

4 Computation

Posterior inference is straightforward with a Gibbs sampler, because all of the parameters have conjugate full conditional distributions. For the model in (2.1), the full conditional of the perturbation matrix is:

where , and . The full conditionals for all other parameters are described in bhattacharya2011sparse, replacing by . For the model in (2.2), the full conditional of is:

where , and . Other parameters can again be updated using the results in bhattacharya2011sparse replacing by . If we put a prior on , the posterior distribution of is given by IG for the model in (2.1) and for the model in (2.2), the posterior distribution of is IG.

5 Simulation Study

In this section, we study the performance of our method in various simulation settings. As ground truth, we consider the two loading matrices given in Figure 3. These are similar to the ones considered in rovckova2016fast. The two loading matrices have equal number of columns which is 5. The first loading matrix has 21 rows and the second one has 128 rows. We compare the estimated loading matrices for different choices of hyperparameters. We investigate the perturbation parameter and the shape parameter , which controls the level of shrinkage on the diagonal entries of .

Figure 3: The ground truth loading matrices of dimension (Loading 1) and (Loading 2) respectively.

For the single dataset case, we compare with the approach of bhattacharya2011sparse (B&D), which corresponds to the spacial case of our approach that fixes the perturbation matrices and latent factor covariances equal to the identity. A point estimate of the loading matrix for B&D is calculated by post-processing the posterior samples. We use the algorithm of assmann2016bayesian to rotationally align the samples of , as used in de2018bayesian. In contrast, our method does not require any post-processing. We simply take the mean of the posterior samples to estimate the loading matrix. For the multi-group case, we compare our estimates with Bayesian Multi-study Factor Analysis (BMSFA), which also requires post-processing to remove rotational ambiguity. We use the MSFA package from https://github.com/rdevito/MSFA.

We compare the different methods using predictive likelihood. Simulated data are randomly divided into training and test sets. After fitting each model on the training data, we calculate the predictive likelihood of the test set. All methods are run for iterations of the Gibbs sampler, with burn-in samples. Each simulation is replicated times.

5.1 Case 1: Single dataset,

We first consider factor analysis of a single dataset. Starting with the two loading matrices in Figure 3, we simulate latent factors from N, and generate datasets of observations, with residual variance . We compare B&D with our method when , so the only adjustment is to use heteroscedastic latent factors. Note that the simulated latent factors actually have identical variances.

In Table 3 and 4, we compare methods by MSE of the estimated versus true covariance matrix. For loading matrix , Bayesian FA had an MSE of , which is dominated by our method across a range of values of .

d MSE
0.1 1.04
10 1.32
100 1.38
Table 3: MSE of estimated covariance matrix for loading matrix , across different values of

For loading matrix , Bayesian FA had a MSE of . Again, our method beats this across a range of values of .

d MSE
0.1 5.21
10 9.45
100 10.32
Table 4: MSE of estimated covariance matrix for loading matrix , across different values of

We compare the estimated and true loading matrices in Figure 4. Our method performs overwhelmingly better at estimating the true loading structure compared with B&D FA. The first five columns of the estimated loadings based on our method are very close to the true loading structure under some permutation. Having heteroscedastic latent factors give us such improved performance.

Figure 4: Comparison of true and estimated loading matrices and covariance matrices for Case . Comparison of B&D FA with our method for different choices of . Results for loading matrix 1 are in (a) and 2 in (b). For all the cases, we fix .
Mex OH N-H White N-H Black Other/Multi
Mex 0.00 0.75 0.70 0.59 0.78
OH 0.75 0.00 0.63 0.75 0.73
N-H White 0.70 0.63 0.00 0.68 0.84
N-H Black 0.59 0.75 0.68 0.00 0.91
Other/Multi 0.78 0.73 0.84 0.91 0.00
Table 5: Square norm difference of the group-specific loadings, estimated using BMSFA.

6 Discussion

Our over-arching goal was to develop a less brittle method for estimating factors that hold across similar datasets and groups within a dataset. In many applications, it is important to define a more generalizable set of factors, improving on the current status in which one often obtains dramatically different factors when fitting a factor model separately to different groups or using current additive ANOVA-type factor models. A primary innovation of the article is to incorporate a multiplicative perturbation of the data. As we have shown, this can be included in many different contexts - ranging from meta analysis, to multiple group data, to measurement error modeling, and even to obtain improved performance when there is no group or replicated structure in the data. We have demonstrated exciting improvements in performance in all of these settings.

It is our hope that the proposed method can be used directly by epidemiologists, and scientists in other fields, to obtain more generalizable factors in their applications. With this goal in mind, we have provided code for implementing the proposed approach at https://github.com/royarkaprava/Perturbed-factor-model. To make the method more broadly useful, there are a number of interesting extensions, some trivial and some less so. The first is to adjust for covariates - this can be very easily done without modifying the fundamental approach with a simple tweak to the proposed Gibbs sampler. Another important modification is to allow data that are not simply all continuous variables but may be categorical or mixed categorical and continuous. This can rely on an underlying variable model, as is often used in factor analysis (see, for example, carvalho2008high; zhou2015bayesian). In this case, we would incorporate the perturbation in the underlying variables instead of in the observed data directly. Again, this leads to a minor modification of the proposed Gibbs sampling algorithm.

We conjecture that both the perturbation ideas and the idea of including heterogeneous factor variances to improve identifiability will have impact beyond the specific setting we have considered in this paper. For example, one natural extension is to non-linear factor models, such as Gaussian process latent variable models (lawrence2004gaussian; lawrence2006local) and variational auto-encoders (kingma2013auto; pu2016variational).

Appendix

Proof of Theorem 5

Proof.

For the space of probability measure , let the Kullback-Leibler divergences be given by

Let denote the Kullback-Leibler divergence

For our model, we have

To prove Theorem 5, we rely on the following Lemma.

Lemma 6.

For any , there exists , and for such that , and , then we have

Due to continuity of the functions such as determinant, trace and , the above result is immediate following the proof of Theorem 2 in bhattacharya2011sparse

. For our proposed priors, the prior probability of the R.H.S. of Lemma 

6 is positive. Thus the prior probability of any Kullback-Leibler neighborhood around the truth is positive. This proves Theorem 5. ∎

More exploratory analysis on NHANES data

For each chemical level, we fit an one-way ANOVA model to analyse group-specific effects on each phthalate level separately.

Estimate Std. Error t value Pr()
(Intercept) 3.85 0.11 36.27 0.00
N-H Black 0.36 0.15 2.33 0.02
N-H White -0.29 0.13 -2.26 0.02
OH 0.46 0.18 2.55 0.01
Other/Multi -0.09 0.22 -0.41 0.68
Table 6: For phthalate MnBP
Estimate Std. Error t value Pr()
(Intercept) 2.88 0.06 49.23 0.00
N-H Black 0.59 0.08 6.97 0.00
N-H White -0.35 0.07 -4.98 0.00
OH 0.33 0.10 3.28 0.00
Other/Multi -0.08 0.12 -0.68 0.50
Table 7: For phthalate MiBP
Estimate Std. Error t value Pr()
(Intercept) 8.32 0.29 28.80 0.00
N-H Black 2.84 0.42 6.79 0.00
N-H White -1.41 0.35 -4.02 0.00
OH 1.18 0.49 2.40 0.02
Other/Multi -0.99 0.60 -1.64 0.10
Table 8: For phthalate MEP
Estimate Std. Error t value Pr()
(Intercept) 2.79 0.07 39.90 0.00
N-H Black 0.27 0.10 2.66 0.01
N-H White -0.09 0.08 -1.05 0.29
OH -0.01 0.12 -0.06 0.95
Other/Multi -0.23 0.15 -1.57 0.12
Table 9: For phthalate MBeP
Estimate Std. Error t value Pr()
(Intercept) 4.80 0.12 41.52 0.00
N-H Black -0.44 0.17 -2.63 0.01
N-H White -0.64 0.14 -4.56 0.00
OH -0.25 0.20 -1.28 0.20
Other/Multi -0.46 0.24 -1.89 0.06
Table 10: For phthalate MECPP
Estimate Std. Error t value Pr()
(Intercept) 3.83 0.10 36.85 0.00
N-H Black -0.04 0.15 -0.26 0.79
N-H White -0.39 0.13 -3.06 0.00
OH -0.10 0.18 -0.56 0.58
Other/Multi -0.26 0.22 -1.19 0.23
Table 11: For phthalate MEHHP
Estimate Std. Error t value Pr()
(Intercept) 3.11 0.08 39.04 0.00
N-H Black -0.06 0.12 -0.54 0.59
N-H White -0.33 0.10 -3.39 0.00
OH -0.11 0.14 -0.83 0.41
Other/Multi -0.26 0.17 -1.54 0.12
Table 12: For phthalate MEOHP
Estimate Std. Error t value Pr()
(Intercept) 1.58 0.04 35.44 0.00
N-H Black 0.03 0.06 0.44 0.66
N-H White -0.25 0.05 -4.57 0.00
OH 0.01 0.08 0.10 0.92
Other/Multi 0.00 0.09 0.02 0.99
Table 13: For phthalate MEHP

7 Acknowledgments

This research was partially supported by grant R01-ES027498-01A1 from the National Institute of Environmental Health Sciences (NIEHS) of the National Institutes of Health (NIH).

References