1 Introduction
Marginal loglinear models, Bergsma and Rudas (2002), were conceived to construct discrete multivariate distributions subject to restrictions imposed, simultaneously, on different marginals. Consider the simple context where denotes a treatment, one or more variables which might be affected by and may influence the response which, for simplicity, we assume to be binary. In this context, we might be interested in the marginal distributions and
in addition to the joint distribution
.1.1 Notations and preliminary results
A list of variables, say , shortened as will be used to denote both a marginal distribution and the interaction among the variable in the list; let , denote two such lists with ; will denote the loglinear interactions defined within the marginal , coded either as contrasts between adjacent categories (Ac) or with respect to a reference category (Rc) depending on the context; in both cases, variables in will be set to the initial or reference category coded as 0. When and quantitative, the linear logistic model including the interaction has the form
(1) 
where the interaction is equal to under Ac and to under Rc.
To introduce the mixed parametrization, recall that in a general multiway table with cells, the saturated model may be parameterized as
(2) 
where is made of
linearly independent columns which do not span the unitary vector and
is a vector of loglinear (canonical) parameters. Let be the left inverse of such that = , then (2) may be inverted as = ; note the one to one correspondence between the rows of , the columns of and the loglinear parameters. Define the vector of mean parameters = ; clearly there is a one to one correspondence between elements of and . Let be the collection of columns of that correspond to the set of interactions in , then the vector = has the same size as .Given a partition of the collection of all possible interactions for the joint distribution into two disjoint sets , the mixed parametrization, (BarndorffNielsen, 1978, pag. 12122), is made of and has the following properties:
Lemma 1.
(i) there is a one to one mapping between and , (ii) the two components of the mixed parametrization are variation independent and (iii) the expected information matrix is block diagonal.
The following results on the differential properties of the mixed parametrization will be used later: let and denote the distribution within the marginal ; let = , then we have (see Forcina, 2012, Lemma 3, 4);
Lemma 2.
in addition, is symmetric and positive definite if the elements of are strictly positive.
2 Main results
It is well known that the parameters in the marginal logistic models for , and do not determine those in (1); the mixed parametrization allows to sharpen this result as follows:
Proposition 1.
(i) The parameters of the three logistic regression models defined on the marginals , , are variation independent from . (ii) If , then the parameters of the three marginals determine uniquely the joint distribution.
Proof: the log linear parameters within the marginals are uniquely determined by the set of mean parameters which are variation independent from = . The above list of mean parameters together with constitute a mixed parametrization of the joint distribution, thus (ii) follows from Lemma 1.
Remark 1.
For the model in (1), Stanghellini and Doretti (2019) derived an expression for = , where is the regression coefficient of in the linear logistic model defined within the marginal
distribution. For the case of a multivariate discrete distribution on a set of binary random variables, an expression for the difference between the same interaction parameters defined within two different marginals, say
, was derived by Evans (2015), Theorem 3.1. In the Appendix we rewrite the latter result in the case where and are discrete and show that, by setting and , they are essentially equivalent to those in Stanghellini and Doretti (2019).The following provides some additional insights into the relation between interaction parameters defined within different marginals:
Proposition 2.
Suppose that has size , then
(3) 
Proof: Follows from Lemma 2.
In the special case when , Proposition 2 simply says that and are variation independent which is somehow implied by the derivation in Stanghellini and Doretti (2019). Additional features of the result are clarified in the example below.
Example 1.
Consider an distribution where are binary and has
categories; suppose we have two probability distributions
, with all loglinear parameters being equal, except for . Then the difference between corresponding pairs of marginal interactions is equal to .It is well known that we cannot impose loglinear restrictions on the interactions both in the marginal and in the joint distribution; for a formal argument see Bergsma and Rudas (2002). However, Colombi and Forcina (2014) proved a result that, within the Rc coding and assuming that has categories, may be stated as follows:
Proposition 3.
Within , the marginal loglinear parametrization with elements
where is obtained from by deleting all elements with is a smooth parametrization of the saturated model.
In words, if we want to define (and possibly constraint) the interactions both in the marginal and in the joint, we need to remove a subset of the interactions corresponding to a fixed value of . This may be seen as an added flexibility in the modelling process: if we are interested in imposing constraints to the interaction both in the marginal and in the joint, the price to pay is that we cannot model a subset of the interactions. The feature is illustrated in the next section.
3 Application
3.1 The data
The data come from the NCDS, a UK cohort study that included everybody born in UK from March 3rd to March 9th 1958. several variables concerning the parents and the child are recorded; a full description of the data set is available at http://cls.ucl.ac.uk/clsstudies/1958nationalchilddevelopmentstudy
. In this simplified analysis, we consider the number of years of schooling for each parent, parents’ concern about the education of the child shown at different stages (as recorded by the teachers), the weekly income of parents and the academic qualification reached by the child, an ordered categorical variable with four categories. The issue of interest is the effect of parents’ education on that of the child. Intuitively, parents’ education might affect income by which to offer better chances to the child. In addition, more educated parents might show more concern being more aware of the importance of education. Direct effects may work through the atmosphere inside the family, like having books and meeting more educated friends.
For simplicity, the analysis below is restricted to the sample of 2161 daughters, the response if the child got at least an high school degree; income and concern are dichotomized at the median. The exposure
is a categorical variable with four levels obtained by splitting at quantiles the following measure of parent’s education
where denote the number of years of schooling for mother and father and is a penalty for unequally educated parents. We also assume there are two mediators: , the father weekly income (that of the mother wa ignored, having a large number of missing values) and , an average measure of the concern shown by parent at different stages, as recorded by teachers. Finally define = .
3.2 Two alternative models
We compare two alternative models, both parameterized with the adjacent coding; because all variables except X are binary, assuming that, say, the adjacent interactions are constant in
is equivalent to assume that the logits of
is a linear functions of . However, because the evidence against linearity in was rather strong, the dependence on was left unconstrained.
M1: Define the overall effect of on in the corresponding marginal distribution, in addition, model the effect of on the mediators in the marginal . Define all other interactions within the joint , including the interactions; the parameters already in the model determine the interactions which cannot be modeled. Then we constrain to 0 the interactions in and the and interactions within
; this model fits well with a deviance of 7.82 and 7 dof. Parameter estimates and standard errors for interaction parameters involving the
term are given in Table 1.Estimates under M1 Est. s.e. Est. s.e. Est. s.e. 0.0066 0.1016 0.1948 0.2311 0.6587 0.1672 0.5990 0.3194 0.7854 0.1230 1.5142 0.2817 1.2045 0.1339 0.8004 0.1827 0.4604 0.1215 Estimates under M2 Est. s.e. Est. s.e. Est. s.e. 0.0441 0.1035 0.3048 0.1729 0.3282 0.2287 0.6216 0.2649 0.3186 0.2581 0.6978 0.1236 0.7761 0.1346 0.0240 0.1182 0.0964 0.1807 Table 1: Estimates of interactions containing the in the M1 and M2 models. 
M2: Define the effects of on within the marginal as above and all other effects within the joint ; next, constrain to 0 the interactions in the marginal as above and the and interactions in the joint. This model, which is the closest analog to the one considered above, has a deviance of 13.04 with the same number of dof. Estimates and standard errors for the dependence of on are displayed in Table 1.
Est. s.e. Est. s.e. 0.4470 0.3175 0.0718 0.1591 0.2616 0.1298 0.4766 0.1478 0.9327 0.1762 1.2654 0.0928 Table 2: Model M1: dependence of income, , and concern, , on parents’ education.
The effect of on is strongest in going from 2 to 3; the same holds for the marginal effect of on . Within M2 the effects of conditional on and are roughly similar the the corresponding ones under M1.
If we assume that there are no unobserved confounders, the estimated joint distribution under M1 allows to compute an estimate of the natural direct and indirect effect of parents’ education on academic qualification of the daughter, Pearl (2014), by changing from one category to the next (see VanderWeele et al., 2013, equations (1) and (2)). Results are in Table 3 with standard errors estimated by bootstrap; the direct effect is always the largest component of the total though going from 0 to 1 does not seem to matter.
Est  s.e.  Est  s.e.  Est  s.e.  

Dir.  0.0107  0.0196  0.0609  0.0255  0.1549  0.0298 
Ind.  0.0083  0.0069  0.0298  0.0100  0.1092  0.0179 
Total  0.0024  0.0210  0.0907  0.0257  0.2641  0.0299 
Appendix
Rephrasing Robin Evans result
Let be two nested marginals and = ; assume that we define interactions as contrasts relative to the reference category coded as 0; we also use the convention that, when the value of the conditioning variables are not given, they are fixed to the reference value; the derivation below is, essentially, a rewriting of Evans (2015). Let denote the loglinear interaction among variables in computed within the marginal fixed at the value .
Lemma 3.
(4) 
where the conditional probabilities on the righthand side are of the event when the conditioning set is split into a component taking the original values and the remaining ones fixed to 0.
Proof: Start from the expansion of , add and subtract and write the difference between the two in terms of conditional probabilities
We now apply Lemma 3 to the special case where = , = , is binary and are discrete; to simplify notations, let = ; in addition, because is the joint distribution, replace with .
Corollary 1.
(5) 
this may also be expressed in terms of loglinear parameters defined within as
Proof. The first part follows from Lemma 3 by noting that, because
has just two elements, the expansion contains four elements which can be arranged into the form of a log odds ratio. For the second part, first write the conditional distribution of
as a multinomial and then apply (3) in Colombi and Forcina (2014) for expanding interactions conditional to into a sum of higher order interactions.Loglinear versus logistic parameterizations
For what follows, it might be useful to recall how, under the corner point coding, loglinear parameters may be mapped into the corresponding logistic parameters. When the dependent variable, like , is binary, we have
with the convention that the log.linear parameter is 0 whenever at least one of the arguments is 0. Having assumed that is multinomial with, possibly, more than two categories, its logits may be written as
The results of Stanghellini and Doretti
As above, let be binary and be discrete; equation (A2) in Stanghellini and Doretti (2019) may be written as
which follows by expanding the lefthand side as
and noting that logits may be computed equivalently either on the joint or conditional distribution.
To derive an extension of their (A3) to non binary , first swap conditioning
next expand the first term on the right handside by adding and subtracting and ,
Thus the analog of the loglinear expansion in their (A3) is
which is a equivalent to (1) in the special case when
are both binary variables
3.3 Acknowledgments
This research did not receive any specific grant from funding agencies in the public, commercial, or notforprofit sectors. The author would like to thank Elena Stanghellini for suggesting the problem and for several helpful comments.
References
 BarndorffNielsen (1978) BarndorffNielsen, O.E., 1978. Information and exponential families. Wiley, New York.
 Bergsma and Rudas (2002) Bergsma, W.P., Rudas, T., 2002. Marginal models for categorical data. Annals of Statististics 30, 140–159.

Colombi and Forcina (2014)
Colombi, R., Forcina, A.,
2014.
A class of smooth models satisfying marginal and
context specific conditional independencies.
J. Multivariate Analysis 126, 75–85.
 Evans (2015) Evans, R.J., 2015. Smoothness of marginal loglinear parameterizations. Electronic Journal of Statistics 9, 475–491.
 Forcina (2012) Forcina, A., 2012. Smoothness of conditional independence models for discrete data. J. Multivariate Analysis 106, 49–56.
 Pearl (2014) Pearl, J., 2014. Interpretation and identification of causal mediation. Psychological methods 19, 459.
 Stanghellini and Doretti (2019) Stanghellini, E., Doretti, M., 2019. On marginal and conditional parameters in logistic regression models. Biometrika 106, 732–739.
 VanderWeele et al. (2013) VanderWeele, T., Vansteelandt, S., Robins, J., 2013. Effect decomposition in the presence of an exposureinduced mediatoroutcome confounder. Epidemiology 25, 300–306.