1. Introduction
For a random variable
and random vectors
and of dimension , the linear random coefficients model is(1)  
(2) 
The researcher has at her disposal observations of but does not observe the realizations of . subsumes the intercept and error term and the vector of slope coefficients is heterogeneous (i.e., varies across ). For example, a researcher interested in the effect of class size on pupils’ achievements might want to allow some pupils to be more sensitive than others to a decrease in the size and to estimate the density of the effect. correspond to multidimensional unobserved heterogeneity and to observed heterogeneity. Restricting unobserved heterogeneity to a scalar, as when only is random, can have undesirable implications such as monotonicity in the literature on policy evaluation (see [24]). Parametric assumptions are often made by convenience and can drive the results (see [29]). For this reason, this paper considers a nonparametric setup. Model (1
) is also a type of linear model with homegeneous slopes and heteroscedasticity, hence the averages of the coefficients are easy to obtain. However, the law of coefficients, their quantiles, prediction intervals for
for as in [3], welfare measures, treatment and counterfactual effects, which depend on the distribution of the coefficients can be of great interest.Estimation of the density of random coefficients when the support of is and has heavy enough tails has been studied in [4, 31]. These papers notice that the inverse problem is related to a tomography problem (see, e.g., [11, 12]) involving the Radon transform. Assuming the support of is amounts to assuming that the law of angles has full support, moreover a lower bound on the density of is assumed so that the law of the angles is nondegenerate. When this is implied by densities of
which follow a Cauchy distribution. The corresponding tomography problem has a nonuniform and estimable density of angles and the dimension can be larger than in tomography due to more than one regressor. More general specifications of random coefficients model are important in econometrics (see,
e.g., [25, 30] and references therein) and there has been recent interest in nonparametric tests (see [10, 19]).This paper considers the case where the support of is a proper (i.e., strict) subset. This is a much more useful and realistic framework for the random coefficients model. When , this is related to limited angle tomography (see, e.g., [20, 32]). There, one has measurements over a subset of angles and the unknown density has support in the unit disk. This is too restrictive for a density of random coefficients and implies that has compact support, ruling out usual parametric assumptions on error terms. Due to (2
), the conditional characteristic function of
given atis the Fourier transform of
at . Hence, the family of conditional characteristic functions indexed by in the support of gives access to the Fourier transform of on a double cone of axis and apex 0. When , is compact, andis an arbitrary compact set of nonempty interior, this is the problem of outofband extrapolation or superresolution (see,
e.g., [5] sections 11.4 and 11.5). Because we allow to be nonzero, we generalize this approach. Estimation of is a statistical inverse problem for which the deterministic problem is the inversion of a truncated Fourier transform (see, e.g., [2] and the references therein). The companion paper [23] presents conditions on the law of and the support of that imply nonparametric identification. It considers weak conditions onwhich could have infinite absolute moments and the marginals of
could have heavy tails. In this paper, we obtain rates of convergence when the marginals of do not have heavy tails but can have noncompact support.A related approach is extrapolation. It is used in [41]
to perform deconvolution of compactly supported densities while allowing the Fourier transform of the error density to vanish on a set of positive measure. In this paper, the relevant operator is viewed as a composition of two operators based on partial Fourier transforms. One involves a truncated Fourier transform and we make use of properties of the singular value decomposition rather than extrapolation.
, we study optimality in the minimax sense. We obtain lower bounds under weak to strong integrability in the first argument for this and a white noise model. We present an estimator involving: series based estimation of the partial Fourier transform of the density with respect to the first variable, interpolation around zero, and inversion of the partial Fourier transform. We give rates of convergence and use a GoldenshlugerLepski type method to obtain datadriven estimators. We consider estimation of
in Appendix B.5. We present a numerical method to compute the estimator which is implemented in the R package RandomCoefficients.2. Notations
and stand for the positive and nonnegative integers, for , (resp. ) for the minimum (resp. maximum) between and , and for the indicator function. Bold letters are used for vectors. For all , is the vector, which dimension will be clear from the text, where each entry is . The iterated logarithms are and, for and large enough, . for stands for the norm of a vector. For all , functions with values in , and , denote by , , and . For a differentiable function of real variables, denotes and its support. is the space of infinitely differentiable functions. The inverse of a mapping , when it exists, is denoted by . We denote the interior of by and its closure by . When is measurable and a function from to , is the space of complexvalued square integrable functions equipped with . This is denoted by when . When , we have and . Denote by the set of densities, by such that , and by the product of functions (e.g., ) or measures. The Fourier transform of is and is also the Fourier transform in . For all , denote the PaleyWiener space by , by the projector from to (), and, for all , by
(3) 
Abusing notations, we sometimes use for the function in . assigns the value 0 outside and is the partial Fourier transform of with respect to the first variable. For a random vector , is its law, its density, the truncated density of given , its support, and the conditional density. For a sequence of random variables , means that, for all , there exists such that for all such that holds. In the absence of constraint, we drop the notation . With a single index the
notation requires a bound holding for all value of the index (the usual notation if the random variables are bounded in probability).
3. Preliminaries
Assumption 1.

and exist;

, where and is even, nondecreasing on , such that and , with ;

There exists and and we have at our disposal i.i.d and an estimator based on independent of ;

is a set of densities on such that, for , for all , and , and, for which tends to 0, we have
We maintain this assumption for all results presenting upper bounds. When , , for , might not exist. Due to Theorem 3.14 in [18], if there exist , , and equal to 0 for large enough, such that
for all , then which implies (H1.2
). Marginal distributions can have an infinite moment generating function hence be heavytailed and their Fourier transforms belong to a quasianalytic class but not be analytic. Now on, we use
or for . This rules out heavy tails and nonanalytic Fourier transforms. When , integrability in amounts to , but other allow for non compact . Though with a different scalar product, we have and (see Theorem IX.13 in [45]), for , is the set of squareintegrable functions which Fourier transform have an analytic continuation on . In particular the Laplace transform is finite near 0. Equivalently, if is a density, it does not have heavytails. The condition in (H1.4) is not restrictive because we can write (1) as , take and such that , and there is a onetoone mapping between and . We assume (H1.4) because the estimator involves estimators of in denominators. Alternative solutions exist when (see, e.g., [36]) only. Assuming the availability of an estimator of using the preliminary sample is common in the deconvolution literature (see, e.g., [15]). By using estimators of for a well chosen rather than of , the assumption that and in (H1.4) becomes very mild. This is feasible because of (2).3.1. Inverse problem in Hilbert spaces
Estimation of is a statistical illposed inverse problem. The operator depends on and . Now on, the functions and are those of (H1.2). We have, for all and , , where
(4) 
Proposition 1.
is continuously embedded into . Moreover, is injective and continuous, and not compact if .
The case corresponds to mild integrability assumptions in the first variable when the SVD of does not exist. This makes it difficult to prove rates of convergence even for estimators which do not rely explicitly on the SVD such as the Tikhonov and Landweber method (Gerchberg algorithm in outofband extrapolation, see, e.g., [5]). Rather than work with directly, we use that is the composition of operators which are easier to analyze
(5) 
For all , either or , and , belongs to and, for ,
admits a SVD, where both orthonormal systems are complete. This is a tensor product of the SVD when
that we denote by , where is in decreasing order repeated according to multiplicity, and are orthonormal systems of, respectively, and . This holds for the following reason. Because , , , and is even, we obtain and . The operator is a compact positive definite selfadjoint operator (see [44] and [49] for the two choices of). Its eigenvalues in decreasing order repeated according to multiplicity are denoted by
and a basis of eigenfunctions by
. The other elements of the SVD are and .Proposition 2.
For all , is a basis of .
The singular vectors are the Prolate Spheroidal Wave Functions (hereafter PSWF, see, e.g., [44]). They can be extended as entire functions in and form a complete orthogonal system of for which we use the same notation. They are useful to carry interpolation and extrapolation (see, e.g., [40]) with Hilbertian techniques. In this paper, for all , plays the role of the Fourier transform in the definition of . The weight allows for larger classes than and noncompact . This is useful even if is compact when the researcher does not know a superset containing . The useful results on the corresponding SVD and a numerical algorithm to compute it are given in [22].
3.2. Sets of smooth and integrable functions
Define, for all and increasing, , , , , , and ,
and when we replace by , where
(6) 
The first inequality in the definition of defines the notion of smoothness for functions in analyzed in this paper. It involves a maximum of two terms, thus two inequalities: the first corresponds to smoothness in the first variable and the second to smoothness in the other variables. The additional inequality imposes integrability in the first variable. The asymmetry in the treatment of the first and remaining variables is due to the fact that, in the statistical problem, only the random slopes are multiplied by regressors which have limited variation and we make integrability assumptions in the first variable which are as mild as possible. The use of the Fourier transform to express smoothness in the first variable is classical. For the remaining variables, we choose a framework that allows for both functions with compact and noncompact support and work with the bases for . For functions with compact support, it is possible to use Fourier series and we make a comparison in Section B.4. The use of different bases for different values of is motivated by (5). Though the spaces are chosen for mathematical convenience, we analyze all types of smoothness. The smoothness being unknown anyway, we provide an adaptive estimator. We analyze two values of and show that the choice of the norm matters for the rates of convergence for supersmooth functions.
Remark 1.
The next model is related to (1) under Assumption 1 when is known:
(7) 
where plays the role of , is known, and is a complex twosided cylindrical Gaussian process on . This means, for HilbertSchmidt from to a separable Hilbert space , is a Gaussian process in of covariance (see [17]). Taking , where , and are independent twosided Brownian motions, the system of independent equations
(8) 
where, and , is equivalent to (7). Because is small when is large or is small (see Lemma B.4), the estimator of Section 4.1 truncates large values of and does not rely on small values of but uses interpolation.
3.3. Interpolation
Define, for all , the operator
(9) 
on with domain . For all , is a distribution.
Proposition 3.
For all , we have and, for all , in and, for and all ,
(10) 
If , only relies on and on , so (9) provides an analytic formula to carry interpolation on of functions in . Else, (10) provides an upper bound on the error made by approximating by on when approximates outside
. We use interpolation when the variance of an initial estimator
of is large due to its values near 0 but is small and work within which case, (10) yields
(11) 
When is compact, is taken such that . Else, goes to infinity so the second term in (11) goes to 0. is taken such that is constant because, due to (3.87) in [44], and (10) and (11) become useless. Then is constant and we set . When , we get and .
3.4. Risk
The risk of an estimator is the mean integrated squared error (MISE)
When and , it is , else,
(12) 
We consider a risk conditional on for simplicity of the treatment of the random regressors with unknown law. We adopt the minimax approach and consider the supremum risk. The lower bounds involve a function (for rate) and take the form
(13) 
When we replace by , by , and consider model (8), we refer to (13’); when we also replace by , we refer to (13”), where is the set of functions in such that is not arbitrarily concentrated close to 0: for all , .
4. Estimation
The sets of densities in the supremum risk and of estimators in this section depend on . The rates of convergence depend on via .
4.1. Estimator considered
For all , and such that for and for , a regularized inverse is obtained by:

for all , obtain a preliminary approximation of

for all , ,

.
To deal with the statistical problem, we carry (S.1)(S.3) replacing by the estimator
(14) 
where and is a trimming factor converging to zero with . This yields the estimators , , and . We use as a final estimator of which always has a smaller risk than (see [25, 48]). We use for the sample size required for an ideal estimator where is known to achieve the rate of the plugin estimator. The upper bounds below take the form
(15) 
When we use instead the restriction , we refer to (15’).
4.2. Logarithmic rates when is a power
The first result below involves, for all and , the inverse of which is such that, for all , is increasing.
Theorem 1.
Let , , , , , , for , and . (15) holds with in the following cases

, , , , and ,

, , , and
Comments
There are no comments yet.