Individual Heterogeneity Learning in Distributional Data Response Additive Models
In many complex applications, data heterogeneity and homogeneity exist simultaneously. Ignoring either one will result in incorrect statistical inference. In addition, coping with complex data that are non-Euclidean becomes more common. To address these issues we consider a distributional data response additive model in which the response is a distributional density function and the individual effect curves are homogeneous within a group but heterogeneous across groups, the covariates capturing the variation share common additive bivariate functions. A transformation approach is first utilized to map density functions into a linear space. We then apply the B-spline series approximating method to estimate the unknown subject-specific and additive bivariate functions, and identify the latent group structures by hierarchical agglomerative clustering (HAC) algorithm. Our method is demonstrated to identify the true latent group structures with probability approaching one. To improve the efficiency, we further construct the backfitted local linear estimators for grouped structures and additive bivariate functions in post-grouping model. We establish the asymptotic properties of the resultant estimators including the convergence rates, asymptotic distributions and the post-grouping oracle efficiency. The performance of the proposed method is illustrated by simulation studies and empirical analysis with some interesting results.
READ FULL TEXT