Information-based Optimal Subdata Selection for Clusterwise Linear Regression

09/01/2023
by   Yanxi Liu, et al.
0

Mixture-of-Experts models are commonly used when there exist distinct clusters with different relationships between the independent and dependent variables. Fitting such models for large datasets, however, is computationally virtually impossible. An attractive alternative is to use a subdata selected by “maximizing" the Fisher information matrix. A major challenge is that no closed-form expression for the Fisher information matrix is available for such models. Focusing on clusterwise linear regression models, a subclass of MoE models, we develop a framework that overcomes this challenge. We prove that the proposed subdata selection approach is asymptotically optimal, i.e., no other method is statistically more efficient than the proposed one when the full data size is large.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset