Logistic regression and Ising networks: prediction and estimation when violating lasso assumptions
The Ising model was originally developed to model magnetisation of solids in statistical physics. As a network of binary variables with the probability of becoming 'active' depending only on direct neighbours, the Ising model appears appropriate for many other processes. For instance, it was recently applied in psychology to model co-occurrences of mental disorders. It has been shown that the connections between the variables (nodes) in the Ising network can be estimated with a series of logistic regressions. This naturally leads to questions of how well such a model predicts new observations and how well parameters of the Ising model can be estimated using logistic regressions. Here we focus on the high-dimensional setting with more parameters than observations and consider violations of assumptions of the lasso. In particular, we determine the consequences for both prediction and estimation when the sparsity and restricted eigenvalue assumptions are not satisfied. We explain by using the idea of connected copies (extreme multicollinearity) the fact that prediction becomes better when either sparsity or multicollinearity is not satisfied. We illustrate these results with simulations.
READ FULL TEXT