Efficient Feature Selection With Large and High-dimensional Data
Driven by the advances in technology, large and high-dimensional data have become the rule rather than the exception. Approaches that allow for feature selection with such data are thus highly sought after, in particular, since standard methods, like cross-validated Lasso, can be computationally intractable and, in any case, lack theoretical guarantees. In this paper, we propose a novel approach to feature selection in regression. Consisting of simple optimization steps and tests, it is computationally more efficient than existing methods and, therefore, suited even for very large data sets. Moreover, in contrast to standard methods, it is equipped with sharp statistical guarantees. We thus expect that our algorithm can help to leverage the increasing volume of data in Biology, Public Health, Astronomy, Economics, and other fields.
READ FULL TEXT