Winning Models for GPA, Grit, and Layoff in the Fragile Families Challenge

05/29/2018
by   Daniel E Rigobon, et al.
0

In this paper, we discuss and analyze our approach to the Fragile Families Challenge. The challenge involved predicting six outcomes for 4,242 children from disadvantaged families from around the United States. The data consisted of over 12,000 features (covariates) about the children and their parents, schools, and overall environments from birth to age 9. Our approach relied primarily on existing data science techniques, including: (1) data preprocessing: elimination of low variance features, imputation of missing data, and construction of composite features; (2) feature selection through univariate Mutual Information and extraction of non-zero LASSO coefficients; (3) three machine learning models: Random Forest, Elastic Net, and Gradient-Boosted Trees; and finally (4) prediction aggregation according to performance. The top-performing submissions produced winning out-of-sample predictions for three outcomes: GPA, grit, and layoff. However, predictions were at most 20 training data of each outcome.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset