Winning Models for GPA, Grit, and Layoff in the Fragile Families Challenge

05/29/2018
by   Daniel E Rigobon, et al.
0

In this paper, we discuss and analyze our approach to the Fragile Families Challenge. The challenge involved predicting six outcomes for 4,242 children from disadvantaged families from around the United States. The data consisted of over 12,000 features (covariates) about the children and their parents, schools, and overall environments from birth to age 9. Our approach relied primarily on existing data science techniques, including: (1) data preprocessing: elimination of low variance features, imputation of missing data, and construction of composite features; (2) feature selection through univariate Mutual Information and extraction of non-zero LASSO coefficients; (3) three machine learning models: Random Forest, Elastic Net, and Gradient-Boosted Trees; and finally (4) prediction aggregation according to performance. The top-performing submissions produced winning out-of-sample predictions for three outcomes: GPA, grit, and layoff. However, predictions were at most 20 training data of each outcome.

READ FULL TEXT

page 17

page 24

page 27

page 28

page 38

research
04/02/2019

UAFS: Uncertainty-Aware Feature Selection for Problems with Missing Data

Missing data are a concern in many real world data sets and imputation m...
research
04/06/2021

Variable selection with missing data in both covariates and outcomes: Imputation and machine learning

The missing data issue is ubiquitous in health studies. Variable selecti...
research
11/30/2017

Who wins the Miss Contest for Imputation Methods? Our Vote for Miss BooPF

Missing data is an expected issue when large amounts of data is collecte...
research
09/25/2021

Prediction of MGMT Methylation Status of Glioblastoma using Radiomics and Latent Space Shape Features

In this paper we propose a method for predicting the status of MGMT prom...
research
09/01/2018

Privacy, ethics, and data access: A case study of the Fragile Families Challenge

Stewards of social science data face a fundamental tension. On one hand,...
research
06/07/2018

No Fragile Family Left Behind - Targeted Indicators of Academic Performance

Academic performance is a key component in the development and subsequen...
research
05/29/2021

Dynamic Placement in Refugee Resettlement

Employment outcomes of resettled refugees depend strongly on where they ...

Please sign up or login with your details

Forgot password? Click here to reset