Imputing Missing Values in the Occupational Requirements Survey

01/24/2022
by   Terry Leitch, et al.
0

The U.S. Bureau of Labor Statistics allows public access to much of the data acquired through its Occupational Requirements Survey (ORS). This data can be used to draw inferences about the requirements of various jobs and job classes within the United States workforce. However, the dataset contains a multitude of missing observations and estimates, which somewhat limits its utility. Here, we propose a method by which to impute these missing values that leverages many of the inherent features present in the survey data, such as known population limit and correlations between occupations and tasks. An iterative regression fit, implemented with a recent version of XGBoost and executed across a set of simulated values drawn from the distribution described by the known values and their standard deviations reported in the survey, is the approach used to arrive at a distribution of predicted values for each missing estimate. This allows us to calculate a mean prediction and bound said estimate with a 95 confidence interval. We discuss the use of our method and how the resulting imputations can be utilized to inform and pursue future areas of study stemming from the data collected in the ORS. Finally, we conclude with an outline of WIGEM, a generalized version of our weighted, iterative imputation algorithm that could be applied to other contexts.

READ FULL TEXT

page 5

page 20

page 24

page 25

research
07/13/2019

Leveraging Auxiliary Information on Marginal Distributions in Nonignorable Models for Item and Unit Nonresponse

When handling nonresponse, government agencies and survey organizations ...
research
02/20/2023

Transformed Distribution Matching for Missing Value Imputation

We study the problem of imputing missing values in a dataset, which has ...
research
02/19/2019

On the consistency of supervised learning with missing values

In many application settings, the data are plagued with missing features...
research
07/06/2020

Multiple Imputation with Massive Data: an Application to the Panel Study of Income Dynamics

Multiple imputation (MI) is a popular and well-established method for ha...
research
06/07/2023

Improving Survey Inference in Two-phase Designs Using Bayesian Machine Learning

The two-phase sampling design is a cost-effective sampling strategy that...
research
11/25/2019

Modeling Variables with a Detection Limit using a Truncated Normal Distribution with Censoring

When data are collected subject to a detection limit, observations below...
research
11/15/2019

Imputing missing values with unsupervised random trees

This work proposes a non-iterative strategy for missing value imputation...

Please sign up or login with your details

Forgot password? Click here to reset