Ultrahigh dimensional instrument detection using graph learning: an application to high dimensional GIS-census data for house pricing

07/30/2020
by   Ning Xu, et al.
0

The exogeneity bias and instrument validation have always been critical topics in statistics, machine learning and biostatistics. In the era of big data, such issues typically come with dimensionality issue and, hence, require even more attention than ever. In this paper we ensemble two well-known tools from machine learning and biostatistics – stable variable selection and random graph – and apply them to estimating the house pricing mechanics and the follow-up socio-economic effect on the 2010 Sydney house data. The estimation is conducted on an over-200-gigabyte ultrahigh dimensional database consisting of local education data, GIS information, census data, house transaction and other socio-economic records. The technique ensemble carefully improves the variable selection sparisty, stability and robustness to high dimensionality, complicated causal structures and the consequent multicollinearity, which is ultimately helpful on the data-driven recovery of a sparse and intuitive causal structure. The new ensemble also reveals its efficiency and effectiveness on endogeneity detection, instrument validation, weak instruments pruning and selection of proper instruments. From the perspective of machine learning, the estimation result both aligns with and confirms the facts of Sydney house market, the classical economic theories and the previous findings of simultaneous equations modeling. Moreover, the estimation result is totally consistent with and supported by the classical econometric tool like two-stage least square regression and different instrument tests (the code can be found at https://github.com/isaac2math/solar_graph_learning).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2020

Accuracy and stability of solar variable selection comparison under complicated dependence structures

In this paper we focus on the variable-selection peformance of solar on ...
research
06/15/2020

Comparative Analysis of Economic Instruments in Intersection Operation: A User-Based Perspective

Focusing on different economic instruments implemented in intersection o...
research
01/10/2013

Semi-Instrumental Variables: A Test for Instrument Admissibility

In a causal graphical model, an instrument for a variable X and its effe...
research
04/02/2023

TSCI: two stage curvature identification for causal inference with invalid instruments

TSCI implements treatment effect estimation from observational data unde...
research
09/29/2020

The Illusion of the Illusion of Sparsity: An exercise in prior sensitivity

The emergence of Big Data raises the question of how to model economic r...
research
07/20/2020

Variable Selection in Macroeconomic Forecasting with Many Predictors

In the data-rich environment, using many economic predictors to forecast...
research
11/12/2020

Mostly Harmless Machine Learning: Learning Optimal Instruments in Linear IV Models

We provide some simple theoretical results that justify incorporating ma...

Please sign up or login with your details

Forgot password? Click here to reset