Unsupervised Learning for Submarket Modeling: A Proxy for Neighborhood Change
This study focused on submarket modeling with unsupervised learning and geographic information system fundamentals to better understand urbanism at the neighborhood scale. A Spatially Con- strained Weighted-Multivariate Hierarchical Clustering algorithm was trained to identify the optimal number of Multifamily Residential Commercial Real Estate submarkets in Manhattan, New York. The methodology explored Non-Negative Matrix Factorization for predicting the annual normalized values of every Multifamily Residential Commercial Real Estate property in Manhattan, which had been transacted from and including 2004 to 2018. Several extensive data transformations were applied prior to model fitting. A novel conditional random sampling technique was introduced to train and test set split sparse matrices for validating the prediction results. The study utilized a series of optimization techniques, including Leave One Out Cross Validation for estimating the optimal low rank matrix. The results from Non-Negative Matrix Factorization were compared with other imputation methods for sparse matrices, including Simon Funk’s Singular Value Decomposition. Both the observed and imputed values were then clustered on a weighted basis with the Spatially Constrained Weighted-Multivariate Hierarchical Clustering algorithm, using five di erent Agglomerative Hierarchical Clustering linkage methods: Average Linkage, Median Linkage, Centroid Linkage, Complete Linkage, and Ward’s Method. Ward’s Method was found to be the superior linkage method for determining the optimal number of submarkets, when measured by the maximum absolute value di erence between the mean intra-cluster similarity and the maximum inter-cluster dissimilarity. The clustering results indicated that the optimal number of Multifamily Residential Commercial Real Estate submarkets in Manhattan from 2004 to 2018 was 43. The final results were mapped by spatial joining to the intersecting land lot polygons with their respective submarket identifications. This study found that in several cases, there was a strong and obvious presence of multiple submarkets contained within a discrete neighborhood boundary. A speculative discussion was introduced regarding what that might mean for better understanding neighborhood change, possible policy applications, and the future of urbanism.
READ FULL TEXT 
  
  
     share
 share