Unsupervised Learning for Submarket Modeling: A Proxy for Neighborhood Change

by   Nick Kunz, et al.

This study focused on submarket modeling with unsupervised learning and geographic information system fundamentals to better understand urbanism at the neighborhood scale. A Spatially Con- strained Weighted-Multivariate Hierarchical Clustering algorithm was trained to identify the optimal number of Multifamily Residential Commercial Real Estate submarkets in Manhattan, New York. The methodology explored Non-Negative Matrix Factorization for predicting the annual normalized values of every Multifamily Residential Commercial Real Estate property in Manhattan, which had been transacted from and including 2004 to 2018. Several extensive data transformations were applied prior to model fitting. A novel conditional random sampling technique was introduced to train and test set split sparse matrices for validating the prediction results. The study utilized a series of optimization techniques, including Leave One Out Cross Validation for estimating the optimal low rank matrix. The results from Non-Negative Matrix Factorization were compared with other imputation methods for sparse matrices, including Simon Funk’s Singular Value Decomposition. Both the observed and imputed values were then clustered on a weighted basis with the Spatially Constrained Weighted-Multivariate Hierarchical Clustering algorithm, using five dierent Agglomerative Hierarchical Clustering linkage methods: Average Linkage, Median Linkage, Centroid Linkage, Complete Linkage, and Ward’s Method. Ward’s Method was found to be the superior linkage method for determining the optimal number of submarkets, when measured by the maximum absolute value dierence between the mean intra-cluster similarity and the maximum inter-cluster dissimilarity. The clustering results indicated that the optimal number of Multifamily Residential Commercial Real Estate submarkets in Manhattan from 2004 to 2018 was 43. The final results were mapped by spatial joining to the intersecting land lot polygons with their respective submarket identifications. This study found that in several cases, there was a strong and obvious presence of multiple submarkets contained within a discrete neighborhood boundary. A speculative discussion was introduced regarding what that might mean for better understanding neighborhood change, possible policy applications, and the future of urbanism.



page 20

page 22

page 23

page 26

page 30


Optimal Bayesian clustering using non-negative matrix factorization

Bayesian model-based clustering is a widely applied procedure for discov...

Clustering US States by Time Series of COVID-19 New Case Counts with Non-negative Matrix Factorization

The spreading pattern of COVID-19 differ a lot across the US states unde...

Techniques for clustering interaction data as a collection of graphs

A natural approach to analyze interaction data of form "what-connects-to...

A deep matrix factorization method for learning attribute representations

Semi-Non-negative Matrix Factorization is a technique that learns a low-...

Automatic Dimension Selection for a Non-negative Factorization Approach to Clustering Multiple Random Graphs

We consider a problem of grouping multiple graphs into several clusters ...

Low-rank Convex/Sparse Thermal Matrix Approximation for Infrared-based Diagnostic System

Active and passive thermography are two efficient techniques extensively...

Hierarchical Subtask Discovery With Non-Negative Matrix Factorization

Hierarchical reinforcement learning methods offer a powerful means of pl...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.