Supervising Unsupervised Learning
We introduce a framework to leverage knowledge acquired from a repository of (heterogeneous) supervised datasets to new unsupervised datasets. Our perspective avoids the subjectivity inherent in unsupervised learning by reducing it to supervised learning, and provides a principled way to evaluate unsupervised algorithms. We demonstrate the versatility of our framework via simple agnostic bounds on unsupervised problems. In the context of clustering, our approach can help choose the number of clusters, the clustering algorithm, and provably circumvents Kleinberg's impossibility result. Experimental results across hundreds of problems demonstrate improved performance on unsupervised data with simple algorithms, despite the fact problems come from different domains. Additionally, a deep learning algorithm learns common features from many small datasets across multiple domains.
READ FULL TEXT