Mixed Membership Graph Clustering via Systematic Edge Query
This work considers clustering nodes of a largely incomplete graph. Under the problem setting, only a small amount of queries about the edges can be made, but the entire graph is not observable. This problem finds applications in large-scale data clustering using limited annotations, community detection under restricted survey resources, and graph topology inference under hidden/removed node interactions. Prior works treated this problem as a convex optimization-based matrix completion task. However, this line of work is designed for learning single cluster membership of nodes belonging to disjoint clusters, yet mixed (multiple) cluster membership nodes and overlapping clusters often arise in practice. Existing works also rely on the uniformly random edge query pattern and nuclear norm-based optimization, which give rise to a number of implementation and scalability challenges. This work aims at learning mixed membership of the nodes of overlapping clusters using edge queries. Our method offers membership learning guarantees under systematic query patterns (as opposed to random ones). The query patterns can be controlled and adjusted by the system designers to accommodate implementation challenges—e.g., to avoid querying edges that are physically hard to acquire. Our framework also features a lightweight and scalable algorithm. Real-data experiments on crowdclustering and community detection are used to showcase the effectiveness of our method.
READ FULL TEXT