Predicting Positive and Negative Links with Noisy Queries: Theory & Practice
Social networks and interactions in social media involve both positive and negative relationships. Signed graphs capture both types of relationships: positive edges correspond to pairs of "friends", and negative edges to pairs of "foes". The edge sign prediction problem, that aims to predict whether an interaction between a pair of nodes will be positive or negative, is an important graph mining task for which many heuristics have recently been proposed [Leskovec 2010]. We model the edge sign prediction problem as follows: we are allowed to query any pair of nodes whether they belong to the same cluster or not, but the answer to the query is corrupted with some probability 0<q<1/2. Let δ=1-2q be the bias. We provide an algorithm that recovers all signs correctly with high probability in the presence of noise for any constant gap δ with O(n n/δ^4) queries. Our algorithm uses breadth first search as its main algorithmic primitive. A byproduct of our proposed learning algorithm is the use of s-t paths as an informative feature to predict the sign of the edge (s,t). As a heuristic, we use edge disjoint s-t paths of short length as a feature for predicting edge signs in real-world signed networks. Our findings suggest that the use of paths improves the classification accuracy, especially for pairs of nodes with no common neighbors.