1 Introduction
Interactions among different entities in many realworld complex systems are often represented by networks, where the entities are represented by nodes and the interactions among them are represented as links between entities. For example, the information contained in online social networks proved to be valuable in advertising applications such as finding influential users to targeted marketing. Data acquisition is done using Application Programming Interfaces (APIs) offered by respective social networking services. Using these APIs is often time consuming and the number of nodes (e.g., profiles) that can be queried within a given time is restricted. A poorly constructed incomplete network will lead to inaccurate findings. This highlights the importance of acquiring more information as possible using a limited number of queries.
Here, we provide an overview of Adaptive Graph Exploration problem. We formally define it in section 3. Suppose we are given a partially observed network. For instance, a sample of a social network collected by a researcher. Since we do not know how this sample is obtained, only way to enhance this sample is by acquiring data belonging to the unseen portion of the network. We use the term probing to refer to querying a node to retrieve information about it and its neighborhood. As an example, probing a node of a social network corresponds to obtaining information about a profile and its friends (or followers) using an API or a web service. Several rounds of probing updates the sample with new nodes and links found in the neighborhood of queried nodes. The number of times the network can be probed is restricted by a probing budget. Thus, the goal is to enhance the observed graph as much as possible within the probing budget.
Two approaches have been proposed to solve the problem of reducing the incompleteness of partially observed networks. First approach involves inferring properties of the unseen part of the network using knowledge of the sample. Such methods infers the missing information by fitting a model of network structure to the observed part [kim2011network]
. However, this is not practical for realworld networks as such methods require more structural information about the complete network. Second approach is acquiring more information by probing as we propose in this paper. Existing heuristic algorithms such as maximum observed degree (MOD) probing and maxreach
[soundarajan2016maxreach] require the sample to be obtained in a certain way (e.g., uniform edge sampling). In section 4 we show that existing probing algorithms can not be generalized for incomplete networks obtained by different sampling techniques. Furthermore, many real world networks consist of communities, densely connected regions of nodes. Heuristic probing algorithms get stuck inside communities, making them worse than probing a node in random.Our Work.
A high level overview of the proposed adaptive probing algorithm is illustrated in Figure 1. The probing pipeline consists of two major steps, obtaining a feature representation of the observed network and a model which predicts the reward a node will reveal (e.g., the true degree of that node) based on its feature vector. The key assumption of using a learning model is that nodes with similar features in the observed network will result in similar rewards. Our choice of graph features is motivated by work on inferring structural role [henderson2012rolx] and social status [zhao2013inferring] of nodes in social networks.
One property which makes estimation of rewards different from a normal prediction problem is that our training data is accumulated over the process of probing. Probing nodes with similar features all the time may result in suboptimal results. This situation is known in reinforcement learning literature as
explorationexploitation trade off. Multiarmed bandits [robbins1952some] is a generic way to approach realworld exploitationexploration problems. In this context, exploitation corresponds to selecting the node which has the largest expected reward and exploration corresponds to selecting some other node for probing.Our contributions are threefold:

A generic approach for enhancing partially observed networks which does not require any prior knowledge about the network.

A novel nonparamteric UCB algorithm (KNNUCB) to solve the multiarmed bandit problem (MAB) when the arms are represented in a vector space. ^{1}^{1}1source code available at https://bitbucket.org/kau_mad/bandits/src/pkdd2018/

Using KNNUCB algorithm on synthetic networks and realworld networks from different domains, we demonstrate that our proposed method performs significantly better than existing methods. ^{2}^{2}2source code available at https://bitbucket.org/kau_mad/net_complete/src/pkdd2018
The rest of the paper is structured as following. In section 2, we provide an extensive review of related work. section 3 starts with the problem definition and describes our approach in detail. section 4 explains the experimental setup and the data sets being used. Then, in section 5 we present empirical evaluations of our bandit algorithm using realworld networks as well as synthetic networks. Finally, section 6 concludes with a brief discussion of the bandit approach and a few promising directions as future work.
2 Related Work
2.1 Network Crawling and Sampling
Although this problem looks similar to network crawling and sampling, objective of most sampling algorithms is to select a representative subset of the nodes (or edges) when the entire network is accessible [ahmed2014network]. In contrast, we are improving a given incomplete network and we have no knowledge of how the sample is being obtained. Particularly, snowball sampling [lee2006statistical] can be used when the information about the complete network is not accessible. But it suffers from the same drawbacks as of heuristic algorithms; it does not adapt as the observed information updates. As another related problem, link prediction [liben2007link] can predict missing links on a network, but not missing regions of nodes. The only way to enhance the observed sample is by iteratively querying observed nodes and adding their neighboring nodes to the sample.
2.2 Active Search
Active search on graphs [wang2013active, bilgic2010active] is another related problem with the objective of finding as much target nodes as possible possessing a given property. Most of the previous work relating to this problem assume that the complete graph is observable and any node can be queried to find its label [ma2015active]. If only an incomplete view is available, relying only on the observed information may not obtain the best possible reward. In addition to exploitation of the best option according to available information, exploration of other possible options is performed to achieve better rewards. A common approach to finding a balance between exploitation vs exploration tradeoff is formulating it as a multiarmed bandit problem (MAB) [mahajan2008multi]. SNUCB1[bnaya2013social] and NETEXP[singla2015information] are such MAB based active search algorithms proposed for partially observed networks. Probing a node in NETEXP reveals 2hop neighborhood, which is not true for real world social networks. SNUCB1 does not provide a significant improvement over the existing heuristic methods. soundarajan2017varepsilon recently proposed WGX, a multiarmed bandit approach to solve Active Edge Probing (AEP) problem in incomplete networks. Though AEP looks similar, it is fundamentally different from ours as a node can be probed multiple times and only one neighboring edge is revealed in each probe.
3 Proposed Bandit Based Probing Method
We start this section with the formal definition of the problem. Then we describe the main components of this work and the multiarmed bandit algorithm in detail.
3.1 Problem Definition
Suppose there is a large unweighted undirected graph which can not be observed fully, but only a partially observed network is available. We denote the initial incomplete network as . Our goal is to grow this network by probing any of the observed nodes at each time step. Using this notation we denote the observed network at time as . Table 1 lists the notation that we will be using in this section.
Symbol  Definition 

original network  
observed network at time  
set of candidate nodes at time  
probing budget 
Definition 1.
Probing a node reveals all links incident to it and the identity of its neighboring nodes.
The number of times we are allowed to probe the network is constrained by the probing budget ()
Definition 2.
At time , a node in the original network can belong to any of the following three sets.

unobserved: existence of these nodes is not visible to the algorithm.

observed: these nodes exist in both and , but has not being probed.

probed: the algorithm knows about these nodes and their neighboring nodes.
illustrates an example incomplete network. We use bold lines to denote observed links and dash lines to denote unobserved links at the given moment. Even though nodes
and are observed when node is probed, [] link is not observed because neither nodes are probed.An observed node can either be probed or not probed at the moment. Any observed node which is not probed is considered as a candidate for probing. Hence, we refer such nodes as candidate nodes. At the beginning, all the nodes in the given sample are candidate nodes. Probing a candidate node reveals a reward (eg. true degree of a node). Our goal is iteratively selecting b candidate nodes that maximizes the cumulative reward (i.e., number of observed nodes).
3.2 Calculation of expected reward of candidate nodes
Instead of using a heuristic metric to choose a candidate node for probing in each time step, we treat this problem as a learning problem. Similar to an active exploration algorithm, our proposed solution consists of three high level steps [pfeiffer2014active]: probing, learning, and prediction. Probing a node results in additional information about the observed network. Information about the currently observed network is leveraged to learn a predictive model which predicts the expected reward of a given candidate node in future. Our approach assumes that candidate nodes with similar structural neighborhoods will result in similar rewards.
Suppose that the feature vector of a candidate node at time is . The learner probes node at time and observes the following reward
where gives the expected reward of a given node and
is subgaussian white noise with mean 0 and variance
.Assumption 1.
(Lipschitz condition): There exists a constant such that for all , . is a metric which defines the “distance” between two vectors and .
Assumption 1 expresses that nodes which are similar in terms of their feature vectors will have similar rewards. In the next section, we describe in detail how we formulate this problem as a multiarmed bandit problem.
3.3 Bandit Algorithm
3.3.1 Problem Setting
In the classical contextual multiarmed bandit problem, an agent selects one of the
arms (or actions) at each time step and observes a reward depending on the chosen action. In this setting, each arm is assumed to be independent, the rewards are drawn randomly from a probability distribution that is specific to each arm. The goal of the agent is to play a sequence of actions which maximizes the cumulative reward it receives within a given number of time steps.
Selecting a node from the set of candidate nodes at time step for probing is similar to pulling an arm in a multiarmed bandit problem. However, the classical notion of Karmed bandit problem assumes that the set of arms would not change over time and requires each arm to be played several times. In contrast, the set of candidate nodes change as probings occur over time. And more importantly, a node can not be probed for a second time.
As independent assumption does not hold in our problem setting, it is more suitable to express it as a structured bandits problem, in which reward distributions of arms are not independent, but interrelated. In structured bandit problem, the agent deduces relationship between arms based on some dimensional feature vector assigned to an arm .
3.3.2 KNNUCB algorithm for structured bandits
Linear bandits[rusmevichientong2010linearly, Dani2008] the simplest among such models, assumes the reward is linearly dependent on feature vectors and computes the expected reward of an arm by the inner product of its feature vector and a parameter vector . But real data often exhibits more complicated relationships than a linear one. Hence, we choose nearest neighbor (kNN) regression to estimate the expected reward of arms. We adapt guan2018nonparametric’s karmed KNNUCB algorithm to the structured setting. Upper confidence bound [auer2002using] (UCB) algorithms incorporate an exploration term by calculating a confidence bound for each arm and choose the action corresponding to the largest confidence bound.
We define nearest neighbor upper confidence bound (KNNUCB) rule as
(1) 
where is a constant determining the amount of exploration.
Definition 3.
Let the NN radius of be where . NN set of be . Expected reward of arm , is estimated with weighted NN regression as
(2) 
where is the observed reward for and is the euclidean distance between feature vectors and .
We define as the average distance to points in the kneighborhood,
(3) 
The term is analogous to the term accounting for the number of times action has been chosen by the time . The way the network is being probed using KNNUCB is shown in algorithm 1.
3.3.3 Regret
The objective of a bandit algorithm is to select arms so as to maximize the cumulative reward over time. Minimization of total regret, is an equivalent way of expressing maximization of cumulative reward. The regret at iteration equals to the difference between reward of the “optimal” arm and the reward of a suboptimal arm. In simple terms, regret is the loss incurred by the policy for not playing the optimal arm all the times. In iterations, we pull arms and we observe rewards . We use the following notion of regret
Theorem 3.1.
Let be an arbitrary constant. Then the regret is sublinear with, .
Proof.
The regret for bandits in a continuous feature space is
(4) 
Let be
Using Lipschitz assumption
(5) 
(6) 
With
(9)  
(10) 
Hence, the regret is sublinear. ∎
Remark 1.
If we select , we can write eq. 5 as
(11) 
4 Experiments
We construct the feature vector of candidate node as a vector of following features. For each feature, the local neighborhood of node in the observed graph is considered.

degree centrality

average degree centrality of its neighbors

median degree centrality of its neighbors

the average percentage of probed neighbors found in the neighborhood
These features are chosen because their effectiveness is shown in previous work on finding structurally similar nodes [henderson2012rolx].
4.1 Data
We use simulated network data as well as publicly available^{3}^{3}3http://snap.stanford.edu/data/index.html realworld data sets of social and information networks.
4.1.1 Synthetic data.
The aim of using synthetic networks is to investigate the behavior of the proposed method on networks with different network configurations. We use two random network models, BarabasiAlbert model (BA) [barabasi1999emergence] and LancichinettiFortunatoRadicchi (LFR) [lancichinetti2008benchmark] benchmark to create networks with different characteristics. All these networks have the same number of nodes (, the number of nodes in the HepPh citation network. BA model generates networks with powerlaw degree distributions. But realworld communication networks possess different properties such as homophily [mcpherson2001birds] which can not be represented by a BA model. We use LFR model to generate networks with community structure. The mixing parameter of LFR model decides the probability of a node linking other nodes belonging to different communities. Low values of will result in dense communities as the chance of having intracommunity links () is higher compared to the chance of intercommunity links (). We created LRF benchmark networks with varying the value of in the range [0.1, 0.5] to investigate the impact of underlying community structure of a network on our method.
4.1.2 Realworld data.
Table 2 gives a summary of the seven realworld network data sets we use. In citation networks, if a paper cites another paper , the network contains an undirected edge connecting paper and paper . Similarly, coauthorship networks represent authors as nodes and two authors are connected if they have published at least one paper together. Nodes of the network Enronemail are email addresses of Enron employees. If user has sent at least one email to the user , nodes and are connected by an undirected edge. Twitter data set is made of 1000 egonetworks consisting of 4,869 Twitter lists [leskovec2012learning]. Epinions, and Slashdot can be considered as web of trust networks. Even though Epinion and Slashdot networks are often labeled as online social networks, they differ from the usual notion of social networks as they represent whotrustwhom data of users instead of the relationships or interaction among users. In these networks, a user tags another user as trustworthy or not. They are sparse compared to online social networks.
HepPh  HepTh  Epinions  Stanford  AstroPh  DBLP  Slashdot  

Type  citation  citation  web  social  web  CA  CA  web 
Nodes  34,546  27,770  75,789  81,306  281,903  18,772  317,080  82,168 
Edges  421,578  352,807  508,837  1,768,149  2,312,497  198,110  1,049,866  549,202 
Avg Clustering  0.2848  0.3120  0.1378  0.5653  0.5976  0.6306  0.6324  0.0603 
4.2 Impact of Initial Sampling Method
To investigate how the sampling method used to acquire the initial sample influence the probing methods, we generate graph samples using two sampling methods. These are the methods we use:

Random node sampling (RN): At each step we choose one neighbor of a node already in the sample.

Breadthfirst search (BFS): Nodes are added to the sample in the order they are observed.
4.3 Methods
We compare the performance of our algorithm against the following algorithms.
4.3.1 Algorithms that do not use node features

Random walk (RW). In this trivial baseline, we select one of the candidate nodes randomly for probing. This is equivalent to running our Bandit Explorer algorithm with only one cluster and using the random strategy for node selection.

Maximum observed degree (MOD). This greedy method proposed in [avrachenkov2014pay] is the current stateoftheart algorithm for finding the network cover in an online manner.
4.3.2 Algorithms that use node features

LinUCB. This applies the UCB algorithm by Dani2008 assuming that the reward of an arm is linearly dependent on its feature vector.

KNNgreedy. This algorithm chooses the arm corresponding to the largest expected reward calculated by kNN model.

KNNgreedy. This algorithm chooses a random arm with probability while selecting the arm with kNN regression selects the arm rest of the times.

KNNUCB This is our proposed algorithm, algorithm 1.
5 Results
5.1 Analysis on Synthetic Networks
We probe incomplete BA and LFR networks obtained by RN and BFS sampling for 1,000 iterations (). Number of nodes observed in the BA network is shown in Figure 3. For all networks generated by BarabasiAlbert (BA) model, MOD could observe more nodes than bandit algorithm. This confirms avrachenkov2014pay’s claim that MOD probing can achieve the best connected network cover for networks generated by preferential attachment processes.
To understand how the existence of community structure impacts the probing, we evaluate the performance of all algorithms on synthetic networks generated by different configurations of LFR benchmark model [lancichinetti2008benchmark]. We vary the mixing parameter from 0.1 to 0.5 keeping all other parameters of the model constant (, , average degree = 25). KNNUCB significantly outperforms the baseline for networks with smaller . When the initial sample is obtained by BFS sampling, KNNUCB outperforms all baselines by a significant margin. The gap between KNNUCB and the baseline is larger when the mixing parameter is small, network has significant community structure. The experimental results on synthetic networks suggest that KNNUCB algorithm can adapt for incomplete networks obtained by different sampling techniques and networks with structural properties such as community structure.
5.2 Results on Real World Networks
We use 8 realworld networks mentioned in Table 2 and generate RN and BFS samples containing 5% nodes of the original network . Then 1,000 probing steps are performed. We perform each experiment five times initialized with different random seeds and report the average number of additional nodes which were observed in Figure 5 and Figure 6.
KNNUCB and LinUCB bandit algorithms outperform all baseline methods in networks generated by both RN and BFS sampling. Even though LinUCB bandit algorithm observes as much nodes as KNNUCB for RN samples, its performance is worse for BFS samples. This shows that linear model in LinUCB is not capable of learning the relationship between observed node features and the true degree of a node if the sample is constructed by a BFS.
6 Conclusions
In this paper, we introduced a bandit based exploration algorithm for partially observed incomplete networks. We proposed a novel nonparametric multiarmed bandit algorithm KNNUCB with sublinear regret. Compared to existing solutions for the Adaptive Graph Exploring problem, the proposed method does not depend on a specific heuristic. Additionally, KNNUCB bandit algorithm outperforms the baseline methods irrespective of how the initial incomplete network is obtained. We provided experimental evidence for our approach using synthetic networks and variety of realworld networks. Using different configurations of LFR benchmark networks, we observed that our algorithm outperforms all other baselines significantly when the network exhibits community structure prominently. Since the reward function is independent from the probing procedure, it is easy to define a new reward function to solve a different graph exploration problem (eg. finding a particular type of nodes).
In this problem, we assumed that probing a node would reveal all its neighboring nodes. However in some realworld scenarios, only a certain number of neighbors is revealed (e.g., follower limit in Twitter API ^{4}^{4}4https://dev.twitter.com/rest/reference/get/followers/ids). As future work, we would explore how this current approach can be changed for such different settings of the same problem.
Comments
There are no comments yet.