Domain-adversarial Network Alignment

08/15/2019 ∙ by Huiting Hong, et al. ∙ University of Technology Sydney Beijing Institute of Technology 7

Network alignment is a critical task to a wide variety of fields. Many existing works leverage on representation learning to accomplish this task without eliminating domain representation bias induced by domain-dependent features, which yield inferior alignment performance. This paper proposes a unified deep architecture (DANA) to obtain a domain-invariant representation for network alignment via an adversarial domain classifier. Specifically, we employ the graph convolutional networks to perform network embedding under the domain adversarial principle, given a small set of observed anchors. Then, the semi-supervised learning framework is optimized by maximizing a posterior probability distribution of observed anchors and the loss of a domain classifier simultaneously. We also develop a few variants of our model, such as, direction-aware network alignment, weight-sharing for directed networks and simplification of parameter space. Experiments on three real-world social network datasets demonstrate that our proposed approaches achieve state-of-the-art alignment results.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Network alignment seeks to find the correspondence of nodes (a.k.a. anchor links) across two or more networks. It is of importance in a wide variety of fields. For instance, network alignment can be applied to connecting identical users across different social network medias (refer to as different domains in the sequel). The established user correspondence could alleviate the sparsity issue of analyzing individual social networks with information fusion, benefiting applications such as preferred link prediction and cross-domain recommendation. Similarly, network alignment can help construct a more compact knowledge graph based on the existing vertical or cross-lingual knowledge bases, thus to obtain better knowledge inference. In Bioinformatics, aligning protein-protein interaction networks from different species has been widely studied in order to determine the common functional structures.

Regarding the network alignment task, there exists a basic assumption that affiliated nodes should have a consistent connectivity structure across the different networks. The approaches exploring the topological consistency offer a universal solution to the alignment task, since the informative node attributes are usually unavailable in reality. Recently, representation learning of networks a.k.a. network embedding has provided a means to obtain low-dimensional representations of nodes by exploiting the structural information of the network. Then, the network alignment could be performed by exploring a common low-dimensional subspace of networks or a subspace transformation between networks.

(a) SNNA
(b) IONE
Figure 1. An SVM trained domain classification on 2D representations of vertices obtained by existing alignment approaches in Douban-weibo dataset.

However, in the literature, existing embedding-based alignment methods, e.g. SNNA (Li et al., 2019) and IONE (Liu et al., 2016), fail to explicitly capture domain-invariant features, which therefore suffer from domain representation bias w.r.t. the network alignment task111In this paper, the domain representation bias refers to the domain-dependent features which are irrelevant to the specific task but is able to represent domains. For example, RGB value could be thte key feature to distingish from colorful digits and grayscale digits, but shouldn’t be the key feature to disignuish from each digit.. Most network-embedding approaches tend to obtain the local structures and high-order structures simultaneously in the embedded space. For example, IONE leveraged LINE (Tang et al., 2015) to preserve the second-order proximity explicitly and retain high-order structures implicitly via linkage propagation. The learned embedding therefore includes domain-dependent signals, which may be suitable for distinguishing between the domains/networks, but is inborn defective for the alignment task due to inadequate learning of domain-invariant features.

Fig.1(a) and 1(b) show the 2D representations of nodes of two networks (Douban and Weibo), which are obtained from two state-of-the-art network alignment approaches SNNA(Li et al., 2019) and IONE(Liu et al., 2016) respectively. For clarity, we only plot 2000 vertices randomly sampled from the test set. The experimental setup is consistent with that described in Sec.4. The decision boundaries of SVM is shown in the background color. The SVM domain classifiers are trained on the learned representations and the testing accuracies are 0.99 and 0.95 respectively. We believe that the representations somehow encoded the domain-dependent feature, for example, the signal of the average node degree (the average node degree of Douban is twice that of Weibo, see Table 1). And we argue that such domain-dependent features learned by existing network alignment approaches are not informative to align the networks, as the domain of each network is previously known to the alignment task. And sometimes the domain-dependent features may even lead to an inferior alignment performance. Thus, suppressing the learning of domain-dependent features/domain representation bias to lead the representations of nodes more task-specific to boost the alignment performance is the basic motivation in this paper.

In the literature, there are some existing works which introduce domain-dependent features and domain-independent features in pursuit of better performance for cross-domain tasks, e.g., cross-domain sentiment analysis and image segmentation

(Weiss et al., 2016)

. These features are usually learned through manual selection or (and) feature augmentation, which is applicable in the field of natural language processing and image processing, where explicit semantics and rich attributes are accessible

(Pan et al., 2010). However, it cannot be applied to network embedding, where only structural information is available.

Inspired by the recent advancement of domain adaptation learning (Ganin et al., 2016; Xie et al., 2017), which is trying to obtain features that are invariant to the change of domains, we propose to incorporate an adversarial learning of domain classifier into the process of network embedding within an alignment framework to suppress the generation of the domain-dependent features for better alignment performance. The framework - Domain-Adversarial Network Alignment (DANA) mainly consists of two components, namely, task-driven network embedding module and adversarial domain classifier.

In this paper, the task-driven embedding of networks is accomplished via graph convolutional networks (GCNs) (Kipf and Welling, 2016; Defferrard et al., 2016), known as being powerful on graph-structured data. Instead of enforcing the anchors’ representations to be same as in most existing works, e.g., IONE, we maximize a posterior probability distribution of anchors over the parameter space to supervise GCNs in pursuit of a more flexible network representation. On the other hand, the embedding process is also supervised by the adversarial domain classifier, which is meant to perform an adversarial learning of the domain classifier to obtain the domain-invariant features w.r.t. the alignment task. That is to say, the framework is optimized in order to minimize the loss of the alignment and maximize the loss of the domain classifier simultaneously.

To better deal with the alignment task involved with directed networks, e.g., Twitter where follower-followee relations222In twitter, someone is following you does not mean that you are necessarily following them back. In contrast, the friendship on Facebook is always bidirectional, meaning that the contact graph is undirected. are maintained on purpose in Twitter to constitute a directed network/graph, we further adapt the framework by developing a direction-aware structure to characterize the directed edges in networks. Moreover, weight-sharing within the network embedding module is facilitated to obtain similar subspaces for each domain/network, which generally benefits the alignment determination, while reducing the number of parameters to speed up the training process. a t

The main contributions of this paper can be summarized as follows:

  • We propose a representation learning-based adversarial framework to perform the network alignment tasks. Unlike most existing approaches which formulate the alignment task as the mapping problem between networks, the adversarial learning adopted here is to steer the feature extraction towards alignment tasks by suppressing the domain-dependent features which are considered task-unrelated for network alignment. To best of our knowledge, we are the first to argue that it is helpful to eliminate/suppress the domain-dependent features to improve the performance of network alignment.

  • The mathematical models and deductions, and experiments in the paper are specifically tailored to the conventional alignment tasks and tasks involved with directed networks. In particular, the objective function leverages a probabilistic design from a multi-view perspective as the network alignment can be viewed as a bi-directional matching problem. Whereas most of existing approaches adopt an distance-based supervision with the observed anchors.

  • We evaluate the proposed models with detailed experiments on real-world social network datasets. Results demonstrate significant and robust improvements in comparison with other state-of-the-art approaches.

The rest of the paper is organized as follows. Section 2 summarizes the related work. Section 3 illustrates the design and algorithms of vanila GANA, and its variations. Section 4 reports the experimental design and discusses the results. A case study, which illustrates how the framework suppresses the domain-dependent features to boost the alignment task, is also included in Section 4. Section 5 concludes the paper.

Figure 2. The Vanilla Architecture of DANA

2. Related Work

Our work is most related to embedding-based network alignment and adversarial learning.

2.1. Embedding-based Network Alignment

Among the various representation learning-based network alignment approaches, the main difference lies in the way (1) What kind of network embedding approach is leveraged? (2) Whether the multiple networks are projected onto the same low-dimensional subspace?

(Tan et al., 2014) proposed a shallow model MAH to align the network manifolds by modeling social graphs with hypergraphs. The manifolds of social networks are projected onto a common embedded space, then the user mapping can be inferred by comparing the distances of users in the embedding space. To scale up, IONE (Liu et al., 2016) proposed an embedding approach by only considering the “second-order proximity” of local structures to obtain the common low-dimensional subspace of networks, semi-supervised by the observed anchors.

ULink (Mu et al., 2016)

was proposed to explore the concept of “Latent User Space”, the objective of which is to find projections of each network while minimizing the distance between the node and its correspondence among their respective vector spaces. Similarly,

PALE (Man et al., 2016) proposes to embed the networks individually first by leveraging on network embedding approach, e.g., LINE (Tang et al., 2015) or Deepwalk (Perozzi et al., 2014), then to seek an explicit feature space transformation that would map one into the other one. However, the standalone embedding process in a two-phase approach like PALE is designed irrelevant to the alignment task, thus may not include the features which directly benefit the alignment. And all the aforementioned approaches neglect the importance of learning domain-invariant features.

2.2. Adversarial Training of Neural Networks

Generative Adversarial Networks (GANs) (Goodfellow et al., 2014), which plays an adversarial minimax game between the generator and discriminator, frees the users from the painful practice of defining a tricky objective function. GANs shows its impressive potential in various fields/tasks, e.g., natural language processing (Zhang et al., 2017; Wang et al., 2017) and network embedding (Dai et al., 2018; Wang et al., 2018).

Recently, an adversarial training framework DANN (Ganin et al., 2016) was proposed for domain adaption. In particular, DANN introduces a representation learning module for better domain adaptation, in which the adversarial training pushes maximizing the loss of the domain classifier thus to encourage domain-invariant features to dominate the process of minimizing the loss of the label classifier. (Xie et al., 2017) further extended this idea to obtain a controllable invariance through adversarial feature learning. Both two approaches were based on the theory that a good representation for domain adaption is one for which an algorithm cannot identify the domain of its input. This is also the building block of our work.

SNNA (Li et al., 2019) is recently proposed to perform social network alignment via supervised adversarial learning. SNNA is a two-phase approach which first learns the low-dimensional representation for each network via the conventional network embedding, then learns the projection function within a GAN

framework. Supervised by the observed anchors, the generator targets at learning a transformation from one embedding space to another which minimize the Wasserstein distance between the projected source distribution and the target distribution, while the discriminator estimates the distance between two embedding space. In other words, the adversarial learning in

SNNA is used to obtain an optimal projection function between the two subspaces.

In contrast to the two-phase SNNA, our proposed approach performs network representation learning and alignment learning in a unified architecture. The adversarial learning is mainly for the domain classifier to filter away the domain-dependent feature by maximizing the loss of the classifier. Meanwhile, the presentation learning is also task-driven by maximizing the posterior probability of the observed anchors, thus to produce useful feature representations for network alignment.

3. Domain-Adversarial Network Alignment

In this section, we formulate our problem first, and then present a vanilla framework for domain-adversarial network alignment. Its adaptions with weight-sharing for model simplification and a direction-aware structure for directed networks are further introduced.

For the same user in different social networks, namely in network and in network , we denote as a pair of anchors. The network alignment task could be formulated as predicting the anchor pair given two networks and , where , and are the sets of vertices and edges in network respectively. Each vertex is either labeled as or , indicating the network which the vertex belongs to. Note that we argue that domain-dependent features, which are capable to reveal the domain identity, are futile, sometimes detrimental to alignment task. To achieve better alignment performance, we adopt the domain-adversarial training paradigm to train a domain classifier, which helps to extract domain-invariant representations of networks.

3.1. Vanilla Architecture of DANA

The vanilla architecture of DANA consists of two components, namely, task-driven network embedding module and adversarial domain classifier.

Input: network including and , network including and , and the set of anchor seeds .
Hyperparameters: the batch size of vertices ; the batch size of anchor seeds ; the weighting factor ; the regularization factor .

Parameters: the feature extractors : and : where ; the domain classifier parameterized as MLP: .

Output: representations of : ; representations of : .

1:  Randomly initialize {
2:  repeat
3:     Sample a batch of vertices from :
4:     Sample a batch of vertices from :
5:     Sample a batch of anchors from :
6:     Update with Adam Optimizer to minimize:          
7:     Update with Adam Optimizer to minimize:
8:  until convergence
Algorithm 1 Training procedure of DANA

3.1.1. Task-driven Network Embedding

To explore the structural information of networks, we employ GCNs as our task-driven feature extractors. Note that we adopted a GCN for each network (See Fig.2). In the following, we omit the superscript which denotes the identity of the network for simplicity. Given the adjacency matrix of one network, GCN

outputs the corresponding hidden representations

in the -th layer with neurons following the layer-wise propagation rule, namely:

(1)

where . is the convolution kernel, which acts as a spatial filter on network. denotes the diagonal node degree matrix of the network, i.e. and

is the self-connection identity matrix of the network.

denotes the trainable weight matrix of the -th layer.

can be either previously encoded vectors carrying privilege information of the network or randomly initialized. The activation function

is implemented by in our framework following (Kipf and Welling, 2016). Thereby, the GCN module outputs a low-dimensional vector for each network, respectively. To integrate the representation learning into the alignment task, we optimize the network alignment problem by maximizing the following posterior:

(2)

where denotes the collection of anchor pairs. denotes all the parameters of the module, i.e., . The notation definition applies to . Note that the probability expansions for an anchor pair , i.e.:

are both significant to our problem. We abbreviate to , then we have and as the abbreviations of and , respectively. Therefore, we define , which is a popular practice for multi-view problems where all views matter. Further, a Gaussian prior is introduced for the model parameters, i.e. and . The resultant optimization criterion can be derived as follows:

(3)

where and are the constants. Softmax function is used to approximate the likelihood of observing an anchor pair, namely:

(4a)
(4b)

where corresponds to the learned representation of vertex . The same is true for . Due to the summation over the entire set of nodes in Eq.(4a) and Eq.(4b), it will be time-consuming for large scale networks. To reduce the computational complexity, we adopted a sampled softmax function (Jean et al., 2014), which performs the summations over a set of sampled candidates, namely

(5)

The candidate set

is sampled via a log-uniform distribution

. Such operation also applies to Eq.(4b).

3.1.2. Adversarial Domain Classifier

However, the optimization criterion Eq.(3) could not induce purging the task-irrelevant domain feature, which may weaken the professionalism of representations for network alignment. Inspired by the adversarial learning paradigm, we further augment the alignment task-driven network embedding with an adversarial learning to a domain classifier, which is meant to filter away the domain-dependent features while concentrating on extracting alignment-targeted features.

Note that the domain classifier, acting as the discriminator, tries to distinguish which domain a given vertex comes from, while feature extractors, i.e. GCNs in our framework, act as a role of the generator, aiming at learning domain-invariant features from the input data to fool the domain classifier. Technically, the domain classifier and the feature extractor are trained by playing minimax games expressed as follows:

(6)

where denotes the label of the domain belongs to, and is the parameter set of the domain classifier. Note that is the indicator function, which equals to 1 if comes from the domain

and 0 otherwise. We employ an MLP classifier where the last hidden layer is connected to a softmax layer to induce the conditional distribution

.

Referring back to Eq.(3) for the network alignment task, we train and to extract domain-invariant feature representations while maximizing the posterior probability for network alignment with the following form:

(7)

where hyperparameter is a weighting factor to modulate the contribution of . To optimize , and , we incorporate a Gradient Reversal Layer (GRL) (Ganin et al., 2016) between feature extractors and domain classifier. GRL can be viewed as an activation function layer with no parameters, which identically transfers the input during the forward pass but reverses gradients (multiplied by ) during the back propagation. The adoption of GRL enables a synchronous optimization of Eq.(7), thus DANA can be trained easier and faster. The overall architecture and algorithm of our proposed model are depicted in Fig.2 and Algorithm 1, respectively.

Figure 3. Unfolded structure for directed networks

3.2. DANA for Directed Networks

There exist many networks deliberately defined as the directed graph. For example, Twitter created a directed graph of followers because the interactions in Twitter are generally one-way. Stemmed from the spectral graph theory, the conventional GCN requires a symmetric adjacency matrix to obtain the low-dimensional representation, which makes our model limited to dealing with the undirected graph. To address directed networks, existing research simply relaxes the strict constraint on the symmetric adjacency matrix in GCNs, and explains the convolutional kernel from a spatial perspective (Schlichtkrull et al., 2018). However, it suffers an inadequate characterization of the directed edges in networks, which is important for obtaining accurate representations of the associated vertices. In pursuit of better representations, we elaboratively characterize each vertex from two perspectives, which performs the convolution according to its in-degree and out-degree distributions, respectively.

Given an adjacency matrix of a directed network, and randomly initialized and , the hidden representation of and in the -th layer can be obtained as follows:

(8a)
(8b)

where , = , and , . Eq.(8a) focuses on the convolution operations on vertices’ out-going neighbours, and Eq.(8b) focuses on the convolution operations on vertices’ in-going neighbours. At length, each GCN outputs two low-dimensional representations for each vertex, i.e. and . The computation and dataflow through the unfolded structure are also depicted in Fig.3. Then, and of each vertex are concatenated to perform the alignment.

(a) DBLP
(b) Foursquare-Twitter
(c) Douban-Weibo
Figure 4. Detailed performance comparison on real-world datasets.

3.3. Weight-sharing Between GCNs

An ideal representation learning for alignment task is to obtain a low-dimensional subspace in which the two vertices of an anchor pair are close to each other. Thus the candidates of a vertex can be obtained based on a “distance” between the two vectors. Drawing the subspaces close to each other is usually supervised by forcing the vertices of an anchor pair to share the same representation.

In this paper, we further reinforce the closeness between subspaces by sharing weights across the two GCNs i.e. enforcing . Additionally, such weight-sharing reduces the number of parameters and simplifies our model so that it is more favorable to model training.

4. Experiments

In this section, we present the experimental evaluations of our proposed models and the competing baselines over three real-world datasets.

4.1. Metrics, Datasets and Comparative Models

4.1.1. Metrics

We evaluate the performance of our proposed models and competing baselines using a metric of Hits@k:

where means the number of hits in test set given the top-k candidates in network for each vertex from network

. In our models, the Cosine similarity is adopted as the scoring criteria to obtain the top-k candidate list. For the baselines, the candidate lists are obtained following the scoring criteria suggested in their papers. In addition to hits@k, we also adopted the Mean Reciprocal Rank (MRR)

(Radev et al., 2002) to evaluate the models. Similar to the definition of , MRR in this paper is an average value of bi-directional counts.

4.1.2. Datasets

We employ three real-world cross network data sets, the statistics of which are tabulated in Table 1. For the DBLP (Tang et al., 2008)

dataset, authors are split into two different co-author networks (Data Mining and Machine Learning) by filtering publication venues of their papers. The ground truth anchors of this dataset are the authors who published papers in both areas. Note that the co-author relationships are non-directional in DBLP. In contrast, the other two datasets

(Zhang and Philip, 2015)(Cao and Yu, 2016) are constructed from the directed social networks. The ground truth of the anchor users is obtained based on the fact that some users provide their unified accounts across social networks.

Dataset Network(#Nodes, #Edges) #Anchors
DBLP Data Mining (11526, 28565) 1295
Machine Learning (12311, 26162)
Fq.-Tw. Foursquare (5313, 76972) 1611
Twitter (5120, 164920)
Db.-Wb. Douban (10103, 527980) 4752
Weibo (9576, 270780)
Table 1. Statistics of the datasets used for evaluation

4.1.3. Comparative Models

Our proposed model DANA with its variants and the state-of-the-art baseline methods for comparison are listed as following:

  • MAH (Tan et al., 2014): A hypergraph-based manifold matching approach for network alignment, where the hyperedges model the high-order relations in social networks.

  • ULink (Mu et al., 2016): An approach for multi-platform user identity linkage predication in which Latent User Space was proposed and utilized. The constrained concave-convex procedure is also adopted for the model inference.

  • IONE (Liu et al., 2016): The state-of-the-art approach for network alignment which incorporates the learning of the second-order proximity preserving embeddings and the network alignment in a unified framework.

  • PALE-LINE (Man et al., 2016): An embedding-based approach where the embeddings of individual networks are learned using LINE (Tang et al., 2015), and an MLP is used for learning the project function between the low-dimensional subspaces of networks.

  • PALE-Deepwalk (Man et al., 2016): A variant of PALE-LINE, in which DeepWalk (Perozzi et al., 2014) is adopted for learning individual network embeddings. The projection function learning is the same as that of PALE-LINE.

  • SNNA (Li et al., 2019): An adversarial approach to network alignment where the low-dimensional subspaces of networks are obtained by using existing network embedding approaches. The generator is then designed to learn a projection function from one subspace to another, and the discriminator is to estimate the wasserstein distance between the projected source distribution and the target distribution.

  • DANA: The vanilla version of our proposed framework in this paper.

  • DANA-S: A variation of DANA where the Suffix “-S” of the name indicates an incorporation with weight-sharing adopted in the model.

  • DANA-SD : A variation of DANA where “D” further indicates an incorporation of the direction-aware structure on top of DANA-S.

  • DNA : refers to a variation of DANA where the domain adversarial component (Gradient reversal layer and domain classifier) is removed.

Dataset Metric MAH PALE-LINE PALE-DW IONE Ulink SNNA DNA DANA DANA-S DANA-SD
DBLP Hits@1 0.0695 0.0277 0.0772 0.0560 0.0116 0.0096 0.2104 0.2182 0.2201 0.2297
Imp(%) 230.50 729.24 197.54 310.18 1880.17 2292.71 9.17 5.27 4.36
MRR 0.1108 0.0422 0.1710 0.1414 0.0503 0.0312 0.2739 0.2830 0.2838 0.2895
Imp(%) 161.28 586.02 69.30 104.74 475.55 827.88 5.70 2.30 2.01
Fq.-Tw. Hits@1 0.0062 0.0093 0.0464 0.1409 0.0495 0.0372 0.1207 0.1486 0.1548 0.1842
Imp(%) 2870.97 1880.65 296.98 30.73 272.12 395.16 52.61 23.96 18.99
MRR 0.0176 0.0164 0.0928 0.2132 0.1479 0.0550 0.2017 0.2258 0.2391 0.2579
Imp(%) 1365.34 1472.56 177.91 20.97 74.37 368.91 27.86 14.22 7.86
Db.-Wb. Hits@1 0.0032 0.0126 0.0358 0.0794 0.0074 0.0042 0.0847 0.1420 0.1772 0.1930
Imp(%) 5931.25 1431.75 439.11 143.07 2508.11 4495.24 127.86 35.92 8.92
MRR 0.0081 0.0317 0.0822 0.1224 0.0301 0.0300 0.1598 0.2144 0.2228 0.2608
Imp(%) 3119.75 722.71 217.27 113.07 766.45 769.33 63.20 21.64 17.06
Table 2. Hits@1 and MRR comparison on real-world datasets.

In our experiments, for DANA and its variants, we use 2-layer GCNs for feature extractor and a 2-layer MLP for domain classifier. The batch size of vertices for domain-adversarial training is set to 512 and the batch size of anchors seeds is set as the size of the training set. The parameters are optimized using Adam optimizer with a learning rate of 0.001, a weighting factor , and for regularization. The state-of-the-art approaches, including MAH (Tan et al., 2014), ULink (Mu et al., 2016), IONE (Liu et al., 2016), PALE-LINE, PALE-Deepwalk (Man et al., 2016), and SNNA (Li et al., 2019), are evaluated as the competing baselines. They are trained based on the settings recommended in the published papers or the distributed open source code until convergence.

4.2. Experimental Results

4.2.1. Overall Alignment Performance.

Figure 5. Hits@50 vs. Dimension on Foursquare-Twitter.
Figure 6. Hits@50 vs. Training ratio on Foursquare-Twitter.

In this section, we compare the performance of DANA with its variations and other baselines on three real-world datasets. We set 80% of the anchors as the training set and the rest as the test set. The dimension of the embedding is unanimously set to 100 for all models. Note that is set to 50 in DANA-SD as the embedding is the concatenation of two vertex representations and . We tabulate Hits@1, MRR and DANA-SD’s improvement over all comparative approaches in Table 2. And the experimental results of Hits@k () are presented in Fig.4.

From Fig.4 and Table 2, we can observe that:

  1. DANA and its variants significantly outperform most baselines, under different @K settings for all datasets. It demonstrates the efficacy of the proposed DANA framework. In particular, DANAs improve Hits@1 by 190+%, 30+% and 140+% respectively over the most competitive baseline on DBLP, Foursquare-Twitter and Douban-Weibo. When becomes larger, DANAs can still achieve more than 15+% performance improvement. In general, the improvement becomes more significant when k is smaller.

  2. The unified frameworks, e.g., IONE, achieve much higher accuracy than the two-phase methods, e.g, PALE-LINE and PALE-Deepwalk. Because the embedding process (first-phase) in two-phase framework is independent of the objective of the alignment task, which would result in unsuitable representations for the transformation process in the second-phase. Besides, the two-phase alignment method is also sensitive to the adopted embedding approach (e.g., Deepwalk performs better than LINE in PALE framework).

  3. Both ULink and SNNA do not perform well with only the structural information, as they heavily rely on the initialization of the embedding. In particular, better performances of ULink and SNNA usually come with the initialization using the privilege information, e.g., attributes. Whereas, benefiting from the adopted GCNs, DANA and its variants are robust to the initialization.

  4. The matrix factorization-based approach MAH performs worst because matrix-factorization is kind of linear method which is usually inferior to the non-linear embedding method used in our framework. Further, MAH is hard to scale up for large-scale problems due to the matrix inversion involved. For Foursquare-Twitter dataset, MAH requires the representation with over 800 dimensions to reach convergence (Liu et al., 2016), which further validates the efficiency of the embedding-based approaches.

Compared with DANA and its variants, DNA (DANA without the adversarial learning module) achieves lower accuracy. It demonstrates the effectiveness of the domain adversarial learning w.r.t. the network alignment task. Benefiting from the introduced weight-sharing structure, DANA-S performs better than the vanilla DANA. DANA-SD outperforms all the baselines which validates the importance of the incorporation of direction-aware structure. Note that DANA-SD also achieves a performance enhancement on the undirected network DLBP, we believe it’s due to the larger parameter set (an adoption of ). The superiority of DANA-SD becomes more obvious for larger directed networks, i.e. Douban-Weibo dataset. We also investigate the importance of directional edges to the entire network via analyzing network structures. It turns out that the number of connected components and that of strongly connected components in Foursquare-Twitter differ significantly compared with Douban-Weibo dataset. It indicates the direction information play a rather important role in the Foursquare-Twitter dataset. Thus, Foursquare-Twitter dataset may be beneficial to the LINE-based model IONE which joints three sets of vectors from different views for directed network alignment (Liu et al., 2016). In comparison, DANA-SD employs two sets of vectors to capture the directions, but still improves Hits@k by 10%+ over IONE.

Fig.6 and Fig.6 show the outperformance of DANA-SD on the Foursquare-Twitter dataset, given different dimension settings as well as different training-to-test ratios. Fig.6 also indicates that, in a weakly-supervised manner, our proposed models can still achieve robust and obvious outperformance.

To sum up, we have DANA-SDDANA-SDANADNA in terms of alignment accuracy, which is consistent with our motivation in this paper.

Regarding the efficiency, DANA

and its variants take few minutes (within 500 epochs) to reach convergence, which is much faster compared with other baselines. That is because: (1) GCNs is an efficient feature extractor. (2) the gradient reversal layer enables synchronous learning of Eq.(

7).

(a) DBLP
(b) Foursquare-Twitter
Figure 7. Sensitivity analysis of parameter

4.2.2. Parameter Sensitivity Analysis

To analyze the effects of the hyperparameters in DANAs which are the number of layers in GCNs and the weighting factor , we conduct the experiments of DANAs with different L-layers GCN and different values of .

In Fig.7, we vary the number of the layers (from 1 to 7) in GCNs, as well as fixing all other parameters. And we observe that DANAs achieve the best performance with the 2-layers GCNs. When , the deeper layers GCNs have, the worse the performance. The observation is consistent with the general acknowledgement that two-layers usually are the best setting for the conventional GCNs (Li et al., 2018). That is because the graph convolution of the GCN model can be viewed as a special form of Laplacian smoothing over the features of a vertex and its nearby neighbors. However, the operation also results in an over-smoothing when involved with many convolutional layers, leading the output features of vertices less distinguishable and an inferior alignment performance.

(a) Foursquare-Twitter
(b) Douban-Weibo
Figure 8. Sensitivity analysis of parameter

Fig.8 presents the effect of the weighting factor when varying its values in and fixing all other parameters. The alignment performances on both Foursquare-Twitter and Douban-Weibo datasets appear an obvious increasing tend with the increase of , which demonstrates that the domain-adversarial learning module in DANAs plays a positive role for the alignment task.

Dataset Metric(%) GCN GCN-D Improve
Foursquare mAP 10.947 12.267 12.06%
R@3 8.928 10.287 15.22%
R@5 13.956 15.862 13.66%
R@10 20.367 23.400 14.89%
Twitter mAP 8.651 9.079 4.95%
R@3 5.175 5.769 11.48%
R@5 8.223 9.314 13.27%
R@10 13.556 14.979 10.50%
Table 3. Link Prediction Performance on Foursquare-Twitter

4.2.3. Probabilistic Design Effect

To verify the effectiveness of our unconventional design in objective function for the alignment task, we compare MAP-based models and MSE-based models on three datasets. MAP denotes the Maximum Posterior Probability and the objective function is designed as Eq.(3) in this paper. MSE denotes Minimize mean Square Error which is adopted in most of the existing distance-based approaches. In our experiments, the objective function of MSE-based alignment models is given as:

(9)

where and are the negative samples. For each anchor pair, we randomly sample negative samples from network and network respectively. We further adapt DNA and the distance-based model SNNA by replacing their objective functions with Eq.(9) and Eq.(3) respectively to obtain four models for comparison, namely, (MAP-based) DNA, MSE-based DNA, (MSE-based) SNNA and MAP-based SNNA.

Fig.9(a) and Fig.9(b) show the performance of MAP-based DNA and MSE-based DNA on three datasets. We see that DNA lost 4.77-9.94% MRR accuracy for the alignment when its objective function is replaced by Eq.(9). Fig.9(c) and Fig.9(d) show the similar observation. MAP-based SNNA performs better than MSE-based SNNA on all three dataset, which illustrates the strength of our MAP-based design by viewing the alignment as a bi-directional matching problem. Note that the alignment performance of MAP-based SNNA is still much lower than that of our proposed DANAs. One of the reasons is that the features of SNNA learned from the network embedding may include domain-dependent signals, which cannot be eliminated in its adversarial procedure of learning the projection function between two networks. Thus, SNNA cannot avoid domain representation bias which yields an inferior alignment performance.

(a) Hits@1 of DNA
(b) MRR of DNA
(c) Hits@1 of SNNA
(d) MRR of SNNA
Figure 9. Objective analysis of alignment task.

4.2.4. Directed Convolution Effect

Recall that we propose to modify the graph convolutional network in this paper to adapt our alignment model to directed networks (See Sec.3.2). To verify the effect of the directed convolution structure, we compare GCN and GCN-D (”-D” indicates an incorporation of the direction-aware convolution structure) on link prediction task within a single network, where the objective function is formulated to preserve the structural proximity (Tang et al., 2015):

where denotes a negative edge randomly drawn from the noise distribution and is the number of negative edges for each observed edges .

We split 90% edges from the network for the training process. Table 3 reports the test performances of link prediction on Foursquare network and Twitter network with respect to the metrics Mean Average Precision (mAP) and Recall@k (R@k) (Wikipedia contributors, 2019). As we expected, the performance of GCN-D all significantly improve over the conventional GCN. It implies that intentionally capturing the directions in GCNs is beneficial to the representation learning of directed networks, and in turn beneficial to the alignment of directed networks.

4.3. Case Study: Domain-invariant Embedding

(a) DNA-S
(b) DANA-S
Figure 10. Hidden neuron visualization on the toy twinning-networks

To better illustrate the characteristic of our proposed model, we introduce a case study in Fig.10 to visualize the behavior of the domain adversarial training. A twinning-networks ( and ) is constructed as follows: We adopt the well known Zachary’s Karate network (Zachary, 1977) as , where the 2D embedding (coordinates) of vertices (shown as circles) are obtained via large graph layout following (Adai et al., 2004). (2) The nodes in (presented as triangles) are generated with the mirror opposite of each node in along the y-axis. (3) The edges of are generated exactly the same as that of . (4) Each node in along with its corresponding node in are considered as an anchor in the twinning-networks.

Taking 50% of anchors as the training set and initializing and with the coordinates, we perform DANA-S and DNA-S for the alignment task with and , where the network embedding module are instantiated with 1-layer GCNs. Let / denote the points correctly classified by the domain classifier and / denote the missed shot. Note that DANA-S, integrated with domain-adversarial learning, is in pursuit of the domain invariant features, which may be not good for the domain classifier (See Fig.10(b), all nodes are classified to one domain). While the features learned with DNA-S are domain dependent, leading to an inferior performance for the alignment task.

We visualize the weight W of the hidden neurons in the 1-layer GCNs in Fig.10 following (Ganin et al., 2016), where , . Note that the neurons visualization consists of ten lines with each line corresponding to the -th neuron of the hidden layer, . We can observe that:

  1. Most neurons of DNA-S gather around and parallel to y-axis, tending to capture the discriminative feature for domain classification, since the twin-networks is y-axis symmetric.

  2. DANA-S gives a richer representation, that is, the ten lines of neurons visualization are widely dispersed.

  3. The dominant pattern in the neurons visualization of DNA-S, i.e., the lines parallel to y-axis, vanishes in that of DANA-S, bringing a better performance for the alignment task.

5. Conclusion

With a conjecture that domain-dependent features hinder the network alignment performance, we propose a representation learning-based domain-adversarial framework (DANA) to perform network alignment, by obtaining domain-invariant representations, and develop its adaptions for specific tasks, i.e. (directed social network alignment). Comprehensive empirical studies on three popular real-world datasets show that DANA can significantly improve the performance for social network alignment tasks in comparison with existing solutions. Unlike most existing approaches which formulate the alignment task as the mapping problem between networks, Our paper triggers the discussion on the importance of feature extraction toward alignment tasks. And the proposed network alignment framework opens a new door to other tasks, e.g., cross-lingual knowledge graph task.

References

  • A. T. Adai, S. V. Date, S. Wieland, and E. M. Marcotte (2004) LGL: creating a map of protein function with an algorithm for visualizing very large biological networks. Journal of molecular biology 340 (1), pp. 179–190. Cited by: §4.3.
  • X. Cao and Y. Yu (2016) Asnets: a benchmark dataset of aligned social networks for cross-platform user modeling. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1881–1884. Cited by: §4.1.2.
  • Q. Dai, Q. Li, J. Tang, and D. Wang (2018) Adversarial network embedding. In

    Thirty-Second AAAI Conference on Artificial Intelligence

    ,
    Cited by: §2.2.
  • M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §1.
  • Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky (2016) Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17 (1), pp. 2096–2030. Cited by: §1, §2.2, §3.1.2, §4.3.
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §2.2.
  • S. Jean, K. Cho, R. Memisevic, and Y. Bengio (2014)

    On using very large target vocabulary for neural machine translation

    .
    arXiv preprint arXiv:1412.2007. Cited by: §3.1.1.
  • T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1, §3.1.1.
  • C. Li, Y. Wang, S. Wang, Y. Liu, P. Yu, Z. Li, and Y. Liang (2019) Adversarial learning for weakly-supervised social network alignment. In Thirty-Third AAAI Conference on Artificial Intelligence, Cited by: §1, §1, §2.2, 6th item, §4.1.3.
  • Q. Li, Z. Han, and X. Wu (2018) Deeper insights into graph convolutional networks for semi-supervised learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp. 3538–3545. External Links: Link Cited by: §4.2.2.
  • L. Liu, W. K. Cheung, X. Li, and L. Liao (2016) Aligning users across social networks using network embedding.. In International Joint Conference on Artificial Intelligence, pp. 1774–1780. Cited by: §1, §1, §2.1, 3rd item, item 4, §4.1.3, §4.2.1.
  • T. Man, H. Shen, S. Liu, X. Jin, and X. Cheng (2016) Predict anchor links across social networks via an embedding approach.. In International Joint Conference on Artificial Intelligence, Vol. 16, pp. 1823–1829. Cited by: §2.1, 4th item, 5th item, §4.1.3.
  • X. Mu, F. Zhu, E. Lim, J. Xiao, J. Wang, and Z. Zhou (2016) User identity linkage by latent user space modelling. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1775–1784. Cited by: §2.1, 2nd item, §4.1.3.
  • S. J. Pan, X. Ni, J. Sun, Q. Yang, and Z. Chen (2010) Cross-domain sentiment classification via spectral feature alignment. In The 19th International World Wide Web Conference, pp. 751–760. Cited by: §1.
  • B. Perozzi, R. Al-Rfou, and S. Skiena (2014) DeepWalk: online learning of social representations. In Acm Sigkdd International Conference on Knowledge Discovery & Data Mining, Cited by: §2.1, 5th item.
  • D. R. Radev, H. Qi, H. Wu, and W. Fan (2002) Evaluating web-based question answering systems.. In Proceedings of the Third International Conference on Language Resources and Evaluation, LREC 2002, May 29-31, 2002, Las Palmas, Canary Islands, Spain, Cited by: §4.1.1.
  • M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling (2018) Modeling relational data with graph convolutional networks. In European Semantic Web Conference, pp. 593–607. Cited by: §3.2.
  • S. Tan, Z. Guan, D. Cai, X. Qin, J. Bu, and C. Chen (2014) Mapping users across networks by manifold alignment on hypergraph.. In Twenty-Eighth AAAI Conference on Artificial Intelligence, Vol. 14, pp. 159–165. Cited by: §2.1, 1st item, §4.1.3.
  • J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei (2015) Line: large-scale information network embedding. In Proceedings of the 24th international conference on world wide web, pp. 1067–1077. Cited by: §1, §2.1, 4th item, §4.2.4.
  • J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su (2008) Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 990–998. Cited by: §4.1.2.
  • H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xie, and M. Guo (2018) Graphgan: graph representation learning with generative adversarial nets. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §2.2.
  • J. Wang, L. Yu, W. Zhang, Y. Gong, Y. Xu, B. Wang, P. Zhang, and D. Zhang (2017) Irgan: a minimax game for unifying generative and discriminative information retrieval models. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 515–524. Cited by: §2.2.
  • K. Weiss, T. M. Khoshgoftaar, and D. Wang (2016)

    A survey of transfer learning

    .
    Journal of Big Data 3 (1), pp. 9. Cited by: §1.
  • Wikipedia contributors (2019) Evaluation measures (information retrieval) — Wikipedia, the free encyclopedia. Note: https://en.wikipedia.org/w/index.php?title=Evaluation_measures_(information_retrieval)&oldid=889157178[Online; accessed 21-May-2019] Cited by: §4.2.4.
  • Q. Xie, Z. Dai, Y. Du, E. Hovy, and G. Neubig (2017) Controllable invariance through adversarial feature learning. In Advances in Neural Information Processing Systems, pp. 585–596. Cited by: §1, §2.2.
  • W. W. Zachary (1977) An information flow model for conflict and fission in small groups. Journal of anthropological research 33 (4), pp. 452–473. Cited by: §4.3.
  • J. Zhang and S. Y. Philip (2015) Integrated anchor and social link predictions across social networks.. In International Joint Conference on Artificial Intelligence, pp. 2125–2132. Cited by: §4.1.2.
  • Y. Zhang, Z. Gan, K. Fan, Z. Chen, R. Henao, D. Shen, and L. Carin (2017)

    Adversarial feature matching for text generation

    .
    arXiv preprint arXiv:1706.03850. Cited by: §2.2.