1. Introduction
In realworld applications, objects can be associated with different types of relations. These objects and their relationships can be naturally represented by multiview networks, which are also known as multiplex networks or multiview graphs (Kumar and Daumé, 2011; Kumar et al., 2011; Liu et al., 2013; Zhou and Burges, 2007; Sindhwani and Niyogi, 2005; Zhang et al., 2008; Hu et al., 2005; Pei et al., 2005; Zeng et al., 2006; Frank and Nowicki, 1993; Pattison and Wasserman, 1999). As shown in Figure 0(a), a multiview network consists of multiple network views, where each view corresponds to a type of edge, and all views share the same set of nodes. In ecology, a multiview network can be used to represent the relation among species, where each node stands for a species, and the six views represent predation, competition, symbiosis, parasitism, protocooperation, and commensalism, respectively. On social networking services, a fourview network upon users can be used to describe the widely seen social relationship and interaction including friendship, following, message exchange, and post viewing. With such vast availability of multiview networks, one could be interested in extracting knowledge or business value from data. In order to achieve this goal with the progressively developed computing power, it is of interest to first transform the multiview networks into a different form of representations that are more machine actionable.
Network embedding has emerged as a scalable representation learning method that generates distributed node representations for networked data (Grover and Leskovec, 2016; Perozzi et al., 2014; Tang et al., 2015; Wang et al., 2016)
. Specifically, network embedding projects networks into embedding spaces, where nodes are represented by embedding vectors. With the semantic information of each node encoded, these vectors can be directly used as node features in various downstream applications
(Grover and Leskovec, 2016; Perozzi et al., 2014; Tang et al., 2015). Motivated by the success of network embedding in representing homogeneous networks (Grover and Leskovec, 2016; Perozzi et al., 2014; Tang et al., 2015; Wang et al., 2016; Perozzi et al., 2017; Ou et al., 2016), where nodes and edges are untyped, we believe it is important to study the problem of embedding multiview networks.To design embedding algorithms for multiview networks, the major challenge lies in how to make use of the type information on edges from different views. As a result, we are interested in investigating into the following two problems:

With the availability of multiple edge types, what are the characteristics that are specific and important to multiview network embedding?

Can we achieve better embedding quality by modeling these characteristics jointly?
To answer the first problem, we identify two characteristics, preservation and collaboration, from our practice of embedding realworld multiview networks. We describe the concepts of preservation and collaboration as follows. Collaboration – In some datasets, edges between the same pair of nodes may be observed in different views due to shared latent reasons. For instance, in a social network, if we observe an edge between a user pair in either the message exchange view or the post viewing view, likely these two users are happy to be associated with each other. In such scenario, these views may complement each other, and embedding them jointly may potentially yield better results than embedding them independently. We call such synergetic effect in jointly embedding multiple views by collaboration. The feasibility of enjoying this synergetic effect is also the main intuition behind most existing multiview network algorithms (Kumar and Daumé, 2011; Kumar et al., 2011; Liu et al., 2013; Zhou and Burges, 2007; Sindhwani and Niyogi, 2005; Zhang et al., 2008; Hu et al., 2005; Pei et al., 2005; Zeng et al., 2006). Preservation – On the other hand, it is possible for different network views to have different semantic meanings; it is also possible that a portion of nodes have completely disagreeing edges in different views since edges in different views are formed due to distinct latent reasons. For example, professional relationship may not always align well with friendship. If we embed the profession view and the friendship view in Figure 0(b) into the same embedding space, the embedding of Gary will be close to both Tilde and Elton. As a result, the embedding of Tilde will also not be too distant from Elton due to transitivity. However, this is not a desirable result, because Tilde and Elton are not closely related in terms of either profession or friendship according to the original multiview network. In other words, embedding in this way fails to preserve the unique information carried by different network views. We refer to such need for preserving unique information carried by different views as preservation. The detailed discussion of the presence and importance of preservation and collaboration is presented in Section 4.
Furthermore, it is also possible for preservation and collaboration to coexist in the same multiview network. Two scenarios can result in this situation: (i) a pair of views are generated from very similar latent reason, while another pair of views carries completely different semantic meanings; and more subtly (ii) for the same pair of views, one portion of nodes have consistent edges in different views, while another portion of nodes have totally disagreeing edges in different views. One example of the latter scenario is that professional relationship does not align well with friendship in some cultures, whereas coworkers often become friends in certain other cultures (Alston, 1989). Therefore, we are also interested in exploring the feasibility of achieving better embedding quality by modeling preservation and collaboration simultaneously, and we address this problem in Section 5 and beyond.
We summarize our contributions as follows. (i) We propose to study the characteristics that are specific and important to multiview network embedding, and identify preservation and collaboration as two such characteristics from the practice of embedding realworld multiview networks. (ii) We explore the feasibility of attaining better embedding by simultaneously modeling preservation and collaboration, and propose two multiview network embedding methods – mvn2veccon and mvn2vecreg. (iii) We conduct experiments with various downstream applications on a series of synthetic datasets and three realworld multiview networks, including an internal dataset sampled from the Snapchat social network. These experiments corroborate the presence and importance of preservation and collaboration, and demonstrate the effectiveness of the proposed methods.
2. Related Work
Network embedding has recently emerged as an efficient and effective approach for learning distributed node representations. Instead of leveraging spectral properties of networks as commonly seen in traditional unsupervised feature learning approaches (Belkin and Niyogi, 2001; Roweis and Saul, 2000; Tenenbaum et al., 2000; Yan et al., 2007), most network embedding methods are designed atop local properties of networks that involve links and proximity among nodes (Grover and Leskovec, 2016; Perozzi et al., 2014; Tang et al., 2015; Wang et al., 2016; Perozzi et al., 2017; Ou et al., 2016). Such methodology with focus on local properties has been shown to be more scalable. The designs of many recent network embedding algorithms trace to the skipgram model (Mikolov et al., 2013)
that aims to learn distributed representation for words in natural language processing, under the assumption that words with similar context should have similar embedding. To fit in the skipgram model, various strategies have been proposed to define the context of a node in the network scenario
(Grover and Leskovec, 2016; Perozzi et al., 2014; Tang et al., 2015; Perozzi et al., 2017). Beyond the skipgram model, embedding methods for preserving certain other network properties can also be found in the literature (Wang et al., 2016; Ou et al., 2016).Meanwhile, multiview networks have been extensively studied as a special type of networks, motivated by their ubiquitous presence in realworld applications. However, most existing methods for multiview networks aim to bring performance boost in traditional tasks, such as clustering (Kumar and Daumé, 2011; Kumar et al., 2011; Liu et al., 2013; Zhou and Burges, 2007), classification (Sindhwani and Niyogi, 2005; Zhang et al., 2008), and dense subgraph mining (Hu et al., 2005; Pei et al., 2005; Zeng et al., 2006). The above methods aim to improve the performance of specific applications, but do not directly study distributed representation learning for multiview networks. Another line of research on multiview networks focuses on analyzing interrelations among different views, such as revealing such interrelations via correlation between link existence and network statistics (Frank and Nowicki, 1993; Pattison and Wasserman, 1999). These works do not directly address how such interrelations can impact the embedding learning of multiview networks.
Meng et al. (Qu et al., 2017) recently propose to embed multiview networks for a given task by linearly combining the embeddings learned from different network views. This work studies a problem different from ours since supervision is required for their framework, while we focus on the unsupervised scenario. Also, their work attends to weighing different views according to their informativeness in a specific task, while we aim at identifying and leveraging the principles when extending a network embedding method from the homogeneous scenario to the multiview scenario. Moreover, their work does not model preservation, one of the characteristics that we deem important for multiview network embedding, because their final embedding derived via linear combination is a tradeoff between representations from all views. Another group of related studies focus on the problem of jointly modeling multiple network views using latent space models (Gollini and Murphy, 2014; Greene and Cunningham, 2013; SalterTownshend and McCormick, 2013). These work again does not model preservation.
3. Preliminaries
Definition 3.1 (MultiView Network).
A multiview network is a network consisting of a set of nodes and a set of views, where consists of all edges in view . If a multiview network is weighted, then there exists a weight mapping such that is the weight of the edge , which joints nodes and in view .
Additionally, when context is clear, we use the network view of multiview network to denote the untyped network .
Definition 3.2 (Network Embedding).
Network embedding aims at learning a (center) embedding for each node in a network, where is the dimension of the embedding space.
Besides the center embedding , a family of popular algorithms (Mikolov et al., 2013; Tang et al., 2015) also deploy a context embedding for each node . Moreover, when the learned embedding is used as the feature vector for downstream applications, we take the center embedding of each node as feature following the common practice in algorithms involving context embedding.
Symbol  Definition 

The set of all network views  
The set of all nodes  
The set of all edges in view  
The list of random walk pairs from view  
The final embedding of node  
The center embedding of node w.r.t. view  
The context embedding of node w.r.t. view  
The hyperparameter on parameter sharing in mvn2veccon 

The hyperparameter on regularization in mvn2vecreg  
The dimension of the embedding space 
4. Preservation and Collaboration in MultiView Network Embedding
In this section, we elaborate on the intuition and presence of preservation and collaboration – the two characteristics that we have introduced in Section 1 and deem important for multiview network embedding. In particular, we first describe and investigate the motivating phenomena that are observed in the practice of embedding realworld multiview networks. Then, we discuss how they can be explained by the two proposed characteristics.
Two straightforward approaches for embedding multiview networks. Most existing network embedding methods (Grover and Leskovec, 2016; Perozzi et al., 2014; Tang et al., 2015; Wang et al., 2016; Perozzi et al., 2017; Ou et al., 2016) are designed for homogeneous networks, where nodes and edges are untyped, while we are interested in studying the problem of embedding multiview networks. To extend any untyped network embedding algorithm to multiview networks, two straightforward yet practical approaches exist. We refer to these two approaches as the independent model and the onespace model.
Using any untyped network embedding method, we denote the (center) embedding of node achieved by embedding only the view of the multiview network, where is the dimension of the embedding space for network view . With such notation, the independent model and the onespace model are given as follows.

Independent. Embed each view independently, and then concatenate to derive the final embedding . That is,
(1) where , and represents concatenation. In other words, the embedding of each node in the independent model resides in the direct sum of multiple embedding spaces. This approach preserves the information embodied in each view, but do not allow collaboration across different views in the embedding learning process.

Onespace. Let the embedding for different views to share parameters when learning the final embedding . That is,
(2) where for all . In other words, each dimension of the final embedding space correlates with all views of the concerned multiview network. This approach enables different views to collaborate in learning a unified embedding, but do not preserve information specifically carried by each view. This property of the onespace model is corroborated by experiment presented in Section 6.5.
In either of the above two approaches, the same treatment to the center embedding is applied to the context embedding when applicable. It is also worth noting that the embedding learned by the onespace model cannot be obtained by linearly combining in the independent model. This is because most network embedding models are nonlinear models.
Dataset  Metric  independent  onespace 

YouTube  ROCAUC  0.931  0.914 
AUPRC  0.745  0.702  
ROCAUC  0.724  0.737  
AUPRC  0.447  0.466 
Embedding realword multiview networks by straightforward approaches. In this paper, independent and onespace are implemented on top of a random walk plus skipgram approach as widely seen in the literature (Grover and Leskovec, 2016; Perozzi et al., 2014, 2017). The experiment setup and results are concisely introduced at this point, while detailed description of algorithm, datasets, and more comprehensive experiment results are deferred to Section 5 and 6. Two networks, YouTube and Twitter, are used in these exploratory experiments with users being nodes on each network. YouTube has three views representing common videos (cmnvid), common subscribers (cmnsub), and common friends (cmnfnd) shared by each pair of users, while Twitter has two views corresponding to replying (reply) and mentioning (mention) among users. The downstream evaluation task is to infer whether two users are friends, and the results are presented in Table 2.
It can be seen that the independent model consistently outperformed the onespace model in the YouTube experiment, while the onespace model outperformed the independent model in Twitter. These exploratory experiments make it clear that neither of the two straightforward approaches is categorically superior to the other. Furthermore, we interpret the varied performance of the two approaches by the varied extent of needs for modeling preservation and modeling collaboration when embedding different networks. Specifically, recall that the independent model only captures preservation, while onespace only captures collaboration. As a result, we speculate if a certain dataset craves for more preservation than collaboration, the independent model would outperform the onespace model, otherwise, the onespace model would win.
In order to corroborate our interpretation of the results, we further examine the involved datasets, and look into the agreement between information carried by different network views. We achieve this by a Jaccard coefficient–based measurement, where the Jaccard coefficient is a similarity measure with range , defined as for set and set . Given a pair of views in a multiview network, a node can be connected to a different set of neighbors in each of the two network views. The Jaccard coefficient between these two sets of neighbors can then be calculated. In Figure 2, we apply this measurement on the YouTube dataset and the Twitter data, respectively, and illustrate the proportion of nodes with the Jaccard coefficient greater than for each pair of views.
As presented in Figure 2, little agreement exists between each pair of different views in YouTube. As a result, it is not surprising that collaboration among different views is not as needed as preservation in the embedding learning process. On the other hand, a substantial portion of nodes have Jaccard coefficient greater than over different views in the Twitter dataset. It is therefore also not surprising to see modeling collaboration brings about more benefits than modeling preservation in this case.
5. The mvn2vec Models
In the previous section, preservation and collaboration are identified as important characteristics for multiview network embedding. In the extreme cases, where only preservation is needed – each view carries a distinct semantic meaning – or only collaboration is needed – all views carry the same semantic meaning – it is advisable to choose between independent and onespace to embed a multiview network. However, it is of interest to study the also likely scenario where both preservation and collaboration coexist in given multiview networks. Therefore, we are motivated to explore the feasibility of achieving better embedding by simultaneously modeling both characteristics. To this end, we propose and experiment with two approaches that capture both characteristics, without overcomplicating the model or requiring additional supervision. These two approaches are named mvn2veccon and mvn2vecreg, where mvn2vec is short for multiview network to vector, while con and reg stand for constrained and regularized, respectively.
As with the notation convention in Section 4, we denote and the center and context embedding, respectively, of node for view . Further given the network view , i.e.,
, we use an intraview loss function to measure how well the current embedding can represent the original network view
(3) 
We defer the detailed definition of this loss function (Eq. (3)) to a later point of this section. Moreover, we let for all out of convenience for model design. To further incorporate multiple views with the intention to model both preservation and collaboration, two approaches are proposed as follows.
mvn2veccon. The mvn2veccon model does not enforce further design on the center embedding in the hope of preserving the semantics of each individual view. To reflect collaboration, mvn2veccon includes further constraints on the context embedding for parameter sharing across different views
(4) 
where is a hyperparameter controlling the extend to which model parameters are shared. The greater the value of , the more the model enforces parameter sharing and thereby encouraging more collaboration across different views. This design aims at allowing different views to collaborate by passing information via the shared parameters in the embedding learning process. That is, the mvn2veccon model solves the following optimization problem
(5) 
where is defined in Eq. (4). After model learning, the final embedding for node is given by . We note that in the extreme case when is set to be , the model will be identical to the independent model discussed in Section 4.
mvn2vecreg. In stead of setting hard constraints on how parameters are shared across different views, the mvn2vecreg model regularizes the embedding across different views and solves the following optimization problem
(6) 
where is the  norm, , , and is a hyperparameter. This model captures preservation again by letting and to reside in the embedding subspace specific to view , while each of these subspaces are distorted via crossview regularization to model collaboration. Similar to the mvn2veccon model, the greater the value of the hyperparameter , the more the collaboration is encouraged, and the model is identical to the independent model when .
Intraview loss function. There are many possible approaches to formulate the intraview loss function in Eq. (3). In our framework, we adopt the random walk plus skipgram approach, which is one of the most common methods used in the literature (Grover and Leskovec, 2016; Perozzi et al., 2014, 2017). Specifically, for each view , multiple rounds of random walks are sampled starting from each node in . Along any random walk, a node and a neighboring node constitute one random walk pair, and a list of random walk pairs can thereby be derived. We defer the detailed description on the generation of to a later point in this section. The intraview function is then given by
(7) 
where
(8) 
Model inference. To optimize the objectives in Eq. (5) and (6
), we opt to asynchronous stochastic gradient descent (ASGD)
(Recht et al., 2011) following existing skipgram–based algorithms (Grover and Leskovec, 2016; Perozzi et al., 2014, 2017; Tang et al., 2015; Mikolov et al., 2013). In this regard, from all views are joined and shuffled to form a new list of random walk pairs for all views. Then each step of ASGD draws one random walk pair from , and updates corresponding model parameters with onestep gradient descent.Moreover, due to the existence of partition function in Eq. (8), computing gradients of Eq. (5) and (6) is unaffordable with Eq. (7) being their parts. Negative sampling is hence adopted as in other skipgram–based methods (Grover and Leskovec, 2016; Perozzi et al., 2014, 2017; Tang et al., 2015; Mikolov et al., 2013), which approximates in Eq. (7) by
where
is the sigmoid function,
is the negative sampling rate, is the noise distribution, and is the number of occurrences of node in (Mikolov et al., 2013).With negative sampling, the objective function involving one walk pair drawn from view in mvn2veccon is
On the other hand, the objective function involving from view in mvn2vecreg is
and , . The gradients of the above two objective function used for ASGD are provided in the appendix.
Random walk pair generation. Without additional supervision, we assume equal importance of different network views in learning embedding, and sample the same number of random walks from each view. To determine this number, we denote the number of nodes that are not isolated from the rest of the network in view , , and let , where is a hyperparameter to be specified.
Given a network view , we generate random walk pairs as in existing work (Perozzi et al., 2014; Grover and Leskovec, 2016; Perozzi et al., 2017). Specifically, each random walk is of length , and or random walks are sampled from each nonisolated node in view , yielding a total of random walks. For each node along any random walk, this node and any other node within a window of size constitute a random walk pair that is then added to .
Finally, we summarize both the mvn2veccon algorithm and the mvn2vecreg algorithm in Algorithm 1.
6. Experiments
In this section, we further corroborate the intuition of preservation and collaboration, and demonstrate the feasibility of simultaneously model these two characteristics. We first perform a case study on a series of synthetic multiview networks that have varied extent of preservation and collaboration. Next, we introduce the realworld datasets, baselines, and experiment setting for more comprehensive quantitative evaluations. Lastly, we analyze the evaluation results and provide further discussion.
6.1. Case Study – Varied preservation and collaboration on Synthetic Data
In order to directly study the relative performance of different models on networks with varied extent of preservation and collaboration, we design a series of synthetic multiview networks and experiment on a multiclass classification task.
with varied intrusion probability
, corresponding to different extent of preservation and collaboration.We denote each of these synthetic networks by , where is referred to as intrusion probability. Each has nodes and views – and . Furthermore, each node is associated to one of the class labels – A, B, C, or D – and each class has exactly nodes. We first describe the process for generating before introducing the more general as follows:

Generate one random network over all nodes with label A or B, and another over all nodes with label C or D. Put all edges in these two random networks into view .

Generate one random network over all nodes with label A or C, and another over all nodes with label B or D. Put all edges in these two random networks into view .
To generate each of the four aforementioned random networks, we adopt the preferential attachment process with nodes and edge to attach from a new node to existing nodes, where the preferential attachment process is a widely used method for generating networks with powerlaw degree distribution.
With this design for , view
carries the information that nodes labeled A or B should be classified differently from nodes labeled C or D, while
reflects that nodes labeled A or C are different from nodes labeled B or D. More generally, are generated with the following tweak from : when putting an edge into one of the two views, with probability , the edge is put into the other view instead of the view specified in the generation process.It is worth noting that larger favors more collaboration, while smaller favors more preservation. In the extreme case where , only collaboration is needed in the network embedding process. This is because every edge has equal probability to fall into view or view of , and there is hence no information carried specifically by either view that should be preserved.
On each , independent, onespace, mvn2veccon, and mvn2vecreg
are tested. On top of the embedding learned by each model, we apply logistic regression with cross entropy to carry out the multiclass evaluation tasks. All model parameters are tuned to the best for each model on a validation dataset sampled from the
class labels. Classification accuracy and crossentropy on a different test dataset are reported in Figure 3.From Figure 3, we make three observations. (i) independent performs better than onespace in case is small – when preservation is the dominating characteristic in the network – and onespace performs better than independent in case is large – when collaboration is dominating. (ii) The two proposed mvn2vec models perform better than both independent and onespace except when is close to , which implies it is indeed feasible for mvn2vec to achieve better performance by simultaneously model the two characteristics preservation and collaboration. (iii) When is close to , onespace performs the best. This is expected because no preservation is needed in , and any attempts to additionally model preservation shall not boost, if not impair, the performance.
6.2. Data Description and Evaluation Tasks
We perform quantitative evaluations on three realworld multiview networks: Snapchat, YouTube, and Twitter. The key statistics are summarized in Table 3, and we describe these datasets as follows.
Snapchat. Snapchat is a multimedia social networking service. On the Snapchat multiview social network, each node is a user, and the three views correspond to friendship, chatting, and story viewing^{*}^{*}*https://support.snapchat.com/enUS/a/viewstories. We perform experiments on the subnetwork consisting of all users from Los Angeles. The data used to construct the network are collected from two consecutive weeks in the Spring of 2017. Additional data for downstream evaluation tasks are collected from the following week – henceforth referred to as week 3. We perform a multilabel classification task and a link prediction task on top of the user embedding learned from each network. For classification, we classify whether or not a user views each of the most popular discover channels^{†}^{†}†https://support.snapchat.com/enUS/a/discoverhowto according to the user viewing history in week 3. For each channel, the users who view this channel are labeled positive, and we randomly select
times as many users who do not view this channel as negative examples. These records are then randomly split into training, validation, and test sets. This is a multilabel classification problem that aims at inferring users’ preference on different discover channels and can therefore guide product design in content serving. For link prediction, we predict whether two users would view the stories posted by each other in week 3. Negative examples are the users who are friends, but do not have story viewing in the same week. It is worth noting that this definition yields more positive examples than negative examples, which is the cause of a relatively high AUPRC score observed in experiments. These records are then randomly split into training, validation, and test sets with the constraint that a user appears as the viewer of a record in at most one of the three sets. This task aims to estimate the likelihood of story viewing between friends, so that the application can rank stories accordingly.
We also provide the Jaccard coefficient–based measurement on Snapchat in Figure 4. It can be seen that the crossview agreement between each pair of views in the Snapchat network falls in between YouTube and Twitter presented in Section 2.
YouTube. YouTube is a videosharing website. We use a dataset made publicly available by the Social Computing Data Repository (Zafarani and Liu, 2009)^{‡}^{‡}‡http://socialcomputing.asu.edu/datasets/YouTube. From this dataset, a network with three views is constructed, where each node is a core user and the edges in the three views represent the number of common friends, the number of common subscribers, and the number of common favorite videos, respectively. Note that the core users are those from which the author of the dataset crawled the data, and their friends can fall out of the scope of the set of core users. Without user label available for classification, we perform only link prediction task on top of the user embedding. This task aims at inferring whether two core users are friends, which has also been used for evaluation by existing research (Qu et al., 2017). Each core user forms positive pairs with his or her core friends, and we randomly select times as many nonfriend core users to form negative examples. Records are split into training, validation, and test sets as in the link prediction task on Snapchat.
Twitter. Twitter is an online news and social networking service. We use a dataset made publicly available by the Social Computing Data Repository (Leskovec and Krevl, 2014)^{§}^{§}§https://snap.stanford.edu/data/higgstwitter.html. From this dataset, a network with two views is constructed, where each node is a user and the edges in the two views represent the number of replies and the number of mentions, respectively. Again, we evaluate by a link prediction task that infers whether two users are friends as in existing research (Qu et al., 2017). The same negative example generation method and training–validation–test split method are used as in the YouTube dataset.
For each evaluation task on all three networks, training, validation, and test sets are derived in a shuffle split manner with a –– ratio. The shuffle split is conducted for
times, so that mean and its standard error under each metric can be calculated. Furthermore, a node is excluded from evaluation if it is isolated from other nodes in at least one of the multiple views.
Dataset  Metric  singleview  independent  onespace  viewmerging  mvn2veccon  mvn2vecreg  
(worst view)  (best view)  
Snapchat  ROCAUC  0.587 (0.001)  0.592 (0.001)  0.617 (0.001)  0.603 (0.001)  0.611 (0.001)  0.626 (0.001)  0.638 (0.001) 
AUPRC  0.675 (0.001)  0.677 (0.002)  0.700 (0.001)  0.688 (0.002)  0.693 (0.002)  0.709 (0.001)  0.712 (0.002)  
YouTube  ROCAUC  0.831 (0.002)  0.904 (0.002)  0.931 (0.001)  0.914 (0.001)  0.912 (0.001)  0.932 (0.001)  0.934 (0.001) 
AUPRC  0.515 (0.004)  0.678 (0.004)  0.745 (0.003)  0.702 (0.004)  0.699 (0.004)  0.746 (0.003)  0.754 (0.003)  
ROCAUC  0.597 (0.001)  0.715 (0.001)  0.724 (0.001)  0.737 (0.001)  0.741 (0.001)  0.727 (0.000)  0.754 (0.001)  
AUPRC  0.296 (0.001)  0.428 (0.001)  0.447 (0.001)  0.466 (0.001)  0.469 (0.001)  0.453 (0.001)  0.478 (0.001) 
Dataset  Metric  singleview  independent  onespace  viewmerging  mvn2veccon  mvn2vecreg  
(worst view)  (best view)  
Snapchat  ROCAUC  0.634 (0.001)  0.667 (0.002)  0.687 (0.001)  0.675 (0.001)  0.672 (0.001)  0.693 (0.001)  0.690 (0.001) 
AUPRC  0.252 (0.001)  0.274 (0.002)  0.293 (0.002)  0.278 (0.001)  0.279 (0.001)  0.298 (0.001)  0.296 (0.002) 
6.3. Baselines and Experimental Setup
In this section, we describe the baselines used to validate the utility of modeling preservation and collaboration, and the experimental setup for both embedding learning and downstream evaluation tasks.
Baselines. Quantitative evaluation results are obtained by applying downstream learner upon embedding learned by a given embedding method. Therefore, for fair comparisons, we use the same downstream learner in the same evaluation task. Moreover, since our study aims at understanding the characteristics of multiview network embedding, we build all compared embedding methods from the same random work plus skipgram approach with the same model inference method, as discussed in Section 5. Specifically, we describe the baseline embedding methods as follows:

Independent. As briefly discussed in Section 4, the independent model first embeds each network view independently, and then concatenate them to find the final embedding . This method is equivalent to mvn2veccon when , and to mvn2vecreg when . It preserves the information embodied in each view, but do not allow collaboration across different views in the embedding process.

Onespace. Also discussed in Section 4, the onespace model assumes the embedding of the same node to share model parameters across different views . It uses the same strategy to combine random walks generated from different views as with the proposed mvn2vec methods. onespace enables different views to collaborate in learning a unified embedding, but do not preserve information specifically carried by each view.

Viewmerging. The viewmerging model first merges all network views into one unified view, and then learn the embedding of this single unified view. In order to comply with the assumed equal importance of different network views, we scale the weights of edges proportionally in each view, so that the total edge weights from all views are the same in the merged network. This method serves as an alternate approach to onespace in modeling collaboration. The difference between viewmerging and onespace essentially lies in whether or not random walks can cross different views. We note that just like onespace, viewmerging does not model preservation.

Singleview. For each network view, the singleview model learns embedding from only this view, and neglects all other views. This baseline is used to verify whether introducing more than one view does bring in informative signals in each evaluation task.
Downstream learners. For fair comparisons, we apply the same downstream learner onto the features derived from each embedding method. Specifically, we use the scikitlearn^{¶}^{¶}¶http://scikitlearn.org/stable/ implementation of logistic regression with 2 regularization and the SAG solver for both classification and link prediction tasks. For each task and each embedding method, we tune the regularization coefficient in the logistic regression to the best on the validation set. Following existing research (Tang et al., 2015), each embedding vector is normalized onto the unit 2 sphere before feeding into downstream learners. In multilabel classification tasks, the features fed into the downstream learner is simply the embedding of each node, and we train an independent logistic regression model for each label. In link prediction tasks, features of node pairs are needed, and we derive such features by the Hadamard product of the two involved node embedding vectors as suggested by previous work (Grover and Leskovec, 2016).
Hyperparamters. For independent, mvn2veccon, and mvn2vecreg, we set embedding space dimension . For onespace and viewmerging, we experiment with both and , and always report the better result between the two settings. For singleview, we set . To generate random walk pairs, we always set and . For the SnapchatLA network, we set due to its large scale, and set for all other datasets. The negative sampling rate is set to be for all models, and each model is trained for epoch. In Figure 3, Table 4, Table 5, and Figure 6, and in the mvn2vec models are also tuned to the best on the validation dataset. The impact of and on model performance is further presented and discussed in Section 6.5.
Metrics.
For link prediction tasks, we use two widely used metrics: the area under the receiver operating characteristic curve (ROCAUC) and the area under the precisionrecall curve (AUPRC). The receiver operating characteristic curve (ROC) is derived from plotting true positive rate against false positive rate as the threshold varies, and the precisionrecall curve (PRC) is created by plotting precision against recall as the threshold varies. Higher values are preferable for both metrics. For multilabel classification tasks, we also compute the ROCAUC and the AUPRC for each label, and report the mean value averaged across all labels.
6.4. Quantitative Evaluation Results on RealWorld Datasets
The link prediction experiment results on three networks are presented in Table 4. For each dataset, all methods leveraging multiple views outperformed those using only one view, which justifies the necessity of using multiview networks. Moreover, onespace and viewmerging had comparable performance on each dataset. This is an expected outcome because they both only model collaboration and differ from each other merely in whether random walks are performed across network views.
On YouTube, the proposed mvn2vec models perform as good but do not significantly exceed the baseline independent model. Recall that the need for preservation in the YouTube network is overwhelmingly dominating as discussed in Section 4. As a result, it is not surprising to see that additionally modeling collaboration does not bring about significant performance boost in such extreme case. On Twitter, collaboration plays a more important role than preservation, as confirmed by the better performance of onespace than independent. Furthermore, mvn2vecreg achieved better performance than all baselines, while mvn2veccon outperformed independent by further modeling collaboration, but failed to exceed onespace. This phenomenon can be explained by the fact that in mvn2veccon are set to be independent regardless of its hyperparameter , and mvn2veccon’s capability of modeling collaboration is bounded by this design.
The Snapchat network used in our experiments lies in between YouTube and Twitter in terms of the need for preservation and collaboration. The proposed two mvn2vec models both outperformed all baselines under all metrics. In other words, this experiment result shows the feasibility of gaining performance boost by simultaneously model preservation and collaboration without overcomplicating the model or adding supervision.
The multilabel classification results on Snapchat are presented in Table 4. As with the previous link prediction results, the two mvn2vec model both outperformed all baselines under all metrics, with a difference that mvn2veccon performed better in this classification task, while mvn2vecreg outperformed better in the previous link prediction task. Overall, while mvn2veccon and mvn2vecreg may have different advantages in different tasks, they both outperformed all baselines by simultaneously modeling preservation and collaboration on the Snapchat network, where both preservation and collaboration coexist.
6.5. Hyperparameter Study
Impact of for mvn2veccon and for mvn2vecreg. With results presented in Figure 5, we first focus on the Snapchat network. Starting from , where only preservation was modeled, mvn2vecreg performed progressively better as more collaboration kicked in by increasing . The peak performance was reached between and . On the other hand, the performance of mvn2veccon improved as grew. Recall that even in case , mvn2veccon still have independent in each view. This prevented mvn2veccon from promoting more collaboration.
On YouTube, the mvn2vec models did not significantly outperform independent no matter how and varied due to the dominant need for preservation as discussed in Section 4 and 6.4.
On Twitter, mvn2vecreg outperformed onespace when was large, while mvn2veccon could not beat onespace for reason discussed in Section 6.4. This also echoed mvn2veccon’s performance on Snapchat as discussed in the first paragraph of this section.
Impact of embedding dimension. To rule out the possibility that onespace could actually preserve the viewspecific information as long as the embedding dimension were set to be large enough, we further carry out the multiclass classification task on under varied embedding dimensions. Note that is used in this experiment because it has the need for modeling preservation as discussed in Section 6.1. As presented in Figure 6, onespace achieves its best performance at , which is worse than independent at , let alone the best performance of independent at . Therefore, one cannot expect onespace to preserve the information carried by different views by employing embedding space with large enough dimension.
Besides, all four models achieve their best performance with in the vicinity of 256512. Particularly, onespace requires the smallest embedding dimension to reach peak performance. This is expected because, unlike the other models, onespace does not segment its embedding space to suit multiple views, and hence has more freedom in exploiting an embedding space with given dimension.
7. Conclusion and future work
We studied the characteristics that are specific and important to multiview network embedding. preservation and collaboration were identified as two such characteristics in our practice of embedding realworld multiview networks. We then explored the feasibility of achieving better embedding results by simultaneously modeling preservation and collaboration, and proposed two multiview network embedding methods to achieve this objective. Experiments with various downstream evaluation tasks were conducted on a series of synthetic networks and three realworld multiview networks with distinct sources, including two public datasets and an internal Snapchat dataset. Experiment results corroborated the presence and importance of preservation and collaboration, and demonstrated the effectiveness of the proposed methods.
Knowing the existence of the identified characteristics, future work includes modeling different extent of preservation and collaboration for different pairs of views in multiview embedding. It is also rewarding to explore supervised methods for taskspecific multiview network embedding that are capable of modeling preservation and collaboration jointly.
Appendix
We provide the gradients used for ASGD in the proposed algorithms.
mvn2veccon:
(9) 
(10) 
(11) 
mvn2vecreg:
(12) 
(13) 
(14) 
Note that in implementation, should be the number of views in which is associated with at least one edge.
References
 (1)
 Alston (1989) Jon P Alston. 1989. Wa, guanxi, and inhwa: Managerial principles in Japan, China, and Korea. Business Horizons 32, 2 (1989), 26–31.
 Belkin and Niyogi (2001) Mikhail Belkin and Partha Niyogi. 2001. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, Vol. 14. 585–591.
 Frank and Nowicki (1993) Ove Frank and Krzysztof Nowicki. 1993. Exploratory statistical anlaysis of networks. Annals of Discrete Mathematics 55 (1993), 349–365.
 Gollini and Murphy (2014) Isabella Gollini and Thomas Brendan Murphy. 2014. Joint Modelling of Multiple Network Views. Journal of Computational and Graphical Statistics (2014), 00–00.
 Greene and Cunningham (2013) Derek Greene and Pádraig Cunningham. 2013. Producing a unified graph representation from multiple social network views. In Web Science Conference. ACM, 118–121.
 Grover and Leskovec (2016) Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 855–864.
 Hu et al. (2005) Haiyan Hu, Xifeng Yan, Yu Huang, Jiawei Han, and Xianghong Jasmine Zhou. 2005. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21, suppl 1 (2005), i213–i221.

Kumar and
Daumé (2011)
Abhishek Kumar and Hal
Daumé. 2011.
A cotraining approach for multiview spectral clustering. In
ICML. 393–400.  Kumar et al. (2011) Abhishek Kumar, Piyush Rai, and Hal Daume. 2011. Coregularized multiview spectral clustering. In NIPS. 1413–1421.
 Leskovec and Krevl (2014) Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data. (June 2014).
 Liu et al. (2013) Jialu Liu, Chi Wang, Jing Gao, and Jiawei Han. 2013. Multiview clustering via joint nonnegative matrix factorization. In SDM, Vol. 13. SIAM, 252–260.
 Mikolov et al. (2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.
 Ou et al. (2016) Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 2016. Asymmetric Transitivity Preserving Graph Embedding.. In KDD. 1105–1114.
 Pattison and Wasserman (1999) Philippa Pattison and Stanley Wasserman. 1999. Logit models and logistic regressions for social networks: II. Multivariate relations. Brit. J. Math. Statist. Psych. 52, 2 (1999), 169–194.
 Pei et al. (2005) Jian Pei, Daxin Jiang, and Aidong Zhang. 2005. On mining crossgraph quasicliques. In KDD. ACM, 228–238.
 Perozzi et al. (2014) Bryan Perozzi, Rami AlRfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 701–710.
 Perozzi et al. (2017) Bryan Perozzi, Vivek Kulkarni, Haochen Chen, and Steven Skiena. 2017. Don’t Walk, Skip! Online Learning of Multiscale Network Embeddings. In Advances in Social Networks Analysis and Mining (ASONAM), 2017 IEEE/ACM International Conference on.
 Qu et al. (2017) Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, and Jiawei Han. 2017. An Attentionbased Collaboration Framework for MultiView Network Representation Learning. In CIKM. ACM.
 Recht et al. (2011) Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild: A lockfree approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems. 693–701.
 Roweis and Saul (2000) Sam T Roweis and Lawrence K Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500 (2000), 2323–2326.
 SalterTownshend and McCormick (2013) Michael SalterTownshend and Tyler H McCormick. 2013. Latent Space Models for Multiview Network Data. Technical Report 622. Department of Statistics, University of Washington.

Sindhwani and
Niyogi (2005)
Vikas Sindhwani and
Partha Niyogi. 2005.
A coregularized approach to semisupervised learning with multiple views. In
ICML Workshop on Learning with Multiple Views.  Tang et al. (2015) Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Largescale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1067–1077.
 Tenenbaum et al. (2000) Joshua B Tenenbaum, Vin De Silva, and John C Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 5500 (2000), 2319–2323.
 Wang et al. (2016) Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1225–1234.
 Yan et al. (2007) Shuicheng Yan, Dong Xu, Benyu Zhang, HongJiang Zhang, Qiang Yang, and Stephen Lin. 2007. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE transactions on pattern analysis and machine intelligence 29, 1 (2007).
 Zafarani and Liu (2009) R. Zafarani and H. Liu. 2009. Social Computing Data Repository at ASU. (2009). http://socialcomputing.asu.edu
 Zeng et al. (2006) Zhiping Zeng, Jianyong Wang, Lizhu Zhou, and George Karypis. 2006. Coherent closed quasiclique discovery from large dense graph databases. In KDD. ACM, 797–802.
 Zhang et al. (2008) Dan Zhang, Fei Wang, Changshui Zhang, and Tao Li. 2008. MultiView Local Learning.. In AAAI. 752–757.
 Zhou and Burges (2007) Dengyong Zhou and Christopher JC Burges. 2007. Spectral clustering and transductive learning with multiple views. In ICML. ACM, 1159–1166.
Comments
There are no comments yet.