1 Introduction
Over the past several years, largescale graph machine learning has gained increasing relevance in the domain of international poverty research [Blumenstock2016]. Driven largely by the expansion of mobile phone networks throughout developing countries – roughly 95% of the world population now has mobile phone coverage [GSMA2016] – vast quantities of network data are constantly being generated by people living in even extremely poor and marginalized communities. Recent work has shown how such data can be used to inform critical policy decisions, including the measurement of living conditions [Blumenstock, Cadamuro, and On2015], the spread of infectious diseases [Wesolowski et al.2015], and the management of humanitarian crises [Lu, Bengtsson, and Holme2012]. Private companies are also taking advantage of this new source of data, for instance by using data from mobile phones to generate credit scores that can expand credit to millions of people historically shut out of the formal banking ecosystem [Francis, Blumenstock, and Robinson2017].
However, a critical constraint to the use of these data in settings related to economic development is the lack of scalable algorithms for performing prediction tasks on sparse multiview networks. Multiview networks (also referred to as multiplex and multimodal networks), are networks in which nodes can be related in multiple ways, and are the natural abstraction for mobile phone networks, where different individuals have different types of relationships and can interact using different modalities (such as phone calls, text messages, money transfers, and appbased activity). Yet, the vast majority of applied research using mobile phone data — in developing and developed countries alike — ignores the multiview nature of phone networks.
This paper develops a novel approach for learning on multiview networks, which bridges two different strands in the research literature. The first strand involves methods for efficient analysis of multiview networks; the second explores algorithms for semisupervised graph learning (see Related Work, below). The method we develop provides an efficient approach for applying convolutional neural networks to multiview graphstructured data. We benchmark this new method, which we call MultiGCN (short for MultiView Graph Convolutional Networks), on three different mobile network datasets, on three different prediction tasks relevant to the international development community: (1) predicting the adoption of a new “financial inclusion” technology in a West African country; (2) predicting whether an individual is living below the poverty line in an East African country; (3) predicting the gender of mobile phone subscribers in a South Asian country. In all cases, we find that MultiGCN outperforms stateoftheart benchmarks, including standard Graph Convolutional Networks
[Kipf and Welling2017], Node2Vec [Grover and Leskovec2016], Deepwalk [Perozzi, AlRfou, and Skiena2014], and LINE [Tang et al.2015].While designed specifically with the developingcountry context in mind (where the sparsity and multiview properties of networks are very salient), we show that MultiGCN can be more generally applied to a wide range of problems involving multiview networks. Indeed, most realworld networks are multiview, including the network data most frequently used by AI researchers (e.g., data from Twitter, Amazon, Netflix, etc.). Our second set of results shows that MultiGCN can improve upon stateoftheart algorithms not just in povertyrelated contexts, but also in traditional classification problems. In particular, we show that MultiGCN outperforms competing algorithms on citation labeling tasks (using benchmark datasets from Citeseer and Cora) that have been studied extensively in prior work.
2 Related Work
2.1 Technical Related Work
Our goal is to develop an efficient method for nodelevel transductive semisupervised learning over multiview graphs. Here, we begin with a general overview of semisupervised learning, then focus on various approaches to graphbased semisupervised learning, and finally discuss related work on multiview networks.
GraphBased SemiSupervised Learning
One of the biggest issue with applying supervised learning algorithms in a developing country is that it is often costly to collect labels for training. For instance, when using mobile phone data to predict the wealth of subscribers, blumenstock2015predicting blumenstock2015predicting manually conducted a survey of roughly 1,000 subscribers. Semisupervised learning tries to solve this problem by using unlabeled data along with the labeled data to train better classifiers (see
[Zhu2005] for a survey). Our focus is on transductive semisupervised learning, which assumes that all the unlabeled data is available at the training time and does not attempt to generalize to data unseen during training.Graphbased semisupervised learning (GSSL) is a popular approach for semisupervised learning that treats labeled and unlabeled instances as graph vertices, and relationships between instances as edges [Liu, Wang, and Chang2012]
. GSSL algorithms try to learn a classifier that is consistent with the labeled data while making sure that the prediction for similar nodes is also similar. This is achieved by minimizing a loss function with two factors: a) supervised loss over the labeled instances, and b) a graphbased regularization term. Different GSSL algorithms use different functions for graph regularization. Label propagationbased approaches, for instance, use a constrained label lookup function (e.g., zhou2004learning zhou2004learning). Related, kernelbased approaches parameterize regularization term in the Reproducing Kernel Hilbert Space (RKHS).
Learning Over Graphs
The success of word embedding algorithms like Word2Vec [Mikolov et al.2013] has inspired similar algorithms for graphs. For instance, DeepWalk [Perozzi, AlRfou, and Skiena2014] learns embeddings by predicting the neighborhood of nodes based on random walks over the graphs, while LINE [Tang et al.2015] and Node2vec [Grover and Leskovec2016] allow for advanced sampling schemes. More recently, neural networkbased approaches have been proposed to perform learning over graphs. These have been extended to the task of semisupervised learning [Bruna et al.2013, Defferrard, Bresson, and Vandergheynst2016], including recent work by kipf2016semi kipf2016semi that proposes a Graph Convolutional Network (GCN), which we take as a starting point for our approach.
Learning Over MultiView Graphs
The key distinction between our approach and prior work is our desire to handle graphs with multiple views, i.e., graphs where vertices can be connected in more than one way. In recent years, many different algorithms have been proposed for learning on multiview graphs. These algorithms can be broadly divided into three main categories: 1) cotraining algorithms, 2) learning with multiple kernels, and 3) subspace learning (See xu2013survey xu2013survey for a survey). Recent work by dong2014subspace dong2014subspace show that subspace approaches — which find a latent subspace shared by multiple views — perform well relative to cotraining and kernelized approaches on a range of tasks. We therefore focus our attention on integrating subspace learning approaches with recent innovations in graph convolutional networks.
Comparison with existing work
Our main contribution is to propose an efficient method for adapting GSSL to multiview contexts. Existing approaches to GSSL cannot be readily implemented on such data; those algorithms that do handle multiple views generally treat views and vertices equally. We show that current “state of the art” methods like Graph Convolutional Networks [Kipf and Welling2017] can be enhanced by augmenting the input graph using subspace analysis over Grassman manifolds. farseev2017cross farseev2017cross have demonstrated that subspace merging approach can be quite accurate for the problem of crossdomain recommendation which is different from our experimental settings and context as described in the section 4.
2.2 Empirical Related Work
Our experimental results focus on three prediction tasks of relevance to the international development community:
Predicting poverty.
A large number of humanitarian applications — from poverty targeting to program monitoring — require accurate estimates of the welfare for beneficiary populations. Recently, several papers have shown how digital trace data can be used to estimate the socioeconomic status of individuals, households, and villages. For instance, jean2016combining jean2016combining show that daytime satellite imagery can be used to estimate village wealth; quercia2012tracking quercia2012tracking find that Twitter data can be used to estimate levels of deprivation, and blumenstock2014 (2015) shows that mobile phone metadata can be used to estimate the welfare of individuals and regions.
Product adoption.
We focus on the adoption of “mobile money”, a suite of phonebased financial services that are designed to promote financial inclusion among those traditionally shut out of the formal banking ecosystem [Suri2017]. Within this literature, our work relates most closely to kdd_2016_mrk (2016), who analyze the predictors of mobile money adoption in three different developing countries.
Gender prediction.
Gender equality and women’s empowerment are one of the Sustainable Development Goals, and recent work explores how digital trace data can be used to assess progress toward this goal [Fatehkia, Kashyap, and Weber2018]. mislove2011understanding mislove2011understanding and frias2010gender frias2010gender show that gender can be predicted from social media and mobile phone data.
Broadly, these prior studies demonstrate a proof of concept: that digital trace data can be used to predict the characteristics and outcomes of individuals. However, such analysis rely on offtheshelf algorithms that rarely, if ever, account for the multiview nature of realworld social networks. This paper shows that a simple approach to multiview learning can yield substantial improvements on these realworld prediction tasks.
3 MultiGCN: MultiView Graph Convolutional Networks
Our approach to semisupervised learning on multiview graphs integrates three steps, depicted in Figure 1. First, we use methods from subspace analysis to efficiently merge multiple views of the same graph. Second, we use a manifold ranking procedure to identify the most informative subcomponents of the graph and to prune the graph upon which learning is performed. Finally, we apply a convolutional neural network, adapted to graphstructured data, to allow for semisupervised node classification.
3.1 Merging Subspace Representations
Given an undirected multilayer graph with M layers such that each layer has the same vertex set but same or different edges set , we first calculate the graph Laplacian for each of the individual layers. If and represent the degree matrix and the adjacency matrix for the view of the graph, then the normalized graph Laplacian is defined as
(1) 
Given the graph Laplacian for each layer of the graph, we calculate the spectral embedding matrix through trace minimization:
(2) 
This trace minimization problem can be solved by the RayleighRitz theorem. The solution contains the first eigenvectors corresponding to the
smallest eigenvalues of
. The spectral embedding embeds nodes of the original graph to a low dimensional spectral domain (See von2007tutorial von2007tutorial for details).A Grassman manifold can be considered as a set of dimensional linear subspaces in where each unique subspace is mapped to a unique point on the manifold. Each point on the manifold can be represented by an orthonormal matrix whose columns span the corresponding kdimensional subspace in and the distance between the subspaces can be calculated as a set of principal angles between these subspaces. dong2014subspace dong2014subspace show that the projection distance between two subspaces and can be represented as a separate trace minimization problem:
(3) 
where, based on Eq. 3, the projection distance between the target representative subspace and the individual subspaces can be calculated as:
(4) 
Minimization of Eq. 4 ensures that individual subspaces are close to the final representative subspace .
Finally, to ensure that the original vertex connectivity in each graph layer is preserved, we include a separate term that minimizes the quadraticform Laplacian (evaluated on the columns of U):
(5) 
In Eq 5, is the regularization parameter that balances the tradeoff between the two terms in the objective function. Rearranging Eq. 5 and ignoring the constant terms yields
(6) 
As before, the RayleighRitz theorem can be used to solve Eq 5. The solution is given by the fist eigenvectors of the modified Laplacian:
(7) 
3.2 GraphBased Manifold Ranking
Though the modified Laplacian calculated above can be fed directly to the downstream graph convolutional networks, model performance can be increased by ranking the nodes in the manifold based on their saliency with respect to some critical nodes [Zhou et al.2004b]. To rank points on the manifold, we use the closed form function,
(8) 
Here,
represents the identity matrix,
is the normalized Laplacian as calculated in Eq. 7, andis the regularization parameter. Given a vector
containing the indices of the query nodes, Eq. 8 calculates the saliency of the other nodes with respect to the query nodes; the saliency of these nodes can then be used to add or prune edges from the induced underlying graph. The use of manifoldbased ranking suits our approach as the modified Laplacian representing merged subspaces can be used directly for saliency detection. The query nodes can be selected as the centroids determined by any clustering algorithm over the manifold.The algorithm for the subspace merging and subsequent manifold ranking is shown in Algorithm 1. The time complexity of Algorithm 1 for a graph with layers with users per layer is where represents the number of eigenvectors to be calculated and is the number of centroids is the cost of computing Laplacians and Eigenvector matrix for all the layers ; is the cost of computing modified Laplacian; is the cost of computing
clusters using kmeans clustering;
is the cost of manifold ranking. using the iterative version described by [Zhou et al.2004b].Dataset  Data Type  Nodes  Edges  Edges  Classes  Features  Label Rate 
(view 1)  (view 2)  
Product Adoption  Phone logs (West Africa)  17,000  23,032  18,371  2  132  0.002 
Poverty Prediction  Phone logs (East Africa)  422  544  1,799  2  1,709  0.094 
Gender Prediction  Phone logs (South Asia)  958  992  978  2  821  0.042 
Citeseer  Citation network  3,327  4,732  3,492  6  3,703  0.036 
Cora  Citation network  2,708  5,429  2,846  7  1,433  0.052 
3.3 Graph Convolution Networks
The application of convolutional neural networks to irregular or nonEuclidean grids, such as graphs, is based on the fact that convolutions are multiplications in the Fourier domain, which implies that graph convolutions can be expressed as the multiplication of a signal with a filter (see bruna2013spectral bruna2013spectral):
(9) 
Here, represents the eigendecomposition of the normalized graph Laplacian and , , represent the identity, degree and the adjacency matrix, respectively. Graph convolutions can be further expressed in terms of Chebyshev polynomials as
(10) 
where is the rescaled Laplacian, represents the Chebyshev polynomials, and represents the vector of Chebyshev coefficients. Following kipf2016semi kipf2016semi, by approximating the maximum value of the largest eigenvalue and constraining the number of free parameters, the convolution operation can be represented as
(11) 
where and are the renormalized versions of and . This renormalization avoids numerical instabilities resulting from exploding/vanishing gradients [Defferrard, Bresson, and Vandergheynst2016].
The modified graph ( in Algorithm 1) resulting from the merger of Laplacians using the subspace analysis and manifold ranking can be fed directly into the graph convolution networks defined above. The forward propagation model for a two layer network can then be represented as
(12) 
Here, is calculated as a preprocessing step before giving the input to the neural network. and
represent the inputtohiddenlayer and hiddenlayertooutput weight matrices for a two layer neural network, and can be trained using gradient descent. ReLU and Softmax represent the activation functions in the hidden and output layers.
4 Experiments and Data
4.1 Datasets
Our first set of experiments test MultiGCN on three prediction tasks relevant to international development. Each one uses a different dataset of mobile phone Call Detail Records (CDR), obtained from three different developing countries with GDP per capita less than $1,600 USD. These datasets contain detailed metadata on all communication events (calls, messages) that occur on the mobile phone network. Each CDR dataset contains multiple possible relationships between nodes (views); we extract one view corresponding to phone calls between users, and another corresponding to text messages. We separately construct a large set of features of each user (such as total call volume and degree centrality), using the combinatoric approach described in kdd_2016_mrk kdd_2016_mrk.
Table 1 presents summary statistics for each of these datasets. The connections and sparsity of each network are shown in Figure 2. These spy plots help visualize the structure of the adjacency matrices for each graph view, where a dot indicates that an edge exists between those two individuals on the corresponding view.
Method  Product Adoption  Poverty Prediction  Gender Prediction 

DeepWalk (first view)  56.430.187  51.910.62  53.18 0.55 
DeepWalk (second view)  51.970.112  50.340.36  50.840.64 
DeepWalk (view union)  56.81 0.114  50.870.95  52.340.50 
Node2vec (first view)  53.870.20  52.260.58  50.12 0.40 
Node2vec (second view)  50.500.11  49.700.23  51.680.40 
Node2vec (view union)  54.500.11  50.520.63  51.640.53 
LINE (first view)  51.110.01  50.150.02  51.56 0.001 
LINE (second view)  50.830.01  52.290.001  50.000.001 
LINE (view union)  56.260.003  50.180.001  51.330.002 
GCN (first view)  70.742.2  55.192.33  63.97 1.29 
GCN (second view)  71.401.81  50.060.81  63.010.013 
GCN (view union)  71.900.9  50.220.56  63.901.32 
MultiGCN (this paper)  73.470.91  59.230.20  66.34 1.03 
. Numbers indicate mean classification accuracy (percentage) and standard error over 10 randomly selected dataset splits of equal size.
Product adoption dataset
The first dataset that we use is a sample of a dataset of mobile phone activity from a West African country. Here, the classification of interest is whether or not the user eventually adopts a new financial inclusion product. There are two possible classifications: (1) Did not adopt; (2) Adopted and used the product. Following the experimental setup described in kipf2016semi kipf2016semi, we randomly selected 20 users from each category (40 total) for the training dataset; the validation and the testing dataset consist of 500 and 1000 randomly selected users, respectively.
Poverty prediction dataset
The wealth prediction dataset consists of several thousand transactions of different mobile phone users from an East African country. We attempt to classify users as poor or nonpoor, where labels were obtained by blumenstock2015predicting blumenstock2015predicting through a small set of phone surveys that were conducted with mobile phone subscribers. Again, we randomly selected 20 users from each category as the training dataset, while the size of the validation dataset and the testing dataset is 100 and 200 respectively.
Gender prediction dataset
The gender prediction dataset originates from a developing country in South Asia. Here, the classification task is to predict the gender of the mobile phone users, where gender labels are provided by the operator for a small number of labeled instances. We randomly select 20 users from each category for training; the size of the validation and the testing datasets are 100 and 800, respectively.
Citation classification datasets
A final set of experiments replicates the experimental design of kipf2016semi kipf2016semi to test MultiGCN on more standard node labelling tasks. In these datasets, nodes are documents and the first view corresponds to the citation links between the research papers. We construct the second view from the textual similarity of the papers. Specifically, if the normalized cosine similarity between documents is greater than 0.8, then we create an edge in the second view of the citation network.
4.2 Experimental setup
In general, our goal is to correctly classify nodes in a network, where only a very small fraction of nodes are labeled. In the experiments, we start from a small sample of labeled nodes and test the ability of MultiGCN, as well as several stateoftheart algorithms, to correctly classify unlabeled nodes in the validation and testing sets. We use three popular node embedding algorithms (Node2vec, Deepwalk, and LINE) as a first set of baselines. In addition, we provide three baselines based on graph convolutional networks [Kipf and Welling2017]. The first two, GCN (first view) and GCN (second view), apply GCN over the two respective adjacency matrices from phone and text message activity. The third, GCN (view union), operates on the union of the adjacency matrices of the first view and the second view. In each GCN baseline, the node features are constructed from the adjacency matrix of the first view.
After merging different views, we rank the interaction between nodes using Eq. 8 based on their salience with respect to the query points. The value of the regularization parameter (see Eq. 7) is selected through 10fold crossvalidation. We similarly tune the hyperparameters to 0.99 and set the number of query points to ten times the number of classes.
After adding salient edges and eliminating nonsalient edges through the ranking process, both the adjacency matrix of the modified graph and the node features are passed as input to a twolayer graph convolutional network as described in Section 3. All of the GCNbased models, including MultiGCN, are trained for a maximum of 200 iterations, using Adam
(Adaptive moment estimation extension to stochastic gradient descent – see kingma2014adam kingma2014adam) and a learning rate of 0.01. Other GCN hyperparameters are set using the same values reported in kipf2016semi kipf2016semi.
Predefined traintest splits 


Method  Citeseer  Cora 
ManiReg (first view)  planetoid planetoid  60.1  59.5 
DeepWalk (first view)  deepwalk deepwalk  43.2  67.2 
Planetoid (first view)  planetoid planetoid  64.7  75.7 
GCN (first view)  70.3  81.5 
GCN (second view)  50.7  53.6 
GCN (view union)  70.7  80.4 
MultiGCN (this paper)  71.3  82.5 
Randomized traintest splits 

GCN (first view)  67.9 0.5  80.10.5 
GCN (second view)  53.60.1  56.90.3 
GCN (view union)  67.90.3  78.50.1 
MultiGCN (this paper)  70.5 0.2  81.10.2 
5 Results
Experimental results for the three developingcountry datasets are shown in Table 2. Each row in this table indicates the average and standard error of the classification accuracy over 10 randomly drawn traintest splits of the same size for each dataset, constructed as described in Section 4. The last row in Table 2 shows the performance of MultiGCN. In all four datasets, MultiGCN outperforms existing stateoftheart benchmarks, with the margin of improvement greatest in the poverty prediction task and smallest in the gender prediction task.
The second set of experimental results, comparing MultiGCN to recent benchmarks on a more standard node classification task, are shown in Table 3. In addition to performing a comparison over randomly drawn traintest splits, we also compare the performance of MultiGCN against a different set of randomized testtrain splits, as used in the original tests by kipf2016semi kipf2016semi, with an additional validation set of 500 instances used for hyperparameter tuning. In all cases, we observe improvements in predictive accuracy of MultiGCN relative to existing approaches.
6 Discussion
This paper proposes a new approach to semisupervised learning on multiview graphs. Through a series of experiments, we show that this approach improves upon stateoftheart embedding and convolutionbased algorithms on a variety of prediction tasks related to both poverty research and to node labelling in general.
Relative to singleview learning algorithms, the main value of the multiGCN approach is that it incorporates nonredundant information from multiple views into the learning process. Thus, the gains from multiGCN depend on the prediction task, and the importance of multiview graph structure to that task. Intuitively, this depends on the mutual information between. This intuition is also supported by a closer look at the results in Table 2. Here, we observe that while MultiGCN provides the biggest gains relative to Deepwalk, Node2vec and LINE in the case of product adoption, the gains relative to singleview GCN are more modest. By contrast, the performance gain on the poverty and gender prediction tasks is significantly higher for MultiGCN, even relative to the other singleview GCN benchmarks. The spy plots in Figures 1(a)1(c) help explain this pattern. In particular, we can see that different views in the product adoption setting appear somewhat redundant, whereas for poverty and gender prediction the views appear more independent.
We believe future work should explore several limitations of the current analysis. In particular, there is much to be learned from a more systematic exploration of the value of additional views, and for different methods for merging views (beyond the subspace learning approach developed in Section 3.1). We are also exploring how graphs with varying degrees of sparsity and a different fraction of labeled nodes can impact the performance of MultiGCN relative to alternative approaches.
7 Conclusion
Graph convolutional networks have recently achieved considerable success in a variety of learning tasks on irregular, graphstructured data. Leveraging insights from spectral graph theory, GCN’s are beginning to replicate the success that CNN’s have seen on more regular image and text data. For a wide variety of learning tasks relevant to graphstructured data — in contexts ranging from advertising in online networks to intervening in the spread of a contagious disease — this is a promising development.
In this paper, we have shown that stateoftheart GCNs can achieve even greater performance on a variety of classification tasks when the multiview nature of the underlying network is incorporated into the learning process. While motivated by three applications in global poverty research, the performance gains appear to generalize to other graphbased classification problems. We therefore view MultiGCN as an important first step in adapting neural networkbased approaches to multiview networks and hope that it provides a foundation for future work in this space.
8 Acknowledgements
This research was supported by the National Science Foundation Grant under award #CCF  1637360 (Algorithms in the Field) and by the Office of Naval Research (Minerva Initiative) under award N000141712313.
References
 [Blumenstock, Cadamuro, and On2015] Blumenstock, J.; Cadamuro, G.; and On, R. 2015. Predicting poverty and wealth from mobile phone metadata. Science 350(6264):1073–1076.

[Blumenstock2014]
Blumenstock, J. E.
2014.
Calling for Better Measurement: Estimating an Individual’s
Wealth and WellBeing from Mobile Phone Transaction Records.
In
The 20th ACM Conference on Knowledge Discovery and Mining (KDD ’14), Workshop on Data Science for Social Good
.  [Blumenstock2016] Blumenstock, J. E. 2016. Fighting poverty with data. Science 353(6301):753–754.
 [Bruna et al.2013] Bruna, J.; Zaremba, W.; Szlam, A.; and LeCun, Y. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203.
 [Defferrard, Bresson, and Vandergheynst2016] Defferrard, M.; Bresson, X.; and Vandergheynst, P. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, 3844–3852.
 [Dong et al.2014] Dong, X.; Frossard, P.; Vandergheynst, P.; and Nefedov, N. 2014. Clustering on multilayer graphs via subspace analysis on grassmann manifolds. IEEE Transactions on signal processing 62(4):905–918.
 [Farseev et al.2017] Farseev, A.; Samborskii, I.; Filchenkov, A.; and Chua, T.S. 2017. Crossdomain recommendation via clustering on multilayer graphs. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 195–204. ACM.
 [Fatehkia, Kashyap, and Weber2018] Fatehkia, M.; Kashyap, R.; and Weber, I. 2018. Using facebook ad data to track the global digital gender gap. World Development 107:189–209.
 [Francis, Blumenstock, and Robinson2017] Francis, E.; Blumenstock, J.; and Robinson, J. 2017. Digital credit: A snapshot of the current landscape and open research questions. CEGA White Paper.

[FriasMartinez, FriasMartinez, and
Oliver2010]
FriasMartinez, V.; FriasMartinez, E.; and Oliver, N.
2010.
A gendercentric analysis of calling behavior in a developing economy
using call detail records.
In
AAAI spring symposium: artificial intelligence for development
.  [Grover and Leskovec2016] Grover, A., and Leskovec, J. 2016. Node2vec: Scalable feature learning for networks. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. ACM.
 [GSMA2016] GSMA. 2016. Unlocking rural coverage: Enablers for commercially sustainable mobile network expansion. Technical report.
 [Jean et al.2016] Jean, N.; Burke, M.; Xie, M.; Davis, W. M.; Lobell, D. B.; and Ermon, S. 2016. Combining satellite imagery and machine learning to predict poverty. Science 353(6301).
 [Khan and Blumenstock2016] Khan, M. R., and Blumenstock, J. E. 2016. Predictors without borders: Behavioral modeling of product adoption in three developing countries. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. ACM.
 [Kingma and Ba2014] Kingma, D. P., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
 [Kipf and Welling2017] Kipf, T. N., and Welling, M. 2017. Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR).
 [Liu, Wang, and Chang2012] Liu, W.; Wang, J.; and Chang, S.F. 2012. Robust and scalable graphbased semisupervised learning. Proceedings of the IEEE 100(9):2624–2638.
 [Lu, Bengtsson, and Holme2012] Lu, X.; Bengtsson, L.; and Holme, P. 2012. Predictability of population displacement after the 2010 haiti earthquake. Proceedings of the National Academy of Sciences 109(29):11576–11581.
 [Mikolov et al.2013] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 3111–3119.
 [Mislove et al.2011] Mislove, A.; Lehmann, S.; Ahn, Y.Y.; Onnela, J.P.; and Rosenquist, J. N. 2011. Understanding the demographics of twitter users. ICWSM 11(5th):25.
 [Perozzi, AlRfou, and Skiena2014] Perozzi, B.; AlRfou, R.; and Skiena, S. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14. ACM.
 [Quercia et al.2012] Quercia, D.; Ellis, J.; Capra, L.; and Crowcroft, J. 2012. Tracking gross community happiness from tweets. In Proceedings of the ACM 2012 conference on computer supported cooperative work, 965–968. ACM.
 [Suri2017] Suri, T. 2017. Mobile money. Annual Review of Economics 9(1):497–520.
 [Tang et al.2015] Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; and Mei, Q. 2015. Line: Largescale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, 1067–1077. International World Wide Web Conferences Steering Committee.

[Von Luxburg2007]
Von Luxburg, U.
2007.
A tutorial on spectral clustering.
Statistics and computing 17(4):395–416.  [Wesolowski et al.2015] Wesolowski, A.; Qureshi, T.; Boni, M. F.; Sundsøy, P. R.; Johansson, M. A.; Rasheed, S. B.; EngøMonsen, K.; and Buckee, C. O. 2015. Impact of human mobility on the emergence of dengue epidemics in pakistan. Proceedings of the National Academy of Sciences 112(38):11887–11892.
 [Xu, Tao, and Xu2013] Xu, C.; Tao, D.; and Xu, C. 2013. A survey on multiview learning. arXiv preprint arXiv:1304.5634.
 [Yang, Cohen, and Salakhutdinov2016] Yang, Z.; Cohen, W. W.; and Salakhutdinov, R. 2016. Revisiting semisupervised learning with graph embeddings. In Proceedings of the 33rd International Conference on International Conference on Machine LearningVolume 48, 40–48. JMLR. org.
 [Zhou et al.2004a] Zhou, D.; Bousquet, O.; Lal, T. N.; Weston, J.; and Schölkopf, B. 2004a. Learning with local and global consistency. In Advances in neural information processing systems, 321–328.
 [Zhou et al.2004b] Zhou, D.; Weston, J.; Gretton, A.; Bousquet, O.; and Schölkopf, B. 2004b. Ranking on data manifolds. In Advances in neural information processing systems, 169–176.
 [Zhu2005] Zhu, X. 2005. Semisupervised learning literature survey. Technical Report 1530, Computer Sciences, University of WisconsinMadison.
Comments
There are no comments yet.