1 Introduction
Convolutional Neural Networks (CNNs) have been proven supremely successful on solving a wide variety of machine learning problems
hinton2012deep . The stationarity of data and the metric of grid unlock the possibility of designing a local convolutional kernel that linearly combines local features. With the power of deep architecture, the network can output highlevel representation of both local features and universal structures of signal. Even though the CNNs have been successful in tasks where data have underlying grid structure, e.g. text, images and videos, in many problems the data lie on irregular grid or more generally in nonEuclidean domains, e.g. molecular data, social networks and knowledge instances. Those data are better to be structured as graph, which is capable of handling varying node neighborhood connectivity and nonEuclidean metric. Under such a circumstance, the stationarity, locality and compositionality, which allow kernelbased convolution and pooling in CNNs, are no longer satisfied. The classical CNNs cannot directly work on graphstructured data.However, a generalization of classical CNNs from regular grid to irregular graph is not straightforward. For simplicity of constructing kernel, many previous works assume data is still on lowdimensional graph and the training data has unified graph Laplacian shared across signal domain bruna2013spectral ; henaff2015deep
. As a result, the graphs have to be of identical dimensions, which makes it impossible to construct an endtoend deep learning pipeline that accepts arbitrary graph inputs. Moreover, current graph convolution layer does not deeply exploit information given by vertex connectivity due to the difficulty of designing a kernel flexible with varying neighborhood
atwood2016diffusion ; chen2016compressing. Whereas, some sorts of data on nonEuclidean domain, such as molecular data, have underlying graph structure or some prior knowledge of how to construct it, e.g. social network, many others do not have such knowledge. So, it is necessary to estimate the similarity matrix before performing graph convolutions. The stateoftheart graph construction methods are classified into unsupervised and supervised ones
henaff2015deep . However, both graph constructions are accomplished before feeding data into the network. Therefore, the generated graph structure for the data keeps unchanged and will not be updated during the training procedure bruna2013spectral ; henaff2015deep .Although the supervised graph construction with fully connected networks has been exploited in DNN henaff2015deep , their dense training weights restrict it to small graphs. Moreover, the graph structure learned from a fully connected architecture is not guaranteed to best serve the convolution neural network. To tackle these challenges, we introduce a new graph convolution layer embedded with metric learning, so that each convolution layer is able to dynamically construct and learn graph structures for each individual data sample in the batch based on the given supervised information. Directly learning the similarity matrix has complexity for a graph of data. If harnessing a supervised metric learning with Mahalanobis distance, we could reduce the parameter number to at most or even . As a consequence, the learning complexity becomes independent of graph size . In classical CNNs, backpropagation generally updates kernel weights to adjust the relationship between neighboring nodes at each feature dimension individually. Then it sums up signals from all filters to construct hiddenlayer activations. To grant graph CNNs a similar capability, we propose a reparameterization on the feature dimension of graph data with additional weights and bias.
Even for those data with inherent graph structure, it is still interesting to ask if the free graphs optimally serve the specific learning tasks based on the supervised information or not. For example, the chemical bonds, connecting a pair of atoms, directly lead to a underlying graph for each chemical compound. It is not hard to find that those chemical connections are not always the optimal information source for predicting desired outputs of specific tasks. Consequently, it is emerging to develop new approach that automatically discovers the hidden and taskrelated graph structures that boost the performance of graph CNNs for specific task. Motivated by deep residue learning he2016deep , we propose a residual graph Laplacian learning method, which is able to learn an optimal graph structure for each data sample and the prediction neural network simultaneously.
In this paper, we explore our approach primarily on chemical molecular datasets, although the network can be straightforwardly trained on other graphstructured data, such as point cloud, social networks and so on. Our contributions can be summarized as follows:

A novel spectral graph convolution layer boosted by Laplacian learning (SGCLL) has been proposed to dynamically update the residual graph Laplacians via metric learning for deep graph learning.

Reparametrization on feature domain has been introduced in
hop spectral graph convolution to enable our proposed deep graph learning and to grant graph CNNs the similar capability of feature extraction on graph data as that in the classical CNNs on grid data.

An evolving graph convolution network (EGCN) has been designed to be fed by a batch of arbitrarily shaped graphstructured data. The network is able to construct and learn for each data sample the graph structure that best serves the prediction part of network. Extensive experimental results indicate the benefits from the evolving graph structure of data.
The rest of the paper is organized as follows. Section 2 reviews previous related works. Section 3 introduces the proposed spectral graph convolution boosted by residual Laplacian learning. Section 4 demonstrates both visual and numerical results. Section 5 concludes this paper.
2 Related Work
There have been lots of works that explored local receptive fields on grid krizhevsky2012imagenet ; coates2011selecting with deep learning. However, there are not so many works on generalizing deep convolutional network to graphstructured data. The first trial of formulating CNN analogy on irregular domains modeled as graphs has been accomplished by bruna2013spectral
, who investigated performing convolution on both spatial and spectral domains of graph representations. Their works gave a spatial localized filter by designing smooth spectral kernel constructed by Bspine interpolation, but it only worked on lowdimensional graph.
henaff2015deep further extended the spectral construction to a larger scale of highdimensional graphs as well as proposed two graph construction methods in both unsupervised and supervised fashion. Inspired by previous jobs and based on graph signal processing (GSP) shuman2013emerging , defferrard2016convolutional introduced a new spectral graph theoretical formulation and used Chebyshev polynomials and its approximate evaluation scheme to reduce the computational cost and achieve localized filtering. kipf2016semi showed a firstorder approximation to the Chebyshev polynomials as the graph filter spectrum, which requires less training parameters.
Besides above papers on constructing convolution layer on graphs, many others studied the problem from a different angle. niepert2016learning first investigated learning a network from a set of heterogeneous graphs to predict nodelevel feature as well as to do graph completion, although it is based on node sequence selection. atwood2016diffusion introduced a graph diffusion process, which delivers equivalent effect as convolution has, but atwood2016diffusion ’s DCNN has no dependency on the indexing of nodes. Its constrains are the highly restricted locality by diffusion process and the expensive dense matrix multiplication.
Recently, simonovsky2017dynamic investigated a similar problem as ours by learning edgeconditioned feature weight matrix from edge features using a separate filtergenerating network de2016dynamic , while simonovsky2017dynamic ’s application is on point cloud classification. There are other studies about learning on graph data such as dai2016discriminative that proposed a kernel embedding methods on feature space for graphstructured data. Another similar work is grover2016node2vec , but their models do not fall into the kingdom of feedforward CNN analogs on graphs.
For chemical compounds, naturally modeled as graphs, duvenaud2015convolutional ; wallach2015atomnet ; wu2017moleculenet made several successful trials of applying neural networks to learn representations for predictive tasks, which were usually tackled by handcrafted features mayr2016deeptox or hashing weiss2009spectral . Whereas, due to the constraints of spatial convolution, their models failed to make full use of the atomconnectivities, which are more than bond features by Rdkit landrum2013rdkit . More recent explorations on progressive network, multitask learning and lowshot or oneshot learning have been accomplished altae2016low ; gomes2017atomic . Lastly, Deepchem ^{1}^{1}1https://github.com/deepchem/deepchem is an outstanding opensource cheminformatics/machine learning benchmark. Our codes and demos were built and tested upon it.
3 Method
3.1 Spatial v.s Spectral Convolution
For constructing convolution operators on graphstructured data, there exist two major approaches: spatial construction and spectral construction. As implied by the name of the two, they separately manipulate spatial and spectrum domain of graph signals. Particularly, spatial convolution purely uses neighborhood information in terms of graph adjacency matrix or similarity matrix . More formally, if at th layer the input data , its output is formulated as bruna2013invariant ; wu2017moleculenet :
(1) 
where is a matrix that linearly maps each input feature dimension to output features and possibly . Nonzero entries of are where two nodes connected. Apparently, this model is hard to induce weights shared across spatial domain. The convolution of this type reduces to an analog of fully connected layer with sparse regularization given by on weight matrix . See Fig. 1 for explicit demonstrations of spatial graph convolution and graph max pooling layers.
Compared to spatial construction, spectral graph theory empowers us to build convolution kernel on spectrum domain which is more compact and the spatial locality of kernel is supported by the smoothness of spectrum multipliers. The baseline approach is built upon [Eq(3), defferrard2016convolutional ] which extended the onehop spatial kernels bruna2013spectral to the kernels that allow
hop connectivities. According to graph Fourier transform
defferrard2016convolutional , if is graph Fourier basis of :(2) 
where is frequencies of Laplacian . The Eq.(2) brings us an elastic kernel that allows any pair of nodes with shortest path distance to squeeze in. Of course, the faraway connectivity means less similarity and will be assigned less importance by .
Recursively fast filtering. Evaluating Eq.(2) is expensive due to dense matrix multiplication with . For instead, and were approximated by Chebyshev coefficients and polynomial functions. The computation of were replaced by recursive function with and . Then the hop kernel becomes
still parameterized by vector
of size . Consequently, the entire cost was reduced to from because of the natural sparsity of defferrard2016convolutional .Reparameterization on feature domain. One major idea for graph CNN is to exactly reconstruct classical CNN on graphs. This way is tough, because regularly shaped kernel is impossible on graphs. bruna2013invariant ; duvenaud2015convolutional simply bypass building kernels on spatial domain, but give feature transformation conditioned on edge distance bruna2013invariant or even node degree duvenaud2015convolutional . Spectral kernel Eq.(2) is a promising attempt. But it distributes weights in spatial domain similarly in concentric zone model, which is still not as flexible as convolution kernel on grid. Besides, for convolution layer of classical CNNs, outputted activations combine filtered signals from all feature maps in which separate kernels work independently. In other words, they do not only sum up features from their spatial neighbors, but also mine relationships with other feature dimensions. To mimic the classical CNNs, we reparameterize output of Eq.(2) by a feature domain transformation matrix and a bias . Intuitively, we divide the operations of classical CNNs on both spatial and feature domain into two consecutive stages: 1) compute kernel with ; 2) linearly maps features to another features. The layer after reparameterization is as below:
(3) 
3.2 Graph CNN with Laplacian Learning
The stateoftheart methods on graph convolution neural networks all utilize graph Laplacian matrix in some way. Normalized graph Laplacian is more often used. Given the adjacency matrix and the degree matrix for graph , the graph Laplacian matrix :
(4) 
As we know, defines both nodewise connectivity and degree of vertices. Some types of data have inherent graph structure, such as chemical molecular data. Each molecule is a graph with atoms as vertices and bonds as edges. Those chemical bonds could be verified by experiments and even visible in some cases. But, most of data do not have graph structure given, so we have to construct graphs before feed them to our deep nets. Besides above two cases, the most likely case is that the inherent graphs can not sufficiently express all of the meaningful nodewise connectivities. For example, mayr2016deeptox proposed to predict the toxicity of drugs by learning representations of toxic substructures from labeled molecular samples. The graph directly given by SMILES weininger1988smiles sequence does not tell anything about the toxicity. The model has to learn the atom connectivity and to form substructures most related to toxicity. The discovered toxic substructure may happen to be of the bonds, e.g. Benzene ring, or not at all. Given this, the next question becomes what defines a particularly good distance metric that best describes those hidden connectivities driven by learning tasks.
Supervised Metric Learning.
In articles of metric learning, the algorithms were divided as supervised and unsupervised learning for metrics
wang2015survey . The unsupervised metric selection picks the metric that works best for clustering data samples. The optimal metric should minimize the intracluster distances and also maximize the intercluster distances. For datasets come with labels, the quality of metric is determined by the learning loss. Parameterized as part of learning model, the metric converges to the optimal when the learning curve remains stable. Generalized Mahalanobis distance measures the distance between samples and by:(5) 
If , Eq.(5) reduces to Euclidean distance. In proposed EGCN, the symmetric positive semidefinite matrix is the trainable weight of SGCLL layer. The works as a transform basis to some domain in which we measure the Euclidean distance between and . Then, we use that distance to calculate the Gaussian kernel : . In our case, the optimal transformation matrix will be found by the one who is able to generate the graphs that best fit our learning tasks. Although the distance formulation Eq.(5) seems trivial, it is cheap to compute gradient w.r.t in backpropagation, which is the main source of computations in DNN.
Learning Residual Graph Laplacian As we discussed above, to discover the hidden correlations between nodes in graph, we introduce a parameterized distance Eq.(5) to update the Gaussian similarity matrix (, adjacency matrix after thresholding), and then use updated to compute normalized graph Laplacian (Eq.(4)). Due to the distance parameter is randomly initialized, so it may take long before the model to converge. To accelerate the convergence and increase the stability of our model, we announce a reasonable assumption that the optimal graph Laplacian is a small shifting from the original graph Laplacian , in other words the original graph Laplacian has disclosed a large amount of helpful graph structural information. Consequently, instead of directly learning , we learn the residual graph Laplacian: , so we have :
(6) 
The proposed LaplacianLearning boosted spectral graph convolution layer is fed by minibatch of arbitrarily shaped graphs, it performs convolution on spectrum domain with hop elastic kernel of training parameters. is for the weights in distance metric. In Fig. 2, the network consists of two SGCLL layers, in which the two sets of graph Laplacian
will be updated independently and will probably diverge because the input
is different, and the two layers worked on different feature maps.In Section. 4
, our experimental results on multiple datasets indicate that for the data with inherent graphs, e.g. drug data given as SMILES sequences, the original Laplacian is quite close to the optimal one. However, those small updates on graph connectivity within 20 epochs significantly raise the performance of model.
plays a role similar to regularization on and its weight is balanced by . For those datasets without given graphs, we could use clustering algorithms, e.g.nearest neighbor, spectral clustering
ng2001spectral , to construct graphs in unsupervised fashion. Then using them as initialization of the network is better than purely randomized weights initializer. See Fig. 2 for details of SGCLL layer and the residual graph Laplacian learning procedures in this layer.4 Experiments
Network Configuration of EGCN. The proposed network is named as evolving graph convolution networks (EGCN), because it allows graph structure evolves according to the context of learning task. Besides SGCLL, it has graph max pooling layer and gathering layer gomes2017atomic . The max pooling on graph was performed feature by feature. For each node , the operator replaces the th feature of node with the maximum among the original values from his neighbors and himself: . In graph gather layer, it simply sums up the feature vectors of all nodes and output it as representation of the graph, so we can use it to do graphlevel regression or classification. The motivation of embedding a bilateral filter in EGCN is against overfitting gadde2016superpixel . The evolving graph Laplacian definitely adapts the model to better fit the training data, but, at the risk of overfitting. To prevent overfitting, we introduce a revised bilateral filtering layer to regularize activation of SGCLL layer by augmenting the spatial locality of
. We also introduced batch normalization layers to accelerate the training
ioffe2015batch .Batch Training of Nonuniformly Shaped Samples. One of the greatest challenges for graph CNN is the different shapes of training graphs: 1) Raises the difficulty of designing kernels, because the invariance of kernel on graphs is not satisfied and the node indexing sometimes matters; 2) Sometimes resizing (clustering) bruna2013invariant is not reasonable for some types of graph like molecular data: it will lose significant atoms along with its features, if perform graph coarsening or pooling; 3) Most of deep learning APIs do not support the training inputs of varying shapes in batchmode ^{1}^{1}1I see some new workout released by Google’s Tensorflow looks2017deep , but due to time constraint, we do not try to move our code to that frameworks.
. In this work, we bypassed the tensor shape constraint of Tensorflow by heavy usage of
and . Samples has different number of nodes, so their graph Laplacians definitely differ, but they share all the model parameters. In the experiments, we almost reused the parameter setup for all datasets. Batch size is 256. The optimizer is Adam with exponential decayed (0.9 every 50 iterations) learning rate begins with 0.005. The maximum epoch is 50. We extracted 75 node features and 6 edge features.4.1 Performance boosted by SGCLL Layer
The experiment demonstrates the close correlation between evolving graph Laplacian and model fitting. Fig. 3 shows the 4 heat maps of graph similarity matrix , used to compute evolving graph Laplacian, at the second SGCLL layer. As shown in Fig. 4, the weighted loss dropped quickly during the epoch 520, so did the mean RMSE score. In the meanwhile, the graph Laplacians keep evolving according to gradient backpropagated from next layer. The white circles mark one of the major region of intensities on that changed significantly during the epoch 520. The connections between some pairs of node were reinforced (get lighter), while others got weakened (go darker). Besides, in Fig. 4, the two figures show that the EGCN network equipped with proposed SGCLL layer (red line) has overwhelmingly better performance in both convergence speed and predictive accuracy. We attribute this improvement to the supervised residual graph Laplacian learning scheme during training. The evolving graph Laplacians, used in spectral graph convolution, fit the data better than the fixed graph Laplacian defferrard2016convolutional ; kipf2016semi .
Datasets  Delaney solubility  AzlogD  NCI 

GCNN bruna2013spectral  0.42225 8.38  0.75160 8.42  0.86958 3.55 
NFP duvenaud2015convolutional  0.49546 2.30  0.95971 5.70  0.87482 7.50 
GCN defferrard2016convolutional  0.46647 7.07  1.04595 3.92  0.87175 4.14 
SGCLL  0.30607 5.34  0.73624 3.54  0.86474 4.67 
Datasets  Tox21  ClinTox  Sider  Toxcast  

Valid  Test  Valid  Test  Valid  Test  Valid  Test  
GCNN bruna2013spectral  0.7105  0.7023  0.7896  0.7069  0.5806  0.5642  0.6497  0.6496 
NFP duvenaud2015convolutional  0.7502  0.7341  0.7356  0.7469  0.6049  0.5525  0.6561  0.6384 
GCN defferrard2016convolutional  0.7540  0.7481  0.8303  0.7573  0.6085  0.5914  0.6914  0.6739 
SGCLL  0.7947  0.8016  0.9267  0.8678  0.6112  0.5921  0.7227  0.7033 
4.2 Prediction on Chemical Molecular Datasets
Delaney Dataset ^{1}^{1}1Delaney Dataset: http://pubs.acs.org/doi/abs/10.1021/ci034243x contains aequeous solubility data for 1,144 low molecular weight compounds. The complexest compound in the dataset has 492 atoms, while the smallest one only consists of 3 atoms. For organic compound, we set the maximum degree of node as 10. NCI chemical compound database ^{2}^{2}2NIHNCI: https://cactus.nci.nih.gov/download/nci/ has around 20,000 training compound samples and 60 prediction tasks from drug reaction experiments to clinical pharmacology studies. At last, AzlogD dataset from ADME vugmeyster2012absorption
is a set of compounds and their logD measurements correlated to permeability. The presented taskaveraged RMSE scores and standard deviations were obtained after 5fold crossvalidation.
To demonstrate our advantage, we compared it with three stateoftheart graph CNN benchmarks: the pioneering graph CNN (GCNN) bruna2013spectral , its spectral domain extension to hop (GCN) defferrard2016convolutional and neural fingerprint (NFP) duvenaud2015convolutional . In Table. 1, our network reduced the mean RMSE by 31 40 on Delaney dataset, averagely 15 on AzlogD and 24 on testing set of NCI. The improvements come from the more meaningful representations extracted by SGCLL layer. First, the hop kernel on spatial domain via Eq.(2) used to be impossible bruna2013spectral ; duvenaud2015convolutional , then reparameterization offers feature domain filter mappings that was absent in defferrard2016convolutional . Besides, our residual Laplacian learning and updating scheme does learn a better graph structure that optimally fits the learning tasks while training, which makes more sense than graphs constructed by unsupervised clustering gadde2016superpixel or separate training networks henaff2015deep ; simonovsky2017dynamic .
4.3 Multitask Classification on Pharmacological Datasets
Tox21 Dataset mayr2016deeptox we used contains 7,950 chemical compounds. It has 12 classification tasks for different essays of toxicity, however, not every sample contains all 12 labels. For those with missing labels, we excluded them when computing losses, but still kept them in train dataset. ClinTox is a public dataset of 1451 chemical compounds for clinical toxicological study together with labels for 2 tasks. Sider ^{3}^{3}3Sider Data Web: http://sideeffects.embl.de/ database records 1392 drugs and their 27 different side effect or adverse reaction. Toxcast is another toxicological research database that has 8,571 SMILES samples and the database has labels for 617 predictive tasks. For task prediction, the network graph model will become an analog of Kary tree with
leaf nodes, each of which is actually a fully connected layer followed by logistic regression to generate scores for each task
mayr2016deeptox . The displayed scores were averaged over all tasks at Table. 2. Obviously, our method greatly raises classification accuracy on both small and large datasets, and even 5 on average for 617 tasks on Toxcast dataset.5 Conclusions
We proposed a new spectral graph convolution layer that learns the residual graph Laplacians via learning optimal metric weights. The proposed EGCN can be fed by a batch of arbitrarily shaped samples on graph. For each sample, the network can individually learn the graph structure that optimally expresses the hidden nodewise connectivity. The training in a supervised fashion was driven by context of learning tasks. The extensive experiments show that our evolving graph CNN outperforms the stateofthearts on multiple datasets. In future, we plan to design a real spatial kernel of elastic kernel on graphs. Second, the implementation of SGCLL need to be remodeled and hopefully get accelerated. Another interesting work is to extend graph CNNs to applications such as natural language understanding and userbehavior prediction on social networks.
References
 (1) G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.
 (2) J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally connected networks on graphs,” arXiv preprint arXiv:1312.6203, 2013.
 (3) M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks on graphstructured data,” arXiv preprint arXiv:1506.05163, 2015.
 (4) J. Atwood and D. Towsley, “Diffusionconvolutional neural networks,” in Advances in Neural Information Processing Systems, 2016, pp. 1993–2001.
 (5) W. Chen, J. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen, “Compressing convolutional neural networks in the frequency domain,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD?16, 2016, pp. 1475–1484.

(6)
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2016, pp. 770–778. 
(7)
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in
Advances in neural information processing systems, 2012, pp. 1097–1105.  (8) A. Coates and A. Y. Ng, “Selecting receptive fields in deep networks,” in Advances in Neural Information Processing Systems, 2011, pp. 2528–2536.

(9)
D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending highdimensional data analysis to networks and other irregular domains,”
IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83–98, 2013.  (10) M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Advances in Neural Information Processing Systems, 2016, pp. 3837–3845.
 (11) T. N. Kipf and M. Welling, “Semisupervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
 (12) M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural networks for graphs,” in Proceedings of the 33rd annual international conference on machine learning. ACM, 2016.
 (13) M. Simonovsky and N. Komodakis, “Dynamic edgeconditioned filters in convolutional neural networks on graphs,” arXiv preprint arXiv:1704.02901, 2017.
 (14) B. De Brabandere, X. Jia, T. Tuytelaars, and L. Van Gool, “Dynamic filter networks,” in Neural Information Processing Systems (NIPS), 2016.
 (15) H. Dai, B. Dai, and L. Song, “Discriminative embeddings of latent variable models for structured data,” arXiv preprint arXiv:1603.05629, 2016.
 (16) A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016, pp. 855–864.
 (17) D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. AspuruGuzik, and R. P. Adams, “Convolutional networks on graphs for learning molecular fingerprints,” in Advances in neural information processing systems, 2015, pp. 2224–2232.
 (18) I. Wallach, M. Dzamba, and A. Heifets, “Atomnet: a deep convolutional neural network for bioactivity prediction in structurebased drug discovery,” arXiv preprint arXiv:1510.02855, 2015.
 (19) Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V. Pande, “Moleculenet: A benchmark for molecular machine learning,” arXiv preprint arXiv:1703.00564, 2017.
 (20) A. Mayr, G. Klambauer, T. Unterthiner, and S. Hochreiter, “Deeptox: toxicity prediction using deep learning,” Frontiers in Environmental Science, vol. 3, p. 80, 2016.
 (21) Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing,” in Advances in neural information processing systems, 2009, pp. 1753–1760.
 (22) G. Landrum, “Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling,” 2013.
 (23) H. AltaeTran, B. Ramsundar, A. S. Pappu, and V. Pande, “Low data drug discovery with oneshot learning,” arXiv preprint arXiv:1611.03199, 2016.
 (24) J. Gomes, B. Ramsundar, E. N. Feinberg, and V. S. Pande, “Atomic convolutional networks for predicting proteinligand binding affinity,” arXiv preprint arXiv:1703.10603, 2017.
 (25) J. Bruna and S. Mallat, “Invariant scattering convolution networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1872–1886, 2013.
 (26) D. Weininger, “Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules,” Journal of chemical information and computer sciences, vol. 28, no. 1, pp. 31–36, 1988.
 (27) F. Wang and J. Sun, “Survey on distance metric learning and dimensionality reduction in data mining,” Data Mining and Knowledge Discovery, vol. 29, no. 2, pp. 534–564, 2015.

(28)
A. Y. Ng, M. I. Jordan, Y. Weiss et al.
, “On spectral clustering: Analysis and an algorithm,” in
NIPS, vol. 14, no. 2, 2001, pp. 849–856.  (29) S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
 (30) M. Looks, M. Herreshoff, D. Hutchins, and P. Norvig, “Deep learning with dynamic computation graphs,” arXiv preprint arXiv:1702.02181, 2017.
 (31) Y. Vugmeyster, J. Harrold, and X. Xu, “Absorption, distribution, metabolism, and excretion (adme) studies of biotherapeutics for autoimmune and inflammatory conditions,” The AAPS journal, vol. 14, no. 4, pp. 714–727, 2012.
 (32) R. Gadde, V. Jampani, M. Kiefel, D. Kappler, and P. V. Gehler, “Superpixel convolutional networks using bilateral inceptions,” in European Conference on Computer Vision. Springer, 2016, pp. 597–613.
Comments
There are no comments yet.