1 Introduction
Artificial intelligence, including machine learning, has irreversibly changed many fields including science and engineering jordan2015machine ; kotsiantis2007supervised . In fact, the combination of artificial intelligence (AI) and big data has been referred to as the “fourth industrial revolution” schwab2017fourth . Nevertheless, machine learning tasks face several challenges.
First, while the big data challenge is well known, little attention is paid to the diverse data challenge. The success behind machine learning is that the behavior in unknown domains can be accurately estimated by quantitatively learning the pattern from sufficient training samples. However, while data sets in computer vision and image analysis often contain millions or billions of points, it is typically difficult to obtain large data sets in science
jiang2020boosting . We often deal with diverse data originating from a relatively small data set lying in a huge space. For example, due to the complexity, ethnicity, and high cost of scientific experiments shaikhina2017handling ; shaikhina2015machine ; saha2016multiple ; hudson2000neural , it is extremely difficult to collect a relatively small set of drug candidates of the order of 10 for a therapeutic target, while the size of the underlying chemical space of potentially pharmacologically active molecules is about 10 bohacek1996art . Therefore, researchers try to cover as many components as possible with a small number of sampling points. The diversity is created by deliberately sampling a wide distribution in the huge space to understand the landscape of potential drugs. This practice is very common in scientific explorations. Similar diverse data sets exist in materials design feng2019using ; zhang2018strategy . Overall, diverse data originated from a relatively small data set lying in a huge chemical space gives rise to a serve challenge for machine learning. Mathematically, diverse data involves disconnected submanifolds and/or nested submanifolds corresponding to multiphysics and multiscale natures of the diversity, respectively chen2020evolutionary ; nguyen2020review . The multiphysics and multiscale representations of data have been addressed by the authors’ earlier work on elementspecific persistent homology ZXCang:2017a ; ZXCang:2017b ; ZXCang:2017c ; cang2018representability. However, multiscale graph learning models have hardly been developed. The proposed algorithms of this paper fill the gap, addressing the multiphysics nature of data diversity through a multiphysics data representation, such as the elementspecific feature extraction developed in
ZXCang:2017a ; ZXCang:2017b ; ZXCang:2017c ; cang2018representability ; nguyen2019agl .Second, the success of many existing approaches for machine learning tasks, such as data classification, is dependent on a sufficient amount of labeled samples. However, obtaining enough labeled data is difficult as it is timeconsuming and expensive, especially in domains where only experts can determine experimental labels; thus, labeled data is scarce. As a result, the majority of the data embedded into a graph is unlabeled data, which is often much easier to obtain than labeled data but more challenging to predict. Overall, one of the key limitations of most existing approaches is their reliance on large labeled sets; in particular, deep learning approaches often require massive labeled sets to learn the patterns behind the data. These challenges call for innovative strategies to revolutionize the current stateoftheart.
Recently, algorithms involving the graphbased framework, such as those described in Section 2.1, have recently become some of the most competitive approaches for applications ranging from image processing to the social sciences. Such methods have been successful in part due to the many advantages offered by using a graphbased approach. For example, a graphbased framework provides valuable information about the extent of similarity between elements of both labeled and unlabeled data via a weighted similarity graph and also yields information about the overall structure of the data. Moreover, in addition to handling nonlinear structure, a graph setting embeds the dimension of the features in a graph during weight computations, thus reducing the highdimensionality of the problem. The graph framework is also able to incorporate diverse types of data, such as 3D point clouds, hyperspectral data, text, etc.
Inspired by the recent successes, we address the aforementioned challenges by integrating similarity graphbased frameworks, multiscale structure, modified and adapted optimization techniques and semisupervised procedures, with both labeled and unlabeled data embedded into a graph. Overall, this paper formulates two multiscale Laplacian learning (MLL) approaches for machine learning tasks, such as data classification, and for dealing with diverse data, data with limited samples and smaller data sets. The first approach, the multikernel manifold learning (MML) method, introduces multiscale kernels to manifold regularization. This approach integrates new multiscale graph Laplacians into lossfunction based minimization problems involving warped kernel regularizers. The second approach, the multiscale MerrimanBenceOsher (MMBO) method, adapts and generalizes the classical MerrimanBenceOsher (MBO) scheme merriman to a multiscale graph Laplacian setting for learning tasks. The MMBO approach also makes use of fast solvers, such as nystrom1 ; nystrom2 ; nystrom3 and anderson , for finding approximations of the extremal eigenvectors of the graph Laplacian. We validate the proposed MLL approaches using a variety of data sets.
There are several strengths of the proposed methods:

The methods address the multiscale nature of data through a multiphysics data representation, allowing them to perform well in the case of diverse data, which often occurs in, e.g., scientific applications.

The methods require less labeled training data to accurately classify a data set compared to most existing machine learning techniques, especially supervised approaches, and often in considerably smaller quantities. This is in part due to the usage of a similarity graphbased framework and the fact that the majority of the data embedded into the graph is unlabeled data. In fact, in most cases, a good accuracy can be obtained with
at most 1%5% of the data elements serving as labeled data. This is an important advantage due to the scarcity of labeled data for most applications. 
Although equally applicable and successful in the case of larger data, the new methods also perform well in the case of smaller data sets, which often result in unsatisfactory performances for existing machine learning techniques, due to an often insufficient number of labeled samples and a decreased ability for machine learningbased models to learn from the observed data.
The proposed MMBO method offers specific advantages:

Although it can perform just as successfully on smaller data, the MMBO algorithm is equipped with a structure which allows it to be easily adapted and designed specifically for the use of large data. In particular, in the case of large data, one can use a slight modification of the fast Nyström extension procedure nystrom1 ; nystrom2 ; nystrom3
to compute an approximation to the extremal eigenvectors of the multiscale graph Laplacian using a dense graph without the need to compute all the graph weights; in fact, only a small portion of the weights need to be calculated. Overall, the method uses a lowdimensional subspace spanned by only a small number of eigenfunctions.

Once the eigenvectors of the graph Laplacian are computed, the complexity of this algorithm is linear. Moreover, the Nyström extenstion procedure allows the eigenvectors of the graph Laplacian to be computed using only operations, where and is the number of data elements.
The paper is organized as follows. In Section 2, we present background, previous work and an overview of graph learning methods. In Section 3, we derive the proposed MML and MMBO methods and provide details on the computation of eigenvectors of the graph Laplacian for the latter method. The results from experiments are described in Section 4, and we present a conclusion in Section 5.
2 Background
2.1 Previous work
In this section, we review recent graphbased methods for data classification and semisupervised learning, including approaches related to convolutional neural networks, support vector machines, neural networks, label propagation, embedding methods, multiview and multimodal methods.
Convolutional neural networks have recently been extended to a graphbased framework for the purpose of semisupervised learning. In particular,
conv1 presents a scalable approach using graph convolutional networks by integrating a convolutional architecture motivated by a localized firstorder approximation of spectral graph convolutions. Deeper insights into the graph convolutional neural network model are discussed in conv3 . Moreover, a dual graphbased convolutional network approach is described in conv2 , while a Bayesian graph convolutional network procedure is derived in conv4 . In conv5 , a multiscale graph convolution model is presented. In conv6 , generalizations of convolutional neural networks to signals defined on more general domains using two constructions are described; one of the generalizations is based on the spectrum of the graph Laplacian.Neural networks have also been extended to a graphbased framework for the task of semisupervised learning. For example, attentionbased graph neural networks the , graph partition neural networks gpnn , and graph Markov neural networks qu have been proposed.
Moreover, support vector machines are also applied to semisupervised learning using a graphbased framework. In
svm1 , graphbased support vector machines are derived to emphasize low density regions. Also, Laplacian support vector machines (LapSVM) svm2 ; belkin2006manifold and Laplacian twin support vector machines (LapTSVM) qi have been formulated.Label and measure propagation methods are discussed in, e.g., iscen , where the authors derive a transductive label propagation method that is based on the manifold assumption. Label propagation techniques and the use of unlabeled data in classification are investigated in zhu . Dynamic label propagation is studied in dynamic , while semisupervised learning with measure propagation is described in sub .
Embedding methods are also used for semisupervised learning. Nonlinear embedding algorithms for use with shallow semisupervised learning techniques, such as kernel methods, are applied to deep multilayer architectures in weston . Other graph embedding methods are presented in yang .
Multiview and multimodal methods include nie , which proposes a reformulation of a standard spectral learning model that can be used for multiview clustering and semisupervised tasks. The work nie2 proposes novel multiview learning, while gong describes multimodal curriculum learning.
Other techniques for graphbased semisupervised learning include fast anchor graph regularization wang
, a Bayesian framework for learning hyperparameters
kapoor , and random subspace dimensionality reduction. In goldberg , a classification method is proposed to learn from dissimilarity and similarity information on labeled and unlabeled data using a novel graphbased encoding of dissimilarity. Random graph walks are used in lin , and sampling theory for graph signals is utilized in gadde . In greedy , a bivariate formulation for graphbased semisupervised learning is shown to be equivalent to a linearly constrained maxcut problem. Lastly, reproducing kernel Hilbert spaces are used in sind .Various approaches involving graphbased regularization terms include regularization frameworks zhou:bousquet:lal ; zhou:scholkopf , regularization developments chapelle:scholkopf:zien , anchor graph regularization wang , manifold regularization belkin2006manifold , measure propagation sub , approximate energy minimization boykov1 , nonlocal discrete regularization elmoataz:lezoray:bougleux , power watershed couprie:grady:najman , spectral matting levin:acha:lischinski , Laplacian regularized least squares sindhwani2005beyond , locality and similarity preserving embedding fang2014 , and clustering nie2014 . Examples for graph Laplacian regularization include label propagation zhu and deep semisupervised embedding weston .
Merkurjev and coauthors have studied graphbased spectral approaches merkurjev ; garcia ; merkurjev_aml ; merkurjev2 ; gloria ; merkurjev_pagerank ; gerhart ; merkurjev_cut using GinzburgLandau techniques and modifications of the MBO scheme merriman , which is an efficient method for evolving an interface by mean curvature in a continuous setting and which can be linked to optimization problems involving the GinzburgLandau functional. Specifically, the MBO scheme can be derived from a GinzburgLandau functional minimization procedure, and can be modified and transferred to a graph setting using more general operators on graphs, as shown in Merkurjev’s work on data classification merkurjev ; garcia ; merkurjev2 ; gloria ; merkurjev_aml .
Overall, Merkurjev and coauthors have shown that multiclass data classification can be achieved using techniques from topological spaces and the Gibbs simplex garcia ; merkurjev_aml . In particular, MBOlike methods were developed for image processing applications merkurjev , hyperspectral imaging merkurjev2 ; gerhart , Cheeger and ratio cut applications merkurjev_cut , heat kernel pagerank applications merkurjev_pagerank
, and unsupervised learning
gloria . The subject of this paper is to integrate elements of this prior work, prior work on manifold learning and novel graphbased formulations into a multiscale framework to develop new multiscale graphbased methods for machine learning tasks, such as data classification. Our methods will be able to deal with a variety of scales present in many data sets.2.2 Graphbased framework
The methods presented in this paper use a similarity graph framework consisting of a graph , where is a set of vertices associated with the elements of the data set, and is a set of edges connecting some pairs of vertices. The edges are weighted by a weight function , where measures the degree of similarity between and . Larger values indicate similar elements and smaller values indicate dissimilar elements. Naturally, the embedding of data into a graph depends greatly on the edge weights. This section provides more details about graph construction, but the exact manner of weight construction for particular data sets is described in Section 4.
The use of the graphbased framework offers many advantages. First, it provides valuable information about the extent of similarity between pairs of elements of both labeled and unlabeled data via a weighted similarity graph and also yields information about the overall structure of the data. This reduces the amount of labeled data needed for good accuracy. Moreover, a graphbased setting embeds the dimension of the features in the graph during weight computations, thus reducing the highdimensionality of the problem. It also provides a way to handle nonlinearly separable classes and affords the flexibility to incorporate diverse types of data. In addition, in image processing, the graph setting allows one to capture texture more accurately.
The exact technique of computing the similarity value between two elements of data depends on the data set, but first involves feature (attribute) vector construction and a distance metric chosen specifically for the data and task at hand. For example, for hyperspectral data, one may choose the feature vector to be the vector of intensity values in its many bands and the distance measure to be the cosine distance. For 3D sensory data, one can take the feature vector to contain both geometric and color information; the weights can be calculated using a Gaussian function incorporating normal vectors, e.g., bae_merkurjev . For text classification, popular feature extraction methods include term frequency inverse document frequency and bagofwords, both described in bag . For biological data tasks, such as protein classification, persistent homology cang2018representability can be used for feature construction.
Once the features are constructed, the weights are computed. Popular weight functions include the ZelnikManor and Perona function zelnik and the Gaussian function luxberg :
(1) 
where represents a distance between feature vectors of data elements and , and . Using the weight function , one can construct a weight matrix defined as , and define the degree of a vertex as . If is the diagonal matrix with elements , then the graph Laplacian is defined as
(2) 
It is sometimes beneficial to use normalized versions of the graph Laplacian, such as a symmetric graph Laplacian luxberg .
For some data, it is more desirable to compute the weights directly by calculating pairwise distances. In this case, the efficiency can be increased by using parallel computing or by reducing the dimension of data. Then, a graph is often made sparse using, e.g., thresholding or a nearest neighbors technique, resulting in graph where most of the edge weights are zero. Thus, the number of needed computations is reduced. Overall, a nearest neighbor graph can be computed efficiently using the tree code of VLFeat library vlfeat . In particular, for the nearest neighbor technique, vertices and are connected only if is among the nearest neighbors of or if is among the nearest neighbors of . Otherwise, is set to .
For very large data sets, one can efficiently construct an approximation to the full graph using e.g. samplingbased approaches, such as the fast Nyström Extension method nystrom1 .
2.3 Semisupervised setting
Despite the tremendous accomplishments of machine learning, its success depends on a sufficient amount of labeled samples. However, obtaining enough labeled data is difficult as it is timeconsuming and expensive. Therefore, labeled data is scarce for most applications.
However, unlabeled data is usually easier and less costly to obtain than labeled data. Therefore, it is advantageous to use a semisupervised setting, which uses both labeled and unlabeled data to construct the graph in order to reduce the amount of labeled data needed for good accuracy. In fact, the use of unlabeled data for graph construction allows one to obtain structural information of the data. Overall, for most graphbased semisupervised methods, the majority of data embedded into a graph is unlabeled data. This paper derives methods which use a semisupervised setting of this kind.
3 Methods
3.1 Background and related graph Laplacian methods
3.1.1 Manifold learning
For the derivation of the MML method, let be the number of classes, be the set of labeled vertices, and be the set of unlabeled vertices. We assume that is drawn from the joining distribution on , while is drawn from the marginal distribution of . We also assume that the conditional distribution varies smoothly in the intrinsic geometry generated by , where and .
In graphbased methods, information about labeled data and the geometric structure of the marginal distribution of the unlabeled samples is incorporated into the problem:
(3) 
where the Mercer kernel uniquely defines a reproducing kernel Hilbert space (RKHS) with the corresponding norm , is a loss function which gives rise to different types of regularization problems, , , and is an additional regularizer that reflects the intrinsic geometry of . The solution to (3) can be described using the classical representer theorem representer :
(4) 
where is the support of the marginal distribution belkin2006manifold .
In practice, that marginal distribution is unknown. In spite of that, one could empirically estimate by making use of the weighted graph as discussed in Section 2.2. With the predefined graph Laplacian matrix , the manifold regularizer can be empirically estimated belkin2006manifold as
(5) 
where .
The ambient norm and the intrinsic norm in (3) can be integrated in one term under the warped kernel sindhwani2005beyond . This kernel defines an alternative reproducing kernel Hilbert space by considering a modified inner product:
(6) 
where is a positive semidefinite matrix defined on labeled and unlabeled data, and . With , the warped kernel is shown in sindhwani2005beyond to have the following representation:
(7) 
where is the Gram matrix with , denotes the vector , and denotes the vector .
The regularization problem for the warped kernel is:
(8) 
Problem (8) exploits the intrinsic geometry of via the datadependent kernel but still makes use of the classical regularization solvers. In fact, the classical representer theorem representer allows in (8) to be expressed as:
(9) 
In practice, are numerically determined by an appropriate optimization solver, e.g., cortes1995 .
3.1.2 MBO reduction
For the derivation of the MMBO method, we first note that a typical learning algorithm involves finding an optimal label matrix associated with data elements, where
represents the probability distribution over the classes for data element
; the row of is set to . The vector is an element of the Gibbs simplex:(10) 
where is the number of classes. Moreover, the vertex of the simplex is given by the unit vector .
A general form of a graphbased problem for data classification is the minimization of , where is the data classification function, is a graphbased regularization term incorporating the graph weights, and is a term incorporating labeled points.
Not surprisingly, the choice of the regularization term has nontrivial consequences in the final accuracy. In garcia , Garcia et al. successfully take for the regularization term a multiclass graph based GinzburgLandau (GL) functional:
(11) 
Here, , is a normalized graph Laplacian, is the number of classes, , is the row of , is a vector indicating prior class knowledge of , is an indicator vector of size with one in the component and zero elsewhere, and is a parameter that takes the value of if is a labeled data element and zero otherwise. The variable if is a labeled element of class . The first (smoothing) term in (11) measures variations in the vector field, while the second (potential) term in (11) drives the system closer to the vertices of the simplex. The third (fidelity) term enables the incorporation of labeled data.
While it is possible to develop a convex splitting scheme to minimize the graphbased multiclass GL energy garcia , a more efficient technique involves MBO reduction. Specifically, if one considers the minimization of the GL functional plus a fidelity term (consisting of a fit to elements of known class) in the continuous case, one can apply gradient descent resulting in a modified AllenCahn equation. If a timesplitting scheme is then applied, one obtains a procedure where one alternates between propagation using the heat equation with a forcing term and thresholding. In such a state, the resulting procedure has similar elements to the MBO scheme mbo , which evolves an interface by mean curvature, in a continuous, rather than graphbased, setting. The procedure can then be transferred to a graphbased setting using merkurjev ; garcia ; merkurjev_aml . Moreover, in order for the scheme to be applicable to the multiclass case, one can convert the thresholding operation to the displacement of the vector field variable towards the closest vertex in (10) merkurjev ; garcia ; merkurjev_aml .
3.2 The derivation of the multiscale setting and the proposed methods
3.2.1 Multiscale graph Laplacian operator
The dominance of multiscale information over the single one has been proved in various biophysicrelated works, such as those involving thermal fluctuation predictions opron ; xia and binding affinity predictions nguyen . Therefore, it is promising to explore how the multiscale approach can improve the accuracy of graphbased data classification. We examine a novel multiscale graph Laplacian in the form of
(12) 
where , , and is an extended Laplacian matrix defined by , where is a degree matrix, and is an extended adjacent graph edge matrix
(13) 
where and is the order Hermite polynomial. Usually, only two or three multiscale Laplacian terms in (12), i.e., or , are needed to obtain a significant improvement in accuracy; by setting and , one can restore the regular graph Laplacian discussed in (2). In this formulation, is automated scale filtration parameter that controls the shape of a submanifold for a data set, while weighs contributions from different scales. The parameters and may vary for different Hermite polynomials.
In case of large data for which computing all the graph weights can be computationally expensive, one can use the Nyström extension method nystrom1 ; nystrom2 ; nystrom3
to compute approximations to the few smallest eigenvalues and corresponding eigenvectors of the multiscale graph Laplacian while calculating only a small fraction of the graph weights. We will modify the Nyström procedure to incorporate the new multiscale graph Laplacian
. In this case, the weights in the procedure are computed using(14) 
where, in most cases, or is enough to obtain a significant accuracy improvement.
3.2.2 Multikernel manifold learning (MML) scheme
In multikernel manifold learning (MML), the multiscale Laplacian matrices proposed in (12) is employed to form nearest neighbors subgraphs. By setting in (7), we attain an MML scheme enabling the reconstruction of the regularization problem presented in (3). Even with the integration of multiscale Laplacian operator into the data kernel, the manifold learning algorithms still retains its classical representation as presented in (8). One, therefore, could utilize traditional solvers to derive the multiscale manifold learning’s optimizer sindhwani2005beyond . The MML procedure is summarized as Algorithm 1.
3.2.3 Multiscale MBO (MMBO) scheme
Our proposed MMBO scheme uses a semiimplicit approach where the multiscale Laplacian term is computed implicitly due to the stiffness of the operator which is caused by a wide range of its eigenvalues. An implicit term is needed since an explicit scheme requires all scales of eigenvalues to be resolved numerically.
To derive the MMBO scheme, let
represent a matrix where each row is a probability distribution of each data element over the classes and let
represent the row of . In addition, let be the number of data set elements, be the number of classes, , and be a vector which takes a value in the place if is a labeled element and otherwise. Moreover, let be the following matrix: for rows corresponding to labeled points, the entry corresponding to the class of the labeled point is set to 1. All other entries of the matrix are set to 0. Lastly, let indicate rowwise multiplication by a scalar.As described in Section 3.1.2, if one considers the minimization of a GL functional plus a fit to elements of known class in the continuous case, an gradient descent results in a modified AllenCahn equation. If a timesplitting scheme is then applied, one obtains a procedure where one alternates between propagation using the heat equation with a forcing term and thresholding. The scheme can then be transferred to a graphbased setting and the Laplace operator can be replaced by a graphbased multiscale Laplacian. The thresholding can be changed to the displacement of the variable towards the closest vertex in (10). A projection to the simplex is then necessary before the displacement step.
Our proposed MMBO procedure thus consists of the following procedure. Starting with an initial guess for , obtain the next iterate of via the following three steps:

Multiscale heat equation with a forcing term: ,
where is a vector which takes a value in the place if is a labeled element and otherwise, and indicates rowwise multiplication by a scalar. 
Projection to simplex: Each row of is projected onto the simplex using chen .

Displacement: , where is the row of , and is the indicator vector (with a value of 1 in the place and 0 elsewhere) associated with the vertex in the simplex closest to the row of the projected from step 2.
This implicit scheme allows the evolution of to be numerically stable regardless of the time step , in spite of the “stiffness” of the differential equations which could otherwise force to be impractically small.
One can compute very efficiently using spectral techniques and projections onto a lowdimensional subspace spanned by a small number of eigenfunctions in the following manner, where is the identity:
(15) 
where , is an truncated matrix retaining only smallest eigenvectors of the multiscale graph Laplacian , and is a diagonal matrix retaining the smallest eigenvalues of along the diagonal.
The proposed MMBO procedure is detailed as Algorithm 2. It is important to note that in the MMBO method, the graph weights are only used to compute the few eigenvectors and eigenvalues of the multiscale graph Laplacian, and the multiscale MMBO procedure themselves do not involve graph weights. This crucial property allows one to use the Nyström extension procedure nystrom1 ; nystrom2 ; nystrom3 to approximate the extremal eigenvectors of the Laplacian by only computing a small portion of the graph weights; this enables one to apply the multiscale models very efficiently on large data.
For initialization, the rows of corresponding to labeled points are set to the vertices of the simplex corresponding to the known labels, while the rows of corresponding to the rest of the points initially represent random probability distributions over the classes.
The energy minimization proceeds until a steady state condition is reached. The final classes are obtained by assigning class to node if is closest to vertex on the Gibbs simplex. Consequently, the calculation is stopped when, for a positive constant ,
(16) 
In regards to computational complexity, in practice, once the eigenvectors of the graph Laplacian are computed, the complexity of the MMBO scheme is linear in the number of data elements . In particular, let be the number of classes and be the number of terms in the multiscale Laplacian (12). Usually, or is enough to obtain a good accuracy. Then, one needs operations for the multiscale heat equation with a forcing term, operations for the projection to the simplex and operations for the displacement step. Moreover, nystrom1 ; nystrom2 ; nystrom3 allows one to compute the eigenvectors of the multiscale graph Laplacian using operations. Since and , in practice, the complexity of this method is linear.
3.3 Computation of eigenvalues and eigenvectors of the multiscale graph Laplacian
The MMBO method requires one to compute a few of the smallest eigenvalues and the corresponding eigenvectors of the multiscale graph Laplacian to form . We examine and use three techniques for this task. Nyström extension nystrom1 ; nystrom2 ; nystrom3 is the preferred method for very large data.
3.3.1 Nyström extension for fully connected graphs
Nyström extension nystrom1 ; nystrom2 ; nystrom3 is a matrix completion method, and it performs faster than many other techniques because it computes approximations to eigenvectors and eigenvalues using much smaller submatrices of the original matrix.
Note that if is an eigenvalue of , then is an eigenvalue of the symmetric Laplacian , and the two matrices have the same eigenvectors. Thus, one can use Nyström extension to calculate approximations to the eigenvectors of and thus of .
Now, consider the problem of approximating the extremal eigenvalues and eigenvectors of a full graph and let . Nyström extension nystrom1 ; nystrom2 ; nystrom3 approximates the eigenvalue equation using a quadrature rule and
randomly chosen interpolation points from
, which represents data elements. Denote the set of randomly chosen points by and its complement by . Partitioning into and letting be the the eigenvector of and be its associated eigenvalue, we obtain:(17) 
This system cannot be solved directly since the eigenvectors are unknown; thus, the eigenvectors of are approximated using much smaller submatrices of .
The efficiency of Nyström extension lies with the following fact: when computing the eigenvalues and eigenvectors of an matrix, where is large, the algorithm approximates them using much smaller matrices, the largest of which has dimension , where . In particular, when the method is applied to or , only a small portion of the weight matrix or needs to be computed. In our experience, or were good choices.
If the number of scales is , the complexity of the Nyström procedure is , which is linear in .
3.3.2 RayleighChebyshev method
The RayleighChebyshev method anderson is a fast algorithm for finding a small subset of eigenvalues and eigenvectors of sparse symmetric matrices, such as a symmetric graph Laplacian which can be made sparse using techniques such as nearest neighbors. The method is a modification of an inverse subspace iteration procedure and uses adaptively determined Chebyshev polynomials.
3.3.3 A shifted block Lanczos algorithm
A shifted block Lanczos algorithm lanczos , as well as other variations of the Lanczos method old that is an adaptation of power methods, are efficient techniques for solving sparse symmetric eigenproblems and for finding a few of the extremal eigenvalues. They can be used to find a subset of the eigenvalues and eigenvectors of the symmetric graph Laplacian which can be made sparse using nearest neighbors.
4 Results and discussion
4.1 Data sets
In this work, we validate the proposed MML and MMBO methods against three common data sets:

G50C is an artificial data set inspired by grandvalet2005semi
and generated from two normal covariance Gaussian distributions. This data set has 550 data points located in
and two labels . 
USPST data set includes images of handwritten digits taken from the USPS test data set. This data has 2007 images to be classified into ten labels corresponding to ten numbers from 0 to 9.

MacWin data set categorizes documents, taken from 20Newsgroups data, into 2 classes: mac or windowsszummer2002partially . This set has 1946 elements and each element is represented by a vector in .

WebKB data set is taken from the web documents of the CS department of four universities and has been used extensively. It has 1051 data samples and two labels: course and noncourse. There are two ways to describe each web document: the textual content of the webpage (called page representation), and the anchor text on hyperlinks pointing from other webpages to the current one. The data points with page representation are in , while the ones with link representation belong to . When we combine two different kinds of representations, we achieve the data points in .

protein data set consists of three different protein domains, namely alpha proteins, beta proteins, and mixed alpha and beta proteins, classified based on protein secondary structures ZXCang:2015 . This data has 900 biomolecules, and each family has 300 instances.
The details of the data sets are outlined in Table 1.
Data set  No. of classes  Sample dim.  No. of data elements  No. of labeled data 

G50C  2  50  550  50 
USPST  20  256  2007  50 
MacWin  2  7511  1946  50 
WebKB (page)  2  3000  1051  12 
WebKB (link)  2  1840  1051  12 
WebKB (page+link)  2  4840  1051  12 
protein  3  50  900  720 
4.2 Hyperparameters selection
In the MMBO setting, for each data point, we do not compute the complete graph but instead construct a nearest neighbor graph for the calculation efficiency. The parameter is one of the hyperparameters and is selected on a case by case basis. Moreover, as discussed in Section 2.2, the weight function used is the Gaussian kernel . Here, the scalar is optimized so that it perfectly fits the labeled set information. In the multiscale approach, each kernel is assigned different values depending on the outcome of hyperparameter selection. Overall, due to the random initialization of the nonlabeled points, we use the same random seed for all the experiments in this work for reproducible purposes.
The Nyström extension method nystrom1 ; nystrom2 ; nystrom3 allows for fast computations even in case of larger data since this approach approximates the eigenvalues and eigenvectors of the origin matrix using much smaller matrices randomly selected from the bigger ones. Thus, only a small portion of the graph weights need to be computed. However, in case of smaller data, it is often more advantageous to use methods such as anderson which can directly compute the eigenvalues and eigenvectors. Therefore, to obtain optimal results, we employ the RayleighChebyshev procedure anderson (see Section 3.3.2) for our experiments. This method is wellknown for efficiently calculating the smallest eigenvectors of a sparse symmetric matrix. The hyperparameters of the MMBO models are the number of leading eigenvalues (), the time step for solving heat equation (), the constraint constant on fidelity term (), and the number of iterations ().
The hyperparameter selection for MML model is carried out in a similar fashion as that of the MMBO algorithm. The tunning parameters are: the number of nearest neighbors (), the scaler factor (), the penalty coefficient (), the manifold regularizer constraint (), and the Laplacian degree (). The optimizer is solved using the primal SVM solver svm2 . The optimal hyperparameters of the proposed methods are documented in the Supporting Information.
4.3 Performance and discussion
4.3.1 Nonbiological data sets
The nonbiological data sets we used for our experiments are the G50C, USPST, MacWin, and WebKB data sets. In the experiments involving these data sets, we utilize the original representations without carrying out any feature generation procedures. In addition, following the previous work svm1 ; sind
, we only consider accuracy as the main evaluation metric for the G50C, USPST, and MacWin data sets, and compute the Precision/Recall Breakeven Point (PRBEP) for the WebKB data set due to its imbalanced labeling.
In all cases, the results of the proposed MML and MMBO methods show promising improvements from nonmultiscale frameworks. Specifically, the best performances of the algorithms are achieved with three kernels. In particular, there is a significant accuracy improvement from single kernel to two kernel architectures on the USPST data (from 86.11% to 90.57% for the MML model, and from 86.55% to 88.65% for the MMBO model) and MacWin data (from 89.98% to 90.01% for the MML model, and from 92.06% to 93.49% for the MMBO model). The improvement from single kernel to multikernel learning is less for the G50C and WebKB data, but that is to be expected since G50C is a small data set consisting of 550 samples. Furthermore, it is an artificial data set drawn from two unit covariance normal distributions. As a result, a single kernel is enough to capture the crucial structure of data. Moreover, the WebKB data poses a challenge for multiscale learning due to its imbalanced data.
In almost all experiments, the proposed models obtain the best results. In particular, in most experiments, the proposed MMBO model obtains the best results, all with threekernel learning, while the proposed MML model obtains the best result for USPST. In particular, for G50C, the MMBO method achieves the best accuracy (95.06%), but the MML method is still comparable with its accuracy being 94.56%. Moreover, the superior performance of our proposed algorithms over the stateoftheart models is also displayed in the case of the more complex USPST data, a set of handwritten digit images with 1440 samples. While the proposed MML algorithm obtains the best accuracy at 90.57%, the MMBO method with threekernel information still obtains a good accuracy of 88.73%. The other published approaches, such as LapRLS, obtain lower accuracies. For MacWin, our multiscale models perform slightly lower than TSVM (94.3%) svm1 and LDS (94.9%) svm1 ; the fact that there are only 1966 samples but the dimension of each sample is very high, i.e., 7511, might indicate noisy information which can reduce the performances of graphbased kernel models. For WebKB, our proposed methods perform extremely well. WebKB is the last data set in this category and has three different feature representations, namely, link, page and page+link. The overall performance of our proposed models is very encouraging. We see that using only one kernel already produces great results, with a little improvement in using multiple kernels. The best model is the MMBO method with 3 kernels which obtains a PRBEP at 96.22%, 97.93%, and 98.87% for the link, page, and page+link experiments, respectively. The MML method obtains the next best result with a PRBEP of 95.75%, 95.81%, and 95.84% for the link, page, and page+link experiments, respectively. After the proposed MMBO and MML methods, the next best result is obtained by LapSVM: 94.3%, 93.4%, and 94.9%.
The results for nonbiological data sets are shown in Figure 1. In most experiments with nonbiological data, the proposed MMBO method is clearly the most dominant. The other proposed model, the MML method, is the second best model with promising performances.
4.3.2 Alpha and beta protein classification
We also tested the proposed multiscale learning models using biological data, such as data involving protein classification. In this data, based on the secondary structure, proteins are typically grouped into three classes, namely alpha helices, beta sheets, and mixed alpha and beta domains. Figure 2 plots the secondarystructure representations of 3 types of protein structures. The data, which consists of 900 structures equally distributed into three classes, was collected by Cang et al ZXCang:2015 and taken from SCOPe (Structural Classification of Proteinsextended), an online database Fox:2014 .
Fivefold cross validation is conducted to examine the performance of the proposed models. To preserve the unbiased information, in each fold, the test set consisted of 180 instances with 60 samples from each group. Overall, the protein data sets originally provide the coordinates and atom types for each structure. However, feature generation is needed to translate such information to a vector format suitable for machine learning algorithms. Moreover, for this data, the feature generation has to sustain crucial physical and chemical interactions such as covalent and noncovalent bonds, electrostatic, hydrogen bonds, etc. In the past few years, we have developed numerous mathematicalbased feature engineering models including geometric and algebraic graph DDNguyen:2017d ; nguyen2019agl , differential geometry nguyen2019dg , persistent homology cang2018representability , and persistent graph wang2020persistent for representing 3D molecular information in low dimensional representations.
We employ our geometric graph representation in DDNguyen:2017d . In order to represent the physical and chemical properties of a biomolecule, we consider four atom types, namely C, C, N, and O. In particular, the protein structures are described by vectors of 50 components. Overall, the details of the parameters for the feature generated approach is provided in the Supporting Information.
Both MMBO and MML models perform well. Moreover, similarly to previous experiments, multiscale information strengthens the accuracy of both MML and MMBO approaches. In fact, there is an encouraged improvement from the one kernel model to the two kernel model , i.e., 84% to 85% accuracy for the MMBO model. There is also an improvement in the MMBO method results using three kernels, i.e. 85.11%. For the MML method, there is a slight improvement by using multiple kernels. For this data, the MMBO method outperforms its counterpart, which indicates the versatility of the MMBO algorithm when dealing with a variety of data. All results are presented in Figure 3.
4.4 Comparison Algorithms
We compare our algorithms to many recent methods, most of which are from 2015 and later.
For WebKB data, we compare classification accuracy against recent methods such as semisupervised multiview deep discriminant representation learning (SMDDRL) webkb__2 , vertical ensemble cotraining (VECoT) webkb__3 , autoweighted multiple graph learning (AMGL) webkb__4 , multiview learning with adaptive neighbors (MLAN) webkb__5
, deep canonically correlated autoencoder (DCCAE)
webkb__7 , multiview discriminative neural network (MDNN) webkb__8 , semisupervised learning for multiple graphs by gradient flow (MGSC) webkb__9 , multidomain classification w/ domain selection (MCS) webkb__10 , multiview semisupervised learning (FMSSL, FMSSLK) webkb__11 , and semisupervised multimodal deep learning framework (SMDLF) webkb__6 . Our results are obtained using 105 labels, and using the classification accuracy metric. Results for SDMDRL, VECoT, AMGL, MLAN, SMDLF, DCCAE and MDNN are from webkb__2 , the results for MGSC and MCS are from webkb__9 , and the results for FMSSL and FMSSLK are from webkb__11 . All methods use 105 labels.For USPST, we compare against recent methods such as transductive minimax probability machines (TMPM) uspst__1 , semisupervised extreme learning machines (SSELM) uspst__4 , graph embeddingbased dimension reduction with extreme learning machines (GDRELM) uspst__6 , extreme learning machine autoencoder (ELM AE) uspst__7 , extreme learning machine autoencoder with invertible functions (ELM AEIF) uspst__8 and extreme learning machines for dimensionality reduction (SRELM) uspst__9 . Our results are obtained using only 50 labels. The results for TMPM (with 50 labels) are from uspst__1 , the results for GDRELM, ELMAE, ELMAEIF and SRELM (with 150 labels) are from uspst__6 , and the result for SSELM (with 100 labels) are from uspst__4 .
For G50C, we compare against recent methods such as classtering (CLSST) g50c__1 , semisupervised broad learning system (SS BLS) g50c__2 , classification from positive and unlabeled data (PNU) g50c__3 , classification from unlabeled positive and negative data (PUNU) g50c__3 , semisupervised extreme learning machines (SSELM) uspst__4 , semisupervised hierarchical extreme learning machine (SSHELM) uspst__2 , safe semisupervised support vector machines (S4VM) g50c__4 , robust and fast transductive support vector machines (RTSVM, RTSVM LDS) g50c__5 . Our results are obtained using 50 labels. The result for CLSST is from g50c__1 , the results for SSBLS, SSELM and SSHELM are obtained from g50c__2 , the results for PNU, PUNU and S4VM are obtained from g50c__3 , and the results for RTSVM and RTSVMLDS are obtained from g50c__5 . All comparison methods use 50 labels.
For MacWin, we compare against recent methods such as support vector machines with manifold regularization and partially labeling privacy protection (SVMMR&PLPP) macwin__2 and a scalable version (SSVMMR&PLPP) macwin__2 . These results are obtained from macwin__2 . All comparison methods and the proposed algorithms use 50 labels, the same number of labeled samples as the proposed methods.
We also compare results for all data sets with slightly older methods such as transductive graph methods (GraphTrans), closely related to all__2 ; zhou:bousquet:lal ; all__3 , transductive support vector machines (TSVM) all__1 , support vector machines on a graphdistance derived kernel (Graphdensity) svm1 , TSVM by gradient descent (TSVM) svm1 , low density separation (LDS) svm1 , Laplacian support vector machines (LapSVM) sind and Laplacian regularized least squares (LapRLS) sind . For WebKB, we use the PRBEP metric when comparing against these methods. The results for all older methods, except LapSVM and LapRLF, are obtained from svm1 , the results for LapSVM and LapRLF are from sind . All comparisons with older methods use the same number of labeled samples as the proposed methods: 12 labels for WeBKB and the PRBEP metric, and 50 labels for the rest of the data.
Data set  Size of  Sample  Timing  Timing 

data set  dimension  (Construction of graph  (MMBO procedure)  
and eigenvectors)  
G50C  550  50  0.02 seconds  0.31 seconds 
USPST  1440  1024  1.41 seconds  1.52 seconds 
MacWin  1946  7511  9.8 seconds  1.17 seconds 
WebKB (page)  1051  3000  1.04 seconds  0.60 seconds 
WebKB (link)  1051  1840  0.67 seconds  0.60 seconds 
WebKB (page+link)  1051  4840  1.58 seconds  0.60 seconds 
protein  900  50  0.18 seconds  1.96 seconds 
Data set  Size of  Sample  Timing  Timing 

data set  dimension  (Deformed Kernel)  (Optimization)  
G50C  550  50  0.039 seconds  0.001 seconds 
USPST  1440  1024  0.24 seconds  0.003 seconds 
MacWin  1946  7511  4.51 seconds  0.002 seconds 
WebKB (page)  1051  3000  0.21 seconds  0.02 seconds 
WebKB (link)  1051  1840  0.15 seconds  0.01 seconds 
WebKB (page+link)  1051  4840  0.28 seconds  0.02 seconds 
protein  900  50  0.05 seconds  0.02 seconds 
4.5 Efficiency
The proposed MML and MMBO procedures are very efficient. The timing results are listed for all data sets in Table 2 (for MMBO) and Table 3 (for MML).
The timing of the proposed MMBO method is divided into two parts: (1) the timing for the construction of the graph weights and the calculation of the extremal eigenvectors of the multiscale graph Laplacian, and (2) the timing of the MMBO procedure. From Table 2, one can see that the proposed MMBO procedure takes under 2 seconds for all data sets, and the graph construction and computation of the eigenvectors takes little time as well.
The timing of the proposed MML method consists of two categories: (1) the timing for the construction of the warped kernels, and (2) the timing of the optimizer. One can see from Table 3 that the procedure of generating the multiscale graph and the warped kernel is the most timeconsuming step of the MML algorithm, but it is still under 5 seconds when handling the MacWin data set having 1946 samples with a feature dimension of 7511. For other data sets, the MML method takes under 0.3 seconds to formulate the multiscale graph and the warped kernel. Due to the simplified version of the optimizer of the MML method, one can directly use the standard solver of SVM for the MML algorithm. This procedure is extremely fast and needs no more than 0.03 seconds to complete the task for all experiments. The computations were performed on a personal laptop 2.4 GHz 8Core Intel Core i9.
5 Conclusion
This work presents several methods for machine learning tasks and for dealing with some of the challenges of machine learning, such as data with limited samples, smaller data sets, and diverse data, usually associated with small data sets or data related to areas of study where the size of the data sets is constrained by the complexity and/or high cost of experiments. In particular, we integrate graphbased techniques, multiscale structure, adapted and modified optimization procedures and semisupervised frameworks to derive two multiscale Laplacian learning (MLL) approaches for machine learning tasks, such as data classification.
The first approach introduces a multiscale kernel representation to a manifold learning technique and is called the multikernel manifold learning (MML) algorithm .
The second approach combines multiscale analysis with an interesting adaptation and modification of the famous classical MerrimanBenceOsher (MBO) scheme, originally intended to approximate motion by mean curvature, and is called the multiscale MBO (MMBO) algorithm.
The performance of the proposed MLL approaches is favorably compared to existing recent and related approaches through experiments on a variety of data sets. The two new MLL methods form powerful techniques for dealing with some of the most important challenges and tasks in machine learning and data science.
Supporting Information
We present the optimal hyperparameters of the proposed MMBO and MML methods for all experiments conducted in this work in Online Resource: Supporting Information.
Availability
The source code for the proposed MMMBO and MML methods is available at Github: https://github.com/ddnguyenmath/MultiscaleLaplacianLearning.
Conflict of interest
The authors declare that they have no conflict of interest.
References
 (1) A Generalized Representer Theorem. https://alex.smola.org/papers/2001/SchHerSmo01.pdf
 (2) Quick Introduction to BagofWords (BoW) and TFIDF for Creating Features from Text. https://www.analyticsvidhya.com/blog/2020/02/quickintroductionbagofwordsbowtfidf/
 (3) VLFeat Library. https://www.vlfeat.org
 (4) AbuElHaija, S., Kapoor, A., Perozzi, B., Lee, J.: NGCN: Multiscale graph convolution for semisupervised node classification. arXiv preprint arXiv:1802.08888 (2018)
 (5) Anam, K., AlJumaily, A.: A novel extreme learning machine for dimensionality reduction on finger movement classification using sEMG. In: International IEEE/EMBS Conference on Neural Engineering, pp. 824–827. IEEE (2015)
 (6) Anderson, C.: A RayleighChebyshev procedure for finding the smallest eigenvalues and associated eigenvectors of large sparse Hermitian matrices. Journal of Computational Physics 229, 7477–7487 (2010)

(7)
Bae, E., Merkurjev, E.: Convex variational methods on graphs for multiclass segmentation of highdimensional data and point clouds.
Journal of Mathematical Imaging and Vision 58(3), 468–493 (2017) 
(8)
Belkin, M., Matveeva, I., Niyogi, P.: Regularization and semisupervised
learning on large graphs.
In: International Conference on Computational Learning Theory, pp. 624–638. Springer (2004)
 (9) Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research 7(Nov), 2399–2434 (2006)
 (10) Belongie, S., Fowlkes, C., Chung, F., Malik, J.: Spectral partitioning with indefinite kernels using the Nyström extension. European Conference on Computer Vision pp. 531–542 (2002)
 (11) Bohacek, R.S., McMartin, C., Guida, W.C.: The art and practice of structurebased drug design: a molecular modeling perspective. Medicinal Research Reviews 16(1), 3–50 (1996)
 (12) Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. In: ICCV (1), pp. 377–384 (1999). URL citeseer.ist.psu.edu/boykov99fast.html
 (13) Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013)
 (14) Cang, Z., Mu, L., Wei, G.W.: Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLoS Computational Biology 14(1), e1005929 (2018)
 (15) Cang, Z.X., Mu, L., Wu, K., Opron, K., Xia, K., Wei, G.W.: A topological approach to protein classification. Molecular Based Mathematical Biology 3, 140–162 (2015)
 (16) Cang, Z.X., Wei, G.W.: Analysis and prediction of protein folding energy changes upon mutation by element specific persistent homology. Bioinformatics 33, 3549–3557 (2017)
 (17) Cang, Z.X., Wei, G.W.: TopologyNet: Topology based deep convolutional and multitask neural networks for biomolecular property predictions. PLOS Computational Biology 13(7), e1005690, https://doi.org/10.1371/journal.pcbi.1005690 (2017)
 (18) Cang, Z.X., Wei, G.W.: Integration of element specific persistent homology and machine learning for proteinligand binding affinity prediction . International Journal for Numerical Methods in Biomedical Engineering 34(2), DOI: 10.1002/cnm.2914 (2018)
 (19) Cevikalp, H., Franc, V.: Largescale robust transductive support vector machines. Neurocomputing 235, 199–209 (2017)
 (20) Chapelle, O., Schölkopf, B., Zien, A.: SemiSupervised Learning. MIT Press, Cambridge, MA (2006)

(21)
Chapelle, O., Zien, A.: Semisupervised classification by low density
separation.
In: International Conference on Artificial Intelligence and Statistics, vol. 2005, pp. 57–64. Citeseer (2005)
 (22) Chen, C., Xin, J., Wang, Y., Chen, L., Ng, M.K.: A semisupervised classification approach for multidomain networks with domain selection. IEEE Transactions on Neural Networks and Learning Systems 30(1), 269–283 (2018)
 (23) Chen, J., Zhao, R., Tong, Y., Wei, G.W.: Evolutionary de rhamhodge method. Discrete & Continuous Dynamical Systems  B (In press, 2020)
 (24) Chen, Y., Ye, X.: Projection onto a simplex. arXiv preprint arXiv:1101.6081 (2011)
 (25) Cheng, Y., Zhao, X., Cai, R., Li, Z., Huang, K., Rui, Y.: Semisupervised multimodal deep learning for RGBD object recognition. In: International Joint Conferences on Artificial Intelligence, pp. 3345–3351 (2016)
 (26) Cortes, C., Vapnik, V.: Supportvector networks. Machine Learning 20(3), 273–297 (1995)
 (27) Couprie, C., Grady, L., Najman, L., Talbot, H.: Power watershed: A unifying graphbased optimization framework. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(7), 1384–1399 (2011)
 (28) Elmoataz, A., Lezoray, O., Bougleux, S.: Nonlocal discrete regularization on weighted graphs: a framework for image and manifold processing. IEEE Transactions on Image Processing 17(7), 1047–1060 (2008)

(29)
Fang, X., Xu, Y., Li, X., Fan, Z., Liu, H., Chen, Y.: Locality and similarity preserving embedding for feature selection.
Neurocomputing 128, 304–315 (2014)  (30) Feng, S., Zhou, H., Dong, H.: Using deep neural network with small dataset to predict material defects. Materials & Design 162, 300–310 (2019)
 (31) Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2), 214–225 (2004)

(32)
Fowlkes, C., Belongie, S., Malik, J.: Efficient spatiotemporal grouping using
the Nyström method.
Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001
1, I–I (2001)  (33) Fox, N.K., Brenner, S.E., Chandonia, J.M.: SCOPe: Structural classification of proteinsextended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Research 42(D1), D304–D309 (2014)
 (34) Gadde, A., Anis, A., Ortega, A.: Active semisupervised learning using sampling theory for graph signals. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 492–501 (2014)
 (35) GarciaCardona, C., Merkurjev, E., Bertozzi, A.L., Flenner, A., Percus, A.: Fast multiclass segmentation using diffuse interface methods on graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence (2014)
 (36) Gerhart, T., Sunu, J., Lieu, L., Merkurjev, E., Chang, J.M., Gilles, J., Bertozzi, A.L.: Detection and tracking of gas plumes in LWIR hyperspectral video sequence data. In: SPIE Conference on Defense, Security, and Sensing, pp. 87430J–87430J (2013)
 (37) Goldberg, A.B., Zhu, X., Wright, S.: Dissimilarity in graphbased semisupervised classification. In: Artificial Intelligence and Statistics, pp. 155–162 (2007)
 (38) Gong, C., Tao, D., Maybank, S., Liu, W., Kang, G., Yang, J.: Multimodal curriculum learning for semisupervised image classification. IEEE Transactions on Image Processing 25(7), 3249–3260 (2016)
 (39) Grandvalet, Y., Bengio, Y.: Semisupervised learning by entropy minimization. In: Advances in Neural Information Processing Systems, pp. 529–536 (2005)
 (40) Grimes, R.G., Lewis, J.G., Simon, H.D.: A shifted block Lanczos algorithm for solving sparse symmetric generalized eigenproblems. SIAM Journal on Matrix Analysis and Applications 15(1), 228–272 (1994)
 (41) Huang, G., Song, S., Gupta, J., Wu, C.: Semisupervised and unsupervised extreme learning machines. IEEE Transactions on Cybernetics 44(12), 2405–2417 (2014)
 (42) Huang, G., Song, S., Xu, Z.E., Weinberger, K.: Transductive minimax probability machine. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 579–594. Springer (2014)
 (43) Hudson, D.L., Cohen, M.E.: Neural networks and artificial intelligence for biomedical engineering. Institute of Electrical and Electronics Engineers (2000)
 (44) Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Label propagation for deep semisupervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5070–5079 (2019)
 (45) Jia, X., Jing, X.Y., Zhu, X., Chen, S., Du, B., Cai, Z., He, Z., Yue, D.: Semisupervised multiview deep discriminant representation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
 (46) Jiang, J., Wang, R., Wang, M., Gao, K., Nguyen, D.D., Wei, G.W.: Boosting treeassisted multitask deep learning for small scientific datasets. Journal of Chemical Information and Modeling 60(3), 1235–1244 (2020)
 (47) Joachims, T., et al.: Transductive inference for text classification using support vector machines. In: International Conference on Machine Learning, vol. 99, pp. 200–209 (1999)
 (48) Jordan, M.I., Mitchell, T.M.: Machine learning: Trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
 (49) Kapoor, A., Ahn, H., Qi, Y., Picard, R.W.: Hyperparameter and kernel learning for graph based semisupervised classification. In: Advances in Neural Information Processing Systems, pp. 627–634 (2006)
 (50) Kasun, L., Yang, Y., Huang, G.B., Zhang, Z.: Dimension reduction with extreme learning machine. IEEE Transactions on Image Processing 25(8), 3906–3918 (2016)
 (51) Katz G.and Caragea, C., Shabtai, A.: Vertical ensemble cotraining for text classification. ACM Transactions on Intelligent Systems and Technology 9(2), 1–23 (2017)
 (52) Kipf, T.N., Welling, M.: Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
 (53) Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: A review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering 160(1), 3–24 (2007)
 (54) Levin, A., RavAcha, A., Lischinski, D.: Spectral matting. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(10), 1699 –1712 (2008)
 (55) Li, Q., Han, Z., Wu, X.M.: Deeper insights into graph convolutional networks for semisupervised learning. In: ThirtySecond AAAI Conference on Artificial Intelligence (2018)
 (56) Li, Y.F., Zhou, Z.H.: Towards making unlabeled data never hurt. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(1), 175–188 (2014)
 (57) Liao, R., Brockschmidt, M., Tarlow, D., Gaunt, A., Urtasun, R., Zemel, R.S.: Graph partition neural networks for semisupervised classification (2018). URL https://openreview.net/forum?id=rk4Fz2e0b
 (58) Lin, F., Cohen, W.W.: Semisupervised classification of network data using very few labels. In: 2010 International Conference on Advances in Social Networks Analysis and Mining, pp. 192–199. IEEE (2010)
 (59) Liu, Y., Ng, M.K., Zhu, H.: Multiple graph semisupervised clustering with automatic calculation of graph associations. Neurocomputing 429, 33–46 (2021)
 (60) Melacci, S., Belkin, M.: Laplacian support vector machines trained in the primal. Journal of Machine Learning Research 12(3) (2011)
 (61) Meng, G., Merkurjev, E., Koniges, A., Bertozzi, A.L.: Hyperspectral video analysis using graph clustering methods. Image Processing On Line 7, 218–245 (2017)
 (62) Merkurjev, E., Bertozzi, A.L., Chung, F.: A semisupervised heat kernel pagerank mbo algorithm for data classification. Communications in Mathematical Sciences 16(5), 1241–1265 (2018)
 (63) Merkurjev, E., Bertozzi, A.L., Lerman, K., Yan, X.: Modified Cheeger and ratio cut methods using the GinzburgLandau functional for classification of highdimensional data. Inverse Problems 33(7), 074003 (2017)
 (64) Merkurjev, E., GarciaCardona, C., Bertozzi, A.L., Flenner, A., Percus, A.: Diffuse interface methods for multiclass segmentation of highdimensional data. Applied Mathematics Letters 33, 29–34 (2014)
 (65) Merkurjev, E., Kostic, T., Bertozzi, A.L.: An MBO scheme on graphs for segmentation and image processing. SIAM Journal of Imaging Sciences 6(4), 1903–1930 (2013)
 (66) Merkurjev, E., Sunu, J., Bertozzi, A.L.: Graph MBO method for multiclass segmentation of hyperspectral standoff detection video. Proceedings of IEEE International Conference on Image Processing (2014)
 (67) Merriman, B., Bence, J.K., Osher, S.: Diffusion generated motion by mean curvature. AMS Selected Lectures in Mathematics Series: Computational Crystal Growers Workshop 8966, 73–83 (1992)
 (68) Merriman, B., Bence, J.K., Osher, S.J.: Motion of multiple functions: a level set approach. Journal of Computational Physics 112(2), 334–363 (1994). DOI 10.1006/jcph.1994.1105. URL http://dx.doi.org/10.1006/jcph.1994.1105
 (69) Nguyen, D., Wei, G.W.: Aglscore: Algebraic graph learning score for proteinligand binding scoring, ranking, docking, and screening. Journal of Chemical Information and Modeling (2019)
 (70) Nguyen, D.D., Cang, Z., Wei, G.W.: A review of mathematical representations of biomolecular data. Physical Chemistry Chemical Physics 22(8), 4343–4367 (2020)
 (71) Nguyen, D.D., Wei, G.W.: DGGL: Differential geometrybased geometric learning of molecular datasets. International Journal for Numerical Methods in Biomedical Engineering 35(3), e3179 (2019)
 (72) Nguyen, D.D., Xia, K., Wei, G.W.: Generalized flexibilityrigidity index. The Journal of Chemical Physics 144(23), 234106 (2016)
 (73) Nguyen, D.D., Xiao, T., Wang, M.L., Wei, G.W.: Rigidity strengthening: A mechanism for proteinligand binding . Journal of Chemical Information and Modeling 57, 1715–1721 (2017)
 (74) Ni, T., Chung, F.L., Wang, S.: Support vector machine with manifold regularization and partially labeling privacy protection. Information Sciences 294, 390–407 (2015)
 (75) Nie, F., Cai, G., Li, J., Li, X.: Autoweighted multiview learning for image clustering and semisupervised classification. IEEE Transactions on Image Processing 27(3), 1501–1511 (2017)
 (76) Nie, F., Cai, G., Li, X.: Multiview clustering and semisupervised classification with adaptive neighbours. In: ThirtyFirst AAAI Conference on Artificial Intelligence (2017)
 (77) Nie, F., Li, J., Li, X.: Parameterfree autoweighted multiple graph learning: A framework for multiview clustering and semisupervised classification. In: IJCAI, pp. 1881–1887 (2016)
 (78) Nie, F., Li, J., Li, X.: Parameterfree autoweighted multiple graph learning: a framework for multiview clustering and semisupervised classification. In: International Joint Conferences on Artificial Intelligence, pp. 1881–1887 (2016)
 (79) Nie, F., Wang, X., Huang, H.: Clustering and projected clustering with adaptive neighbors. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 977–986 (2014)
 (80) Noroozi, V., Bahaadini, S., Zheng, L., Xie, S., Shao, W., Philip, S.Y.: Semisupervised deep representation learning for multiview problems. In: IEEE International Conference on Big Data, pp. 56–64. IEEE (2018)
 (81) Opron, K., Xia, K., Wei, G.W.: Communication: Capturing protein multiscale thermal fluctuations (2015)
 (82) Paige, C.C.: Computational variants of the Lanczos method for the eigenproblem. IMA Journal of Applied Mathematics 10(3), 373–381 (1972)

(83)
Perona, P., ZelnikManor, L.: Selftuning spectral clustering.
Advances in Neural Information Processing Systems (2004)  (84) Qi, Z., Tian, Y., Shi, Y.: Laplacian twin support vector machine for semisupervised classification. Neural Networks 35, 46–53 (2012)
 (85) Qu, M., Bengio, Y., Tang, J.: GMNN: Graph Markov neural networks. arXiv preprint arXiv:1905.06214 (2019)

(86)
Saha, B., Gupta, S., Phung, D., Venkatesh, S.: Multiple task transfer learning with small sample sizes.
Knowledge and Information Systems 46(2), 315–342 (2016)  (87) Sakai, T., Plessis, M.C., Niu, G., Sugiyama, M.: Semisupervised classification based on classification from positive and unlabeled data. In: International Conference on Machine Learning, pp. 2998–3006. PMLR (2017)
 (88) Sansone, E., Passerini, A., De Natale, F.: Classtering: Joint classification and clustering with mixture of factor analysers. In: Proceedings of the Twentysecond European Conference on Artificial Intelligence, pp. 1089–1095 (2016)
 (89) Schwab, K.: The Fourth Industrial Revolution. Currency (2017)
 (90) Shaikhina, T., Khovanova, N.A.: Handling limited datasets with neural networks in medical applications: A smalldata approach. Artificial Intelligence in Medicine 75, 51–63 (2017)
 (91) Shaikhina, T., Lowe, D., Daga, S., Briggs, D., Higgins, R., Khovanova, N.: Machine learning for predictive modelling based on small data in biomedical engineering. IFACPapersOnLine 48(20), 469–474 (2015)
 (92) She, Q., Hu, B., Luo, Z., Nguyen, T., Zhang, Y.: A hierarchical semisupervised extreme learning machine method for EEG recognition. Medical & Biological Engineering & Computing 57(1), 147–157 (2019)
 (93) Sindhwani, V., Niyogi, P., Belkin, M.: Beyond the point cloud: from transductive to semisupervised learning. In: International Conference on Machine Learning, pp. 824–831 (2005)
 (94) Sindhwani, V., Niyogi, P., Belkin, M.: Beyond the point cloud: from transductive to semisupervised learning. In: International Conference on Machine Learning, pp. 824–831. ACM (2005)
 (95) Subramanya, A., Bilmes, J.: Semisupervised learning with measure propagation. Journal of Machine Learning Research 12, 3311–3370 (2011)
 (96) Szummer, M., Jaakkola, T.: Partially labeled classification with markov random walks. In: Advances in Neural Information Processing Systems, pp. 945–952 (2002)
 (97) Thekumparampil, K.K., Wang, C., Oh, S., Li, L.J.: Attentionbased graph neural network for semisupervised learning. arXiv preprint arXiv:1803.03735 (2018)
 (98) Von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416 (2007)
 (99) Wang, B., Tu, Z., Tsotsos, J.K.: Dynamic label propagation for semisupervised multiclass multilabel classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 425–432 (2013)
 (100) Wang, J., Jebara, T., Chang, S.F.: Semisupervised learning using greedy maxcut. Journal of Machine Learning Research 14(Mar), 771–800 (2013)
 (101) Wang, M., Fu, W., Hao, S., Tao, D., Wu, X.: Scalable semisupervised learning by efficient anchor graph regularization. IEEE Transactions on Knowledge and Data Engineering 28(7), 1864–1877 (2016)
 (102) Wang, R., Nguyen, D.D., Wei, G.W.: Persistent spectral graph. International Journal for Numerical Methods in Biomedical Engineering p. e3376 (2020)
 (103) Wang, W., Arora, R., Livescu, K., Bilmes, J.: On deep multiview representation learning. In: International Conference on Machine Learning, pp. 1083–1092. PMLR (2015)
 (104) Weston, J., Ratle, F., Mobahi, H., Collobert, R.: Deep learning via semisupervised embedding. In: Neural networks: Tricks of the trade, pp. 639–655. Springer (2012)
 (105) Xia, K., Opron, K., Wei, G.W.: Multiscale Gaussian network model (mGNM) and multiscale anisotropic network model (mANM). The Journal of Chemical Physics 143(20), 11B616_1 (2015)
 (106) Yang, L., Song, S., Li, S., Chen, Y., Huang, G.: Graph embeddingbased dimension reduction with extreme learning machine. IEEE Transactions on Systems, Man, and Cybernetics: Systems (2019)
 (107) Yang, Y., Wu, Q.J., Wang, Y.: Autoencoder with invertible functions for dimension reduction and image reconstruction. IEEE Transactions on Systems, Man, and Cybernetics: Systems 48(7), 1065–1079 (2016)
 (108) Yang, Z., Cohen, W., Salakhudinov, R.: Revisiting semisupervised learning with graph embeddings. In: International Conference on Machine Learning, pp. 40–48 (2016)
 (109) Zhang, B., Qiang, Q., Wang, F., Nie, F.: Fast multiview semisupervised learning with learned graph. IEEE Transactions on Knowledge and Data Engineering (2020)
 (110) Zhang, Y., Ling, C.: A strategy to apply machine learning to small datasets in materials science. Npj Computational Materials 4(1), 1–8 (2018)
 (111) Zhang, Y., Pal, S., Coates, M., Ustebay, D.: Bayesian graph convolutional neural networks for semisupervised classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5829–5836 (2019)
 (112) Zhao, H., Zheng, J., Deng, W., Song, Y.: Semisupervised broad learning system based on manifold regularization and broad network. IEEE Transactions on Circuits and Systems I: Regular Papers 67(3), 983–994 (2020)
 (113) Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: S. Thrun, L.K. Saul, B. Schölkopf (eds.) Advances in Neural Information Processing Systems 16, pp. 321–328. MIT Press, Cambridge, MA (2004)
 (114) Zhou, D., Schölkopf, B.: A regularization framework for learning from graph data. In: Workshop on Statistical Relational Learning. International Conference on Machine Learning, Banff, Canada (2004)
 (115) Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. CMU CALD Tech Report CMUCALD02107 (2002)
 (116) Zhu, X., Ghahramani, Z., Lafferty, J.: Semisupervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th International conference on Machine learning, pp. 912–919 (2003)
 (117) Zhuang, C., Ma, Q.: Dual graph convolutional networks for graphbased semisupervised classification. In: Proceedings of the 2018 World Wide Web Conference, pp. 499–508 (2018)