
On Random Subsampling of Gaussian Process Regression: A GraphonBased Analysis
In this paper, we study random subsampling of Gaussian process regression, one of the simplest approximation baselines, from a theoretical perspective. Although subsampling discards a large part of training data, we show provable guarantees on the accuracy of the predictive mean/variance and its generalization ability. For analysis, we consider embedding kernel matrices into graphons, which encapsulate the difference of the sample size and enables us to evaluate the approximation and generalization errors in a unified manner. The experimental results show that the subsampling approximation achieves a better tradeoff regarding accuracy and runtime than the Nyström and random Fourier expansion methods.
01/28/2019 ∙ by Kohei Hayashi, et al. ∙ 16 ∙ shareread it

Spectral Norm Regularization for Improving the Generalizability of Deep Learning
We investigate the generalizability of deep learning based on the sensitivity to input perturbation. We hypothesize that the high sensitivity to the perturbation of data degrades the performance on it. To reduce the sensitivity to perturbation, we propose a simple and effective regularization method, referred to as spectral norm regularization, which penalizes the high spectral norm of weight matrices in neural networks. We provide supportive evidence for the abovementioned hypothesis by experimentally confirming that the models trained using spectral norm regularization exhibit better generalizability than other baseline methods.
05/31/2017 ∙ by Yuichi Yoshida, et al. ∙ 0 ∙ shareread it

Guaranteed Sufficient Decrease for Variance Reduced Stochastic Gradient Descent
In this paper, we propose a novel sufficient decrease technique for variance reduced stochastic gradient descent methods such as SAG, SVRG and SAGA. In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion, which yields sufficient decrease versions of variance reduction algorithms such as SVRGSD and SAGASD as a byproduct. We introduce a coefficient to scale current iterate and satisfy the sufficient decrease property, which takes the decisions to shrink, expand or move in the opposite direction, and then give two specific update rules of the coefficient for Lasso and ridge regression. Moreover, we analyze the convergence properties of our algorithms for strongly convex problems, which show that both of our algorithms attain linear convergence rates. We also provide the convergence guarantees of our algorithms for nonstrongly convex problems. Our experimental results further verify that our algorithms achieve significantly better performance than their counterparts.
03/20/2017 ∙ by Fanhua Shang, et al. ∙ 0 ∙ shareread it

Minimizing Quadratic Functions in Constant Time
A samplingbased optimization method for quadratic functions is proposed. Our method approximately solves the following ndimensional quadratic minimization problem in constant time, which is independent of n: z^*=_v∈R^n〈v, A v〉 + n〈v, diag(d)v〉 + n〈b, v〉, where A ∈R^n × n is a matrix and d,b∈R^n are vectors. Our theoretical analysis specifies the number of samples k(δ, ϵ) such that the approximated solution z satisfies z  z^* = O(ϵ n^2) with probability 1δ. The empirical performance (accuracy and runtime) is positively confirmed by numerical experiments.
08/25/2016 ∙ by Kohei Hayashi, et al. ∙ 0 ∙ shareread it

Using kway Cooccurrences for Learning Word Embeddings
Cooccurrences between two words provide useful insights into the semantics of those words. Consequently, numerous prior work on word embedding learning have used cooccurrences between two words as the training signal for learning word embeddings. However, in natural language texts it is common for multiple words to be related and cooccurring in the same context. We extend the notion of cooccurrences to cover k(≥2)way cooccurrences among a set of kwords. Specifically, we prove a theoretical relationship between the joint probability of k(≥2) words, and the sum of ℓ_2 norms of their embeddings. Next, we propose a learning objective motivated by our theoretical result that utilises kway cooccurrences for learning word embeddings. Our experimental results show that the derived theoretical relationship does indeed hold empirically, and despite data sparsity, for some smaller k values, kway embeddings perform comparably or better than 2way embeddings in a range of tasks.
09/05/2017 ∙ by Danushka Bollegala, et al. ∙ 0 ∙ shareread it

Learning Word Representations from Relational Graphs
Attributes of words and relations between two words are central to numerous tasks in Artificial Intelligence such as knowledge representation, similarity measurement, and analogy detection. Often when two words share one or more attributes in common, they are connected by some semantic relations. On the other hand, if there are numerous semantic relations between two words, we can expect some of the attributes of one of the words to be inherited by the other. Motivated by this close connection between attributes and relations, given a relational graph in which words are inter connected via numerous semantic relations, we propose a method to learn a latent representation for the individual words. The proposed method considers not only the cooccurrences of words as done by existing approaches for word representation learning, but also the semantic relations in which two words cooccur. To evaluate the accuracy of the word representations learnt using the proposed method, we use the learnt word representations to solve semantic word analogy problems. Our experimental results show that it is possible to learn better word representations by using semantic semantics between words.
12/07/2014 ∙ by Danushka Bollegala, et al. ∙ 0 ∙ shareread it

Spectral Normalization for Generative Adversarial Networks
One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral normalization on CIFAR10, STL10, and ILSVRC2012 dataset, and we experimentally confirmed that spectrally normalized GANs (SNGANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques.
02/16/2018 ∙ by Takeru Miyato, et al. ∙ 0 ∙ shareread it

Guaranteed Sufficient Decrease for Stochastic Variance Reduced Gradient Optimization
In this paper, we propose a novel sufficient decrease technique for stochastic variance reduced gradient descent methods such as SVRG and SAGA. In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion, which yields sufficient decrease versions of stochastic variance reduction algorithms such as SVRGSD and SAGASD as a byproduct. We introduce a coefficient to scale current iterate and to satisfy the sufficient decrease property, which takes the decisions to shrink, expand or even move in the opposite direction, and then give two specific update rules of the coefficient for Lasso and ridge regression. Moreover, we analyze the convergence properties of our algorithms for strongly convex problems, which show that our algorithms attain linear convergence rates. We also provide the convergence guarantees of our algorithms for nonstrongly convex problems. Our experimental results further verify that our algorithms achieve significantly better performance than their counterparts.
02/26/2018 ∙ by Fanhua Shang, et al. ∙ 0 ∙ shareread it

PolynomialTime Algorithms for Submodular Laplacian Systems
Let G=(V,E) be an undirected graph, L_G∈R^V × V be the associated Laplacian matrix, and b ∈R^V be a vector. Solving the Laplacian system L_G x = b has numerous applications in theoretical computer science, machine learning, and network analysis. Recently, the notion of the Laplacian operator L_F:R^V → 2^R^V for a submodular transformation F:2^V →R_+^E was introduced, which can handle undirected graphs, directed graphs, hypergraphs, and joint distributions in a unified manner. In this study, we show that the submodular Laplacian system L_F( x) ∋ b can be solved in polynomial time. Furthermore, we also prove that even when the submodular Laplacian system has no solution, we can solve its regression form in polynomial time. Finally, we discuss potential applications of submodular Laplacian systems in machine learning and network analysis.
03/29/2018 ∙ by Kaito Fujii, et al. ∙ 0 ∙ shareread it

SublinearTime Quadratic Minimization via Spectral Decomposition of Matrices
We design a sublineartime approximation algorithm for quadratic function minimization problems with a better error bound than the previous algorithm by Hayashi and Yoshida (NIPS'16). Our approximation algorithm can be modified to handle the case where the minimization is done over a sphere. The analysis of our algorithms is obtained by combining results from graph limit theory, along with a novel spectral decomposition of matrices. Specifically, we prove that a matrix A can be decomposed into a structured part and a pseudorandom part, where the structured part is a block matrix with a polylogarithmic number of blocks, such that in each block all the entries are the same, and the pseudorandom part has a small spectral norm, achieving better error bound than the existing decomposition theorem of Frieze and Kannan (FOCS'96). As an additional application of the decomposition theorem, we give a sublineartime approximation algorithm for computing the top singular values of a matrix.
06/27/2018 ∙ by Amit Levi, et al. ∙ 0 ∙ shareread it

Canonical and Compact Point Cloud Representation for Shape Classification
We present a novel compact point cloud representation that is inherently invariant to scale, coordinate change and point permutation. The key idea is to parametrize a distance field around an individual shape into a unique, canonical, and compact vector in an unsupervised manner. We firstly project a distance field to a 4D canonical space using singular value decomposition. We then train a neural network for each instance to nonlinearly embed its distance field into network parameters. We employ a biasfree Extreme Learning Machine (ELM) with ReLU activation units, which has scalefactor commutative property between layers. We demonstrate the descriptiveness of the instancewise, shapeembedded network parameters by using them to classify shapes in 3D datasets. Our learningbased representation requires minimal augmentation and simple neural networks, where previous approaches demand numerous representations to handle coordinate change and point permutation.
09/13/2018 ∙ by Kent Fujiwara, et al. ∙ 0 ∙ shareread it
Yuichi Yoshida
is this you? claim profile