Yuichi Yoshida

is this you? claim profile

0

  • On Random Subsampling of Gaussian Process Regression: A Graphon-Based Analysis

    In this paper, we study random subsampling of Gaussian process regression, one of the simplest approximation baselines, from a theoretical perspective. Although subsampling discards a large part of training data, we show provable guarantees on the accuracy of the predictive mean/variance and its generalization ability. For analysis, we consider embedding kernel matrices into graphons, which encapsulate the difference of the sample size and enables us to evaluate the approximation and generalization errors in a unified manner. The experimental results show that the subsampling approximation achieves a better trade-off regarding accuracy and runtime than the Nyström and random Fourier expansion methods.

    01/28/2019 ∙ by Kohei Hayashi, et al. ∙ 16 share

    read it

  • Spectral Norm Regularization for Improving the Generalizability of Deep Learning

    We investigate the generalizability of deep learning based on the sensitivity to input perturbation. We hypothesize that the high sensitivity to the perturbation of data degrades the performance on it. To reduce the sensitivity to perturbation, we propose a simple and effective regularization method, referred to as spectral norm regularization, which penalizes the high spectral norm of weight matrices in neural networks. We provide supportive evidence for the abovementioned hypothesis by experimentally confirming that the models trained using spectral norm regularization exhibit better generalizability than other baseline methods.

    05/31/2017 ∙ by Yuichi Yoshida, et al. ∙ 0 share

    read it

  • Guaranteed Sufficient Decrease for Variance Reduced Stochastic Gradient Descent

    In this paper, we propose a novel sufficient decrease technique for variance reduced stochastic gradient descent methods such as SAG, SVRG and SAGA. In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion, which yields sufficient decrease versions of variance reduction algorithms such as SVRG-SD and SAGA-SD as a byproduct. We introduce a coefficient to scale current iterate and satisfy the sufficient decrease property, which takes the decisions to shrink, expand or move in the opposite direction, and then give two specific update rules of the coefficient for Lasso and ridge regression. Moreover, we analyze the convergence properties of our algorithms for strongly convex problems, which show that both of our algorithms attain linear convergence rates. We also provide the convergence guarantees of our algorithms for non-strongly convex problems. Our experimental results further verify that our algorithms achieve significantly better performance than their counterparts.

    03/20/2017 ∙ by Fanhua Shang, et al. ∙ 0 share

    read it

  • Minimizing Quadratic Functions in Constant Time

    A sampling-based optimization method for quadratic functions is proposed. Our method approximately solves the following n-dimensional quadratic minimization problem in constant time, which is independent of n: z^*=_v∈R^n〈v, A v〉 + n〈v, diag(d)v〉 + n〈b, v〉, where A ∈R^n × n is a matrix and d,b∈R^n are vectors. Our theoretical analysis specifies the number of samples k(δ, ϵ) such that the approximated solution z satisfies |z - z^*| = O(ϵ n^2) with probability 1-δ. The empirical performance (accuracy and runtime) is positively confirmed by numerical experiments.

    08/25/2016 ∙ by Kohei Hayashi, et al. ∙ 0 share

    read it

  • Using k-way Co-occurrences for Learning Word Embeddings

    Co-occurrences between two words provide useful insights into the semantics of those words. Consequently, numerous prior work on word embedding learning have used co-occurrences between two words as the training signal for learning word embeddings. However, in natural language texts it is common for multiple words to be related and co-occurring in the same context. We extend the notion of co-occurrences to cover k(≥2)-way co-occurrences among a set of k-words. Specifically, we prove a theoretical relationship between the joint probability of k(≥2) words, and the sum of ℓ_2 norms of their embeddings. Next, we propose a learning objective motivated by our theoretical result that utilises k-way co-occurrences for learning word embeddings. Our experimental results show that the derived theoretical relationship does indeed hold empirically, and despite data sparsity, for some smaller k values, k-way embeddings perform comparably or better than 2-way embeddings in a range of tasks.

    09/05/2017 ∙ by Danushka Bollegala, et al. ∙ 0 share

    read it

  • Learning Word Representations from Relational Graphs

    Attributes of words and relations between two words are central to numerous tasks in Artificial Intelligence such as knowledge representation, similarity measurement, and analogy detection. Often when two words share one or more attributes in common, they are connected by some semantic relations. On the other hand, if there are numerous semantic relations between two words, we can expect some of the attributes of one of the words to be inherited by the other. Motivated by this close connection between attributes and relations, given a relational graph in which words are inter- connected via numerous semantic relations, we propose a method to learn a latent representation for the individual words. The proposed method considers not only the co-occurrences of words as done by existing approaches for word representation learning, but also the semantic relations in which two words co-occur. To evaluate the accuracy of the word representations learnt using the proposed method, we use the learnt word representations to solve semantic word analogy problems. Our experimental results show that it is possible to learn better word representations by using semantic semantics between words.

    12/07/2014 ∙ by Danushka Bollegala, et al. ∙ 0 share

    read it

  • Spectral Normalization for Generative Adversarial Networks

    One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral normalization on CIFAR10, STL-10, and ILSVRC2012 dataset, and we experimentally confirmed that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques.

    02/16/2018 ∙ by Takeru Miyato, et al. ∙ 0 share

    read it

  • Guaranteed Sufficient Decrease for Stochastic Variance Reduced Gradient Optimization

    In this paper, we propose a novel sufficient decrease technique for stochastic variance reduced gradient descent methods such as SVRG and SAGA. In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion, which yields sufficient decrease versions of stochastic variance reduction algorithms such as SVRG-SD and SAGA-SD as a byproduct. We introduce a coefficient to scale current iterate and to satisfy the sufficient decrease property, which takes the decisions to shrink, expand or even move in the opposite direction, and then give two specific update rules of the coefficient for Lasso and ridge regression. Moreover, we analyze the convergence properties of our algorithms for strongly convex problems, which show that our algorithms attain linear convergence rates. We also provide the convergence guarantees of our algorithms for non-strongly convex problems. Our experimental results further verify that our algorithms achieve significantly better performance than their counterparts.

    02/26/2018 ∙ by Fanhua Shang, et al. ∙ 0 share

    read it

  • Polynomial-Time Algorithms for Submodular Laplacian Systems

    Let G=(V,E) be an undirected graph, L_G∈R^V × V be the associated Laplacian matrix, and b ∈R^V be a vector. Solving the Laplacian system L_G x = b has numerous applications in theoretical computer science, machine learning, and network analysis. Recently, the notion of the Laplacian operator L_F:R^V → 2^R^V for a submodular transformation F:2^V →R_+^E was introduced, which can handle undirected graphs, directed graphs, hypergraphs, and joint distributions in a unified manner. In this study, we show that the submodular Laplacian system L_F( x) ∋ b can be solved in polynomial time. Furthermore, we also prove that even when the submodular Laplacian system has no solution, we can solve its regression form in polynomial time. Finally, we discuss potential applications of submodular Laplacian systems in machine learning and network analysis.

    03/29/2018 ∙ by Kaito Fujii, et al. ∙ 0 share

    read it

  • Sublinear-Time Quadratic Minimization via Spectral Decomposition of Matrices

    We design a sublinear-time approximation algorithm for quadratic function minimization problems with a better error bound than the previous algorithm by Hayashi and Yoshida (NIPS'16). Our approximation algorithm can be modified to handle the case where the minimization is done over a sphere. The analysis of our algorithms is obtained by combining results from graph limit theory, along with a novel spectral decomposition of matrices. Specifically, we prove that a matrix A can be decomposed into a structured part and a pseudorandom part, where the structured part is a block matrix with a polylogarithmic number of blocks, such that in each block all the entries are the same, and the pseudorandom part has a small spectral norm, achieving better error bound than the existing decomposition theorem of Frieze and Kannan (FOCS'96). As an additional application of the decomposition theorem, we give a sublinear-time approximation algorithm for computing the top singular values of a matrix.

    06/27/2018 ∙ by Amit Levi, et al. ∙ 0 share

    read it

  • Canonical and Compact Point Cloud Representation for Shape Classification

    We present a novel compact point cloud representation that is inherently invariant to scale, coordinate change and point permutation. The key idea is to parametrize a distance field around an individual shape into a unique, canonical, and compact vector in an unsupervised manner. We firstly project a distance field to a 4D canonical space using singular value decomposition. We then train a neural network for each instance to non-linearly embed its distance field into network parameters. We employ a bias-free Extreme Learning Machine (ELM) with ReLU activation units, which has scale-factor commutative property between layers. We demonstrate the descriptiveness of the instance-wise, shape-embedded network parameters by using them to classify shapes in 3D datasets. Our learning-based representation requires minimal augmentation and simple neural networks, where previous approaches demand numerous representations to handle coordinate change and point permutation.

    09/13/2018 ∙ by Kent Fujiwara, et al. ∙ 0 share

    read it