Masaaki Imaizumi

is this you? claim profile


  • On Random Subsampling of Gaussian Process Regression: A Graphon-Based Analysis

    In this paper, we study random subsampling of Gaussian process regression, one of the simplest approximation baselines, from a theoretical perspective. Although subsampling discards a large part of training data, we show provable guarantees on the accuracy of the predictive mean/variance and its generalization ability. For analysis, we consider embedding kernel matrices into graphons, which encapsulate the difference of the sample size and enables us to evaluate the approximation and generalization errors in a unified manner. The experimental results show that the subsampling approximation achieves a better trade-off regarding accuracy and runtime than the Nyström and random Fourier expansion methods.

    01/28/2019 ∙ by Kohei Hayashi, et al. ∙ 16 share

    read it

  • On Tensor Train Rank Minimization: Statistical Efficiency and Scalable Algorithm

    Tensor train (TT) decomposition provides a space-efficient representation for higher-order tensors. Despite its advantage, we face two crucial limitations when we apply the TT decomposition to machine learning problems: the lack of statistical theory and of scalable algorithms. In this paper, we address the limitations. First, we introduce a convex relaxation of the TT decomposition problem and derive its error bound for the tensor completion task. Next, we develop an alternating optimization method with a randomization technique, in which the time complexity is as efficient as the space complexity is. In experiments, we numerically confirm the derived bounds and empirically demonstrate the performance of our method with a real higher-order tensor.

    08/01/2017 ∙ by Masaaki Imaizumi, et al. ∙ 0 share

    read it

  • Consistent Nonparametric Different-Feature Selection via the Sparsest k-Subgraph Problem

    Two-sample feature selection is the problem of finding features that describe a difference between two probability distributions, which is a ubiquitous problem in both scientific and engineering studies. However, existing methods have limited applicability because of their restrictive assumptions on data distributoins or computational difficulty. In this paper, we resolve these difficulties by formulating the problem as a sparsest k-subgraph problem. The proposed method is nonparametric and does not assume any specific parametric models on the data distributions. We show that the proposed method is computationally efficient and does not require any extra computation for model selection. Moreover, we prove that the proposed method provides a consistent estimator of features under mild conditions. Our experimental results show that the proposed method outperforms the current method with regard to both accuracy and computation time.

    07/31/2017 ∙ by Satoshi Hara, et al. ∙ 0 share

    read it

  • Doubly Decomposing Nonparametric Tensor Regression

    Nonparametric extension of tensor regression is proposed. Nonlinearity in a high-dimensional tensor space is broken into simple local functions by incorporating low-rank tensor decomposition. Compared to naive nonparametric approaches, our formulation considerably improves the convergence rate of estimation while maintaining consistency with the same function class under specific conditions. To estimate local functions, we develop a Bayesian estimator with the Gaussian process prior. Experimental results show its theoretical properties and high performance in terms of predicting a summary statistic of a real complex network.

    06/19/2015 ∙ by Masaaki Imaizumi, et al. ∙ 0 share

    read it

  • Deep Neural Networks Learn Non-Smooth Functions Effectively

    We theoretically discuss why deep neural networks (DNNs) performs better than other models in some cases by investigating statistical properties of DNNs for non-smooth functions. While DNNs have empirically shown higher performance than other standard methods, understanding its mechanism is still a challenging problem. From an aspect of the statistical theory, it is known many standard methods attain optimal convergence rates, and thus it has been difficult to find theoretical advantages of DNNs. This paper fills this gap by considering learning of a certain class of non-smooth functions, which was not covered by the previous theory. We derive convergence rates of estimators by DNNs with a ReLU activation, and show that the estimators by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate. In addition, our theoretical result provides guidelines for selecting an appropriate number of layers and edges of DNNs. We provide numerical experiments to support the theoretical results.

    02/13/2018 ∙ by Masaaki Imaizumi, et al. ∙ 0 share

    read it

  • Adaptive Approximation and Estimation of Deep Neural Network to Intrinsic Dimensionality

    We theoretically prove that the generalization performance of deep neural networks (DNNs) is mainly determined by an intrinsic low-dimensional structure of data. Recently, DNNs empirically provide outstanding performance in various machine learning applications. Motivated by the success, theoretical properties of DNNs (e.g. a generalization error) are actively investigated by numerous studies toward understanding their mechanism. Especially, how DNNs behave with high-dimensional data is one of the most important concerns. However, the problem is not sufficiently investigated from an aspect of characteristics of data, despite it is frequently observed that high-dimensional data have an intrinsic low-dimensionality in practice. In this paper, to clarify a connection between DNNs and such the data, we derive bounds for approximation and generalization errors by DNNs with intrinsic low-dimensional data. To the end, we introduce a general notion of an intrinsic dimension and develop a novel proof technique to evaluate the errors. Consequently, we show that convergence rates of the errors by DNNs do not depend on the nominal high-dimensionality of data, but depend on the lower intrinsic dimension. We also show that the rate is optimal in the minimax sense. Furthermore, we find that DNNs with increasing layers can handle a broader class of intrinsic low-dimensional data. We conduct a numerical simulation to validate (i) the intrinsic dimension of data affects the generalization error of DNNs, and (ii) DNNs outperform other non-parametric estimators which are also adaptive to the intrinsic dimension.

    07/04/2019 ∙ by Ryumei Nakada, et al. ∙ 0 share

    read it