
Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space
Deep learning has achieved notable success in various fields, including ...
Neural Architecture Search Using Stable Rank of Convolutional Layers
In Neural Architecture Search (NAS), Differentiable ARchiTecture Search ...
Generalization bound of globally optimal nonconvex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics
We introduce a new theoretical framework to analyze deep learning optimi...
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime
We analyze the convergence of the averaged stochastic gradient descent f...
Gradient Descent in RKHS with Importance Labeling
Labeling cost is often expensive and is a fundamental limitation of supe...
When Does Preconditioning Help or Hurt Generalization?
While second order optimizers such as natural gradient descent (NGD) oft...
Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multiscale Graph Neural Networks
It is known that the current graph neural networks (GNNs) are difficult ...
Selective Inference for Latent Block Models
Model selection in latent block models has been a challenging but import...
Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding MetaAmortization Error
The ability to learn new concepts with small amounts of data is a crucia...
Dimensionfree convergence rates for gradient Langevin dynamics in RKHS
Gradient Langevin dynamics (GLD) and stochastic GLD (SGLD) have attracte...
Understanding Generalization in Deep Learning via Tensor Methods
Deep neural networks generalize well on unseen data though the number of...
Domain Adaptation Regularization for Spectral Pruning
Deep Neural Networks (DNNs) have recently been achieving stateofthear...
Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features
Although kernel methods are widely used in many learning problems, they ...
Scalable Deep Neural Networks via LowRank Matrix Factorization
Compressing deep neural networks (DNNs) is important for realworld appl...
Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space
Deep learning has exhibited superior performance for various tasks, espe...
Compression based bound for noncompressed network: unified generalization error analysis of large compressible deep neural network
One of biggest issues in deep learning theory is its generalization abil...
Understanding the Effects of PreTraining for Object Detectors via Eigenspectrum
ImageNet pretraining has been regarded as essential for training accura...
Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed LargeBatch SGD
Largebatch stochastic gradient descent (SGD) is widely used for trainin...
Goodnessoffit Test for Latent Block Models
Latent Block Models are used for probabilistic biclustering, which is sh...
Accelerated Sparsified SGD with Error Feedback
We study a stochastic gradient method for synchronous distributed optimi...
On Asymptotic Behaviors of Graph CNNs from Dynamical Systems Perspective
Graph Convolutional Neural Networks (graph CNNs) are a promising deep le...
Refined Generalization Analysis of Gradient Descent for Overparameterized Twolayer Neural Networks with Smooth Activations on Classification Problems
Recently, several studies have proven the global convergence and general...
On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces
Deep learning has been applied to various tasks in the field of machine ...
Approximation and Nonparametric Estimation of ResNettype Convolutional Neural Networks
Convolutional neural networks (CNNs) have been shown to achieve optimal ...
Adam Induces Implicit Weight Sparsity in Rectifier Neural Networks
In recent years, deep neural networks (DNNs) have been applied to variou...
Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality
Deep learning has shown high performances in various types of tasks from...
Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation
We develop new stochastic gradient methods for efficiently solving spars...
SpectralPruning: Compressing deep neural network via spectral analysis
The model size of deep neural network is getting larger and larger to re...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors
We consider stochastic gradient descent for binary classification proble...
Crossdomain Recommendation via Deep Domain Adaptation
The behavior of users in certain services could be a clue that can be us...
Functional Gradient Boosting based on Residual Network Perception
Residual Networks (ResNets) have become stateoftheart models in deep ...
Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models
We propose a new technique that boosts the convergence of training gener...
Stochastic Particle Gradient Descent for Infinite Ensembles
The superior performance of ensemble methods with infinite models are we...
Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables
Sparse regularization such as ℓ_1 regularization is a quite powerful and...
Fast learning rate of deep learning via a kernel perspective
We develop a new theoretical framework to analyze the generalization err...
Trimmed Density Ratio Estimation
Density ratio estimation is a vital tool in both machine learning and st...
Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization
In this paper, we develop a new accelerated stochastic gradient method f...
Learning Sparse Structural Changes in Highdimensional Markov Networks: A Review on Methodologies and Theories
Recent years have seen an increasing popularity of learning the sparse c...
Stochastic dual averaging methods using variance reduction techniques for regularized empirical risk minimization problems
We consider a composite convex minimization problem associated with regu...
Structure Learning of Partitioned Markov Networks
We learn the structure of a Markov Network between two groups of random ...
Spectral norm of random tensors
We show that the spectral norm of a random n_1× n_2×...× n_K tensor (or ...
Support Consistency of Direct SparseChange Learning in Markov Networks
We study the problem of learning sparse structure changes between two Ma...
Direct Learning of Sparse Changes in Markov Networks by Density Ratio Estimation
We propose a new method for detecting changes in Markov network structur...
Convex Tensor Decomposition via Structured Schatten Norm Regularization
We discuss structured Schatten norms for tensor decomposition that inclu...
DensityDifference Estimation
We address the problem of estimating the difference between two probabil...
A Conjugate Property between Loss Functions and Uncertainty Sets in Classification Problems
In binary classification problems, mainly two approaches have been propo...
Fast learning rate of multiple kernel learning: Tradeoff between sparsity and smoothness
We investigate the learning rate of multiple kernel learning (MKL) with ...
Fast Learning Rate of NonSparse Multiple Kernel Learning and Optimal Regularization Strategies
In this paper, we give a new generalization error bound of Multiple Kern...
Relative DensityRatio Estimation for Robust Distribution Comparison
Divergence estimators based on direct approximation of densityratios wi...
Fast Learning Rate of lpMKL and its Minimax Optimality
In this paper, we give a new sharp generalization bound of lpMKL which ...
Taiji Suzuki
Associate Professor in Department of Mathematical Informatics and Graduate School of Information Science and Technology at University of Tokyo, Center for Advanced Integrated Intelligence Research, RIKEN, Tokyo