
On Learnability via Gradient Method for TwoLayer ReLU Neural Networks in TeacherStudent Setting
Deep learning empirically achieves high performance in many applications...
read it

Deep TwoWay Matrix Reordering for Relational Data Analysis
Matrix reordering is a task to permute the rows and columns of a given o...
read it

Goodnessoffit Test on the Number of Biclusters in Relational Data Matrix
Biclustering is a method for detecting homogeneous submatrices in a give...
read it

BiasVariance Reduced Local SGD for Less Heterogeneous Federated Learning
Federated learning is one of the important learning scenarios in distrib...
read it

Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis
We propose the particle dual averaging (PDA) method, which generalizes t...
read it

Benefit of deep learning with nonconvex noisy gradient descent: Provable excess risk bound and superiority to kernel methods
Establishing a theoretical analysis that explains why deep learning can ...
read it

Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space
Deep learning has achieved notable success in various fields, including ...
read it

Neural Architecture Search Using Stable Rank of Convolutional Layers
In Neural Architecture Search (NAS), Differentiable ARchiTecture Search ...
read it

Generalization bound of globally optimal nonconvex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics
We introduce a new theoretical framework to analyze deep learning optimi...
read it

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime
We analyze the convergence of the averaged stochastic gradient descent f...
read it

Gradient Descent in RKHS with Importance Labeling
Labeling cost is often expensive and is a fundamental limitation of supe...
read it

When Does Preconditioning Help or Hurt Generalization?
While second order optimizers such as natural gradient descent (NGD) oft...
read it

Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multiscale Graph Neural Networks
It is known that the current graph neural networks (GNNs) are difficult ...
read it

Selective Inference for Latent Block Models
Model selection in latent block models has been a challenging but import...
read it

Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding MetaAmortization Error
The ability to learn new concepts with small amounts of data is a crucia...
read it

Dimensionfree convergence rates for gradient Langevin dynamics in RKHS
Gradient Langevin dynamics (GLD) and stochastic GLD (SGLD) have attracte...
read it

Understanding Generalization in Deep Learning via Tensor Methods
Deep neural networks generalize well on unseen data though the number of...
read it

Domain Adaptation Regularization for Spectral Pruning
Deep Neural Networks (DNNs) have recently been achieving stateofthear...
read it

Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features
Although kernel methods are widely used in many learning problems, they ...
read it

Scalable Deep Neural Networks via LowRank Matrix Factorization
Compressing deep neural networks (DNNs) is important for realworld appl...
read it

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space
Deep learning has exhibited superior performance for various tasks, espe...
read it

Compression based bound for noncompressed network: unified generalization error analysis of large compressible deep neural network
One of biggest issues in deep learning theory is its generalization abil...
read it

Understanding the Effects of PreTraining for Object Detectors via Eigenspectrum
ImageNet pretraining has been regarded as essential for training accura...
read it

Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed LargeBatch SGD
Largebatch stochastic gradient descent (SGD) is widely used for trainin...
read it

Goodnessoffit Test for Latent Block Models
Latent Block Models are used for probabilistic biclustering, which is sh...
read it

Accelerated Sparsified SGD with Error Feedback
We study a stochastic gradient method for synchronous distributed optimi...
read it

On Asymptotic Behaviors of Graph CNNs from Dynamical Systems Perspective
Graph Convolutional Neural Networks (graph CNNs) are a promising deep le...
read it

Refined Generalization Analysis of Gradient Descent for Overparameterized Twolayer Neural Networks with Smooth Activations on Classification Problems
Recently, several studies have proven the global convergence and general...
read it

On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces
Deep learning has been applied to various tasks in the field of machine ...
read it

Approximation and Nonparametric Estimation of ResNettype Convolutional Neural Networks
Convolutional neural networks (CNNs) have been shown to achieve optimal ...
read it

Adam Induces Implicit Weight Sparsity in Rectifier Neural Networks
In recent years, deep neural networks (DNNs) have been applied to variou...
read it

Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality
Deep learning has shown high performances in various types of tasks from...
read it

Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation
We develop new stochastic gradient methods for efficiently solving spars...
read it

SpectralPruning: Compressing deep neural network via spectral analysis
The model size of deep neural network is getting larger and larger to re...
read it

Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors
We consider stochastic gradient descent for binary classification proble...
read it

Crossdomain Recommendation via Deep Domain Adaptation
The behavior of users in certain services could be a clue that can be us...
read it

Functional Gradient Boosting based on Residual Network Perception
Residual Networks (ResNets) have become stateoftheart models in deep ...
read it

Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models
We propose a new technique that boosts the convergence of training gener...
read it

Stochastic Particle Gradient Descent for Infinite Ensembles
The superior performance of ensemble methods with infinite models are we...
read it

Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables
Sparse regularization such as ℓ_1 regularization is a quite powerful and...
read it

Fast learning rate of deep learning via a kernel perspective
We develop a new theoretical framework to analyze the generalization err...
read it

Trimmed Density Ratio Estimation
Density ratio estimation is a vital tool in both machine learning and st...
read it

Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization
In this paper, we develop a new accelerated stochastic gradient method f...
read it

Learning Sparse Structural Changes in Highdimensional Markov Networks: A Review on Methodologies and Theories
Recent years have seen an increasing popularity of learning the sparse c...
read it

Stochastic dual averaging methods using variance reduction techniques for regularized empirical risk minimization problems
We consider a composite convex minimization problem associated with regu...
read it

Structure Learning of Partitioned Markov Networks
We learn the structure of a Markov Network between two groups of random ...
read it

Spectral norm of random tensors
We show that the spectral norm of a random n_1× n_2×...× n_K tensor (or ...
read it

Support Consistency of Direct SparseChange Learning in Markov Networks
We study the problem of learning sparse structure changes between two Ma...
read it

Direct Learning of Sparse Changes in Markov Networks by Density Ratio Estimation
We propose a new method for detecting changes in Markov network structur...
read it

Convex Tensor Decomposition via Structured Schatten Norm Regularization
We discuss structured Schatten norms for tensor decomposition that inclu...
read it
Taiji Suzuki
is this you? claim profile
Associate Professor in Department of Mathematical Informatics and Graduate School of Information Science and Technology at University of Tokyo, Center for Advanced Integrated Intelligence Research, RIKEN, Tokyo