
Stable Weight Decay Regularization
Weight decay is a popular regularization technique for training of deep ...
read it

Artificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting
Deep learning is often criticized by two serious issues which rarely exi...
read it

Classification from Ambiguity Comparisons
Labeling data is an unavoidable preprocessing procedure for most machin...
read it

Diagnostic Uncertainty Calibration: Towards Reliable Machine Predictions in Medical Domain
Label disagreement between human experts is a common issue in the medica...
read it

Adai: Separating the Effects of Adaptive Learning Rate and Momentum Inertia
Adaptive Momentum Estimation (Adam), which combines Adaptive Learning Ra...
read it

LFDProtoNet: Prototypical Network Based on Local Fisher Discriminant Analysis for Fewshot Learning
The prototypical network (ProtoNet) is a fewshot learning framework tha...
read it

γABC: OutlierRobust Approximate Bayesian Computation based on Robust Divergence Estimator
Making a reliable inference in complex models is an essential issue in s...
read it

Similaritybased Classification: Connecting Similarity Learning to Binary Classification
In realworld classification problems, pairwise supervision (i.e., a pai...
read it

Sequential Gallery for Interactive Visual Design Optimization
Visual design tasks often involve tuning many design parameters. For exa...
read it

Timevarying Gaussian Process Bandit Optimization with Nonconstant Evaluation Time
The Gaussian process bandit is a problem in which we want to find a maxi...
read it

Fewshot Domain Adaptation by Causal Mechanism Transfer
We study fewshot supervised domain adaptation (DA) for regression probl...
read it

A Diffusion Theory for Deep Learning Dynamics: Stochastic Gradient Descent Escapes From Sharp Minima Exponentially Fast
Stochastic optimization algorithms, such as Stochastic Gradient Descent ...
read it

Bayesian interpretation of SGD as Ito process
The current interpretation of stochastic gradient descent (SGD) as a sto...
read it

Classification from Triplet Comparison Data
Learning from triplet comparison data has been extensively studied in th...
read it

Interactive Subspace Exploration on Generative Image Modelling
Generative image modeling techniques such as GAN demonstrate highly conv...
read it

Solving NPHard Problems on Graphs by Reinforcement Learning without Domain Knowledge
We propose an algorithm based on reinforcement learning for solving NPh...
read it

Directing DNNs Attention for Facial Attribution Classification using Gradientweighted Class Activation Mapping
Deep neural networks (DNNs) have a high accuracy on image classification...
read it

Classification from Pairwise Similarities/Dissimilarities and Unlabeled Data via Empirical Risk Minimization
Pairwise similarities and dissimilarities between data points might be e...
read it

Use of Ghost Cytometry to Differentiate Cells with Similar Gross Morphologic Characteristics
Imaging flow cytometry shows significant potential for increasing our un...
read it

On Learning from Ghost Imaging without Imaging
Computational ghost imaging is an imaging technique with which an object...
read it

On Transformations in Stochastic Gradient MCMC
Stochastic gradient Langevin dynamics (SGLD) is a widely used sampler fo...
read it

PACBayes Analysis of Sentence Representation
Learning sentence vectors from an unlabeled corpus has attracted attenti...
read it

Online Multiclass Classification Based on Prediction Margin for Partial Feedback
We consider the problem of online multiclass classification with partial...
read it

Multilevel Monte Carlo Variational Inference
In many statistics and machine learning frameworks, stochastic optimizat...
read it

SemiSupervised Ordinal Regression Based on Empirical Risk Minimization
We consider the semisupervised ordinal regression problem, where unlabe...
read it

Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PACBayesian Analysis
The notion of flat minima has played a key role in the generalization pr...
read it

Clipped Matrix Completion: a Remedy for Ceiling Effects
We consider the recovery of a lowrank matrix from its clipped observati...
read it

On the Structural Sensitivity of Deep Convolutional Networks to the Directions of Fourier Basis Functions
Dataagnostic quasiimperceptible perturbations on inputs can severely d...
read it

Unsupervised Domain Adaptation Based on Sourceguided Discrepancy
Unsupervised domain adaptation is the problem setting where data generat...
read it

FrankWolfe Stein Sampling
In Bayesian inference, the posterior distributions are difficult to obta...
read it

Variational Inference for Gaussian Process with Panel Count Data
We present the first framework for Gaussianprocessmodulated Poisson pr...
read it

Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model
While crowdsourcing has become an important means to label data, crowdwo...
read it

LipschitzMargin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks
High sensitivity of neural networks against malicious perturbations on i...
read it

Gaussian Process Classification with Privileged Information by SofttoHard Labeling Transfer
Learning using privileged information is an attractive problem setting t...
read it

Variational Inference based on Robust Divergences
Robustness to outliers is a central issue in realworld machine learning...
read it

On the Model Shrinkage Effect of Gamma Process Edge Partition Models
The edge partition model (EPM) is a fundamental Bayesian nonparametric m...
read it

Expectation Propagation for tExponential Family Using QAlgebra
Exponential family distributions are highly useful in machine learning s...
read it

Bayesian Nonparametric PoissonProcess Allocation for TimeSequence Modeling
Analyzing the underlying structure of multiple timesequences provides i...
read it

Stochastic Divergence Minimization for Biterm Topic Model
As the emergence and the thriving development of social networks, a huge...
read it

Revisiting Distributionally Robust Supervised Learning in Classification
Distributionally Robust Supervised Learning (DRSL) is necessary for buil...
read it

Reparameterization trick for discrete variables
Lowvariance gradient estimation is crucial for learning directed graphi...
read it

Generative Adversarial Nets from a Density Ratio Estimation Perspective
Generative adversarial networks (GANs) are successful deep generative mo...
read it

Quantum Annealing for Variational Bayes Inference
This paper presents studies on a deterministic annealing algorithm based...
read it

Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering
We developed a new quantum annealing (QA) algorithm for Dirichlet proces...
read it

Rethinking Collapsed Variational Bayes Inference for LDA
We propose a novel interpretation of the collapsed variational Bayes inf...
read it

Restricted Collapsed Draw: Accurate Sampling for Hierarchical Chinese Restaurant Process Hidden Markov Models
We propose a restricted collapsed draw (RCD) sampler, a general Markov c...
read it
Issei Sato
is this you? claim profile
Lecture, Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo since 2015, Affiliated Lecture, Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo since 2015, Affiliated Lecture, Department of Information Science, Faculty of Science, The University of Tokyo since 2015, Assistant professor, Academic Information Science Research Division Information Technology Center of University of Tokyo from 20112015, My Ph.D. advisor was Hiroshi Nakagawa at Mathematical Informatics, Graduate School of Information Science and Technology in University of Tokyo from 20082011, Japan Society for the Promotion of Science Research Fellow (DC1) from 20082011.