
Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel
Transformer is a powerful architecture that achieves superior performanc...
read it

GraphLIME: Local Interpretable Model Explanations for Graph Neural Networks
Graph structured data has wide applicability in various domains such as ...
read it

Kernel Stein Tests for Multiple Model Comparison
We address the problem of nonparametric multiple model comparison: give...
read it

Constant Time Graph Neural Networks
Recent advancements in graph neural networks (GNN) have led to stateof...
read it

TreeSliced Approximation of Wasserstein Distances
Optimal transport () theory provides a useful set of tools to compare pr...
read it

More Powerful Selective Kernel Tests for Feature Selection
Refining one's hypotheses in the light of data is a commonplace scientif...
read it

Topological Bayesian Optimization with Persistence Diagrams
Finding an optimal parameter of a blackbox function is important for se...
read it

Learning to Find Hard Instances of Graph Problems
Finding hard instances, which need a long time to solve, of graph proble...
read it

On Scalable Variant of Wasserstein Barycenter
We study a variant of Wasserstein barycenter problem, which we refer to ...
read it

LSMISinkhorn: Semisupervised SquaredLoss Mutual Information Estimation with Optimal Transport
Estimating mutual information is an important machine learning and stati...
read it

Approximation Ratios of Graph Neural Networks for Combinatorial Problems
In this paper, from a theoretical perspective, we study how powerful gra...
read it

Computationally Efficient Tree Variants of GromovWasserstein
We propose two novel variants of GromovWasserstein (GW) between probabi...
read it

Deep Matching Autoencoders
Increasingly many real world tasks involve data in multiple modalities o...
read it

Convex Coupled Matrix and Tensor Completion
We propose a set of convex low rank inducing norms for a coupled matrice...
read it

Interpreting Outliers: Localized Logistic Regression for Density Ratio Estimation
We propose an inlierbased outlier detection method capable of both iden...
read it

Post Selection Inference with Kernels
We propose a novel kernel based post selection inference (PSI) algorithm...
read it

Ultra HighDimensional Nonlinear Feature Selection for Big Biological Data
Machine learning methods are used to discover complex nonlinear relation...
read it

Localized Lasso for HighDimensional Regression
We introduce the localized Lasso, which is suited for learning models th...
read it

Convex Factorization Machine for Regression
We propose the convex factorization machine (CFM), which is a convex var...
read it

Consistent Collective Matrix Completion under Joint Low Rank Structure
We address the collective matrix completion problem of jointly recoverin...
read it

Multiview Anomaly Detection via Probabilistic Latent Variable Models
We propose a nonparametric Bayesian probabilistic latent variable model ...
read it

N^3LARS: Minimum Redundancy Maximum Relevance Feature Selection for Large and Highdimensional Data
We propose a feature selection method that finds nonredundant features ...
read it

Dependence Maximizing Temporal Alignment via SquaredLoss Mutual Information
The goal of temporal alignment is to establish time correspondence betwe...
read it

ChangePoint Detection in TimeSeries Data by Relative DensityRatio Estimation
The objective of changepoint detection is to discover abrupt property c...
read it

HighDimensional Feature Selection by FeatureWise NonLinear Lasso
The goal of supervised feature selection is to find a subset of input fe...
read it

InformationMaximization Clustering based on SquaredLoss Mutual Information
Informationmaximization clustering learns a probabilistic classifier in...
read it

Relative DensityRatio Estimation for Robust Distribution Comparison
Divergence estimators based on direct approximation of densityratios wi...
read it

SERAPH: Semisupervised Metric Learning Paradigm with Hyper Sparsity
We propose a general informationtheoretic approach called Seraph (SEmi...
read it

LeastSquares Independence Regression for NonLinear Causal Inference under NonGaussian Noise
The discovery of nonlinear causal relationship under additive nonGauss...
read it

Sufficient Component Analysis for Supervised Dimension Reduction
The purpose of sufficient dimension reduction (SDR) is to find the lowd...
read it

CrossDomain Object Matching with Model Selection
The goal of crossdomain object matching (CDOM) is to find correspondenc...
read it

Riemannian Manifold Kernel for Persistence Diagrams
Algebraic topology methods have recently played an important role for st...
read it

Selecting the Best in GANs Family: a Post Selection Inference Framework
"Which Generative Adversarial Networks (GANs) generates the most plausib...
read it

Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator
Measuring divergence between two distributions is essential in machine l...
read it

"Dependency Bottleneck" in Autoencoding Architectures: an Empirical Study
Recent works investigated the generalization properties in deep neural n...
read it
Makoto Yamada
is this you? claim profile
Associate Professor at Kyoto University, Unit Leader (PI), Highdimensional Statistical Modeling Unit, RIKEN AIP, Visiting Associate Professor, Research Center for Statistical Machine Learning, Institute of Statistical Mathematics