
Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal
Minmax optimization, especially in its general nonconvexnonconcave form...
read it

The Power of Batching in Multiple Hypothesis Testing
One important partition of algorithms for controlling the false discover...
read it

Sampling for Bayesian Mixture Models: MCMC with PolynomialTime Mixing
We study the problem of sampling from the power posterior distribution i...
read it

LSTree: Model Interpretation When the Data Are Linguistic
We study the problem of interpreting trained classification models in th...
read it

Stochastic Gradient Descent Escapes Saddle Points Efficiently
This paper considers the perturbed stochastic gradient descent algorithm...
read it

Bayesian Robustness: A Nonasymptotic Viewpoint
We study the problem of robustly estimating the posterior distribution f...
read it

A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm
In this note, we derive concentration inequalities for random vectors wi...
read it

On the Complexity of Approximating Multimarginal Optimal Transport
We study the complexity of approximating the multimarginal optimal trans...
read it

Global Error Bounds and Linear Convergence for GradientBased Algorithms for Trend Filtering and ℓ_1Convex Clustering
We propose a class of firstorder gradienttype optimization algorithms ...
read it

Towards Understanding the Transferability of Deep Representations
Deep neural networks trained on a wide range of datasets demonstrate imp...
read it

HighOrder Langevin Diffusion Yields an Accelerated MCMC Algorithm
We propose a Markov chain Monte Carlo (MCMC) algorithm based on thirdor...
read it

Bridging Theory and Algorithm for Domain Adaptation
This paper addresses the problem of unsupervised domain adaption from th...
read it

CostEffective Incentive Allocation via Structured Counterfactual Inference
We address a practical problem ubiquitous in modern industry, in which a...
read it

Boundary Attack++: QueryEfficient DecisionBased Adversarial Attack
Decisionbased adversarial attack studies the generation of adversarial ...
read it

A joint model of unpaired data from scRNAseq and spatial transcriptomics for imputing missing gene expression measurements
Spatial studies of transcriptome provide biologists with gene expression...
read it

Convergence Rates for Gaussian Mixtures of Experts
We provide a theoretical treatment of overspecified Gaussian mixtures o...
read it

Fundamental limits of detection in the spiked Wigner model
We study the fundamental limits of detecting the presence of an additive...
read it

Theoretically Principled Tradeoff between Robustness and Accuracy
We identify a tradeoff between robustness and accuracy that serves as a...
read it

Is There an Analog of Nesterov Acceleration for MCMC?
We formulate gradientbased Markov chain Monte Carlo (MCMC) sampling as ...
read it

A Dynamical Systems Perspective on Nesterov Acceleration
We present a dynamical system framework for understanding Nesterov's acc...
read it

MLLOO: Detecting Adversarial Examples with Feature Attribution
Deep neural networks obtain stateoftheart performance on a series of ...
read it

Approximate SheraliAdams Relaxations for MAP Inference via Entropy Regularization
Maximum a posteriori (MAP) inference is a fundamental computational para...
read it

Quantitative W_1 Convergence of LangevinLike Stochastic Processes with NonConvex Potential StateDependent Noise
We prove quantitative convergence rates at which discrete Langevinlike ...
read it

Provably Efficient Reinforcement Learning with Linear Function Approximation
Modern Reinforcement Learning (RL) is commonly applied to practical prob...
read it

Learning Stages: Phenomenon, Root Cause, Mechanism Hypothesis, and Implications
Under StepDecay learning rate strategy (decaying the learning rate after...
read it

A HigherOrder Swiss Army Infinitesimal Jackknife
Cross validation (CV) and the bootstrap are ubiquitous modelagnostic to...
read it

Competing Bandits in Matching Markets
Stable matching, a classical model for twosided markets, has long been ...
read it

PolicyGradient Algorithms Have No Guarantees of Convergence in Continuous Action and State MultiAgent Settings
We show by counterexample that policygradient algorithms have no guaran...
read it

Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent
Nesterov's accelerated gradient descent (AGD), an instance of the genera...
read it

Stochastic Cubic Regularization for Fast Nonconvex Optimization
This paper proposes a stochastic variant of a classic algorithmthe cu...
read it

Firstorder Methods Almost Always Avoid Saddle Points
We establish that firstorder methods avoid saddle points for almost all...
read it

Online control of the false discovery rate with decaying memory
In the online multiple testing problem, pvalues corresponding to differ...
read it

DAGGER: A sequential algorithm for FDR control on DAGs
We propose a topdown algorithm for multiple testing on directed acyclic...
read it

Kernel Feature Selection via Conditional Covariance Minimization
We propose a framework for feature selection that employs kernelbased m...
read it

Fast Blackbox Variational Inference through Stochastic TrustRegion Optimization
We introduce TrustVI, a fast secondorder algorithm for blackbox variat...
read it

Gradient Descent Can Take Exponential Time to Escape Saddle Points
Although gradient descent (GD) almost always escapes saddle points asymp...
read it

A unified treatment of multiple testing with prior knowledge using the pfilter
A significant literature studies ways of employing prior knowledge to im...
read it

How to Escape Saddle Points Efficiently
This paper shows that a perturbed form of gradient descent converges to ...
read it

Less than a Single Pass: Stochastically Controlled Stochastic Gradient Method
We develop and analyze a procedure for gradientbased optimization that ...
read it

CYCLADES: Conflictfree Asynchronous Machine Learning
We present CYCLADES, a general framework for parallelizing stochastic op...
read it

CommunicationEfficient Distributed Statistical Inference
We present a Communicationefficient Surrogate Likelihood (CSL) framewor...
read it

Deep Transfer Learning with Joint Adaptation Networks
Deep networks have been successfully applied to learn transferable featu...
read it

On kernel methods for covariates that are rankings
Permutationvalued features arise in a variety of applications, either i...
read it

A Variational Perspective on Accelerated Methods in Optimization
Accelerated gradient methods play a central role in optimization, achiev...
read it

Asymptotic behavior of ℓ_pbased Laplacian regularization in semisupervised learning
Given a weighted graph with N vertices, consider a realvalued regressio...
read it

Gradient Descent Converges to Minimizers
We show that gradient descent converges to a local minimizer, almost sur...
read it

A Kernelized Stein Discrepancy for Goodnessoffit Tests and Model Evaluation
We derive a new discrepancy statistic for measuring differences between ...
read it

SparkNet: Training Deep Networks in Spark
Training deep networks is a timeconsuming process, with networks for ob...
read it

Optimistic Concurrency Control for Distributed Unsupervised Learning
Research on distributed machine learning algorithms has focused primaril...
read it

A LinearlyConvergent Stochastic LBFGS Algorithm
We propose a new stochastic LBFGS algorithm and prove a linear converge...
read it
Michael I. Jordan
is this you? claim profile
Michael Irwin Jordan is an american scientist, professor in machine learning, statistical science and artificial intelligence at the University of California, and researcher in Berkeley. He is one of the leading figures in machine learning, and Science has reported him as the most important computer scientist in the world in 2016.
In 1978, Jordan received his BS magna cum laude degree in Psychology from Louisiana State University, his MS degree in Mathematics from Arizona State University in 1980 and his PhD in cognitive science from the University of California in San Diego in 1985. Jordan was a student of David Rumelhart and a member of the PDP Group in the 1980s at the University of California, San Diego.
Jordan currently is a full professor, working in the Department of Statistics and the Department of EECS at the University of California, Berkeley. From 1988 to 1998 he was professor in the Brain and Cognitive Sciences Department at MIT.
Jordan began to develop recurrent neural networks as a cognitive model in the 1980s. In recent years, his work has been less driven by a cognitive point of view and more by traditional statistics.
In the machinelearning community, Jordan popularized Bayesian networks and is known for pointing out links between machine learning and statistics. He was also prominent in formalizing variation methods for approximate inference and popularizing the machine learning expectative maximization algorithm.
In 2001, Jordan and others resigned from the Machine Learning editorial board. They advocated less restrictive access in a public letter and committed support to a new open access newspaper, The Journal of Machine Learning Research, created by Leslie Kaelbling to support the development of machine learning.
Jordan has earned numerous awards, including the ACM  AAAI Allen Newell Award, the IEEE Pioneer Award for Neural Networks, and the NSF Young Investigator Award. This is a prize for the best paper award at the International Conference on Machine Learn. In 2010 he was appointed a Fellow for “contributions to the theory and application of machine training” in the Association for Machinery for Computing Machinery. Jordan belongs to the National Academy of Science, to the National Academy of Engineering and to the Academy of Arts and Sciences in the US.
He was named a Neyman lecturer and an Institute of Mathematical Statistics medallion lecturer. In 2015 he was awarded the David E. Rumelhart Prize and in 2009 received the ACM/AAAI Allen Newell Award.
In 2016 Jordan was identified by an analysis of published literature by the Semantic Scholar Project as the “most influential computer scientist.”