
Optimization with Momentum: Dynamical, ControlTheoretic, and Symplectic Perspectives
We analyze the convergence rate of various momentumbased optimization a...
read it

Lower bounds in multiple testing: A framework based on derandomized proxies
The large bulk of work in multiple testing has focused on specifying pro...
read it

Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal
Minmax optimization, especially in its general nonconvexnonconcave form...
read it

The Power of Batching in Multiple Hypothesis Testing
One important partition of algorithms for controlling the false discover...
read it

Provable MetaLearning of Linear Representations
Metalearning, or learningtolearn, seeks to design algorithms that can...
read it

Sampling for Bayesian Mixture Models: MCMC with PolynomialTime Mixing
We study the problem of sampling from the power posterior distribution i...
read it

LSTree: Model Interpretation When the Data Are Linguistic
We study the problem of interpreting trained classification models in th...
read it

Is Temporal Difference Learning Optimal? An InstanceDependent Analysis
We address the problem of policy evaluation in discounted Markov decisio...
read it

Stochastic Gradient Descent Escapes Saddle Points Efficiently
This paper considers the perturbed stochastic gradient descent algorithm...
read it

Bayesian Robustness: A Nonasymptotic Viewpoint
We study the problem of robustly estimating the posterior distribution f...
read it

NearOptimal Algorithms for Minimax Optimization
This paper resolves a longstanding open question pertaining to the desig...
read it

On Dissipative Symplectic Integration with Applications to GradientBased Optimization
Continuoustime dynamical systems have proved useful in providing concep...
read it

A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm
In this note, we derive concentration inequalities for random vectors wi...
read it

On the Complexity of Approximating Multimarginal Optimal Transport
We study the complexity of approximating the multimarginal optimal trans...
read it

Variance Reduction with Sparse Gradients
Variance reduction methods such as SVRG and SpiderBoost use a mixture of...
read it

On Learning Rates and Schrödinger Operators
The learning rate is perhaps the single most important parameter in the ...
read it

Instability, Computational Efficiency and Statistical Accuracy
Many statistical estimators are defined as the fixed point of a datadep...
read it

Global Error Bounds and Linear Convergence for GradientBased Algorithms for Trend Filtering and ℓ_1Convex Clustering
We propose a class of firstorder gradienttype optimization algorithms ...
read it

Towards Understanding the Transferability of Deep Representations
Deep neural networks trained on a wide range of datasets demonstrate imp...
read it

HighOrder Langevin Diffusion Yields an Accelerated MCMC Algorithm
We propose a Markov chain Monte Carlo (MCMC) algorithm based on thirdor...
read it

Bridging Theory and Algorithm for Domain Adaptation
This paper addresses the problem of unsupervised domain adaption from th...
read it

On Thompson Sampling with Langevin Algorithms
Thompson sampling is a methodology for multiarmed bandit problems that ...
read it

DecisionMaking with AutoEncoding Variational Bayes
To make decisions based on a model fit by AutoEncoding Variational Baye...
read it

CostEffective Incentive Allocation via Structured Counterfactual Inference
We address a practical problem ubiquitous in modern industry, in which a...
read it

FiniteTime LastIterate Convergence for MultiAgent Learning in Games
We consider multiagent learning via online gradient descent (OGD) in a ...
read it

Boundary Attack++: QueryEfficient DecisionBased Adversarial Attack
Decisionbased adversarial attack studies the generation of adversarial ...
read it

A joint model of unpaired data from scRNAseq and spatial transcriptomics for imputing missing gene expression measurements
Spatial studies of transcriptome provide biologists with gene expression...
read it

Convergence Rates for Gaussian Mixtures of Experts
We provide a theoretical treatment of overspecified Gaussian mixtures o...
read it

Robust Optimization for Fairness with Noisy Protected Groups
Many existing fairness criteria for machine learning involve equalizing ...
read it

Theoretically Principled Tradeoff between Robustness and Accuracy
We identify a tradeoff between robustness and accuracy that serves as a...
read it

Robustness Guarantees for Mode Estimation with an Application to Bandits
Mode estimation is a classical problem in statistics with a wide range o...
read it

Fundamental limits of detection in the spiked Wigner model
We study the fundamental limits of detecting the presence of an additive...
read it

Is There an Analog of Nesterov Acceleration for MCMC?
We formulate gradientbased Markov chain Monte Carlo (MCMC) sampling as ...
read it

A Dynamical Systems Perspective on Nesterov Acceleration
We present a dynamical system framework for understanding Nesterov's acc...
read it

Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization
Adaptivity is an important yet understudied property in modern optimiza...
read it

Revisiting Fixed Support Wasserstein Barycenter: Computational Hardness and Efficient Algorithms
We study the fixedsupport Wasserstein barycenter problem (FSWBP), whic...
read it

Mechanism Design with Bandit Feedback
We study a multiround welfaremaximising mechanism design problem, wher...
read it

MLLOO: Detecting Adversarial Examples with Feature Attribution
Deep neural networks obtain stateoftheart performance on a series of ...
read it

Approximate SheraliAdams Relaxations for MAP Inference via Entropy Regularization
Maximum a posteriori (MAP) inference is a fundamental computational para...
read it

Quantitative W_1 Convergence of LangevinLike Stochastic Processes with NonConvex Potential StateDependent Noise
We prove quantitative convergence rates at which discrete Langevinlike ...
read it

Provably Efficient Reinforcement Learning with Linear Function Approximation
Modern Reinforcement Learning (RL) is commonly applied to practical prob...
read it

Learning Stages: Phenomenon, Root Cause, Mechanism Hypothesis, and Implications
Under StepDecay learning rate strategy (decaying the learning rate after...
read it

A HigherOrder Swiss Army Infinitesimal Jackknife
Cross validation (CV) and the bootstrap are ubiquitous modelagnostic to...
read it

Competing Bandits in Matching Markets
Stable matching, a classical model for twosided markets, has long been ...
read it

On Linear Stochastic Approximation: Finegrained PolyakRuppert and NonAsymptotic Concentration
We undertake a precise study of the asymptotic and nonasymptotic proper...
read it

PolicyGradient Algorithms Have No Guarantees of Convergence in Continuous Action and State MultiAgent Settings
We show by counterexample that policygradient algorithms have no guaran...
read it

Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent
Nesterov's accelerated gradient descent (AGD), an instance of the genera...
read it

Stochastic Cubic Regularization for Fast Nonconvex Optimization
This paper proposes a stochastic variant of a classic algorithmthe cu...
read it

Firstorder Methods Almost Always Avoid Saddle Points
We establish that firstorder methods avoid saddle points for almost all...
read it

Online control of the false discovery rate with decaying memory
In the online multiple testing problem, pvalues corresponding to differ...
read it
Michael I. Jordan
is this you? claim profile
Michael Irwin Jordan is an american scientist, professor in machine learning, statistical science and artificial intelligence at the University of California, and researcher in Berkeley. He is one of the leading figures in machine learning, and Science has reported him as the most important computer scientist in the world in 2016.
In 1978, Jordan received his BS magna cum laude degree in Psychology from Louisiana State University, his MS degree in Mathematics from Arizona State University in 1980 and his PhD in cognitive science from the University of California in San Diego in 1985. Jordan was a student of David Rumelhart and a member of the PDP Group in the 1980s at the University of California, San Diego.
Jordan currently is a full professor, working in the Department of Statistics and the Department of EECS at the University of California, Berkeley. From 1988 to 1998 he was professor in the Brain and Cognitive Sciences Department at MIT.
Jordan began to develop recurrent neural networks as a cognitive model in the 1980s. In recent years, his work has been less driven by a cognitive point of view and more by traditional statistics.
In the machinelearning community, Jordan popularized Bayesian networks and is known for pointing out links between machine learning and statistics. He was also prominent in formalizing variation methods for approximate inference and popularizing the machine learning expectative maximization algorithm.
In 2001, Jordan and others resigned from the Machine Learning editorial board. They advocated less restrictive access in a public letter and committed support to a new open access newspaper, The Journal of Machine Learning Research, created by Leslie Kaelbling to support the development of machine learning.
Jordan has earned numerous awards, including the ACM  AAAI Allen Newell Award, the IEEE Pioneer Award for Neural Networks, and the NSF Young Investigator Award. This is a prize for the best paper award at the International Conference on Machine Learn. In 2010 he was appointed a Fellow for “contributions to the theory and application of machine training” in the Association for Machinery for Computing Machinery. Jordan belongs to the National Academy of Science, to the National Academy of Engineering and to the Academy of Arts and Sciences in the US.
He was named a Neyman lecturer and an Institute of Mathematical Statistics medallion lecturer. In 2015 he was awarded the David E. Rumelhart Prize and in 2009 received the ACM/AAAI Allen Newell Award.
In 2016 Jordan was identified by an analysis of published literature by the Semantic Scholar Project as the “most influential computer scientist.”