
On the Convergence of Adam and Beyond
Several recently proposed stochastic optimization methods that have been...
read it

Are Transformers universal approximators of sequencetosequence functions?
Despite the widespread adoption of Transformer models for NLP tasks, the...
read it

Learning to Learn by ZerothOrder Oracle
In the learning to learn (L2L) framework, we cast the design of optimiza...
read it

New Loss Functions for Fast Maximum Inner Product Search
Quantization based methods are popular for solving large scale maximum i...
read it

Efficient Inner Product Approximation in Hybrid Spaces
Many emerging use cases of data mining and machine learning operate on l...
read it

Online Hierarchical Clustering Approximations
Hierarchical clustering is a widely used approach for clustering dataset...
read it

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise
Neural Ordinary Differential Equation (Neural ODE) has been proposed as ...
read it

Local Orthogonal Decomposition for Maximum Inner Product Search
Inverted file and asymmetric distance computation (IVFADC) have been suc...
read it

The Sparse Recovery Autoencoder
Linear encoding of sparse vectors is widely popular, but is most commonl...
read it

Sampled Softmax with Random Fourier Features
The computational cost of training with softmax cross entropy loss grows...
read it

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks
Neural language models have been widely used in various NLP tasks, inclu...
read it

AdaCliP: Adaptive Clipping for Private SGD
Privacy preserving machine learning algorithms are crucial for learning ...
read it

Learning to Hash for Indexing Big Data  A Survey
The explosive growth in big data has attracted much attention in designi...
read it

Compact Hyperplane Hashing with Bilinear Functions
Hyperplane hashing aims at rapidly searching nearest points to a hyperpl...
read it

Binary embeddings with structured hashed projections
We consider the hashing mechanism for constructing binary embeddings, th...
read it

Orthogonal Random Features
We present an intriguing discovery related to Random Fourier Features: i...
read it

Fast Online Clustering with Randomized Skeleton Sets
We present a new fast online clustering algorithm that reliably recovers...
read it

Stochastic Generative Hashing
Learningbased binary hashing has become a powerful paradigm for fast se...
read it

Quantization based Fast Inner Product Search
We propose a quantization based approach for fast approximate Maximum In...
read it

Compact Nonlinear Maps and Circulant Extensions
Kernel approximation via nonlinear random feature maps is widely used in...
read it

Circulant Binary Embedding
Binary embedding of highdimensional data requires long codes to preserv...
read it

On Learning from Label Proportions
Learning from Label Proportions (LLP) is a learning setting, where the t...
read it

Structured Transforms for SmallFootprint Deep Learning
We consider the task of building compact deep learning pipelines suitabl...
read it

∝SVM for learning with label proportions
We study the problem of learning with label proportions in which the tra...
read it

An exploration of parameter redundancy in deep networks with circulant projections
We explore the redundancy of parameters in deep neural networks by repla...
read it

On the Difficulty of Nearest Neighbor Search
Fast approximate nearest neighbor (NN) search in large databases is beco...
read it

Efficient Natural Language Response Suggestion for Smart Reply
This paper presents a computationally efficient machinelearned method f...
read it

Nonlinear Online Learning with Adaptive Nyström Approximation
Use of nonlinear feature maps via kernel approximation has led to succes...
read it

Now Playing: Continuous lowpower music recognition
Existing music recognition applications require a connection to a server...
read it

cpSGD: Communicationefficient and differentiallyprivate distributed SGD
Distributed stochastic gradient descent is an important subroutine in di...
read it

Stochastic Negative Mining for Learning with Large Output Spaces
We consider the problem of retrieving the most relevant labels for a giv...
read it

Optimal NoiseAdding Mechanism in Additive Differential Privacy
We derive the optimal (0, δ)differentially private queryoutput indepen...
read it

Truncated Laplacian Mechanism for Approximate Differential Privacy
We derive a class of noise probability distributions to preserve (ϵ, δ)...
read it

Escaping Saddle Points with Adaptive Gradient Methods
Adaptive methods such as Adam and RMSProp are widely used in deep learni...
read it

Why ADAM Beats SGD for Attention Models
While stochastic gradient descent (SGD) is still the de facto algorithm ...
read it
Sanjiv Kumar
is this you? claim profile
Research Scientist at Google Research, NY, Principal Scientist at at Google Research, NY, PhD (2005; Robotics, SCS, CMU)