
PhysicsAware Downsampling with Deep Learning for Scalable Flood Modeling
Background: Floods are the most common natural disaster in the world, af...
Statistical Testing for Efficient Out of Distribution Detection in Deep Neural Networks
Commonly, Deep Neural Networks (DNNs) generalize well on samples drawn f...
On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent
Recent work has highlighted the role of initialization scale in determin...
Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks
Recently, researchers proposed pruning deep neural network weights (DNNs...
Task Agnostic Continual Learning Using Online Variational Bayes with FixedPoint Updates
Background: Catastrophic forgetting is the notorious vulnerability of ne...
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy
We provide a detailed asymptotic study of gradient flow trajectories and...
Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization?
Deep neural networks are typically initialized with random weights, with...
Neural gradients are lognormally distributed: understanding sparse and quantized training
Neural gradient compression remains a main bottleneck in improving train...
Kernel and Rich Regimes in Overparametrized Models
A recent line of work studies overparametrized neural networks in the "k...
MTJBased Hardware Synapse Design for Quantized Deep Neural Networks
Quantized neural networks (QNNs) are being actively researched as a solu...
Is Feature Diversity Necessary in Neural Network Initialization?
Standard practice in training neural networks involves initializing the ...
The Knowledge Within: Methods for DataFree Model Compression
Background: Recently, an extensive amount of research has been focused o...
A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case
A key element of understanding the efficacy of overparameterized neural ...
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?
Background: Recent developments have made it possible to accelerate neur...
Kernel and Deep Regimes in Overparametrized Models
A recent line of work studies overparametrized neural networks in the "k...
A Mean Field Theory of Quantized Deep Networks: The QuantizationDepth TradeOff
Reducing the precision of weights and activation functions in neural net...
Lexicographic and DepthSensitive Margins in Homogeneous and NonHomogeneous Deep Models
With an eye toward understanding complexity control in deep learning, we...
How do infinite width bounded norm networks look in function space?
We consider the question of what functions can be captured by ReLU netwo...
Augment your batch: better training with larger batches
Largebatch SGD is important for scaling training of deep neural network...
ACIQ: Analytical Clipping for Integer Quantization of neural networks
Unlike traditional approaches that focus on the quantization at the netw...
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Stochastic Gradient Descent (SGD) is a central tool in machine learning....
Implicit Bias of Gradient Descent on Linear Convolutional Networks
We show that gradient descent on fullwidth linear convolutional network...
Scalable Methods for 8bit Training of Neural Networks
Quantized Neural Networks (QNNs) are often used to improve network effic...
The Global Optimization Geometry of Shallow Linear Neural Networks
We examine the squared error loss landscape of shallow linear neural net...
Bayesian Gradient Descent: Online Variational Bayes Learning with Increased Robustness to Catastrophic Forgetting and Weight Pruning
We suggest a novel approach for the estimation of the posterior distribu...
Convergence of Gradient Descent on Separable Data
The implicit bias of gradient descent is not fully understood even in si...
Norm matters: efficient and accurate normalization schemes in deep networks
Over the past few years batchnormalization has been commonly used in de...
Characterizing Implicit Bias in Terms of Optimization Geometry
We study the bias of generic optimization methods, including Mirror Desc...
On the Blindspots of Convolutional Networks
Deep convolutional network has been the stateoftheart approach for a ...
Fix your classifier: the marginal value of training the last weight layer
Neural networks are commonly used as models for classification for a wid...
The Implicit Bias of Gradient Descent on Separable Data
We show that gradient descent on an unregularized logistic regression pr...
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Background: Deep learning models are typically trained using stochastic ...
Exponentially vanishing suboptimal local minima in multilayer neural networks
Background: Statistical mechanics results (Dauphin et al. (2014); Chorom...
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
We introduce a method to train Quantized Neural Networks (QNNs)  neur...
No bad local minima: Data independent training error guarantees for multilayer neural networks
We use smoothed analysis techniques to provide guarantees on the trainin...
Binarized Neural Networks
We introduce a method to train Binarized Neural Networks (BNNs)  neural...
Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation
Compared to Multilayer Neural Networks with real weights, Binary Multila...
Mean Field Bayes Backpropagation: scalable training of multilayer neural networks with binary weights
Significant success has been reported recently using deep neural network...
Daniel Soudry
Assistant Professor at Technion  Israel Institute of Technology