
LowPrecision Hardware Architectures Meet Recommendation Model Inference at Scale
Tremendous success of machine learning (ML) and the unabated growth in M...
Efficient SoftError Detection for Lowprecision Deep Learning Recommendation Models
Soft error, namely silent corruption of signal or datum in a computer sy...
MixedPrecision Embedding Using a Cache
In recommendation systems, practitioners observed that increase in the n...
Fast Distributed Training of Deep Neural Networks: Dynamic Communication Thresholding for Model and Data Parallelism
Data Parallelism (DP) and Model Parallelism (MP) are two common paradigm...
Leveraging the bfloat16 Artificial Intelligence Datatype For HigherPrecision Computations
In recent years fusedmultiplyadd (FMA) units with lowerprecision mult...
Dictionary Learning by Dynamical Neural Networks
A dynamical neural network consists of a set of interconnected neurons t...
A Progressive Batching LBFGS Method for Machine Learning
The standard LBFGS method relies on gradient approximations that are no...
Sparse Coding by Spiking Neural Networks: Convergence Theory and Computational Results
In a spiking neural network (SNN), individual neurons operate autonomous...
Enabling Sparse Winograd Convolution by Native Pruning
Sparse methods and the use of Winograd convolutions are two orthogonal a...
Faster CNNs with Direct Sparse Convolutions and Guided Pruning
Phenomenally successful in practical inference problems, convolutional n...
Ping Tak Peter Tang
