
Towards Efficient Point Cloud Graph Neural Networks Through Architectural Simplification
In recent years graph neural network (GNN)based approaches have become ...
S2TA: Exploiting Structured Sparsity for EnergyEfficient Mobile CNN Acceleration
Exploiting sparsity is a key technique in accelerating quantized convolu...
On the Effects of Quantisation on Model Uncertainty in Bayesian Neural Networks
Bayesian neural networks (BNNs) are making significant progress in many ...
Doping: A technique for efficient compression of LSTM models using sparse structured additive matrices
Structured matrices, such as those derived from Kronecker products (KP),...
Information contraction in noisy binary neural networks and its implications
Neural networks have gained importance as the machine learning models th...
MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers
Executing machine learning workloads locally on resource constrained mic...
Rank and runtime aware compression of NLP Applications
Sequence model based NLP applications can be large. Yet, many applicatio...
Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration
Convolutional neural network (CNN) inference on mobile devices demands e...
High Throughput MatrixMatrix Multiplication between Asymmetric BitWidth Operands
Matrix multiplications between asymmetric bitwidth operands, especially...
Efficient Residue Number System Based Winograd Convolution
Prior research has shown that Winograd algorithm can reduce the computat...
TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids
Modern speech enhancement algorithms achieve remarkable noise suppressio...
Systolic Tensor Array: An Efficient StructuredSparse GEMM Accelerator for Mobile CNN Inference
Convolutional neural network (CNN) inference on mobile devices demands e...
Searching for Winogradaware Quantized Networks
Lightweight architectural designs of Convolutional Neural Networks (CNNs...
Compressing Language Models using Doped Kronecker Products
Kronecker Products (KP) have been used to compress IoT RNN Applications ...
Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation
The success of deep learning has brought forth a wave of interest in com...
ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems
Convolutional neural networks (CNNs) are now predominant components in a...
Ternary MobileNets via PerLayer Hybrid Filter Banks
MobileNets family of computer vision neural networks have fueled tremend...
Pushing the limits of RNN Compression
Recurrent Neural Networks (RNN) can be difficult to deploy on resource c...
RunTime Efficient RNN Compression for Inference on Edge Devices
Recurrent neural networks can be large and computeintensive, yet many a...
Compressing RNNs for IoT devices by 1538x using Kronecker Products
Recurrent Neural Networks (RNN) can be large and computeintensive, maki...
SpArSe: Sparse Architecture Search for CNNs on ResourceConstrained Microcontrollers
The vast majority of processors in the world are actually microcontrolle...
Measuring scheduling efficiency of RNNs for NLP applications
Recurrent neural networks (RNNs) have shown state of the art results for...
Ternary Hybrid NeuralTree Networks for Highly Constrained IoT Applications
Machine learningbased applications are increasingly prevalent in IoT de...
Efficient Winograd or CookToom Convolution Kernel Implementation on Widely Used Mobile CPUs
The Winograd or CookToom class of algorithms help to reduce the overall...
Learning lowprecision neural networks without StraightThrough Estimator(STE)
The StraightThrough Estimator (STE) is widely used for backpropagating...
FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning
The computational demands of computer vision tasks based on stateofthe...
Efficient and Robust Machine Learning for RealWorld Systems
While machine learning is traditionally a resource intensive task, embed...
Energy Efficient Hardware for OnDevice CNN Inference via Transfer Learning
Ondevice CNN inference for realtime computer vision applications can r...
SCALESim: Systolic CNN Accelerator Simulator
Systolic Arrays are one of the most popular compute substrates within De...
SCALESim: Systolic CNN Accelerator
Systolic Arrays are one of the most popular compute substrates within De...
Euphrates: AlgorithmSoC CoDesign for LowPower Mobile Continuous Vision
Continuous computer vision (CV) tasks increasingly rely on convolutional...
Mobile Machine Learning Hardware at ARM: A SystemsonChip (SoC) Perspective
Machine learning is playing an increasingly significant role in emerging...
