
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
The attention mechanism is becoming increasingly popular in Natural Lang...
read it

IOS: InterOperator Scheduler for CNN Acceleration
To accelerate CNN inference, existing deep learning frameworks focus on ...
read it

HardwareCentric AutoML for MixedPrecision Quantization
Model quantization is a widely used technique to compress and accelerate...
read it

Searching Efficient 3D Architectures with Sparse PointVoxel Convolution
Selfdriving cars need to understand 3D scenes efficiently and accuratel...
read it

Tiny Transfer Learning: Towards MemoryEfficient OnDevice Learning
We present TinyTransferLearning (TinyTL), an efficient ondevice learn...
read it

MCUNet: Tiny Deep Learning on IoT Devices
Machine learning on tiny IoT devices based on microcontroller units (MCU...
read it

Differentiable Augmentation for DataEfficient GAN Training
The performance of generative adversarial networks (GANs) heavily deteri...
read it

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
We present APQ for efficient deep learning inference on resourceconstra...
read it

HAT: HardwareAware Transformers for Efficient Natural Language Processing
Transformers are ubiquitous in Natural Language Processing (NLP) tasks, ...
read it

MicroNet for Efficient Language Modeling
It is important to design compact language models for efficient deployme...
read it

GCNRL Circuit Designer: Transferable Transistor Sizing with Graph Neural Networks and Reinforcement Learning
Automatic transistor sizing is a challenging problem in circuit design d...
read it

Lite Transformer with LongShort Range Attention
Transformer has become ubiquitous in natural language processing (e.g., ...
read it

A Fast Algorithm for SourceWise RoundTrip Spanners
In this paper, we study the problem of efficiently constructing sourcew...
read it

GAN Compression: Efficient Architectures for Interactive Conditional GANs
Conditional Generative Adversarial Networks (cGANs) have enabled control...
read it

SpArch: Efficient Architecture for Sparse Matrix Multiplication
Generalized Sparse MatrixMatrix Multiplication (SpGEMM) is a ubiquitous...
read it

ChainSplitter: Towards Blockchainbased Industrial IoT Architecture for Supporting Hierarchical Storage
The fast developing Industrial Internet of Things (IIoT) technologies pr...
read it

Training Kinetics in 15 Minutes: Largescale Distributed Training on Videos
Deep video recognition is more computationally expensive than image reco...
read it

Once for All: Train One Network and Specialize it for Efficient Deployment
Efficient deployment of deep learning models requires specialized neural...
read it

PointVoxel CNN for Efficient 3D Deep Learning
We present PointVoxel CNN (PVCNN) for efficient, fast 3D deep learning....
read it

Deep Leakage from Gradients
Exchanging gradients is a widely used method in modern multinode machin...
read it

Design Automation for Efficient Deep Learning Computing
Efficient deep learning computing requires algorithm and hardware codes...
read it

Defensive Quantization: When Efficiency Meets Robustness
Neural network quantization is becoming an industry standard to efficien...
read it

SysML: The New Frontier of Machine Learning Systems
Machine learning (ML) techniques are enjoying rapidly increasing adoptio...
read it

Fully Distributed Packet Scheduling Framework for Handling Disturbances in Lossy RealTime Wireless Networks
Along with the rapid growth of Industrial InternetofThings (IIoT) appl...
read it

Learning to Design Circuits
Analog IC design relies on human experts to search for parameters that s...
read it

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Neural architecture search (NAS) has a great impact by automatically des...
read it

HAQ: HardwareAware Automated Quantization
Model quantization is a widely used technique to compress and accelerate...
read it

Temporal Shift Module for Efficient Video Understanding
The explosive growth in online video streaming gives rise to challenges ...
read it

CommunicationOptimal Distributed Dynamic Graph Clustering
We consider the problem of clustering graph nodes over largescale dynam...
read it

PathLevel Network Transformation for Efficient Architecture Search
We introduce a new functionpreserving transformation for efficient neur...
read it

Fast inference of deep neural networks in FPGAs for particle physics
Recent results at the Large Hadron Collider (LHC) have pointed to enhanc...
read it

RTDAP: A RealTime Data Analytics Platform for Largescale Industrial Process Monitoring and Control
In most process control systems nowadays, process measurements are perio...
read it

Efficient SparseWinograd Convolutional Neural Networks
Convolutional Neural Networks (CNNs) are computationally intensive, whic...
read it

ADC: Automated Deep Compression and Acceleration with Reinforcement Learning
Model compression is an effective technique facilitating the deployment ...
read it

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Largescale distributed training requires significant communication band...
read it

Deep Generative Adversarial Networks for Compressed Sensing Automates MRI
Magnetic resonance image (MRI) reconstruction is a severely illposed li...
read it

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks
Sparsity helps reduce the computational complexity of deep neural networ...
read it

Classification of Neurological Gait Disorders Using Multitask Feature Learning
As our population ages, neurological impairments and degeneration of the...
read it

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
Long ShortTerm Memory (LSTM) is widely used in speech recognition. In o...
read it

DSD: DenseSparseDense Training for Deep Neural Networks
Modern deep neural networks have a large number of parameters, making th...
read it

SqueezeNet: AlexNetlevel accuracy with 50x fewer parameters and <0.5MB model size
Recent research on deep neural networks has focused primarily on improvi...
read it

Generate Image Descriptions based on Deep RNN and Memory Cells for Images Features
Generating natural language descriptions for images is a challenging tas...
read it

EIE: Efficient Inference Engine on Compressed Deep Neural Network
Stateoftheart deep neural networks (DNNs) have hundreds of millions o...
read it

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Neural networks are both computationally intensive and memory intensive,...
read it

Learning both Weights and Connections for Efficient Neural Networks
Neural networks are both computationally intensive and memory intensive,...
read it

Robust Face Recognition using Local Illumination Normalization and Discriminant Feature Point Selection
Face recognition systems must be robust to the variation of various fact...
read it
Song Han
is this you? claim profile
Assistant professor in the Electrical Engineering and Computer Science Department of the Massachusetts Institute of Technology (MIT).