
-
A review of on-device fully neural end-to-end automatic speech recognition algorithms
In this paper, we review various end-to-end automatic speech recognition...
read it
-
Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation
Transformer is being widely used in Neural Machine Translation (NMT). De...
read it
-
FleXOR: Trainable Fractional Quantization
Quantization based on the binary codes is gaining attention because each...
read it
-
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs
The number of parameters in deep neural networks (DNNs) is rapidly incre...
read it
-
Learning Low-Rank Approximation for CNNs
Low-rank approximation is an effective model compression technique to no...
read it
-
Structured Compression by Unstructured Pruning for Sparse Quantized Neural Networks
Model compression techniques, such as pruning and quantization, are beco...
read it
-
Network Pruning for Low-Rank Binary Indexing
Pruning is an efficient model compression technique to remove redundancy...
read it
-
DeepTwist: Learning Model Compression via Occasional Weight Distortion
Model compression has been introduced to reduce the required hardware re...
read it
-
Retraining-Based Iterative Weight Quantization for Deep Neural Networks
Model compression has gained a lot of attention due to its ability to re...
read it