DeepAI AI Chat
Log In Sign Up

Statistical Model Compression for Small-Footprint Natural Language Understanding

by   Grant P. Strimel, et al.

In this paper we investigate statistical model compression applied to natural language understanding (NLU) models. Small-footprint NLU models are important for enabling offline systems on hardware restricted devices, and for decreasing on-demand model loading latency in cloud-based systems. To compress NLU models, we present two main techniques, parameter quantization and perfect feature hashing. These techniques are complementary to existing model pruning strategies such as L1 regularization. We performed experiments on a large scale NLU system. The results show that our approach achieves 14-fold reduction in memory usage compared to the original models with minimal predictive performance impact.


page 1

page 2

page 3

page 4


Extreme Model Compression for On-device Natural Language Understanding

In this paper, we propose and experiment with techniques for extreme com...

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

How to efficiently serve ever-larger trained natural language models in ...

Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning

Pretrained large-scale language models have increasingly demonstrated hi...

Greener yet Powerful: Taming Large Code Generation Models with Quantization

ML-powered code generation aims to assist developers to write code in a ...

AutoNLU: An On-demand Cloud-based Natural Language Understanding System for Enterprises

With the renaissance of deep learning, neural networks have achieved pro...

Out-of-domain Detection for Natural Language Understanding in Dialog Systems

In natural language understanding components, detecting out-of-domain (O...

Ecological Semantics: Programming Environments for Situated Language Understanding

Large-scale natural language understanding (NLU) systems have made impre...