Log In Sign Up

Embedding Compression for Text Classification Using Dictionary Screening

by   Jing Zhou, et al.

In this paper, we propose a dictionary screening method for embedding compression in text classification tasks. The key purpose of this method is to evaluate the importance of each keyword in the dictionary. To this end, we first train a pre-specified recurrent neural network-based model using a full dictionary. This leads to a benchmark model, which we then use to obtain the predicted class probabilities for each sample in a dataset. Next, to evaluate the impact of each keyword in affecting the predicted class probabilities, we develop a novel method for assessing the importance of each keyword in a dictionary. Consequently, each keyword can be screened, and only the most important keywords are reserved. With these screened keywords, a new dictionary with a considerably reduced size can be constructed. Accordingly, the original text sequence can be substantially compressed. The proposed method leads to significant reductions in terms of parameters, average text sequence, and dictionary size. Meanwhile, the prediction power remains very competitive compared to the benchmark model. Extensive numerical studies are presented to demonstrate the empirical performance of the proposed method.


Online Keyword Spotting with a Character-Level Recurrent Neural Network

In this paper, we propose a context-aware keyword spotting model employi...

Weakly-supervised Text Classification Based on Keyword Graph

Weakly-supervised text classification has received much attention in rec...

FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification

Weakly-supervised text classification aims to train a classifier using o...

Few-Shot Keyword Spotting in Any Language

We introduce a few-shot transfer learning method for keyword spotting in...

Interactive Semantic Featuring for Text Classification

In text classification, dictionaries can be used to define human-compreh...

Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

It has been proved that deep neural networks are facing a new threat cal...

Measuring Economic Policy Uncertainty Using an Unsupervised Word Embedding-based Method

Economic Policy Uncertainty (EPU) is a critical indicator in economic st...


  • A. Acharya, R. Goel, A. Metallinou, and I. Dhillon (2019) Online embedding compression for text classification using low rank matrix factorization. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    Vol. 33, pp. 6196–6203. Cited by: p11, p32.
  • I. Aizenberg, A. Luchetta, and S. Manetti (2012) A modified learning algorithm for the multilayer neural network with multi-valued neurons based on the complex qr decomposition. Soft Computing 16 (4), pp. 563–575. Cited by: p10.
  • S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives (2007) Dbpedia: a nucleus for a web of open data. In The semantic web, pp. 722–735. Cited by: p4.
  • D. Bahdanau, K. Cho, and Y. Bengio (2014) Neural machine translation by jointly learning to align and translate. Computer Science. Cited by: p4.
  • K. Cho, B. V. Merrienboer, C. Gulcehre, D. BaHdanau, F. Bougares, H. Schwenk, and Y. Bengio (2014a) Learning phrase representations using rnn encoder-decoder for statistical machine translation. Computer Science. Cited by: p4.
  • K. Cho, B. V. Merrienboer, C. Gulcehre, D. Schwenk, F. Bougares, H. Schwenk, and Y. Bengio (2014b) Learning phrase representations using rnn encoder-decoder for statistical machine translation. Computer Science. Cited by: p5.
  • J. Chung, C. Gulcehre, K. Cho, and Y. Bengio (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Cited by: p10.
  • G. Cybendo (1989)

    Approximations by superpositions of a sigmoidal function

    Mathematics of Control, Signals and Systems 2, pp. 183–192. Cited by: p16.
  • L. Deng, G. Li, S. Han, L. Shi, and Y. Xie (2020) Model compression and hardware acceleration for neural networks: a comprehensive survey. Proceedings of the IEEE 108 (4), pp. 485–532. Cited by: p10.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2019) BERT: pre-training of deep bidirectional transformers for language understanding. External Links: 1810.04805 Cited by: p4.
  • A. Gittens and M. W. Mahoney (2016)

    Revisiting the nyström method for improved large-scale machine learning

    The Journal of Machine Learning Research 17 (1), pp. 3977–4041. Cited by: p10.
  • X. Glorot, A. Bordes, and Y. Bengio (2011) Deep sparse rectifier neural networks. In Journal of Machine Learning Research, pp. 315–323. Cited by: p24.
  • Z. Han, L. Zhengdong, and P. Poupart (2015) Self-adaptive hierarchical sentencemodel. In Proceedings of International Joint Conferences on Artificial Intelligence, Cited by: p4.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural Computation 9 (8), pp. 1735–1780. Cited by: p4.
  • K. Hornik, M. B. Stinchcombe, and H. White (1989) Multilayer feedforward networks are universal approximators. Neural Networks 2, pp. 359–366. Cited by: p16.
  • E. Hossain, O. Sharif, M. M. Hoque, and I. H. Sarker (2020) SentiLSTM: a deep learning approach for sentiment analysis of restaurant reviews. In Proceedings of 20th International Conference on Hybrid Intelligent Systems, Cited by: p4.
  • H. Hu, R. Peng, Y. W. Tai, and C. K. Tang (2016) Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. Cited by: p10.
  • Y. Hu, S. Sun, J. Li, X. Wang, and Q. Gu (2018) A novel channel pruning method for deep neural network compression. Cited by: p10.
  • F. Jelinek and R. I. Mercer (1980) Interpolated estimation of markov source parameters from sparse data. In

    Proceedings, Workshop on Pattern Recognition in Practice

    Cited by: p4.
  • A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jegou, and T. Mikolov (2016) compressing text classification models. arXiv preprint arXiv:1612.03651.. Cited by: p11.
  • Y. Kim (2014a) Convolutional neural networks for sentence classification. Eprint Arxiv. Cited by: p23.
  • Y. Kim (2014b) Convolutional neural networks for sentence classification. In Proceedings of Empirical Methods on Natural Language Processing, Cited by: p4.
  • R. Kneser and H. Ney (1995) Improved backing-off for m-gram language modeling. In 1995 International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 181–184 vol.1. External Links: Document Cited by: p4.
  • C. H. Li and S. C. Park (2007)

    Neural network for text classification based on singular value decomposition

    In 7th IEEE International Conference on Computer and Information Technology (CIT 2007), pp. 47–52. Cited by: p10.
  • H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf (2016) Pruning filters for efficient convnets. Cited by: p10.
  • Li,F, M. Zhang, G. Fu, T. Qian, and J. D (2016) A bi-lstm-rnn model for relation classification using low-cost sequence features. Cited by: p23, p25.
  • Y. Liu, S. Yang, P. Wu, C. Li, and M. Yang (2015) {}-Norm low-rank matrix decomposition by neural networks and mollifiers. IEEE transactions on neural networks and learning systems 27 (2), pp. 273–283. Cited by: p10.
  • J. Luo and J. Wu (2017) An Entropy-based Pruning Method for CNN Compression. arXiv e-prints. Cited by: p10.
  • T. Mikolov, K. Chen, G. Corrado, and J. Dean (2013) Efficient estimation of word representations in vector space. Cited by: p23.
  • H. Nakayama (2017) Compressing word embeddings via deep compositional code learning. arXiv preprint arXiv:1711.01068.. Cited by: p11.
  • H. Qi, J. Cao, S. Chen, and J. Zhou (2022) Compressing recurrent neural network models through principal component analysis. Statistics and Its Interface. External Links: Document Cited by: p10.
  • V. Raunak (2017) Effective dimensionality reduction for word embeddings. arXiv preprint arXiv:1708.03629.. Cited by: p11.
  • D. S. Sachan, M. Zaheer, and R. Salakhutdinov (2019) Revisiting lstm networks for semi-supervised text classification via mixed objective function. Proceedings of the AAAI Conference on Artificial Intelligence. Cited by: p4.
  • T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, and B. Ramabhadran (2013) Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In 2013 IEEE international conference on acoustics, speech and signal processing, pp. 6655–6659. Cited by: p10.
  • K. Sparck-Jones (1972) A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28 (1), pp. 11–21. Cited by: p29.
  • C. Thurau, K. Kersting, and C. Bauckhage (2012) Deterministic cur for improved large-scale data analysis: an empirical study. In Proceedings of the 2012 SIAM International Conference on Data Mining, pp. 684–695. Cited by: p10.
  • J. Van Der Westhuizen and J. Lasenby (2018) The unreasonable effectiveness of the forget gate. arXiv preprint arXiv:1804.04849. Cited by: p10.
  • P. Wang, X. Xie, L. Deng, G. Li, D. Wang, and Y. Xie (2018) HitNet: hybrid ternary recurrent neural network. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 602–612. Cited by: p10.
  • Z. Wu and S. King (2016) Investigating gated recurrent networks for speech synthesis. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5140–5144. Cited by: p10.
  • Y. Xiao and K. Cho (2016) Efficient character-level document classification by combining convolution and recurrent layers. Cited by: p11, p21, p32.
  • C. Xu, J. Yao, Z. Lin, W. Ou, Y. Cao, Z. Wang, and H. Zha (2018) Alternating multi-bit quantization for recurrent neural networks. arXiv preprint arXiv:1802.00150. Cited by: p10.
  • J. Xue, J. Li, D. Yu, M. Seltzer, and Y. Gong (2014) Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6359–6363. Cited by: p10.
  • M. Yu, Z. Lin, K. Narra, S. Li, Y. Li, N. S. Kim, A. Schwing, M. Annavaram, and S. Avestimehr (2018) Gradiveq: vector quantization for bandwidth-efficient gradient aggregation in distributed cnn training. arXiv preprint arXiv:1811.03617. Cited by: p10.
  • X. Zhang, J. Zhao, and Y. Lecun (2015) Character-level convolutional networks for text classification. MIT Press. Cited by: p21, p32, p5.
  • J. Zhou, Y. Chen, and H. Wang (2021) Progressive principle component analysis for compressing deep convolutional neural networks. Neurocomputing 440, pp. 197–206. Cited by: p10.