References

Online embedding compression for text classification using low rank matrix factorization.
In
Proceedings of the AAAI Conference on Artificial Intelligence
, Vol. 33, pp. 6196–6203. Cited by: p11, p32.  A modified learning algorithm for the multilayer neural network with multivalued neurons based on the complex qr decomposition. Soft Computing 16 (4), pp. 563–575. Cited by: p10.
 Dbpedia: a nucleus for a web of open data. In The semantic web, pp. 722–735. Cited by: p4.
 Neural machine translation by jointly learning to align and translate. Computer Science. Cited by: p4.
 Learning phrase representations using rnn encoderdecoder for statistical machine translation. Computer Science. Cited by: p4.
 Learning phrase representations using rnn encoderdecoder for statistical machine translation. Computer Science. Cited by: p5.
 Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Cited by: p10.

Approximations by superpositions of a sigmoidal function
. Mathematics of Control, Signals and Systems 2, pp. 183–192. Cited by: p16.  Model compression and hardware acceleration for neural networks: a comprehensive survey. Proceedings of the IEEE 108 (4), pp. 485–532. Cited by: p10.
 BERT: pretraining of deep bidirectional transformers for language understanding. External Links: 1810.04805 Cited by: p4.

Revisiting the nyström method for improved largescale machine learning
. The Journal of Machine Learning Research 17 (1), pp. 3977–4041. Cited by: p10.  Deep sparse rectifier neural networks. In Journal of Machine Learning Research, pp. 315–323. Cited by: p24.
 Selfadaptive hierarchical sentencemodel. In Proceedings of International Joint Conferences on Artificial Intelligence, Cited by: p4.
 Long shortterm memory. Neural Computation 9 (8), pp. 1735–1780. Cited by: p4.
 Multilayer feedforward networks are universal approximators. Neural Networks 2, pp. 359–366. Cited by: p16.
 SentiLSTM: a deep learning approach for sentiment analysis of restaurant reviews. In Proceedings of 20th International Conference on Hybrid Intelligent Systems, Cited by: p4.
 Network trimming: a datadriven neuron pruning approach towards efficient deep architectures. Cited by: p10.
 A novel channel pruning method for deep neural network compression. Cited by: p10.

Interpolated estimation of markov source parameters from sparse data.
In
Proceedings, Workshop on Pattern Recognition in Practice
, Cited by: p4.  Fasttext.zip: compressing text classification models. arXiv preprint arXiv:1612.03651.. Cited by: p11.
 Convolutional neural networks for sentence classification. Eprint Arxiv. Cited by: p23.
 Convolutional neural networks for sentence classification. In Proceedings of Empirical Methods on Natural Language Processing, Cited by: p4.
 Improved backingoff for mgram language modeling. In 1995 International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 181–184 vol.1. External Links: Document Cited by: p4.

Neural network for text classification based on singular value decomposition
. In 7th IEEE International Conference on Computer and Information Technology (CIT 2007), pp. 47–52. Cited by: p10.  Pruning filters for efficient convnets. Cited by: p10.
 A bilstmrnn model for relation classification using lowcost sequence features. Cited by: p23, p25.
 {}Norm lowrank matrix decomposition by neural networks and mollifiers. IEEE transactions on neural networks and learning systems 27 (2), pp. 273–283. Cited by: p10.
 An Entropybased Pruning Method for CNN Compression. arXiv eprints. Cited by: p10.
 Efficient estimation of word representations in vector space. Cited by: p23.
 Compressing word embeddings via deep compositional code learning. arXiv preprint arXiv:1711.01068.. Cited by: p11.
 Compressing recurrent neural network models through principal component analysis. Statistics and Its Interface. External Links: Document Cited by: p10.
 Effective dimensionality reduction for word embeddings. arXiv preprint arXiv:1708.03629.. Cited by: p11.
 Revisiting lstm networks for semisupervised text classification via mixed objective function. Proceedings of the AAAI Conference on Artificial Intelligence. Cited by: p4.
 Lowrank matrix factorization for deep neural network training with highdimensional output targets. In 2013 IEEE international conference on acoustics, speech and signal processing, pp. 6655–6659. Cited by: p10.
 A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28 (1), pp. 11–21. Cited by: p29.
 Deterministic cur for improved largescale data analysis: an empirical study. In Proceedings of the 2012 SIAM International Conference on Data Mining, pp. 684–695. Cited by: p10.
 The unreasonable effectiveness of the forget gate. arXiv preprint arXiv:1804.04849. Cited by: p10.
 HitNet: hybrid ternary recurrent neural network. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 602–612. Cited by: p10.
 Investigating gated recurrent networks for speech synthesis. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5140–5144. Cited by: p10.
 Efficient characterlevel document classification by combining convolution and recurrent layers. Cited by: p11, p21, p32.
 Alternating multibit quantization for recurrent neural networks. arXiv preprint arXiv:1802.00150. Cited by: p10.
 Singular value decomposition based lowfootprint speaker adaptation and personalization for deep neural network. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6359–6363. Cited by: p10.
 Gradiveq: vector quantization for bandwidthefficient gradient aggregation in distributed cnn training. arXiv preprint arXiv:1811.03617. Cited by: p10.
 Characterlevel convolutional networks for text classification. MIT Press. Cited by: p21, p32, p5.
 Progressive principle component analysis for compressing deep convolutional neural networks. Neurocomputing 440, pp. 197–206. Cited by: p10.