Neural or Statistical: An Empirical Study on Language Models for Chinese Input Recommendation on Mobile

07/09/2019
by   Hainan Zhang, et al.
0

Chinese input recommendation plays an important role in alleviating human cost in typing Chinese words, especially in the scenario of mobile applications. The fundamental problem is to predict the conditional probability of the next word given the sequence of previous words. Therefore, statistical language models, i.e. n-grams based models, have been extensively used on this task in real application. However, the characteristics of extremely different typing behaviors usually lead to serious sparsity problem, even n-gram with smoothing will fail. A reasonable approach to tackle this problem is to use the recently proposed neural models, such as probabilistic neural language model, recurrent neural network and word2vec. They can leverage more semantically similar words for estimating the probability. However, there is no conclusion on which approach of the two will work better in real application. In this paper, we conduct an extensive empirical study to show the differences between statistical and neural language models. The experimental results show that the two different approach have individual advantages, and a hybrid approach will bring a significant improvement.

READ FULL TEXT

page 4

page 5

page 7

page 8

page 9

page 14

page 16

page 19

research
12/21/2016

An Empirical Study of Language CNN for Image Captioning

Language Models based on recurrent neural networks have dominated recent...
research
05/06/2015

A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models

In this paper, we propose the new fixed-size ordinally-forgetting encodi...
research
03/07/2017

Data Noising as Smoothing in Neural Network Language Models

Data noising is an effective technique for regularizing neural network m...
research
11/07/2022

Probing neural language models for understanding of words of estimative probability

Words of estimative probability (WEP) are expressions of a statement's p...
research
03/21/2022

Better Language Model with Hypernym Class Prediction

Class-based language models (LMs) have been long devised to address cont...
research
10/07/2018

Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling

Previous traditional approaches to unsupervised Chinese word segmentatio...
research
11/02/2020

ABNIRML: Analyzing the Behavior of Neural IR Models

Numerous studies have demonstrated the effectiveness of pretrained conte...

Please sign up or login with your details

Forgot password? Click here to reset