Multi-D Kneser-Ney Smoothing Preserving the Original Marginal Distributions

07/10/2018
by   András Dobó, et al.
0

Smoothing is an essential tool in many NLP tasks, therefore numerous techniques have been developed for this purpose in the past. One of the most widely used smoothing methods are the Kneser-Ney smoothing (KNS) and its variants, including the Modified Kneser-Ney smoothing (MKNS), which are widely considered to be among the best smoothing methods available. Although when creating the original KNS the intention of the authors was to develop such a smoothing method that preserves the marginal distributions of the original model, this property was not maintained when developing the MKNS. In this article I would like to overcome this and propose such a refined version of the MKNS that preserves these marginal distributions while keeping the advantages of both previous versions. Beside its advantageous properties, this novel smoothing method is shown to achieve about the same results as the MKNS in a standard language modelling task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2017

Comparison of Modified Kneser-Ney and Witten-Bell Smoothing Techniques in Statistical Language Model of Bahasa Indonesia

Smoothing is one technique to overcome data sparsity in statistical lang...
research
07/28/2016

Incremental Noising and its Fractal Behavior

This manuscript is about further elucidating the concept of noising. The...
research
06/12/2023

Revisiting Whittaker-Henderson Smoothing

Introduced nearly a century ago, Whittaker-Henderson smoothing remains o...
research
03/12/2020

Post-Estimation Smoothing: A Simple Baseline for Learning with Side Information

Observational data are often accompanied by natural structural indices, ...
research
12/07/2017

A multiplicative masking method for preserving the skewness of the original micro-records

Masking methods for the safe dissemination of microdata consist of disto...
research
02/04/2019

Bayesian views of generalized additive modelling

Links between frequentist and Bayesian approaches to smoothing were high...
research
04/10/2016

Distance for Functional Data Clustering Based on Smoothing Parameter Commutation

We propose a novel method to determine the dissimilarity between subject...

Please sign up or login with your details

Forgot password? Click here to reset