Unified Likelihood Ratio Estimation for High- to Zero-frequency N-grams

10/03/2021
by   Masato Kikuchi, et al.
0

Likelihood ratios (LRs), which are commonly used for probabilistic data processing, are often estimated based on the frequency counts of individual elements obtained from samples. In natural language processing, an element can be a continuous sequence of N items, called an N-gram, in which each item is a word, letter, etc. In this paper, we attempt to estimate LRs based on N-gram frequency information. A naive estimation approach that uses only N-gram frequencies is sensitive to low-frequency (rare) N-grams and not applicable to zero-frequency (unobserved) N-grams; these are known as the low- and zero-frequency problems, respectively. To address these problems, we propose a method for decomposing N-grams into item units and then applying their frequencies along with the original N-gram frequencies. Our method can obtain the estimates of unobserved N-grams by using the unit frequencies. Although using only unit frequencies ignores dependencies between items, our method takes advantage of the fact that certain items often co-occur in practice and therefore maintains their dependencies by using the relevant N-gram frequencies. We also introduce a regularization to achieve robust estimation for rare N-grams. Our experimental results demonstrate that our method is effective at solving both problems and can effectively control dependencies.

READ FULL TEXT
research
10/28/2022

Conservative Likelihood Ratio Estimator for Infrequent Data Slightly above a Frequency Threshold

A naive likelihood ratio (LR) estimation using the observed frequencies ...
research
11/05/2021

Feature Selective Likelihood Ratio Estimator for Low- and Zero-frequency N-grams

In natural language processing (NLP), the likelihood ratios (LRs) of N-g...
research
12/16/2019

A new Frequency Estimation Sketch for Data Streams

In data stream applications, one of the critical issues is to estimate t...
research
12/05/2018

Calibrate: Frequency Estimation and Heavy Hitter Identification with Local Differential Privacy via Incorporating Prior Knowledge

Estimating frequencies of certain items among a population is a basic st...
research
04/25/2021

Frequency Superposition – A Multi-Frequency Stimulation Method in SSVEP-based BCIs

The steady-state visual evoked potential (SSVEP) is one of the most wide...
research
05/08/2023

Neural Steerer: Novel Steering Vector Synthesis with a Causal Neural Field over Frequency and Source Positions

Neural fields have successfully been used in many research fields for th...
research
02/01/2023

Low-Frequency Stabilization of Dielectric Simulation Problems with Conductors and Insulators

When simulating resistive-capacitive circuits or electroquasistatic prob...

Please sign up or login with your details

Forgot password? Click here to reset