Efficient Calculation of Bigram Frequencies in a Corpus of Short Texts

04/18/2016
by   Melvyn Drag, et al.
0

We show that an efficient and popular method for calculating bigram frequencies is unsuitable for bodies of short texts and offer a simple alternative. Our method has the same computational complexity as the old method and offers an exact count instead of an approximation.

READ FULL TEXT

page 1

page 2

research
05/20/2023

Hedges in Bidirectional Translations of Publicity-Oriented Documents

Hedges are widely studied across registers and disciplines, yet research...
research
02/07/2015

An investigation into language complexity of World-of-Warcraft game-external texts

We present a language complexity analysis of World of Warcraft (WoW) com...
research
01/30/2018

Manuscripts in Time and Space: Experiments in Scriptometrics on an Old French Corpus

Witnesses of medieval literary texts, preserved in manuscript, are layer...
research
11/11/2016

Generalized Entropies and the Similarity of Texts

We show how generalized Gibbs-Shannon entropies can provide new insights...
research
07/26/2017

Fast calculation of entropy with Zhang's estimator

Entropy is a fundamental property of a repertoire. Here, we present an e...
research
03/27/2022

Blind Source Separation for Mixture of Sinusoids with Near-Linear Computational Complexity

We propose a multi-tone decomposition algorithm that can find the freque...
research
04/09/2022

Moment estimates in the first Borel-Cantelli Lemma with applications to mean deviation frequencies

We quantify the elementary Borel-Cantelli Lemma by higher moments of the...

Please sign up or login with your details

Forgot password? Click here to reset