Sentence Length

05/22/2019
by   Gábor Borbély, et al.
0

The distribution of sentence length in ordinary language is not well captured by the existing models. Here we survey previous models of sentence length and present our random walk model that offers both a better fit with the data and a better understanding of the distribution. We develop a generalization of KL divergence, discuss measuring the noise inherent in a corpus, and present a hyperparameter-free Bayesian model comparison method that has strong conceptual ties to Minimal Description Length modeling. The models we obtain require only a few dozen bits, orders of magnitude less than the naive nonparametric MDL models would.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2016

Random Walk Models of Network Formation and Sequential Monte Carlo Methods for Graphs

We introduce a class of network models that insert edges by connecting t...
research
09/28/2021

CIDEr-R: Robust Consensus-based Image Description Evaluation

This paper shows that CIDEr-D, a traditional evaluation metric for image...
research
09/17/2019

Controllable Length Control Neural Encoder-Decoder via Reinforcement Learning

Controlling output length in neural language generation is valuable in m...
research
04/18/2017

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

This paper introduces the Multi-Genre Natural Language Inference (MultiN...
research
05/15/2023

Sentence Level Curriculum Learning for Improved Neural Conversational Models

Designing machine intelligence to converse with a human user necessarily...
research
09/06/2021

Exposing Length Divergence Bias of Textual Matching Models

Despite the remarkable success deep models have achieved in Textual Matc...

Please sign up or login with your details

Forgot password? Click here to reset