emLam -- a Hungarian Language Modeling baseline

01/26/2017
by   Dávid Márk Nemeskey, et al.
0

This paper aims to make up for the lack of documented baselines for Hungarian language modeling. Various approaches are evaluated on three publicly available Hungarian corpora. Perplexity values comparable to models of similar-sized English corpora are reported. A new, freely downloadable Hungar- ian benchmark corpus is introduced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2023

Trained on 100 million words and still in shape: BERT meets British National Corpus

While modern masked language models (LMs) are trained on ever larger cor...
research
10/08/2017

The IIT Bombay English-Hindi Parallel Corpus

We present the IIT Bombay English-Hindi Parallel Corpus. The corpus is a...
research
12/14/2021

Towards Interactive Language Modeling

Interaction between caregivers and children plays a critical role in hum...
research
09/26/2016

Pointer Sentinel Mixture Models

Recent neural network sequence models with softmax classifiers have achi...
research
01/31/2023

In-Context Retrieval-Augmented Language Models

Retrieval-Augmented Language Modeling (RALM) methods, that condition a l...
research
12/03/2019

Unsupervised Inflection Generation Using Neural Language Modeling

The use of Deep Neural Network architectures for Language Modeling has r...
research
05/07/2020

The Danish Gigaword Project

Danish is a North Germanic/Scandinavian language spoken primarily in Den...

Please sign up or login with your details

Forgot password? Click here to reset