WikiText Dataset

09/26/2016 ∙ 0

DOWNLOAD WikiText-2

wget https://data.deepai.org/wikitext2.zip
The WikiText language modeling dataset is a collection of 100M+ tokens extracted from a collection of Wikipedia articles that have been verified as "Good" or "Featured".

WikiText-103

DOWNLOAD
Compared to the preprocessed PTB dataset version, WikiText-2 is over 2x larger and WikiText-103 is over 110x larger.