NYTWIT: A Dataset of Novel Words in the New York Times

03/06/2020
by   Yuval Pinter, et al.
0

We present the New York Times Word Innovation Types dataset, or NYTWIT, a collection of over 2,500 novel English words published in the New York Times between November 2017 and March 2019, manually annotated for their class of novelty (such as lexical derivation, dialectal variation, blending, or compounding). We present baseline results for both uncontextual and contextual prediction of novelty class, showing that there is room for improvement even for state-of-the-art NLP systems. We hope this resource will prove useful for linguists and NLP practitioners by providing a real-world environment of novel word appearance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2022

The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and Multi-Purpose Corpus of Patent Applications

Innovation is a major driver of economic and social development, and inf...
research
09/11/2019

How to detect novelty in textual data streams? A comparative study of existing methods

Since datasets with annotation for novelty at the document and/or word l...
research
06/12/2019

Putting words in context: LSTM language models and lexical ambiguity

In neural network models of language, words are commonly represented usi...
research
08/02/2018

OntoSenseNet: A Verb-Centric Ontological Resource for Indian Languages

Following approaches for understanding lexical meaning developed by Yask...
research
03/28/2022

Filler Word Detection and Classification: A Dataset and Benchmark

Filler words such as `uh' or `um' are sounds or words people use to sign...
research
10/13/2020

RuSemShift: a dataset of historical lexical semantic change in Russian

We present RuSemShift, a large-scale manually annotated test set for the...
research
04/02/2016

Embedding Lexical Features via Low-Rank Tensors

Modern NLP models rely heavily on engineered features, which often combi...

Please sign up or login with your details

Forgot password? Click here to reset