Log In Sign Up

Ranking Creative Language Characteristics in Small Data Scenarios

by   Julia Siekiera, et al.

The ability to rank creative natural language provides an important general tool for downstream language understanding and generation. However, current deep ranking models require substantial amounts of labeled data that are difficult and expensive to obtain for different domains, languages and creative characteristics. A recent neural approach, the DirectRanker, promises to reduce the amount of training data needed but its application to text isn't fully explored. We therefore adapt the DirectRanker to provide a new deep model for ranking creative language with small data. We compare DirectRanker with a Bayesian approach, Gaussian process preference learning (GPPL), which has previously been shown to work well with sparse data. Our experiments with sparse training data show that while the performance of standard neural ranking approaches collapses with small training datasets, DirectRanker remains effective. We find that combining DirectRanker with GPPL increases performance across different settings by leveraging the complementary benefits of both models. Our combined approach outperforms the previous state-of-the-art on humor and metaphor novelty tasks, increasing Spearman's ρ by 14 on average.


page 1

page 2

page 3

page 4


Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding

The goal of this paper is to use multi-task learning to efficiently scal...

Transliteration in Any Language with Surrogate Languages

We introduce a method for transliteration generation that can produce tr...

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

Most recent progress in natural language understanding (NLU) has been dr...

Finding Convincing Arguments Using Scalable Bayesian Preference Learning

We introduce a scalable Bayesian preference learning method for identify...

Jointly Improving Language Understanding and Generation with Quality-Weighted Weak Supervision of Automatic Labeling

Neural natural language generation (NLG) and understanding (NLU) models ...

Domain Adaptation for Sparse-Data Settings: What Do We Gain by Not Using Bert?

The practical success of much of NLP depends on the availability of trai...

Content-Based Features to Rank Influential Hidden Services of the Tor Darknet

The unevenness importance of criminal activities in the onion domains of...