Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World Knowledge

04/06/2021
by   Canwen Xu, et al.
0

Cant is important for understanding advertising, comedies and dog-whistle politics. However, computational research on cant is hindered by a lack of available datasets. In this paper, we propose a large and diverse Chinese dataset for creating and understanding cant from a computational linguistics perspective. We formulate a task for cant understanding and provide both quantitative and qualitative analysis for tested word embedding similarity and pretrained language models. Experiments suggest that such a task requires deep language understanding, common sense, and world knowledge and thus can be a good testbed for pretrained language models and help models perform better on other tasks. The code is available at https://github.com/JetRunner/dogwhistle. The data and leaderboard are available at https://competitions.codalab.org/competitions/30451.

READ FULL TEXT
research
08/03/2023

Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models

Large Language Models (LLMs) demonstrate remarkable performance on a var...
research
09/06/2021

An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models

Prompt-based knowledge probing for 1-hop relations has been used to meas...
research
10/20/2022

Counterfactual Recipe Generation: Exploring Compositional Generalization in a Realistic Scenario

People can acquire knowledge in an unsupervised manner by reading, and c...
research
03/23/2022

Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View

Prompt-based probing has been widely used in evaluating the abilities of...
research
07/15/2023

Creating a Dataset for High-Performance Computing Code Translation: A Bridge Between HPC Fortran and C++

In this study, we present a novel dataset for training machine learning ...
research
02/14/2023

READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises

For many real-world applications, the user-generated inputs usually cont...
research
11/29/2022

DiffG-RL: Leveraging Difference between State and Common Sense

Taking into account background knowledge as the context has always been ...

Please sign up or login with your details

Forgot password? Click here to reset