ConFiguRe: Exploring Discourse-level Chinese Figures of Speech

09/16/2022
by   Dawei Zhu, et al.
0

Figures of speech, such as metaphor and irony, are ubiquitous in literature works and colloquial conversations. This poses great challenge for natural language understanding since figures of speech usually deviate from their ostensible meanings to express deeper semantic implications. Previous research lays emphasis on the literary aspect of figures and seldom provide a comprehensive exploration from a view of computational linguistics. In this paper, we first propose the concept of figurative unit, which is the carrier of a figure. Then we select 12 types of figures commonly used in Chinese, and build a Chinese corpus for Contextualized Figure Recognition (ConFiguRe). Different from previous token-level or sentence-level counterparts, ConFiguRe aims at extracting a figurative unit from discourse-level context, and classifying the figurative unit into the right figure type. On ConFiguRe, three tasks, i.e., figure extraction, figure type classification and figure recognition, are designed and the state-of-the-art techniques are utilized to implement the benchmarks. We conduct thorough experiments and show that all three tasks are challenging for existing models, thus requiring further research. Our dataset and code are publicly available at https://github.com/pku-tangent/ConFiguRe.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/19/2017

A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text

Named Entity Recognition and Relation Extraction for Chinese literature ...
research
07/16/2023

Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

Modeling discourse – the linguistic phenomena that go beyond individual ...
research
06/30/2021

ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information

Recent pretraining models in Chinese neglect two important aspects speci...
research
05/07/2022

Unified Chinese License Plate Detection and Recognition with High Efficiency

Recently, deep learning-based methods have reached an excellent performa...
research
02/26/2022

QuoteR: A Benchmark of Quote Recommendation for Writing

It is very common to use quotations (quotes) to make our writings more e...
research
09/18/2023

Proposition from the Perspective of Chinese Language: A Chinese Proposition Classification Evaluation Benchmark

Existing propositions often rely on logical constants for classification...
research
10/14/2021

Building Chinese Biomedical Language Models via Multi-Level Text Discrimination

Pre-trained language models (PLMs), such as BERT and GPT, have revolutio...

Please sign up or login with your details

Forgot password? Click here to reset