Ancient Chinese Word Segmentation and Part-of-Speech Tagging Using Distant Supervision

03/03/2023
by   Shuo Feng, et al.
0

Ancient Chinese word segmentation (WSG) and part-of-speech tagging (POS) are important to study ancient Chinese, but the amount of ancient Chinese WSG and POS tagging data is still rare. In this paper, we propose a novel augmentation method of ancient Chinese WSG and POS tagging data using distant supervision over parallel corpus. However, there are still mislabeled and unlabeled ancient Chinese words inevitably in distant supervision. To address this problem, we take advantage of the memorization effects of deep neural networks and a small amount of annotated data to get a model with much knowledge and a little noise, and then we use this model to relabel the ancient Chinese sentences in parallel corpus. Experiments show that the model trained over the relabeled data outperforms the model trained over the data generated from distant supervision and the annotated data. Our code is available at https://github.com/farlit/ACDS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2015

A Chinese POS Decision Method Using Korean Translation Information

In this paper we propose a method that imitates a translation expert usi...
research
11/30/2016

Towards Accurate Word Segmentation for Chinese Patents

A patent is a property right for an invention granted by the government ...
research
12/17/2021

Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage Span Labeling

Chinese word segmentation and part-of-speech tagging are necessary tasks...
research
09/19/2023

A comparative study of Grid and Natural sentences effects on Normal-to-Lombard conversion

Grid sentence is commonly used for studying the Lombard effect and Norma...
research
09/21/2017

Inducing Distant Supervision in Suggestion Mining through Part-of-Speech Embeddings

Mining suggestion expressing sentences from a given text is a less inves...
research
02/22/2017

Improving Chinese SRL with Heterogeneous Annotations

Previous studies on Chinese semantic role labeling (SRL) have concentrat...
research
05/21/2019

A Seq-to-Seq Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation

The prevalent approaches of Chinese word segmentation task almost rely o...

Please sign up or login with your details

Forgot password? Click here to reset