DeepAI AI Chat
Log In Sign Up

Detecting Syntactic Features of Translated Chinese

04/23/2018
by   Hai Hu, et al.
Indiana University Bloomington
0

We present a machine learning approach to distinguish texts translated to Chinese (by humans) from texts originally written in Chinese, with a focus on a wide range of syntactic features. Using Support Vector Machines (SVMs) as classifier on a genre-balanced corpus in translation studies of Chinese, we find that constituent parse trees and dependency triples as features without lexical information perform very well on the task, with an F-measure above 90 close to the results of lexical n-gram features, without the risk of learning topic information rather than translation features. Thus, we claim syntactic features alone can accurately distinguish translated from original Chinese. Translated Chinese exhibits an increased use of determiners, subject position pronouns, NP + 'de' as NP modifiers, multiple NPs or VPs conjoined by a Chinese specific punctuation, among other structures. We also interpret the syntactic features with reference to previous translation studies in Chinese, particularly the usage of pronouns.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/10/2022

An Analysis of the Differences Among Regional Varieties of Chinese in Malay Archipelago

Chinese features prominently in the Chinese communities located in the n...
11/08/2015

A Chinese POS Decision Method Using Korean Translation Information

In this paper we propose a method that imitates a translation expert usi...
02/03/2023

Towards a responsible machine learning approach to identify forced labor in fisheries

Many fishing vessels use forced labor, but identifying vessels that enga...
04/08/2015

Exploring Lexical, Syntactic, and Semantic Features for Chinese Textual Entailment in NTCIR RITE Evaluation Tasks

We computed linguistic information at the lexical, syntactic, and semant...
05/20/2018

The UN Parallel Corpus Annotated for Translation Direction

This work distinguishes between translated and original text in the UN p...
01/12/2017

A Data-Oriented Model of Literary Language

We consider the task of predicting how literary a text is, with a gold s...
04/22/2019

A syntactic approach to continuity of T-definable functionals

We give a new proof of the well-known fact that all functions (N→N) →N w...