Explaining Translationese: why are Neural Classifiers Better and what do they Learn?

Recent work has shown that neural feature- and representation-learning, e.g. BERT, achieves superior performance over traditional manual feature engineering based approaches, with e.g. SVMs, in translationese classification tasks. Previous research did not show (i) whether the difference is because of the features, the classifiers or both, and (ii) what the neural classifiers actually learn. To address (i), we carefully design experiments that swap features between BERT- and SVM-based classifiers. We show that an SVM fed with BERT representations performs at the level of the best BERT classifiers, while BERT learning and using handcrafted features performs at the level of an SVM using handcrafted features. This shows that the performance differences are due to the features. To address (ii) we use integrated gradients and find that (a) there is indication that information captured by hand-crafted features is only a subset of what BERT learns, and (b) part of BERT's top performance results are due to BERT learning topic differences and spurious correlations with translationese.

READ FULL TEXT
research
09/15/2021

Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification

Traditional hand-crafted linguistically-informed features have often bee...
research
08/25/2023

Measuring Spurious Correlation in Classification: 'Clever Hans' in Translationese

Recent work has shown evidence of 'Clever Hans' behavior in high-perform...
research
07/28/2020

Improving Results on Russian Sentiment Datasets

In this study, we test standard neural network architectures (CNN, LSTM,...
research
10/15/2020

Does Chinese BERT Encode Word Structure?

Contextualized representations give significantly improved results for a...
research
09/13/2022

CNN-Trans-Enc: A CNN-Enhanced Transformer-Encoder On Top Of Static BERT representations for Document Classification

BERT achieves remarkable results in text classification tasks, it is yet...
research
07/13/2021

Using BERT Encoding to Tackle the Mad-lib Attack in SMS Spam Detection

One of the stratagems used to deceive spam filters is to substitute voca...
research
10/19/2022

A Unified Neural Network Model for Readability Assessment with Feature Projection and Length-Balanced Loss

For readability assessment, traditional methods mainly employ machine le...

Please sign up or login with your details

Forgot password? Click here to reset