Machine Translation for Accessible Multi-Language Text Analysis

01/20/2023
by   Edward W. Chew, et al.
0

English is the international standard of social research, but scholars are increasingly conscious of their responsibility to meet the need for scholarly insight into communication processes globally. This tension is as true in computational methods as any other area, with revolutionary advances in the tools for English language texts leaving most other languages far behind. In this paper, we aim to leverage those very advances to demonstrate that multi-language analysis is currently accessible to all computational scholars. We show that English-trained measures computed after translation to English have adequate-to-excellent accuracy compared to source-language measures computed on original texts. We show this for three major analytics – sentiment analysis, topic analysis, and word embeddings – over 16 languages, including Spanish, Chinese, Hindi, and Arabic. We validate this claim by comparing predictions on original language tweets and their backtranslations: double translations from their source language to English and back to the source language. Overall, our results suggest that Google Translate, a simple and widely accessible tool, is effective in preserving semantic content across languages and methods. Modern machine translation can thus help computational scholars make more inclusive and general claims about human communication.

READ FULL TEXT

page 19

page 20

page 21

page 22

research
09/14/2017

Towards an Arabic-English Machine-Translation Based on Semantic Web

Communication tools make the world like a small village and as a consequ...
research
02/06/2019

Extending a model for ontology-based Arabic-English machine translation

The acceleration in telecommunication needs leads to many groups of rese...
research
02/28/2023

An evaluation of Google Translate for Sanskrit to English translation via sentiment and semantic analysis

Google Translate has been prominent for language translation; however, l...
research
12/15/2016

Building a robust sentiment lexicon with (almost) no resource

Creating sentiment polarity lexicons is labor intensive. Automatically t...
research
10/09/2017

Deep Learning Paradigm with Transformed Monolingual Word Embeddings for Multilingual Sentiment Analysis

The surge of social media use brings huge demand of multilingual sentime...
research
10/26/2020

Is it Great or Terrible? Preserving Sentiment in Neural Machine Translation of Arabic Reviews

Since the advent of Neural Machine Translation (NMT) approaches there ha...
research
06/12/2023

Measuring Sentiment Bias in Machine Translation

Biases induced to text by generative models have become an increasingly ...

Please sign up or login with your details

Forgot password? Click here to reset