Bilingual Terminology Extraction from Non-Parallel E-Commerce Corpora

04/15/2021
by   Hao Jia, et al.
0

Bilingual terminologies are important resources for natural language processing (NLP) applications. The acquisition of bilingual terminology pairs is either human translation or automatic extraction from parallel data. We notice that comparable corpora could also be a good resource for extracting bilingual terminology pairs, especially for e-commerce domain. The parallel corpora are particularly scarce in e-commerce settings, but the non-parallel corpora in different languages from the same domain are easily available. In this paper, we propose a novel framework of extracting bilingual terminologies from non-parallel comparable corpus in e-commerce. Benefiting from cross-lingual pre-training in e-commerce, our framework can extract the corresponding target terminology by fully utilizing the deep semantic relationship between source-side terminology and target-side sentence. Experimental results on various language pairs show that our approaches achieve significantly better performance than various strong baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2015

Building Subject-aligned Comparable Corpora and Mining it for Truly Parallel Sentence Pairs

Parallel sentences are a relatively scarce but extremely useful resource...
research
12/31/2020

ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora

Recent studies have demonstrated that pre-trained cross-lingual models a...
research
07/15/2020

A Multilingual Parallel Corpora Collection Effort for Indian Languages

We present sentence aligned parallel corpora across 10 Indian Languages ...
research
01/29/2017

Using English as Pivot to Extract Persian-Italian Parallel Sentences from Non-Parallel Corpora

The effectiveness of a statistical machine translation system (SMT) is v...
research
01/29/2017

Extracting Bilingual Persian Italian Lexicon from Comparable Corpora Using Different Types of Seed Dictionaries

Bilingual dictionaries are very important in various fields of natural l...
research
08/01/2017

A Continuously Growing Dataset of Sentential Paraphrases

A major challenge in paraphrase research is the lack of parallel corpora...

Please sign up or login with your details

Forgot password? Click here to reset