Revisiting Distributional Correspondence Indexing: A Python Reimplementation and New Experiments

10/19/2018
by   Alejandro Moreo, et al.
0

This paper introduces PyDCI, a new implementation of Distributional Correspondence Indexing (DCI) written in Python. DCI is a transfer learning method for cross-domain and cross-lingual text classification for which we had provided an implementation (here called JaDCI) built on top of JaTeCS, a Java framework for text classification. PyDCI is a stand-alone version of DCI that exploits scikit-learn and the SciPy stack. We here report on new experiments that we have carried out in order to test PyDCI, and in which we use as baselines new high-performing methods that have appeared after DCI was originally proposed. These experiments show that, thanks to a few subtle ways in which we have improved DCI, PyDCI outperforms both JaDCI and the above-mentioned high-performing methods, and delivers the best known results on the two popular benchmarks on which we had tested DCI, i.e., MultiDomainSentiment (a.k.a. MDS -- for cross-domain adaptation) and Webis-CLS-10 (for cross-lingual adaptation). PyDCI, together with the code allowing to replicate our experiments, is available at https://github.com/AlexMoreo/pydci .

READ FULL TEXT
research
06/08/2023

T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text Classification

Cross-lingual text classification leverages text classifiers trained in ...
research
10/23/2020

KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text Classification for Kinyarwanda and Kirundi

Recent progress in text classification has been focused on high-resource...
research
01/31/2019

Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Cross-Lingual Text Classification

Cross-lingual Text Classification (CLC) consists of automatically classi...
research
05/05/2017

Cross-lingual Distillation for Text Classification

Cross-lingual text classification(CLTC) is the task of classifying docum...
research
05/23/2023

Detecting and Mitigating Hallucinations in Multilingual Summarisation

Hallucinations pose a significant challenge to the reliability of neural...
research
06/20/2022

Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data

Domain adaptive text classification is a challenging problem for the lar...

Please sign up or login with your details

Forgot password? Click here to reset