Searching for Optimal Subword Tokenization in Cross-domain NER

06/07/2022
by   Ruotian Ma, et al.
0

Input distribution shift is one of the vital problems in unsupervised domain adaptation (UDA). The most popular UDA approaches focus on domain-invariant representation learning, trying to align the features from different domains into similar feature distributions. However, these approaches ignore the direct alignment of input word distributions between domains, which is a vital factor in word-level classification tasks such as cross-domain NER. In this work, we shed new light on cross-domain NER by introducing a subword-level solution, X-Piece, for input word-level distribution shift in NER. Specifically, we re-tokenize the input words of the source domain to approach the target subword distribution, which is formulated and solved as an optimal transport problem. As this approach focuses on the input level, it can also be combined with previous DIRL methods for further improvement. Experimental results show the effectiveness of the proposed method based on BERT-tagger on four benchmark NER datasets. Also, the proposed method is proved to benefit DIRL methods such as DANN.

READ FULL TEXT
research
07/02/2021

Data Centric Domain Adaptation for Historical Text with OCR Errors

We propose new methods for in-domain and cross-domain Named Entity Recog...
research
01/25/2023

One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER

Cross-domain NER is a challenging task to address the low-resource probl...
research
12/08/2020

CrossNER: Evaluating Cross-Domain Named Entity Recognition

Cross-domain named entity recognition (NER) models are able to cope with...
research
06/23/2020

Discriminative Feature Alignment: Improving Transferability of Unsupervised Domain Adaptation by Gaussian-guided Latent Alignment

In this study, we focus on the unsupervised domain adaptation problem wh...
research
06/23/2020

Discriminative Feature Alignment: ImprovingTransferability of Unsupervised DomainAdaptation by Gaussian-guided LatentAlignment

In this study, we focus on the unsupervised domain adaptation problem wh...
research
10/13/2021

Reducing the Covariate Shift by Mirror Samples in Cross Domain Alignment

Eliminating the covariate shift cross domains is one of the common metho...
research
05/13/2021

Cross-Domain Contract Element Extraction with a Bi-directional Feedback Clause-Element Relation Network

Contract element extraction (CEE) is the novel task of automatically ide...

Please sign up or login with your details

Forgot password? Click here to reset