General-to-Specific Transfer Labeling for Domain Adaptable Keyphrase Generation

08/20/2022
by   Rui Meng, et al.
3

Training keyphrase generation (KPG) models requires a large amount of annotated data, which can be prohibitively expensive and often limited to specific domains. In this study, we first demonstrate that large distribution shifts among different domains severely hinder the transferability of KPG models. We then propose a three-stage pipeline, which gradually guides KPG models' learning focus from general syntactical features to domain-related semantics, in a data-efficient manner. With Domain-general Phrase pre-training, we pre-train Sequence-to-Sequence models with generic phrase annotations that are widely available on the web, which enables the models to generate phrases in a wide range of domains. The resulting model is then applied in the Transfer Labeling stage to produce domain-specific pseudo keyphrases, which help adapt models to a new domain. Finally, we fine-tune the model with limited data with true labels to fully adapt it to the target domain. Our experiment results show that the proposed process can produce good quality keyphrases in new domains and achieve consistent improvements after adaptation with limited in-domain annotated data.

READ FULL TEXT
research
01/29/2021

Few-Shot Domain Adaptation for Grammatical Error Correction via Meta-Learning

Most existing Grammatical Error Correction (GEC) methods based on sequen...
research
05/06/2023

Label-Free Multi-Domain Machine Translation with Stage-wise Training

Most multi-domain machine translation models rely on domain-annotated da...
research
10/28/2022

Domain Adaptation of Machine Translation with Crowdworkers

Although a machine translation model trained with a large in-domain para...
research
06/02/2023

Leveraging Auxiliary Domain Parallel Data in Intermediate Task Fine-tuning for Low-resource Translation

NMT systems trained on Pre-trained Multilingual Sequence-Sequence (PMSS)...
research
04/20/2017

Cross-domain Semantic Parsing via Paraphrasing

Existing studies on semantic parsing mainly focus on the in-domain setti...
research
04/27/2021

AT-ST: Self-Training Adaptation Strategy for OCR in Domains with Limited Transcriptions

This paper addresses text recognition for domains with limited manual an...
research
09/08/2021

A Survey on Machine Learning Techniques for Auto Labeling of Video, Audio, and Text Data

Machine learning has been utilized to perform tasks in many different do...

Please sign up or login with your details

Forgot password? Click here to reset