ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain

05/20/2023
by   Mike Zhang, et al.
0

The increasing number of benchmarks for Natural Language Processing (NLP) tasks in the computational job market domain highlights the demand for methods that can handle job-related tasks such as skill extraction, skill classification, job title classification, and de-identification. While some approaches have been developed that are specific to the job market domain, there is a lack of generalized, multilingual models and benchmarks for these tasks. In this study, we introduce a language model called ESCOXLM-R, based on XLM-R, which uses domain-adaptive pre-training on the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy, covering 27 languages. The pre-training objectives for ESCOXLM-R include dynamic masked language modeling and a novel additional objective for inducing multilingual taxonomical ESCO relations. We comprehensively evaluate the performance of ESCOXLM-R on 6 sequence labeling and 3 classification tasks in 4 languages and find that it achieves state-of-the-art results on 6 out of 9 datasets. Our analysis reveals that ESCOXLM-R performs better on short spans and outperforms XLM-R on entity-level and surface-level span-F1, likely due to ESCO containing short skill and occupation titles, and encoding information on the entity-level.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2021

SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification

In this paper we present our submission for the EACL 2021-Shared Task on...
research
09/16/2022

Skill Extraction from Job Postings using Weak Supervision

Aggregated data obtained from job postings provide powerful insights int...
research
06/03/2021

nmT5 – Is parallel data still relevant for pre-training massively multilingual language models?

Recently, mT5 - a massively multilingual version of T5 - leveraged a uni...
research
04/27/2022

SkillSpan: Hard and Soft Skill Extraction from English Job Postings

Skill Extraction (SE) is an important and widely-studied task useful to ...
research
07/20/2023

Extreme Multi-Label Skill Extraction Training using Large Language Models

Online job ads serve as a valuable source of information for skill requi...
research
10/26/2022

A practical method for occupational skills detection in Vietnamese job listings

Vietnamese labor market has been under an imbalanced development. The nu...

Please sign up or login with your details

Forgot password? Click here to reset