Extreme Multi-Label Skill Extraction Training using Large Language Models

07/20/2023
by   Jens-Joris Decorte, et al.
0

Online job ads serve as a valuable source of information for skill requirements, playing a crucial role in labor market analysis and e-recruitment processes. Since such ads are typically formatted in free text, natural language processing (NLP) technologies are required to automatically process them. We specifically focus on the task of detecting skills (mentioned literally, or implicitly described) and linking them to a large skill ontology, making it a challenging case of extreme multi-label classification (XMLC). Given that there is no sizable labeled (training) dataset are available for this specific XMLC task, we propose techniques to leverage general Large Language Models (LLMs). We describe a cost-effective approach to generate an accurate, fully synthetic labeled dataset for skill extraction, and present a contrastive learning strategy that proves effective in the task. Our results across three skill extraction benchmarks show a consistent increase of between 15 to 25 percentage points in R-Precision@5 compared to previously published results that relied solely on distant supervision through literal matches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2022

Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction

Skills play a central role in the job market and many human resources (H...
research
07/07/2023

Large Language Models as Batteries-Included Zero-Shot ESCO Skills Matchers

Understanding labour market dynamics requires accurately identifying the...
research
04/11/2022

"FIJO": a French Insurance Soft Skill Detection Dataset

Understanding the evolution of job requirements is becoming more importa...
research
05/20/2023

ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain

The increasing number of benchmarks for Natural Language Processing (NLP...
research
04/17/2023

SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language Model

We present SkillGPT, a tool for skill extraction and standardization (SE...
research
09/18/2023

LLM4Jobs: Unsupervised occupation extraction and standardization leveraging Large Language Models

Automated occupation extraction and standardization from free-text job p...
research
04/28/2022

Towards Understanding the Skill Gap in Cybersecurity

Given the ongoing "arms race" in cybersecurity, the shortage of skilled ...

Please sign up or login with your details

Forgot password? Click here to reset