Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction

09/13/2022
by   Jens-Joris Decorte, et al.
0

Skills play a central role in the job market and many human resources (HR) processes. In the wake of other digital experiences, today's online job market has candidates expecting to see the right opportunities based on their skill set. Similarly, enterprises increasingly need to use data to guarantee that the skills within their workforce remain future-proof. However, structured information about skills is often missing, and processes building on self- or manager-assessment have shown to struggle with issues around adoption, completeness, and freshness of the resulting data. Extracting skills is a highly challenging task, given the many thousands of possible skill labels mentioned either explicitly or merely described implicitly and the lack of finely annotated training corpora. Previous work on skill extraction overly simplifies the task to an explicit entity detection task or builds on manually annotated training data that would be infeasible if applied to a complete vocabulary of skills. We propose an end-to-end system for skill extraction, based on distant supervision through literal matching. We propose and evaluate several negative sampling strategies, tuned on a small validation dataset, to improve the generalization of skill extraction towards implicitly mentioned skills, despite the lack of such implicit skills in the distantly supervised data. We observe that using the ESCO taxonomy to select negative examples from related skills yields the biggest improvements, and combining three different strategies in one model further increases the performance, up to 8 percentage points in RP@5. We introduce a manually annotated evaluation benchmark for skill extraction based on the ESCO taxonomy, on which we validate our models. We release the benchmark dataset for research purposes to stimulate further research on the task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2022

Skill Extraction from Job Postings using Weak Supervision

Aggregated data obtained from job postings provide powerful insights int...
research
07/20/2023

Extreme Multi-Label Skill Extraction Training using Large Language Models

Online job ads serve as a valuable source of information for skill requi...
research
02/10/2021

Learning Skill Equivalencies Across Platform Taxonomies

Assessment and reporting of skills is a central feature of many digital ...
research
11/19/2019

The gift of the gab: Are rental scammers skilled at the art of persuasion?

Rental scams are a type of advance fee fraud, in which the scammer tries...
research
10/26/2022

A practical method for occupational skills detection in Vietnamese job listings

Vietnamese labor market has been under an imbalanced development. The nu...
research
09/20/2021

JobBERT: Understanding Job Titles through Skills

Job titles form a cornerstone of today's human resources (HR) processes....
research
04/27/2022

SkillSpan: Hard and Soft Skill Extraction from English Job Postings

Skill Extraction (SE) is an important and widely-studied task useful to ...

Please sign up or login with your details

Forgot password? Click here to reset