SkillSpan: Hard and Soft Skill Extraction from English Job Postings

04/27/2022
by   Mike Zhang, et al.
8

Skill Extraction (SE) is an important and widely-studied task useful to gain insights into labor market dynamics. However, there is a lacuna of datasets and annotation guidelines; available datasets are few and contain crowd-sourced labels on the span-level or labels from a predefined skill inventory. To address this gap, we introduce SKILLSPAN, a novel SE dataset consisting of 14.5K sentences and over 12.5K annotated spans. We release its respective guidelines created over three different sources annotated for hard and soft skills by domain experts. We introduce a BERT baseline (Devlin et al., 2019). To improve upon this baseline, we experiment with language models that are optimized for long spans (Joshi et al., 2020; Beltagy et al., 2020), continuous pre-training on the job posting domain (Han and Eisenstein, 2019; Gururangan et al., 2020), and multi-task learning (Caruana, 1997). Our results show that the domain-adapted models significantly outperform their non-adapted counterparts, and single-task outperforms multi-task learning.

READ FULL TEXT

page 7

page 16

page 17

research
05/03/2022

Kompetencer: Fine-grained Skill Classification in Danish Job Postings via Distant Supervision and Transfer Learning

Skill Classification (SC) is the task of classifying job competences fro...
research
05/06/2022

Explaining the Effectiveness of Multi-Task Learning for Efficient Knowledge Extraction from Spine MRI Reports

Pretrained Transformer based models finetuned on domain specific corpora...
research
05/20/2023

ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain

The increasing number of benchmarks for Natural Language Processing (NLP...
research
06/03/2022

Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages

For high-resource languages like English, text classification is a well-...
research
09/13/2022

Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction

Skills play a central role in the job market and many human resources (H...
research
03/02/2021

Hindi-Urdu Adposition and Case Supersenses v1.0

These are the guidelines for the application of SNACS (Semantic Network ...
research
07/20/2018

Learning Representations for Soft Skill Matching

Employers actively look for talents having not only specific hard skills...

Please sign up or login with your details

Forgot password? Click here to reset