Salience and Market-aware Skill Extraction for Job Targeting

05/27/2020 ∙ by Baoxu Shi, et al. ∙ LinkedIn 0

At LinkedIn, we want to create economic opportunity for everyone in the global workforce. To make this happen, LinkedIn offers a reactive Job Search system, and a proactive Jobs You May Be Interested In (JYMBII) system to match the best candidates with their dream jobs. One of the most challenging tasks for developing these systems is to properly extract important skill entities from job postings and then target members with matched attributes. In this work, we show that the commonly used text-based salience and market-agnostic skill extraction approach is sub-optimal because it only considers skill mention and ignores the salient level of a skill and its market dynamics, i.e., the market supply and demand influence on the importance of skills. To address the above drawbacks, we present , our deployed salience and market-aware skill extraction system. The proposed  shows promising results in improving the online performance of job recommendation (JYMBII) (+1.92% job apply) and skill suggestions for job posters (-37% suggestion rejection rate). Lastly, we present case studies to show interesting insights that contrast traditional skill recognition method and the proposed  from occupation, industry, country, and individual skill levels. Based on the above promising results, we deployed the  online to extract job targeting skills for all 20M job postings served at LinkedIn.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

LinkedIn is the world’s largest professional network whose vision is to “create economic opportunity for every member of the global workforce”. To achieve this vision, it is crucial for LinkedIn to match job postings to quality applicants who are both qualified and willing to apply for the job. To serve this goal, LinkedIn offers two job targeting systems, namely Jobs You May Be Interested In (JYMBII) (Gu et al., 2016), which proactively target jobs to the quality applicants, and Job Search (Li et al., 2016), which reactively recommend jobs that the job seeker qualifies.

Matching jobs to the quality applicants is a challenging task. Due to high cardinality (for LinkedIn, M members and more than M open jobs), it is computationally intractable to define job’s target member set by specifying individual members. For this reason, many models match candidates by profile attributes (Volkovs et al., 2017; Paparrizos et al., 2011; Abel, 2015). At LinkedIn, we mostly use titles and skills to target candidates (members). In other words, when a member comes to the job recommendation page, we recommend job postings whose targeting skills (or titles) match the member’s skills (or titles).

max width=.95

Figure 1.

Snippet of a machine learning engineer job posted on LinkedIn. Squared text are detected skill mentions.

In this paper, we study how to target job postings by identifying relevant skills. Given a job posting, we extract skills from the job posting so that it can be shown to the members who have such skills. This task of mapping the job to a set of skills is very important because it determines the quality of applications for the job posting and affects hiring efficiency.

To improve the quality of applicants, what should be the objectives of extracting skills for job targeting? We argue that the extracted skills need to meet two criteria. First, the extracted skills should not only be mentioned in the job posting, but also be relevant to the core job function. In other words, the skills should be salient to the job posting. Second, the extracted skills should be able to reach out to enough number of members. In other words, there should be enough supply for the skills in the job market. In summary, we aim to build a machine learning model to extract skills from job postings in a salience and market-aware way.

However, developing such salience and market-aware skill extraction model is a very challenging task. Not only because modeling the salience and market dynamic is hard, but also because the lack of ground-truth data. Given a job posting, it is tricky to get gold-standard labels for the skills that are both salient and have strong supply. One may propose using crowdsourcing to annotate job postings, but there are two problems. First, crowdsourcing may not be the most cost-efficient way to collect a large amount of training examples. Second and more importantly, labeling salient and high-supply skills requires very solid domain knowledge about the job market and the job posting itself. As we will show later, the labels collected by crowdsourcing do not give us the most salient skills with the best job market supply.

Figure 1

gives an illustration of the above challenge. Given a job posting, it is relatively straightforward for human to label skills mentioned in the posting (rectangles), and an annotator in the crowdsourcing platform would be able to do it. However, if the annotator aims to label which skills are more salient and have better job market supply, the annotator needs to understand both the job market and the job description well. For example, for the sake of market supply, the annotator should choose “deep learning” instead of specific tools such as “Pytorch” or “Keras”, but this is impossible if the annotator does not understand deep learning related skills and the deep learning job market. On the other hand, skills like “Communication Skill” have large supply but they are less salient in the context of deep learning engineer jobs.

Previous works usually treat this skill extraction task as a named entity recognition (NER) 

(Nadeau and Sekine, 2007) task and use the standard named entity recognizer to identify skill mentions (Vasudevan et al., 2018; Li et al., 2016). In the example of Fig. 1, existing methods would identify all rectangles and use them for job targeting with equal importance. This would lead to showing this deep learning job to members who have “Communication Skills”. Since these methods do not consider the job market supply and salience of entities, they will lead to sub-optimal job targeting performance, as we will show later.

Present work: Data collection. In this paper, we tackle the problem of building salience and market-aware skill extraction model for job targeting. To collect the ground-truth labels, we note that the ground-truth skills need to meet dual criteria: salience and market-awareness. Since it is hard to come up with one data collection method to match both criteria, our strategy is to develop one method to cover each criterion, and combine the data collected using two methods. First option is focused on salience. We ask hiring experts to pick which skills they would want to use for targeting of their job postings. Since this method is from the salience perspective, we also conducted analysis to validate that the collected labels match our notions of market-aware skill extraction. This option allows us to collect explicit feedback from hiring experts, but this option is only available to a small portion of job postings that are created through the LinkedIn’s job creation flow.

To cover market-awareness and increase the training data size, we apply “distant supervision” to get weak labels from other job postings. In particular, for a given job posting, we track which members applied for the job and received positive feedback. We then identify common skills that these successful applicants have and use these skills as the ground-truth skills. We then perform analysis to validate that these skills are also salient.

Present work: Modeling job market and entity salience. After collecting the ground-truth, we develop Job2Skills, a novel machine learning model to extract skills in a market-aware, salience-aware way. We note that retraining existing NER models leads to sub-optimal results as they rely purely on the text information, ignoring important signals such as how much supply the skill has in the job market, how salient the skill is overall, and so on. Therefore, we add signals representing the job market supplies and the salience of skills in the model and significantly improve the performance in the offline evaluation and online A/B test.

Present work: Product improvements. We deploy the Job2Skills  in production and improve the product metrics across multiple applications. First, we ask feedback from hiring experts in the job creation systems and outperform the existing NER-based model by more than . Second, we employ this new skill extraction model in the job recommendation systems, one of the most important recommender systems at LinkedIn. We observe member-job interaction and coverage improvement in proactive job recommender (JYMBII) system.

Present work: Insights for Job Market. We note that Job2Skills’s outputs capture the hiring trend in the job market for millions of companies that post jobs at LinkedIn. Since Job2Skills is trained on the feedback from hiring experts and job markets, it learns what kinds of talent that each employer is trying to recruit. We argue that Job2Skills reveals employers’ intention better than traditional information extraction methods that do not consider salience and market factors. We present extensive studies to showcase the power of insights that can be gained with Job2Skills. For example, we show that Job2Skills’s results can forecast that Macy’s would expand a tech office in a new location in two months before the official news article comes out. We also show that Job2Skills  can vividly capture the rising and fall of popularity of a certain skill. Lastly we demonstrate that Job2Skills  can be used to compare the required skill sets across different regional markets.

The contributions of this paper are summarized as follow:

  • We propose a skill extraction framework to target job postings by skill salience and market-awareness, which is different from traditional entity recognition based method.

  • We devise a data collection strategy that combines supervision from experts and distant supervision based on massive job market interaction history.

  • We develop Job2Skills by combining NER methods with additional market and deep learning powered salience signals.

  • We deploy Job2Skills to LinkedIn to improve overall hiring efficiency across multiple products.

  • We provide a case study to show how the proposed market-aware skill extraction model yields better skill-related insights about the workforce and beyond.

2. Problem Definition

Here we will provide the formal definition of the job targeting task and show how we utilize the job targeting process to formulate our salience and market-aware skill extraction task.

Definition 0 ().

Job targeting is an optimization task where given job posting and member set , find a -sized target candidate set of

that can maximize the probability that member

belongs to the job posting ’s quality applicants set , a set of members who are likely to apply for the job and get positive feedback from the job posters.

We can write the objective of Def. 2.1 as

(1)

This optimization is intractable because it involves combinatorial optimization for

million LinkedIn members. For this reason, existing models tend to build job targeting models with attributes instead of directly targeting individuals. Moreover, modeling via attributes also makes the model human-interpretable. LinkedIn, for example, uses attribute entities such as job title, company, screening question (Shi et al., 2020), and skill to target members and provide interpretable insights (Li et al., 2020). Among these entities, we discovered that skill is the most critical entity type for the job targeting task.

Therefore, we formulate a problem of skill-based job targeting. Given a job posting , we choose -sized job targeting skills and we show the job postings to member if has at least one matching skill in their skill set . Since members declare their skill sets on their profile, we need to identify only. Our objective becomes:

(2)

where is an indicator function and is the full skill entity set used by LinkedIn. Although Eq. 2 reduces the dimensionaliy of search space from million members to thousand skills, it is still combinatorial optimization and thus very hard to optimize directly. To make it more tractable, we make the following assumptions. We assume that each skill has a utility that is the chance of being qualified for when knows skill , and that the probability of being qualified is the sum of utility of each skill . In other words, we assume that each skill increases the chance of being qualified by .

(3)

With this notation, Eq. 2 becomes

(4)

where is members who have in their skill set. We can further simplify the notation by introducing :

(5)

where is the sum of for all members who have . Given that is the increase in probability of being qualified for by knowing , quantifies the overall increase in qualified applicants by targeting members having . We call the skill ’s utility for job posting .

The formulation in Eq. 5 makes optimization much more tractable and simpler than the original form in Eq. 1. In Eq. 5, we optimize the sum of utilities for each skill . Therefore, choosing the optimal can be done by picking skills that have the highest value of for a given job posting . We call this problem salience and market-aware skill extraction. We name the problem in this way because we find that the skill needs to satisfy the following two criteria to have high utility .

1. Skills should have sufficient market supply. In order for to be high, the size of needs to be large enough. In other words, there must be a sufficient number of members that have skill .

2. Skills should be salient to the job posting. Another factor that determines is the value of . Remember that quantifies the chance that the member is qualified for job posting if the member has skill . If is a core, “salient” skill for the job posting , it will have high .

Combining the above two criteria together, we define the salience and market-aware skill extraction task as:

Definition 0 ().

Salience and Market-aware skill extraction is an optimization task where given a job posting and skill set

, we estimate the utility

that is the increase in qualified applicants by targeting members with .

As a comparison, we discuss the salience- and market-agnostic approach that chooses by identifying the skills mentioned in the job posting content. We can think of it as solving Eq. 5 with modified utility that purely depends on whether the skill is mentioned in the job posting. We can formally define it as follows:

Definition 0 ().

Salience- and market-agnostic skill extraction is an optimization task where given a job posting and skill set , extract -sized skill set for job posting that maximizes the utility , which is the likelihood that skill is mentioned in ’s content .

As shown in Def. 2.3, this method simplistically defines a skill of a job posting by calculating the probability that it is mentioned in the job posting, using some named skill-entity recognizer. Because Def. 2.3 does not consider the skill salience and the market supply, the extracted skills are not for targeting quality applicants. Therefore Def. 2.3 is not solving the job targeting task defined in Def. 2.1.

Next, we will discuss the methods we use to gather the ground truth for training our salience and market-aware skill extraction system, followed by how we learn the utility function described in Def. 2.2 for the proposed Job2Skills model.

Figure 2. Non-expert and expert labeled skills’ popularity.

3. Data Collection

As we state before, one of the major challenges of developing a salience and market-aware job targeting system is the lack of ground-truth data. Here we will describes two data collection approaches that address both salience and market-awareness.

Collect from Job Posting Experts: Although crowd-sourcing is scalable compared to labeling by group of experts, the quality is sub-optimal because non-experts cannot distinguish the level of skill salient and often ignore the market supply. Instead of designing a sophisticated skill labeling task for non-experts to collect high-quality data, we directly collect data from experts who select job targeting skills regarding the salience and job market implicitly.

To do so, we ask job posters to provide job targeting skills. Given a job posting , we collect the job poster saved job targeting skills to form the positive skill set , and use the rejected, recommended job targeting skills to create the negative set . Lastly, we construct the positive and negative job-skill training pairs and , respectively. We will refer this dataset as the Job Targeting skill (JT) dataset.

Fig. 2 shows the distribution of crowd-labeled (non-expert) and expert-labeled skills in terms of the number of LinkedIn members having the skill. We observe that expert labeled skills better align with the market supply in terms of member-side skill popularity.

Collect from Market Signals: Although hiring experts provide high quality data and address both entity salience and market-awareness, such approach is only available to a small portion of job postings that are created through the LinkedIn’s job creation flow. To cover more jobs and generate a large amount of labeled data for model training, we decide to get a large amount of weak labels using job market signals instead of human annotations.

To be specific, given job posting , we first collect by assuming quality applicants are the members who apply for job and receive positive interactions from the recruiters. Note that is an approximation of the true quality applicant set because we only consider one stage of the recruiting process on LinkedIn. With , we define , where is a set of common skills shared by , and are the skills mentioned in . Similarly, , where the negative skills are the ones mentioned in the job posting but not shared by the quality applicants . We will refer this dataset as the Quality Applicant skill (QA) dataset.

To examine the salient level of these market signal derived skills, we sample jobs with labeled skills from both quality applicants and job targeting datasets, and then compare the top skills shared by quality applicants to the skills labeled by the job posters. We find that the top-ranked quality applicant skills largely overlap with the job poster selections. To be specific, for the top- quality applicant shared skills, more than of them are labeled as positive skills and of them are labeled as negative skills by job posters.

4. The proposed Job2Skills model

After we describe the procedure we use to collect the ground truth for salience and market-aware job targeting skill extraction, here we discuss how we build the proposed Job2Skills using multi-resolution skill salience features and market-aware signals. Compared to simple skill tagging, which merely identifies mentioned skills, multi-resolution skill salience will identify important skills from all mentioned skills.

4.1. Multi-resolution Skill Salience Feature

In this work, we hypothesize that good job targeting skills should be salient to the job posting. Unlike other text, job postings are usually long text with several well-structured segments, e.g., requirements, company summary, benefits, etc. To accurately estimate the level of skill salience and fully utilize the rich job posting information, one should not only consider the mentioning text of the skill, but also other segments in the job and the entire job posting.

Next, we will first briefly describe how we tag skills from job postings, and then provide details on how we explicitly model the skill salience at three resolution levels: sentence, segment, and job level. Fig. 3 gives an overview of the multi-resolution skill salience features we used in this work.

max width=

Figure 3. Multi-resolution skill salience estimation.

To identify skills from job postings, we first utilize an existing, in-house skill tagger to find out all possible skill mentions. By leveraging a comprehensive skill taxonomy with an extensive set of skill surface forms, the skill tagger can identify the majority of the mentioned skills. We then pass the skill mentions into a feature-based regression model to link them to the corresponding skill entities. After we find all the skills in the job posting, we then model the skill salience from the following three levels:

Sentence-level Salience

:To estimate the skill salience at sentence level, we build a neural network model to learn the skill salience by modeling the job posting sentence that contains the skill mention and the skill’s surface form. The model is defined as

(6)

where , is some skill, is the skill’s surface form text, is the sentence containing the skill mention, and is some text to embedding encoder. We tested multiple encoders including FastText (Bojanowski et al., 2017), Universal Sentence Encoder (USE) (Cer et al., 2018), and BERT (Devlin et al., 2018). For FastText and USE, we encode the skill and sentence separately and then use the concatenated embedding as the encoder output. For BERT, we feed both skill and sentence into the model and pick the embedding of the [CLS] token as the encoder output.

Segment-level Salience: Unlike sentence level skill salience, where the input text length is limited to a single sentence, job segment (one or multiple consecutive paragraphs describing the same topic), e.g., the company summary segment or requirements segment in a job description, are usually longer and much noisier. Therefore, it is not easy to model them directly using the neural network models that are designed for shorter text. Instead of modeling the job segment text directly, we choose to represent a job segment by the embeddings of the skill entities mentioned in the segment.

To get the entity (title and skill) embeddings (Ramanath et al., 2018; Shi et al., 2019), we learn LinkedIn’s unsupervised entity embeddings using LinkedIn member profile. The skip-gram loss is defined as:

(7)

Where and are the number of positive and negative samples, and Unif is a uniform sampling function. Simply put, for each LinkedIn member ’s profile entity , we optimize the entity embedding so that the skills and titles appear in the same member’s profile entity set are similar to each other comparing to a random entity that is not in .

After we obtain the entity embedding matrix , we define the segment-level skill salience as

(8)

where is set of skills mentioned in the job content with segment label . This measures the similarity between the given skill and the centroid of other skills mentioned in segment .

Job-level Salience: Similar to segment level salience, Job level skill salience is modeled by the average embedding similarity between the given skill and all entities mentioned in the job posting, which includes skill and title entities. The salience score is defined as:

(9)

where is a set of title and skill entities mentioned in job . Note that we choose to compute entity-wise similarity instead of meanpooled-similarity here because 1) the size of is often significantly larger than the size of and may contain more noisy data, and 2) contains different type of entities from multiple aspects of the job posting. Based on above observation, we believe it is sub-optimal to compute job-level salience by forming a single meanpooled centroid using all mentioned entities.

In sum, for a skill candidate of a job posting, we use the above methods to compute multi-resolution salience probability scores (, , and ) as salience features, combine them with the market-aware signals which we will cover in the next section, and then build the final salience and market-aware skill extraction model.

4.2. Market-aware Signals

Besides the salient level of entities, we also hypothesize that good job targeting skills should have sufficient market supply. To model the supply of skills, it is necessary to factor in market-related signals into the proposed Job2Skills. In general, the market-related signals can be derived from LinkedIn’s member base and job postings . Next, we will describe the signals from these two groups.

Member Features: The goal of the member feature group is to capture how skills can reach a broader audience by measuring member-side skill supply, and therefore to improve job exposure. Here we consider both the general skill supply which measures the overall skill popularity among all members , and the cohort affinities, which indicate the skill supply with finer granularity. To be specific, we partition the member set into a group of non-empty member subsets (cohorts) using different strategies, then compute the point-wise mutual information (PMI) and the entropy (H) of given a cohort as follows:

(10)

in which . The partition can be created using one or multiple member attributes. By combining multiple attributes, the model can detect subtle skill supply differences. For example, by grouping members using both industry and job title, we can discover that although skill KDB+, a financial database, is not popular among either software developers or in the financial industry, it is a preferred skill in cohort software developers & financial industry.

Job Features: Because demand implicitly influences the supply, here we also measure the skills’ demand in terms of the job-side skill popularity . Similar to member features, here we use pointwise mutual information to model the job-side skill popularity, where is the job posting segment label, e.g., summary and requirement.

4.3. Job2Skills

After describing both salience and market-aware features, now we will discuss how we train the proposed Job2Skills model using the generated features. Recall that we need to learn the utility function defined in Def. 2.2 to infer salience and market-aware skills for job targeting. With the job-skill pairs and that we collected from job posters and features described in Sec. 4, we can learn the utility function by viewing this task as a binary classification problem where for a given skill and a job posting , predict if is a salience and market-aware job targeting skill for .

Among all the possible machine learning models ranging from the generalized linear model to neural networks (Cheng et al., 2016; Lou and Obukhov, 2017; He et al., 2017; Zhang et al., 2016)

, here we chose to use XGBoost 

(Chen and Guestrin, 2016) because it is fast, interpretable, and has good scalability with small memory footprint. By leveraging an in-house implementation of XGBoost, we were able to serve the model online without noticeable latency increase over the existing linear production model. The XGBoost-based Job2Skills

 is trained with a logistic regression loss to optimize the binary classification task, and we use the resulting tree-based

Job2Skills model as the utility function

to extract market-aware job targeting skills for job postings. We define the loss function of the

Job2Skills as:

(11)

max width= Model Job Targeting Skills (JT) Quality Applicant Skills (QA) Overall (JT + QA) Salience&Market-agnostic baseline Job2Skills (trained w/ JT) Job2Skills (trained w/ QA) Job2Skills (trained w/ JT+QA)

Table 1. Relative skill extraction AUROC improvement on JT and QA datsets.

in which and are the positive and negative job-skill pairs,

denotes the combined market and salience feature vector of a given

pair, represents the tree in the model, and is the regularization term that penalizes the complexity of tree , in which denotes the number of leaves in tree , is the leaf weights, and are the regularization parameters.

5. Experiments

In this section, we conduct an extensive set of experiments with both offline and online A/B tests to demonstrate the effectiveness of the proposed Job2Skills model compared to our market-agnostic production model. We also present a case study to demonstrate the actual skills returned by Job2Skills and how we can get better market insights from it.

The Job2Skills model evaluated in this section contain all aforementioned market-aware and multi-resolution skill salience features. Note that the job-level salience sub-model we used in production Job2Skills model is a FastText-based model instead of the BERT model we tried offline. This is because we observed significant latency reduction with only salience accuracy drop.

The market-agnostic production model (baseline for short) we compared against is a logistic regression model trained with skill appearance features, e.g. job-level features such as is the skill mentioned in the text?, where the skill is mentioned?, and global-level features such as mention frequency.

The offline training and evaluation data are collected using the following procedure. We used months of LinkedIn’s English Premium jobs posted on LinkedIn as input and generated around million job-skill pairs for training and evaluation. To be specific, we used the methods described in Sec. 3 and collected 1) job-targeting (JT) dataset using job poster provided job targeting skills, and 2) quality applicant (QA) dataset using the common skills shared by job applicants who received positive feedbacks from recruiters. We used of them for training, for validation, and the rest for testing. Note that unlike Job2Skills, the production baseline is trained on the JT data only. During inference, both methods use the same skill tagger to get the same set of skill candidates from jobs.

5.1. Offline Evaluation

We present the Job2Skills offline evaluation result on the hold-out sets of the two training datasets (JT and QA), and report the relative AUROC improvement against the production baseline model. As shown in Tab. 1, Job2Skills significantly outperforms baseline by on the job targeting (JT) set and on the quality applicants’ (QA) skill set. Moreover, by training with both human-labeled JT and derived QA dataset, we are able to generalize the model and achieve a better overall AUROC on both tasks by increasing the overall AUROC by .

max width= Model AUROC Improvement () Salience & Market-agnostic baseline Job2Skills w/ Market and Salience features    – Salience features only    – Market features (member+job) only      – Member features only      – Job features only

Table 2. Feature ablation test on job targeting skill inference.

max width=

Onsite Apply Job Save Member Coverage
Job2Skills
Table 3. Online A/B test result on the LinkedIn Job Recommendation (JYMBII (Kenthapadi et al., 2017)) page.

Next, we present an ablation study to learn the importance of each feature group and present the relative AUROC improvement in Tab. 2. Note the evaluation dataset used here is a slightly different JT dataset collected using the same procedure as Tab. 1 but different time span. We can see both salience and market feature group positively contribute to the model performance improvement. We also observe that when only using one feature group, model trained with deep learning-based salience feature is better than market-feature only model. By combining both group of features, we further improve the AUROC by comparing to the salience only model. These improvements indicate that market-dynamics modeled by market-aware features provide additional information on the skill importance for job targeting that cannot be captured by modeling job posting-based skill salience only.

In addition to the ablation study, we also looked at the feature importance of the Job2Skills model trained using both market and salience features. We found that all three salience features are ranked within the top- most important features, and segment-level salience feature is the most important one followed by the sentence-level and job-level salience features. This means both deep learning powered salience features and market features are crucial to the model and cannot be replaced by each other.

5.2. Online Job Recommendations

In this section, we deploy Job2Skills to production, apply it to all LinkedIn Jobs to extract skill entities for job targeting, and retrain our job recommender system, Jobs You May Be Interested In (JYMBII (Kenthapadi et al., 2017)), based on the extracted salience and market-aware job targeting skills. We perform online A/B test with of the LinkedIn traffic for days, and observe significant lift in multiple metrics. As shown in Tab. 3, the Job2Skills-based JYMBII model not only recommends better jobs (reflected by increased job apply and save rate), but also increases the percentage of members receive job recommendations.

max width=

Model Skill Add Rate Skill Reject Rate
Job2Skills (market-aware)
Job2Skills (market- and salience-aware)
Table 4. A/B test result of job targeting skill suggestions.

max width=.9 Job Title Shared by Both Models Skills returned by Job2Skills only Skills returned by baseline only Software Engineer - Design, Java, Communication, C, C++, Management, Javascript, SQL, Cloud Computing, Architecture OpenGL, DropWizard, ActiveRecord, LLVM, Sinatra, C++0x, Guice, GRPC, cmake, Boost C++ Data Scientist Data Science, Machine Learning

Data Mining, Python, Analytics, Pattern Recognition,

Statistics, R, AI, Communication Apache Spark, Predictive Modeling, Statistical Modeling, scikit-learn, Deep Learning, Text Mining, Pandas, Keras Audit Tax Manager Auditing Communication, Management, Research, Tax Preparation, Supervisory Skills, Presentations, Tax Compliance, Engagements, Budgeting Tax, Due Diligence, US GAAP, Financial Accounting, Financial Audits, External Audit, GAAP, IFRS, Internal Controls

Table 5. Sample of top- job targeting skills extracted by the salience and market-agnostic baseline and Job2Skills.

max width=

United States India
Industry Baseline Job2Skills Baseline Job2Skills
Government teaching analytical skills start-ups communication
TSO communication management think tank
management army procurement research
fire control defense finance social media
law enforcement hazardous materials human resources disability rights
Technology management analytical skills SAP products customer experience
sales project management management analytical skills
cloud computing communication Jakarta EE solution architecture
consulting sales cloud computing business process
salesforce.com problem solving salesforce.com helping clients
Table 6. Sample of top- job targeting skills per industry and country extracted by the salience and market-agnostic baseline and Job2Skills. Red squared skills are sub-optimal.

5.3. Online Job Targeting Skill Suggestions

We also apply Job2Skills to provide job targeting skill suggestions in LinkedIn’s job posting flow. When a recruiter posts a job on LinkedIn, we use Job2Skills to recommend skills, and the poster will be able to save at most job targeting skills by either selecting from the recommendation or providing their own. We ramped our Job2Skills model to of LinkedIn’s traffic, and report the -week A/B test result. The definition of the metrics are as follows:

  • [leftmargin=*]

  • skill add rate: % of skills that are not recommended and are manually added,

  • skill reject rate: % of recommended skills that are rejected.

As shown in Tab. 4, the skills recommended by Job2Skills are notably better than the existing production model because the recruiters are now less likely to manually add a job targeting skill and less likely to reject recommendation. In general, the Job2Skills increases the overall job targeting skill coverage and quality by adjusting skill importance as a function of the skills’ salience and market signals.

6. Qualitative Analysis

We have shown Job2Skills  improves job targeting at LinkedIn because it captures salience and market supplies. Here we further demonstrate salience and market-aware skills’ strength in market analysis. We claim that Job2Skills  can capture hiring trend — e.g., what skill sets are required in different sectors? what kinds of talents employers are recruiting? — very well. As we explained, Job2Skills  is trained on the hiring experts feedback and market supplies. In other words, it is trained on signals representing what kinds of people (skill sets) the companies want the most, and what kinds of people that actually got hired. Next we show that Job2Skills’s results can reveal the intent of employers better than the baseline model which based on named entity recognition.

6.1. Skill Trend Insight

Figure 4. % of Macy’s jobs posted per month from Jun. to Dec. 2019 that require software development related skills.
Figure 5. Azure’s skill popularity in 2018.

The first kind of cases is about the trend of skills. Which skills are getting popular? Which company is growing a certain job segment? We show that Job2Skills’s results can be used to forecast this trend even before actual news articles coming out.

In 2019, we identified Macy’s tech expansion using the results of Job2Skills. In Fig. 4, we presented the percentage of Macy’s monthly posted jobs that require skills such as Java, Javascript, and SQL. We found Macy’s demand on tech skills almost tripled from June to December, 2019. We suspect such radical change indicates Macy’s is planning to invest into technology section. Two months after we detected this trend, in February, 2020, Macy’s officially announced its tech operation expansion in Atlanta and New York111https://www.bloomberg.com/news/articles/2020-02-04/macy-s-to-move-san-francisco-tech-offices-to-new-york-atlanta.

Besides predicting company expansions, Job2Skills also better captured the skill supply and demand regarding market dynamics such as recruiting circles. We present the skill popularity trend of Azure, the fifth trending skill in SF Bay Area, in Fig. 5. We found that job targeting skill popularity is highly correlated with major hiring and vacation seasons. Interestingly, Job2Skills popularity changes also correlated with other market signals such as company performance. The first gray box in Fig. 5 highlights a significant four-month increase trending from Dec. 2017 to Mar. 2018 of Azure, which aligns with MSFT’s 2018Q3 earning that shows Azure cloud has a revenue growth. We suspect the correlation is caused by the adoption of Azure services in the market – more companies are using Azure hence the skill becomes more popular, and such market share increase also reflect in the revenue growth.

6.2. Skill Insights

The second kind of cases is revealing diversified skill demands in different industry sections and regions. We show that the job targeting skills generated by Job2Skills better captures such market diversity by modeling salience and market signals jointly.

In Tab. 5 we present the top- job targeting skills of three occupations, Software Engineer, Data Scientist, and Audit Tax Manager. It is clear that unlike the salience and market-agnostic model which mainly focuses on very specific skills with limited supplies such as C++0x and DropWizard, Job2Skills is able to return a diversified set of skills at the right granularity ranging from popular programming languages to soft skills such as communication and management.

Besides better representing skills for different occupations, the skills generated by Job2Skillscan also capture skill supply and demand differences in different industries and regions. Here we compared top- job targeting skills between government and technology industries in the United States and India. As shown in Tab. 6, skills returned by the proposed Job2Skills  is significantly different from the market and salience agnostic baseline model. we believe this is because Job2Skills  considers skill salience and market dynamics. For example, the baseline model wrongly pick TSO (Time Sharing Option) as a top skill in US government industry because it is mentioned in many jobs posted by Transportation Security Administration and The Department of Homeland Security. However, the TSO mentioned in those jobs actually refers to Transportation Security Officer. By evaluating the skill salience, Job2Skills  was able to identify TSO is not a valid skill. In addition, the market and salience agnostic baseline also ranked many specific tools such as SAP and Jakarta EE as top skills in Indian technology industry. Instead of targeting on very specific skills, the proposed Job2Skills  measures the supply in the Indian market and selected many job-related, high supply skills such as customer experience and helping clients.

The skills extracted by Job2Skills  can reveal market insights due to its market-awareness. It is interesting that the government industry market of US is significantly different from India. While India has a focus on a variety of skills ranging from research to social media to rights, US mostly focuses on military related skills. We suspect this is because US government positions are mostly defense/environmental related positions and prefer veterans. The technology industry is quite different between US and India, too. As shown in Tab. 6, it is clear that the Indian market has a focus on IT support whereas US is more about management and sales.

max width= Director Entry Industry Baseline Job2Skills Baseline Job2Skills Government DES leadership TSO communication management analytical skills DES defense teaching project management fire control hazardous materials communication management teaching analytical skills leadership interpersonal skills management army Technology sales analytical skills sales analytical skills management consulting cloud computing communication cloud computing project management devops SQL leadership sales DES software development DES communication management Java

Table 7. Top- job targeting skills of US government and technology generated the salience and market-agnostic baseline and Job2Skills. Red squared skills are sub-optimal.

Moreover, we found that Job2Skills captures the skill differences between seniority levels. In Tab. 7, we presented the top- job targeting skills of US government and technology industries generated by the baseline and the proposed Job2Skills. Compared to the baseline, skills generated by Job2Skills  are more representative and capture the skill shift across different seniority levels. For example, entry level positions require domain-specific skills such as SQL and Java, and higher level roles, regardless the industry, focus more in management skills such as management and leadership. The baseline model also selects many less relevant skills such as DES (Data Encryption Standard) and management for entry-level technology role due to the lack of salience and market modeling.

7. Related Work

Job Recommendation. Previous work usually treat the job targeting problem as job recommendation (Volkovs et al., 2017; Guo et al., 2017), and optimizes the model using direct user interaction signals such as click, dismiss, bookmark, etc. (Agarwal et al., 2009; Abel et al., 2017; Huang et al., 2019; Xia et al., 2020). Borisyuk et al. proposed LiJar (Borisyuk et al., 2017) to redistribute job targeting audiences and improve marketplace efficiency. Dave et al. designed a representation-learning method to perform job and skill recommendations (Dave et al., 2018) . Li et al. used career history to predict next position (Li et al., 2017). None of the previous works address the most pressing job targeting issue, which is how to properly represent jobs with relevant, important attribute entities to improve the number of quality applicants a job can reach.

Skill Analysis. Traditionally, skill analysis are often conducted by experts manually to either gain insights (Prabhakar et al., 2005) or curate structured taxonomy (Council and others, 2010). Recently, SPTM (Xu et al., 2018) used topic modeling to measure the popularity of IT skills from jobs. TATF (Wu et al., 2019)

is a trend-aware tensor factorization method that models time-aware skill popularity. DuerQuiz 

(Qin et al., 2019) is proposed to create in-depth skill assessment questions for applicant evaluation. These methods were applied to small-sacle IT jobs only and are not designed to extract skills for job targeting purpose. Recently Xiao proposed a social signal-based method for members’ skill validation (Yan et al., 2019). However it is not applicable to jobs due to the lack of such signals.

Job Market Analysis. Modeling job targeting and recommendation using skills is mostly inspired by economic research which analyzes the labor market using skills as the most direct and vital signal (Autor et al., 2003; Saar and Räis, 2017). However these works are either conducted on a very small scale or using only a handful of hand-crafted general skill categories. Woon et al. (Woon et al., 2015) performed a case study to learn occupational skill changes, but the skills are limited to skills provided by O*NET (Peterson et al., 2001). Radermacher et al. (Radermacher et al., 2014) studied the skill gap between fresh graduates and industry expectations based on the feedback of managers and hiring personnel using hand picked skills. Recently,  (Vasudevan et al., 2018; Johnston et al., 2017) analyzed labor demand and skill fungibility using a skill taxonomy with skills in the IT industry. APJFNN (Qin et al., 2018) and other resume-based method (Zhu et al., 2018) are developed to predict person-job fit by comparing the job description and resume. HIPO (Ye et al., 2019) identifies high potential talent by conducting neural network-based social profiling. OSCN (Sun et al., 2019) and HCPNN (Meng et al., 2019)

, use recurrent neural networks and attention mechanism to predict organization and individual level job mobility. However, none of these works addresses the market-aware job targeting task, and they all use a limited skill taxonomy that contains at most a thousand skills.

8. Conclusion and Future Work

In this work, we proposed salience and market-aware skill extraction task, discussed two data collection strategies, and presented Job2Skills, which models skill salience using deep learning methods and market supply signals using engineered features. Lastly, we conducted extensive experiments and showed that Job2Skills significantly improves the quality of multiple LinkedIn products including job targeting skill suggestions and job recommendation. We also performed large-scale case studies to explore interesting insights we obtained by analyzing Job2Skills results. In future work, we plan to add temporal information into the model and explore advanced methods to learn skill embeddings.

References

  • F. Abel, Y. Deldjoo, M. Elahi, and D. Kohlsdorf (2017) Recsys challenge 2017: offline and online evaluation. In RecSys, Cited by: §7.
  • F. Abel (2015) We know where you should work next summer: job recommendations. In RecSys, Cited by: §1.
  • D. Agarwal, B. Chen, and P. Elango (2009) Spatio-temporal models for estimating click-through rate. In WWW, Cited by: §7.
  • D. H. Autor, F. Levy, and R. J. Murnane (2003) The skill content of recent technological change: an empirical exploration. The Quarterly journal of economics 118 (4). Cited by: §7.
  • P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov (2017) Enriching word vectors with subword information. TACL 5. Cited by: §4.1.
  • F. Borisyuk, L. Zhang, and K. Kenthapadi (2017) LiJAR: a system for job application redistribution towards efficient career marketplace. In KDD, Cited by: §7.
  • D. Cer, Y. Yang, S. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, et al. (2018) Universal sentence encoder. arXiv preprint arXiv:1803.11175. Cited by: §4.1.
  • T. Chen and C. Guestrin (2016) Xgboost: a scalable tree boosting system. In KDD, Cited by: §4.3.
  • H. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir, et al. (2016) Wide & deep learning for recommender systems. In DLRS, Cited by: §4.3.
  • N. R. Council et al. (2010) A database for a changing economy: review of the occupational information network (o* net). Cited by: §7.
  • V. S. Dave, B. Zhang, M. Al Hasan, K. AlJadda, and M. Korayem (2018) A combined representation learning approach for better job and skill recommendation. In CIKM, Cited by: §7.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §4.1.
  • Y. Gu, B. Zhao, D. Hardtke, and Y. Sun (2016) Learning global term weights for content-based recommender systems. In WWW, Cited by: §1.
  • C. Guo, H. Lu, S. Shi, B. Hao, B. Liu, M. Zhang, Y. Liu, and S. Ma (2017) How integration helps on cold-start recommendations. In RecSys Challenge 2017, Cited by: §7.
  • X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua (2017) Neural collaborative filtering. In WWW, Cited by: §4.3.
  • C. Huang, X. Wu, X. Zhang, C. Zhang, J. Zhao, D. Yin, and N. V. Chawla (2019) Online purchase prediction via multi-scale modeling of behavior dynamics. In KDD, Cited by: §7.
  • B. Johnston, B. Zweig, M. Peran, C. Wang, and R. Rosenfeld (2017) Estimating skill fungibility and forecasting services labor demand. In Big Data, Cited by: §7.
  • K. Kenthapadi, B. Le, and G. Venkataraman (2017) Personalized job recommendation system at linkedin: practical challenges and lessons learned. In RecSys, Cited by: §5.2, Table 3.
  • J. Li, D. Arya, V. Ha-Thuc, and S. Sinha (2016) How to get them a dream job?: entity-aware features for personalized job search ranking. In KDD, Cited by: §1, §1.
  • L. Li, H. Jing, H. Tong, J. Yang, Q. He, and B. Chen (2017) Nemo: next career move prediction with contextual embedding. In WWW, Cited by: §7.
  • S. Li, B. Shi, J. Yang, J. Yan, S. Wang, F. Chen, and Q. He (2020) Deep job understanding at linkedin. In SIGIR, Cited by: §2.
  • Y. Lou and M. Obukhov (2017)

    BDT: gradient boosted decision tables for high accuracy and scoring efficiency

    .
    In KDD, Cited by: §4.3.
  • Q. Meng, H. Zhu, K. Xiao, L. Zhang, and H. Xiong (2019) A Hierarchical Career-Path-Aware Neural Network for Job Mobility Prediction. In KDD, Cited by: §7.
  • D. Nadeau and S. Sekine (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes 30 (1). Cited by: §1.
  • I. Paparrizos, B. B. Cambazoglu, and A. Gionis (2011) Machine learned job recommendation. In RecSys, Cited by: §1.
  • N. G. Peterson, M. D. Mumford, W. C. Borman, P. R. Jeanneret, E. A. Fleishman, K. Y. Levin, M. A. Campion, M. S. Mayfield, F. P. Morgeson, K. Pearlman, et al. (2001) Understanding work using the occupational information network (o* net): implications for practice and research. Personnel Psychology 54 (2). Cited by: §7.
  • B. Prabhakar, C. R. Litecky, and K. Arnett (2005) IT skills in a tough job market. Communications of the ACM 48 (10). Cited by: §7.
  • C. Qin, H. Zhu, T. Xu, C. Zhu, L. Jiang, E. Chen, and H. Xiong (2018) Enhancing person-job fit for talent recruitment: an ability-aware neural network approach. In SIGIR, Cited by: §7.
  • C. Qin, H. Zhu, C. Zhu, T. Xu, F. Zhuang, C. Ma, J. Zhang, and H. Xiong (2019) DuerQuiz: A Personalized Question Recommender System for Intelligent Job Interview. In KDD, (en). External Links: ISBN 978-1-4503-6201-6 Cited by: §7.
  • A. Radermacher, G. Walia, and D. Knudson (2014) Investigating the skill gap between graduating students and industry expectations. In ICSE, Cited by: §7.
  • R. Ramanath, H. Inan, G. Polatkan, B. Hu, Q. Guo, C. Ozcaglar, X. Wu, K. Kenthapadi, and S. C. Geyik (2018) Towards deep and representation learning for talent search at linkedin. In CIKM, pp. 2253–2261. Cited by: §4.1.
  • E. Saar and M. L. Räis (2017) Participation in job-related training in european countries: the impact of skill supply and demand characteristics. Journal of Education and Work 30 (5). Cited by: §7.
  • B. Shi, S. Li, J. Yang, M. E. Kazdagli, and Q. He (2020) Learning to ask screening questions for job postings. In SIGIR, Cited by: §2.
  • B. Shi, J. Yang, T. Weninger, J. How, and Q. He (2019) Representation learning in heterogeneous professional social networks with ambiguous social connections. In IEEE BigData, Cited by: §4.1.
  • Y. Sun, F. Zhuang, H. Zhu, X. Song, Q. He, and H. Xiong (2019)

    The Impact of Person-Organization Fit on Talent Management: A Structure-Aware Convolutional Neural Network Approach

    .
    In KDD, Cited by: §7.
  • S. Vasudevan, M. Singh, J. Mondal, M. Peran, B. Zweig, B. Johnston, and R. Rosenfeld (2018) Estimating fungibility between skills by combining skill similarities obtained from multiple data sources. Data Science and Engineering 3 (3). Cited by: §1, §7.
  • M. Volkovs, G. W. Yu, and T. Poutanen (2017) Content-based neighbor models for cold start in recommender systems. In RecSys, Cited by: §1, §7.
  • W. L. Woon, Z. Aung, W. AlKhader, D. Svetinovic, and M. A. Omar (2015) Changes in occupational skills-a case study using non-negative matrix factorization. In ICONIP, Cited by: §7.
  • X. Wu, T. Xu, H. Zhu, L. Zhang, E. Chen, and H. Xiong (2019) Trend-aware tensor factorization for job skill demand analysis. In IJCAI, Cited by: §7.
  • L. Xia, C. Huang, Y. Xu, P. Dai, B. Zhang, and L. Bo (2020)

    Multiplex behavioral relation learning for recommendation via memory augmented transformer network

    .
    In SIGIR, Cited by: §7.
  • T. Xu, H. Zhu, C. Zhu, P. Li, and H. Xiong (2018) Measuring the popularity of job skills in recruitment market: a multi-criteria approach. In AAAI, Cited by: §7.
  • X. Yan, J. Yang, M. Obukhov, L. Zhu, J. Bai, S. Wu, and Q. He (2019) Social skill validation at linkedin. In KDD, Cited by: §7.
  • Y. Ye, H. Zhu, T. Xu, F. Zhuang, R. Yu, and H. Xiong (2019) Identifying High Potential Talent: A Neural Network Based Dynamic Social Profiling Approach. In ICDM, Cited by: §7.
  • X. Zhang, Y. Zhou, Y. Ma, B. Chen, L. Zhang, and D. Agarwal (2016)

    Glmix: generalized linear mixed models for large-scale response prediction

    .
    In KDD, Cited by: §4.3.
  • C. Zhu, H. Zhu, H. Xiong, C. Ma, F. Xie, P. Ding, and P. Li (2018) Person-job fit: adapting the right talent for the right job with joint representation learning. TMIS 9 (3). Cited by: §7.