When it comes to jobs and careers, technical abilities and professional qualifications are important factors both from the perspective of an employer and of a new employee. However, as pointed out by recent studies , more and more attention is focused on soft skills, i.e. qualities that do not depend on the acquired knowledge and that are harder to quantify due to being related to one’s emotional intelligence and personality traits. At the same time, they are extremely important due to being able to facilitate human connections . The Oxford dictionary defines soft skills as “personal attributes that enable someone to interact effectively and harmoniously with other people”111https://en.oxforddictionaries.com/definition/soft_skills. During the period of 1980 and 2012, jobs with high social skills requirements grew by around 10 percent as a share of the US labour force . This rising importance of soft skills at labor markets stems from the growth of the service sector, where interpersonal services are sold, as well as from the introduction of lean-manufacturing, where an integrated skill set, comprised of both hard and soft skills, has gained importance [4, 5]. Interviews with employers reveal that soft skills such as motivation, teamwork and the ability to interact with others are increasingly important for employee’s success at the workplace . Also observational studies have shown that social features that may be related with soft skills (e.g. the variety of friendship connections and position diversity within a community) are positively correlated with economic outputs .
The growing importance of soft skills also carries implications for gender inequality in labour markets. Research has shown that certain societal groups are perceived as lacking important soft skills, i.e. evidence was found that black men are characterized as being less motivated than their white counterparts . Additionally, not all types of soft skills are valued equally, e.g. based on gender stereotypes and beliefs about women’s inferior status in the workplace, soft-skills that are perceived as “female” are found to be associated with wage penalties [8, 9, 10].
Despite the growing importance of soft skills and their implication for inequalities in labour markets, to date, we know surprisingly little about the role of soft skills in the job market. This gap is partially due to the unavailability of large-scale data on soft skills. We introduce a semi-automatic approach for extracting soft skills from job advertisements, and thereby conduct—to our knowledge the first—large-scale analysis on the association between soft skills and wages. Additionally, we present evidence on the impact of soft skills on sex segregation at labor markets.222Due to the unavailability of large-scale data on soft skills, contemporary research usually uses a single item to measure the impact of soft skills on wages and segregation, see, e.g. [11, 12, 13]. Although the existing literature on the supply-side mechanisms of occupational sorting, i.e. women making career choices based on potentially biased self-assessed beliefs about interests and capacities, is growing [14, 15], the demand-side process, meaning the allocation of men and women into sex-typed occupations by employers, remains to be underappreciated in the literature . Utilizing our newly extracted dataset based on job advertisements, we are able to fill this gap.
The article at hand is structured as follows: In Section 2, we present our methodology for extracting soft skill mentions from a large corpus of job advertisements. In Section 3, we scrutinize wage premiums and penalties associated with soft skills frequently mentioned in job ads based on a matching study. Next, the role of soft skills in reproducing gender segregation in the labour market is examined in Section 4, where we show that soft skills representing “female” skills are associated with wage penalties. Finally, we present related work in Section 5 and conclude by Section 6 which summarizes our findings and their implications for labour market inequality, discusses the limitations of this study, and provides suggestions for future work.
2 Methods and Data
In this section, we describe the datasets used in this work and our semi-automatic soft skill mining approach which first creates clusters of soft skills, grouping similar soft skills together, and then detects soft skills in job ads by matching the soft skill strings.
The analysis of soft skills is based on a dataset of 245,000 job advertisements (ads) from the United Kingdom (UK).333The dataset from UK is available at:
https://www.kaggle.com/c/job-salary-prediction This data is provided by the Adzuna job search engine, which collects job ads from hundreds of different websites. Each job ad entry contains the title, full description, job category, and salary of the job, among eight other fields.
The ads have been classified into 29 job categories. The category information is found by Adzuna’s own model, relying on the source of job ad and its description. In Table1, we show an analysis of the most distinctive soft skills for a selection of five job categories. The desired soft skills differ considerably depending on the industry. For example, the three most distinctive skills for the Teaching category are enthusiastic, dedicated, professional, whereas for the category Accounting & Finance they are accurate, responsible, analytical abilities. The soft skill detection algorithm is described later in Section 2.2.4.
|Social work||%||%||Accounting & Finance||%||%||IT||%||%||Teaching||%||%||Creative & Design||%||%|
|team player||+7.3||22.7||accurate||+7.5||14.1||problem solving||+4.6||8.9||enthusiastic||+12.1||20.2||creative||+24.8||30.3|
|ability to work with children||+6.6||7.0||responsible||+6.0||34.7||communication skills||+3.5||27.8||creative||+5.9||11.5||innovative||+5.3||11.2|
|positive||+4.3||9.8||communication skills||+4.6||28.9||innovative||+3.1||8.9||positive||+5.9||11.4||attention to detail||+4.9||9.8|
|flexible||+1.5||13.4||analytical skills||+3.2||5.9||team player||+2.4||17.8||leadership||+5.0||11.4||management skills||+4.4||14.2|
|leadership||+1.5||7.9||attention to detail||+2.9||7.8||analytical skills||+2.1||4.8||confident||+4.2||11.1||responsible||+3.6||32.4|
|patience||+0.8||0.9||ability to work within deadlines||+2.1||4.7||management skills||+1.8||11.6||hard working||+3.4||6.3||confident||+3.0||9.8|
|people skills||+0.8||2.3||interpersonal skills||+1.5||6.0||creative||+1.5||7.0||innovative||+3.1||8.9||presentation skills||+1.6||3.4|
All the experiments in this paper are conducted using the dataset from UK except for a crowd-sourcing experiments for collecting an initial list of soft skills described in the next section (2.2.1). For this crowd-sourcing experiment, we use a dataset of 19,000 online job postings from 2004–2015 posted through the Armenian human resource portal CareerCenter.444The Armenian dataset is available at:
We decided to use the Armenian dataset for this crowd-sourcing experiment, since it lists job requirements in a separate field. Thus the workers do not need to read through the full ad, allowing us to annotate more ads and to collect a longer list of soft skills. The risk of using a different dataset is that some skills might only appear in the dataset from UK. However, this most likely only applies to very infrequent soft skills that would have little effect on the downstream analyses.
2.2 Soft Skill Mining
Our semi-automatic soft skill mining approach consists of the following steps: first, crowdworkers generate an initial set of potential soft skills, second, skills that seldom refer to candidates are removed, third, soft skills with a similar meaning are clustered into groups of skills, and fourth, soft skills are detected in new ads. These steps are summarized in Figure 1 and explained in more detail in the following sections.
The resulting soft skills and their clusters are available at: https://drive.google.com/drive/folders/1N1XkmgJ8awB9SgQjcdsYqMK7oQJAqtKo?usp=sharing.
2.2.1 Crowdsourcing a List of Soft Skills
The collection of soft skills was done through Figure Eight (formerly known as CrowdFlower)555https://www.figure-eight.com/, a crowdsourcing platform that allowed us to speed up our data collection process by submitting annotation tasks to online crowdworkers.
First, each worker was given the following definition of soft skills:
In a nutshell soft skills can be identified as qualities that do not depend on acquired knowledge; they complement hard skills (also known as technical skills). According to Wikipedia soft skills “are a combination of interpersonal people skills, social skills, communication skills, character traits, attitudes,  social intelligence and emotional intelligence quotients”
This was followed by a list of soft skill examples and instructions for completing the tasks. In particular, the workers were instructed to read the presented text, consisting of the “job description” and “required qualifications” fields, select whether the text contained any soft skills, and, if that was the case, they were instructed to copy and paste the smallest relevant part of text denoting each skill to an answer field. Additionally, the workers were instructed to remove unnecessary adjectives and complements, but not to alter the text in any other way. For instance, Excellent communication skills with customers and partners had to be reported as communication skills.
Before the actual annotation phase, the workers were supposed to pass a training phase and answer a set of test questions, for which we had provided the correct answers: they had to obtain an accuracy level of at least 60% to proceed further. These test questions also showed up randomly during the actual annotation phase to ensure that the minimum accuracy level of 60% was maintained.
In total, we annotated 1,650 job ads by at least 3 different workers. The annotation effort was conducted in two batches. In between the batches and, again, after the second batch we computed the number of distinct soft skill as the function of the number of annotated ads, which is shown in Figure 2. The results show that the rate at which new soft skills are discovered is slowing down but new skills were still found at the end of the data collection. However, when examining the skills found last, most of them turned out to be typos and other unwanted skills, which is why we decided to stop the annotation task after the second batch.
Accounting for typos as well as recurrent superfluous adjectives, such as excellent, highly, and very good, results were automatically cleaned using a script that removed the superfluous adjectives, extra whitespace and punctuation, and that corrected simple typos and misspellings by comparing the detected skill tokens to a whitelist of valid skill tokens. Thereafter, we manually reviewed the skills to remove all non-soft skills and to prune out tokens not relevant to the skill.
The final manually curated collection included 948 unique soft skills.
2.2.2 Removing Ambiguous Soft Skills
The focus of this work is to analyze soft skill requirements for job applicants. However, often soft skill phrases in job ads do not refer to the required applicant characteristics, but they may also describe the working environment or something else. For instance, independent could be used to describe an “independent business” or a home care assistant might be required to “help people to remain independent in their own homes.” Therefore, it is crucial to be able to detect soft skills that refer to the candidate rather than something else.
To tackle this problem, we created another crowdsourcing task, instructing crowdworkers to annotate soft skill phrases in the context they appear, i.e. the job ads. We noticed that skills consisting of multiple tokens usually unambiguously refer to the candidate and therefore we only annotated the skills consisting of at most three words, that is, 582 out of the 948 skills found in the previous steps.
More specifically, for each one of these skills, we extracted 10 randomly sampled text snippets where the skill occurs, including 25 words before and after the skill. Then we asked crowdworkers to classify each snippet to one of the following three categories: Candidate, Company/company environment, or Other. At least three answers were recorded for each text snippet.
Based on the annotations, we computed the following confidence score666http://success.crowdflower.com/hc/en-us/articles/201855939-How-to-Calculate-a-Confidence-Score for each soft skill
where denotes the workers who classified an occurrence of skill to refer to a candidate, denotes the workers who assessed an occurrence of skill , and is the trust of worker . The trust is calculated by the crowdsourcing platform as the contributor’s accuracy level in the current job, determined by his/her accuracy during the training phase – as explained in section (2.2.1). Thus, the confidence score measures the proportion of votes for the Candidate category weighted by the trusts’ of the workers who gave the votes.
We included the skills with a confidence value of at least 0.7 into the final list of soft skills. This value allowed us to retain 81.3% of the annotated skills (8.3% of trigram, 10.3% of bigram and 40.1% of single-word skills were discarded) while still having a relatively high confidence that the retained soft skill phrases actually refer to the candidate.
2.2.3 Soft Skill Clustering
Many of the soft skills collected by the crowdworkers are synonyms or near-synonyms. The different versions of a skill result, e.g., from diverse ways of expressing the concept (team-worker, ability to work in a team), or from slightly different spellings (able to work in team). To unify the different variants, the collected soft skills were clustered by first employing an algorithmic approach and then refining the clusters manually. After experimenting with a small subset of soft skills, different algorithms and parameter settings, we decided upon the following procedure.
Each soft skill was first represented in the vector space by averaging the word2vec embeddings of its tokens, excluding stopwords. We used 300-dimensional embeddings pre-trained on the GoogleNews dataset.777Official archive available at:
https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM Then, we employed agglomerative clustering algorithm to cluster the embedding vectors using the average linkage cosine distance measure. The clusters were finally reviewed and manually improved by split and merge operations and by reassigning some of the skills to more appropriate clusters, obtaining a final list of 190 clusters.
2.2.4 Soft Skill Detection
In the final phase, our goal was to detect skill clusters in each job ad.
First, we preprocessed the job descriptions and the list of soft skills by lowercasing and removing stop words.888We used the list of English stop words from the NLTK package (http://www.nltk.org). We also removed the competence terms (able, skills, etc.) from most soft skills, if they were perceived as not being fundamental for skill identification, to avoid false negatives (e.g. capable of handling multiple tasks should match with abilities in handling multiple tasks). Still, for some skills, we kept the competence terms if they would have become too ambiguous, resulting in false positive detection (e.g. communication skills without the word skills would match with communication technologies).
Thereafter, we searched for each soft skill in each job description. If consisted of multiple tokens, we allowed for at most two extra words to occur before each token in addition to stop-words, that were allowed to be removed from certain skills without making them ambiguous. We also experimented with a more liberal way of matching skills, ignoring the word order of the skill tokens, but this was found to decrease the precision of the detected skills significantly.
Soft skills were detected in 78% of the ads, with 45.5% mentioning at least 3 soft skills, attesting to the importance of soft skills in the labour market.
3 Salary and Soft Skills
One of our main research questions is how the presence of certain soft skills may affect wages.
By analyzing annual salaries of job ads we see that low paid jobs contain, on average, more soft skills (3.52) than average paid (3.38) and high paid ones (2.94)999Based on the annual salary median, job ads were grouped like so: low pay if £20,000; high pay if £50,000; average pay otherwise.: all paired differences between salary groups are statistically significant with according to Tukey’s honest significance test.
While the higher prevalence of soft skills in low paid jobs is interesting by itself, it does not reveal which soft skills tend to be associated with wage premiums and which ones with wage penalties. To address this question we conduct a matching study.
3.1 Matching Study
In order to study the link between soft skill requirements in a job ad and the respective salary,101010The job ads do not mention the exact annual salary but only a range, so we use the median of the range as the job salary. we conduct a matching study . The benefit of matching is that, in pairing a treated job ad (i.e. an ad with a given job title and job category that contain a specific skill) with its counterfactual (i.e. an ad with the same title and category but without the specific skill), we can control for a range of unobserved job category characteristics. Therefore our results can be considered as coming close to being causal .
The specific matching strategy applied in this article is as follows: first, we group ads having the same job category and job title , ignoring stop words and the word order of the title. We picked all titles occurring at least twice, resulting in 34,071 distinct titles and 158,658 ads. Given a soft skill , a normalized salary reward is defined as
where and are the average salaries of job ads belonging to job category , having job title , and containing or not containing skill , respectively.
For example, in our dataset there are 291 “Software Engineer” job ads in the IT Jobs category out of which 6 contain the soft skill leadership. The average salary of these 6 positions is £46,217 per year, whereas the average salary for the other 285 positions is £40,010 per year. This means that the salary reward for leadership in Software Engineer / IT Jobs category is
suggesting that software engineering positions that require leadership skills usually pay more than other software engineering positions.
Given the individual salary rewards, the overall salary reward of soft skill is obtained by averaging the rewards over all possible job titles and categories
where and are the number of job ads belonging to job category , having job title , and containing or not containing skill , respectively. Individual rewards are weighted by the number of ads to avoid letting infrequent job titles have disproportionately large effect on the overall reward. In most cases, since typically less than half of the ads from any category contain a given soft skill. Thus, the individual rewards are typically weighted by the number of adds containing the skill.
A positive reward indicates that job ads that mention skill have on average a higher salary than other job ads from the same job category and the same job title that do not mention .
The top skills that are associated with wage premiums or penalties are shown in Table 2. Most of the soft skills associated with wage premiums can also be considered to be a requirement for higher occupational positions. Soft skills such as delegation skills, team building skills and leadership imply that a certain kind of supervision and authority toward others is required. In contrast, listening skills, willingness to learn, as well as being punctual, describe skills that entail a certain degree of subordination.
Our empirical observation that soft skills associated with wage premiums are also closely tied to leadership positions is in accordance with sociological occupational class theories. Previous research on occupational classes has identified the magnitude of a job’s authority as one of the key determinants in assessing the job’s position in the occupational class system [20, 21, 22]. Jobs that entail a high degree of authority also occupy a strategic position in the labour market: by monitoring their subordinates, employees in leadership positions are ensuring that a firm produces surplus. Given this powerful position, high degrees of authority entail a significant degree of bargaining power and thereby the possibility to demand higher than average wages .
Additional supporting evidence for this particular reading of the results comes from psychology. We find that character traits associated with wage premiums in our data, for instance delegation skills, team building skills, and strategic planning are closely connected to skills psychological research has identified as leadership characteristics, i.e. management of personnel, visioning, as well as general strategic skills .
What is striking, is that many of the aforementioned skills in Table 2 also correspond to gender stereotypes. Gender stereotypes are generalizations about commonly shared perceptions of female and male attributes. Previous research has shown that while women are described as embodying “communal behavior”, such as kindness, loyalty, and warmness, men are characterized by “agentic traits”, such as competitiveness and aggressiveness , and as possessing leadership abilities . The typical “agentic” traits, competitive and aggressive, have been filtered out as ambiguous (see Section 2.2.2), since they typically do not describe the desired characteristics of the job applicant. However, we find several leadership traits to come about with higher wages based on Table 2. Moreover, “communal behavior” seems to come about with wage penalties in Table 2 across the board (for instance: polite, dedication, friendly personality, and being calm).
Thus, Table 2 provides evidence that male gender stereotypes are connected to wage premiums, whereas female gender stereotypes are connected to wage penalties in the labor market. To scrutinize this issue further, we examine the association between gender stereotypes and wages in more detail in what follows.
|team building skills||9.8||50|
|ability to work in a fastpaced environment||8.0||51|
|ability to improve skills||6.0||108|
|ability to identify problems||-3.1||132|
|willingness to learn||-2.2||1652|
4 Gender and Soft Skills
Our second main goal was to explore differences in soft skills that are mentioned in female and male dominated professions and investigate to what extent gender stereotypes manifest in job ads.
4.1 Industry Gender Composition Prediction
In what follows, we test whether soft skills can predict the gender composition of a job category. The proportion of women for each job category was approximated by mapping the job categories in our data to the nearest categories from UK Labour Market statistics as shown in Table 3. 111111https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/datasets/employeesandselfemployedbyindustryemp14 The average number of mentioned soft skills is per ads in male-dominated jobs and in female-dominated categories, where the difference between the means is statistically significant (one-tailed
-test with equal variances;
). The goal of our analysis was to predict the proportion of women in the category of a job ad based on the soft skills mentioned in the ad. The predictions were obtained using ordinary least squares (OLS) regression over job ads containing at least 3 different soft skills. The model obtained anscore of 0.11.
|Job Category||ONS Category||% of women|
|Social work Jobs||Human health & social work activities||80.62|
|Healthcare & Nursing Jobs||Human health & social work activities||80.62|
|Charity & Voluntary Jobs||Human health & social work activities||80.62|
|Property Jobs||Real estate activities||57.6|
|Legal Jobs||Public admin & defence; social security||56.02|
|Creative & Design Jobs||Other||53.29|
|Domestic help & Cleaning Jobs||Accommodation and food services||53.18|
|Hospitality & Catering Jobs||Accommodation and food services||53.18|
|Maintenance Jobs||Wholesale, retail & repair of motor vehicles||48.23|
|Sales Jobs||Wholesale, retail & repair of motor vehicles||48.23|
|Retail Jobs||Wholesale, retail & repair of motor vehicles||48.23|
|Accounting & Finance Jobs||Financial & insurance activities||46.45|
|IT Jobs||Professional, scientific & technical activities||45.72|
|Engineering Jobs||Professional, scientific & technical activities||45.72|
|Scientific & QA Jobs||Professional, scientific & technical activities||45.72|
|HR & Recruitment Jobs||Administrative & support services||44.4|
|Customer Services Jobs||Administrative & support services||44.4|
|Admin Jobs||Administrative & support services||44.4|
|PR, Advertising & Marketing Jobs||Information & communication||29.77|
|Consultancy Jobs||Information & communication||29.77|
|Logistics & Warehouse Jobs||Transport & storage||23.93|
|Energy, Oil & Gas Jobs||Mining, energy and water supply||21.76|
|Trade & Construction Jobs||Construction||19.21|
|Part time Jobs||-||N/A|
The regression coefficients in Table 4 show soft skill clusters occurring more than 50 times with a . Positive coefficients are associated with female-dominated jobs and negative coefficients with male-dominated jobs.
|0.192||370||0.3||ability to work with children|
|0.059||290||-0.7||ability to maintain confidentiality|
|0.046||635||4.2||ability to adapt|
|0.040||1864||-0.3||flexible with hours|
|-0.040||269||2.2||ability to win new business|
|-0.037||200||3.7||ability to lead project teams|
|-0.034||149||2.7||ability to present ideas|
|-0.028||7191||-1.1||attention to detail|
A high proportion of women in a job category is associated with soft skills such as empathy, respectful, sensitivity and dedication. On the other hand, skills such as marketing skills, ability to win new business, ability to lead project teams and analytical skills are negatively associated with the proportion of women within a job category, meaning they predict soft skills mentioned in ads for male-dominated job categories. These results illustrate that with a few exceptions (e.g. delegation skills and managerial skills) the soft skills that are closely associated to gender stereotypes are predictive of the job’s gender composition. Thus, not only do skills associated with women get lower rewards in labor markets (as shown in Table 2), but we also find that soft skills stereotyped as being female are distinctive of the gender composition within a job. Put differently, not only does one get paid less if one is carrying out tasks connoted as being female but occupations carried out mainly by women require those skills that come about with wage penalties.
Moreover, our finding that there are two exceptions, i.e. delegation skills and managerial skills which are soft skills that are associated with leadership (male) stereotypes but still predict a high proportion of women in an occupation, is in line with previous research. Evidence was found that if stereotypical roles are swapped, women adopt male stereotypes, while men do not take on stereotypical female roles [26, 16].
4.2 Occupational segregation and gender-stereotypical soft skills
To further back up the claim that the gender composition of an occupation is shaped by gender stereotypes, we mapped our soft skill clusters to lists of twenty personality characteristics desired in men and another twenty characteristics desired in women—the so-called Bem Sex Role Inventory . Out of these, we were able to map five feminine and seven masculine characteristics to similar soft skill clusters as shown in Table 5.121212Additionally, we found the following four matches: Act as a leader leadership, Self-reliant confident, Cheerful cheerful personality, and Sympathetic sympathy. These were, however, left out from our analysis since the former two soft skills had already been assigned to other similar stereotypes and the latter two have insufficient samples sizes of Count=3 and Count=4, respectively. Based on the mappings, we set out to study the prevalence of the gender-stereotypical soft skills in the job ads from female- and male-dominated industries. The percentage of ads containing a skill within the ads from female (male) dominated industries is denoted by (). In the last column of Table 5 we show the percentage difference between these two percentages. A positive value means that the skill is used more in female-dominated industries and a negative value that it is used more in male-dominated industries.
|Gender stereotype (Bem, 1974)||Mapped Skill Cluster||()||()||()|
|Does not use harsh language||polite||-5.9||0.25||0.22||13.1|
|Loves children||ability to work with children||0.3||2.13||0.07||96.8|
|Sensitive to the needs of others||sensitivity||3.0||0.22||0.10||52.5|
|Has leadership abilities||leadership||7.4||9.85||5.94||39.7|
|Independent||capability to work independently||1.9||1.17||1.11||5.4|
|Makes decisions easily||make decisions||3.0||1.25||1.08||13.1|
All feminine skills are more prevalent in female-dominated industries, whereas for masculine skills the picture is not as clear. For instance, analytical skills are used more than five times more often in male-dominated industries, while leadership is used almost two times more often in female-dominated industries, although both of these skills are stereotypically masculine according to Bem . Again, this finding is in line with previous research, where evidence was found that although women will adopt male stereotypes, this is not true the other way around [26, 16] (see also 4.1).
Our findings have implications for occupational sex segregation, that is, the unequal distribution of men and women across occupations in the labour market. Advertising female or male-dominated jobs in accordance with the associated gender stereotypes reproduces cultural beliefs about these stereotypes and upholds the gender-typicality of occupations. Previous research has shown that cultural beliefs about gender stereotypes influence self-assessment of men and women [27, 15]. These biased self-assessments have been shown to be a crucial factor of career choices . Our evidence supports this notion of culturally influenced self-assessments, by illustrating that jobs advertisements that include female stereotypes are dominated by women and vice-versa for male stereotypes.
The results thereby suggest the importance of gender stereotypes in the reproduction of occupational segregation, i.e. the demand-side, and the corresponding selection of men and women in different occupations. Women seem to be overrepresented in occupations, in which positions are advertised using female stereotypes and vice versa for men. However, it is important to note that while our results establish a correlation between the usage of stereotypical soft skills and occupational segregation, studying the causal mechanisms between the two is beyond the scope of this paper. Nevertheless, this work supplements the much richer account of research examining the supply side of the unequal distribution of men and women across occupations, namely the influence of gendered individual preferences and respective assessments of one’s own skills and capacities [14, 15] by showing a connection between the demand-side, i.e. job ads, and occupational segregation.
4.3 Gendered Soft Skills and Salary
Next, we examine if there are different salary rewards for soft skills that are associated with either a high percentage of women or men in an occupation. Table 4 illustrates that out of the 12 soft skills associated with a high share of women, 6 come about with wage penalties, while this is only the case for 4 out of 12 soft skills in male-dominated job categories.
As aforementioned, many of the detected soft skills correspond to common gender stereotypes. In Table 4 we find these stereotypes to be an important factor in determining the gender ratio of a specific job. Gender stereotypes, however, have also been found to influence wages. More specifically, tasks that are linked with typically “female” responsibilities are associated with wage penalties . An explanation for the devaluation of “female” tasks is found in the prescribed lower status of women, i.e. gender status beliefs. Gender status beliefs are diffuse cultural beliefs on account of which men are rated more competent than women. These beliefs are being transferred to the labor market and thereby facilitate a devaluation of typically “female” characteristics and tasks in the workplace .
The rewards in Table 4 illustrate that soft skills that reflect gender stereotypes about women, such as respectful, empathy and dedication are predominantly associated with wage penalties (with the exception of sensitivity). These results thereby affirm the preliminary findings from Table 2, namely that it is typically female skills that become devalued at labor markets. This confirms several prior research that found evidence that, net of individual labour-market-relevant characteristics such as work experience, tasks identified as being “female” are associated with wage penalties [28, 11, 13].
When it comes to soft skills found in male-dominated jobs, the table shows that skills identified in the previous section as being associated with leadership, which is also stereotypically ascribed to men, do come with wage premiums (i.e., ability to win new business, ability to lead project teams, and ability to present ideas). However, we also find that leadership skills associated with female-dominated occupations such as delegation skills, and managerial skills are also related with wage premiums. This means that soft skills that are associated with a high share of women in an occupation come about more often with wage penalties compared to soft skills that are associated with a high percentage of male incumbents. However, if soft skills in female-dominated occupations represent leadership skills they can also comprise wage premiums.
To collect more data on the association between sex-typed gender stereotypes and wage penalties or premiums, we calculated the salary rewards of the soft skills clusters that we found congruent with the personality traits from the Sex Role inventory by Bem . The rewards are provided in Table 5. We find that all masculine skills are associated with a positive reward, whereas 3/5 feminine skills are associated with a penalty. The average rewards for masculine and feminine skills are 2.6 and -1.7, respectively. This difference is statistically significant (one-tailed -test with equal variances; ). This suggests that stereotypically masculine character traits are valued more in the workplace than feminine character traits.
Based on the evidence provided we find that the devaluation of women is mainly realized via gender stereotypes, while skills associated with male stereotypes, i.e. leadership skills, do receive wage premiums.
5 Related Work
Next we describe existing work in which the job market is analyzed using a machine learning perspective. Related work from sociology, economics and psychology has been discussed earlier in sections1, 3, and 4.
The curation of hard skills has been addressed earlier, and extensive lists of hard skills are widely provided and employed, e.g., by LinkedIn . Kivimäki et al.  suggested a system for automatic detection of new skills in free written text using spread-activation algorithm. In addition to mining skills, they studied the tagging of skills with relevant industry labels to reduce possible ambiguity. Recently, Haranko et al.  suggested a novel approach for collecting data on skills and gender imbalances through LinkedIn’s advertising platform.
Job salary prediction has been addressed using at least two public datasets: UCL Census Income dataset131313https://archive.ics.uci.edu/ml/datasets/census+income and the UK dataset from Kaggle used in this paper. In the UCL Census Income dataset the salary label, indicating whether it is above $50,000 per year, is predicted for a person based on the demographic information. Chakraborti 
compared different machine-learning algorithms to predict the salary labels, obtaining an F-measure of 0.858 using a decision tree classifier.
The more recent UK dataset can be used for job salary prediction using features from the job ad, such as job description, category, contract type, etc. The best solution on this Kaggle competition was achieved using a deep neural network, yielding a mean absolute error of £3,465 per year.
Xiao  analyzed the Chinese job market to predict salary increase over 6 years, based on demographical characteristics, education, previous work experience and company characteristics. This study showed that formal education has a positive impact on the employer hiring decision and on the initial salary, rather than further salary increase. Another finding of this study is that improved technical proficiency through changing production technology, contributed positively to the salary growth. At the same time, the study mentions a lot of unexplained variations in the salary increase that might be caused by personal characteristics, not taken into account in their study.
6 Discussion and Conclusions
This study examined soft skills in the labour market and showed that soft skills are a crucial component of job ads, especially of low-paid jobs and jobs in female-dominated professions and may therefore potentially perpetuate labour market inequalities. To explore how soft skills shape labor market outcomes with special emphasis on salary rewards (or penalties) and gendered labour market composition we developed a semi-automatic approach for mining soft skills from job advertisements.
We would like to highlight three key findings of our study:
We found that not all soft skills are valued equally in the labour market, some are associated with wage premiums while others are linked to wage penalties.
Some soft skills are significant predictors of a job’s gender composition. Utilizing solely soft skills, we can explain 11% of the variation in the gender composition of job categories. Soft skills that are associated with gender stereotypes, such as empathy and sensitivity for women, are significant predictors for a high percentage of women in the respective jobs, and vice versa is found for characteristics perceived as being “male”.
However, the selection of men and women into different occupations would in itself not be crucial for labour market inequality, as long as this segregation only implies that men and women work in different occupations and no other repercussions are attached. Previous research, however, has pointed out that wages paid in female-dominated occupations are lower than in male-dominated occupations [35, 36, 37]. Sex segregation in labour market is thus perceived as being a crucial factor of perpetuating wage differentials between men and women. Therefore, our results suggest that gender stereotypical job ads may contribute to a selection of women into lower paying occupations, thus upholding gender wage inequalities in labour markets.141414However, assessing the causal mechanism of wording in job ads on occupational sex segregation is beyond the scope of this paper. Providing conclusive evidence for this research question would call for a more nuanced, i.e. longitudinal, analysis.
Typically “female” soft skills, i.e. prescribed stereotypes about women, are mostly associated with wage penalties, while soft skills associated with leadership, and as such stereotypes that are associated with men, come with wage premiums—even after controlling for the job title and job category.
Although, by drawing on empirical research from psychology, we could explain which tasks are associated with being “male” or “female”, we believe that certain soft skills, such as being respectful and being curious
are probably important in any kind of job. Given this assumption, it is the more compelling to find that while the former is associated with a high percentage of women in an occupation and wage penalties, the latter comes about with wage premiums and is found in job ads for male-dominated occupations. This hints, as discussed, at a general devaluation of task carried out by women in labour markets.
This study was not without limitations. Therefore next we discuss these restraints and briefly consider how these limitations can be addressed in the future research.
First, distinguishing between soft skill as a necessity for a certain job and assets was beyond the scope of the paper. The accuracy of the soft skill detection, as well as the distinction of a soft skill being an asset or a necessity, could be improved by considering part-of-speech features.
Second, although we were able to account for a considerable degree of unobserved occupational heterogeneity by using matching techniques, in order to rigorously test the impact soft skills on wages, one would need to analyze if wage premiums or penalties associated with certain soft skills hold, net of individual labor-market-relevant attributes. Given previous evidence that finds that tasks associated with being “female”, such as “nurturing skills” do pose a penalty on wages, net of individual characteristics, it is plausible that our results would be stable as well. In future research this could be tested by linking the soft skills to individual survey data, such as the British Household Panel (BHPS).
Regardless of these limitations, this study has made an important contribution to the impact of soft skills in the labour market. Combining computational methods as well as theoretical and empirical insights from economics, sociology and psychology enabled us to shed more light on how soft skills operate in the labour market. We showed that soft skills are a crucial component of job ads, especially of low-paid jobs and jobs in female-dominated professions and we found evidence that soft skills uphold gender segregation across occupations and that they preserve wage inequalities between men and women by rewarding typically “male” characteristics while penalizing “female” traits.
Grugulis and Vincent [4, p.599] put it this way: “When it is an individual character that is being judged, evaluations based on gender and race are far more likely”. Put differently, personal traits and characteristics, namely soft skills, are hard to evaluate and thus likely subjected to proxies such as gender or race and associated stereotypes, which in turn leads to discrimination. Our results support this observation, as they suggest that the increasing importance of soft skills is polarizing labour market outcomes in terms of wages and occupational segregation. This polarization strikes women, as an already vulnerable group in labour markets, the hardest.
We are grateful to Olaf Groh-Samberg, Karin Gottschall, and Matti Nelimarkka for their invaluable feedback on previous versions of the article.
-  Lucas, S.: Retail and Food Services Most at Risk From Soft Skills Deficit. https://developmenteconomics.co.uk/retail-and-food-services-most-at-risk-from-soft-skills-deficit/. Accessed: 2017-10-30 (2015)
-  Bortz, D.: Soft skills to help your career hit the big time. https://www.monster.com/career-advice/article/soft-skills-you-need (2014)
-  Bakhshi, H., Downing, J.M., Osborne, M.A., Schneider, P.: The future of skills employment in 2030. Technical report, Pearson PLC (2017)
-  Grugulis, I., Vincent, S.: Whose skill is it anyway? ’soft’ skills and polarization. Work, employment and society 23(4), 597–615 (2009)
-  Shibata, H.: Productivity and skill at a japanese transplant and its parent company. Work and Occupations 28(2), 234–260 (2001)
-  Moss, P., Tilly, C.: “soft” skills and race: An investigation of black men’s employment problems. Work and Occupations 23(3), 252–276 (1996)
Xie, W.-J., Yang, Y.-H., Li, M.-X., Jiang, Z.-Q., Zhou, W.-X.: Individual position diversity in dependence socioeconomic networks increases economic output. EPJ Data Science6(1), 10 (2017)
-  Ridgeway, C.L.: Interaction and the conservation of gender inequality: Considering employment. American Sociological Review, 218–235 (1997)
-  Ceci, S.J., Williams, W.M.: Understanding current causes of women’s underrepresentation in science. Proceedings of the National Academy of Sciences 108(8), 3157–3162 (2011)
-  Lester, J.: Women in male-dominated career and technical education programs at community colleges: Barriers to participation and success. Journal of Women and Minorities in Science and Engineering 16(1) (2010)
-  Kilbourne, B.S., England, P., Farkas, G., Beron, K., Weir, D.: Returns to skill, compensating differentials, and gender bias: Effects of occupational characteristics on the wages of white women and men. American Journal of Sociology 100(3), 689–719 (1994)
-  Charles, M., Grusky, D.B.: Occupational ghettos: The worldwide segregation of women and men. Stanford University Press (2004)
-  England, P., Herbert, M.S., Kilbourne, B.S., Reid, L.L., Megdal, L.M.: The gendered valuation of occupations and skills: Earnings in 1980 census occupations. Social Forces 73(1), 65–100 (1994)
-  Busch-Heizmann, A.: Supply-side explanations for occupational gender segregation: adolescents’ work values and gender-(a) typical occupational aspirations. European Sociological Review 31(1), 48–64 (2014)
-  Correll, S.J.: Gender and the career choice process: The role of biased self-assessments. American journal of Sociology 106(6), 1691–1730 (2001)
-  Levanon, A., Grusky, D.B.: The persistence of extreme gender segregation in the twenty-first century. American Journal of Sociology 122(2), 573–619 (2016)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
-  Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)
-  Angrist, J.D., Pischke, J.S.: Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press (2009)
-  Hertel, F.R.: Social Mobility in the 20th Century: Class Mobility and Occupational Change in the United States and Germany. Springer (2016)
-  Oesch, D.: Redrawing the Class Map: Stratification and Institutions in Britain, Germany, Sweden and Switzerland. Palgrave Macmillan UK (2006)
-  Wright, E.O.: Class counts: Comparative studies in class analysis. Cambridge University Press (1997)
-  Mumford, T.V., Campion, M.A., Morgeson, F.P.: The leadership skills strataplex: Leadership skill requirements across organizational levels. The Leadership Quarterly 18(2), 154–166 (2007)
-  Rudman, L.A., Glick, P.: Prescriptive gender stereotypes and backlash toward agentic women. Journal of social issues 57(4), 743–762 (2001)
-  Bem, S.L.: The measurement of psychological androgyny. Journal of consulting and clinical psychology 42(2), 155–162 (1974)
-  England, P.: The gender revolution: Uneven and stalled. Gender & society 24(2), 149–166 (2010)
-  Correll, S.J.: Constraints into preferences: Gender, status, and emerging career aspirations. American sociological review 69(1), 93–113 (2004)
-  England, P.: Comparable worth: Theories and evidence. Transaction Publishers (1992)
-  Bastian, M., Hayes, M., Vaughan, W., Shah, S., Skomoroch, P., Kim, H., Uryasev, S., Lloyd, C.: Linkedin skills: large-scale topic extraction and inference. In: Proceedings of the 8th ACM Conference on Recommender Systems, pp. 1–8 (2014). ACM
-  Kivimäki, I., Panchenko, A., Dessy, A., Verdegem, D., Francq, P., Bersini, H., Saerens, M.: A graph-based approach to skill extraction from text. Proceedings of TextGraphs-8 Graph-based Methods for NLP, 79–87 (2013)
-  Haranko, K., Zagheni, E., Garimella, K., Weber, I.: Professional gender gaps across us cities. arXiv:1801.09429 (2018)
-  Chakraborti, S.: A comparative study of performances of various classification algorithms for predicting salary classes of employees. In: International Journal of Computer Science and Information Technologies, vol. 5, pp. 1964–1972 (2014)
-  Mnih, V.: Q&A With Job Salary Prediction First Prize Winner Vlad Mnih. http://blog.kaggle.com/2013/05/06/qa-with-job-salary-prediction-first-prize-winner-vlad-mnih/. Accessed: 2017-10-31 (2013)
-  Xiao, J.: Determinants of salary growth in shenzhen, china: an analysis of formal education, on-the-job training, and adult education with a three-level model. Economics of Education Review 21(6), 557–577 (2002)
-  Levanon, A., England, P., Allison, P.: Occupational feminization and pay: Assessing causal dynamics using 1950–2000 us census data. Social Forces 88(2), 865–891 (2009)
-  Mandel, H.: Up the down staircase: Women’s upward mobility and the wage penalty for occupational feminization, 1970-2007. Social Forces 91(4), 1183–1207 (2013)
-  Murphy, E., Oesch, D.: The feminization of occupations and change in wages: A panel analysis of britain, germany, and switzerland. Social Forces 94(3), 1221–1255 (2015)