The Internet has become the primary channel for disseminating information in many areas of society. This is the case for job advertisements (ads), where approximately 60% of Australian job ads are posted online . At aggregate levels, online job ads can provide valuable indicators of relative labour demands. Rather than relying solely on lagging indicators from labour market surveys, online job ads data can reveal shifting labour demands as they occur. This can provide policy-makers, researchers, and businesses with additional data points to assess the health and dynamics of labour markets.
Real-time labour demand data is essential for Data Science and Analytics (DSA) occupations because of how rapidly DSA skills are evolving and diffusing into other occupational classes. In this research, DSA skills refer to the use of scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data, which can be used to make data-driven decisions and actions . DSA skills are multi-disciplinary, adopting methods from fields such as statistics, mathematics, and computer science. A distinction can also be made between skills, knowledge, abilities, and occupations. ‘Skills’ are the proficiencies developed through training and/or experience ; ‘knowledge’ is the theoretical and/or practical understanding of an area; ‘ability’ is the competency to achieve a task ; and ‘occupations’ are the amalgamation of skills, knowledge, and abilities that are used by an individual to perform a set of tasks that are required by their vocation. For simplicity, throughout this paper the term ‘skill’ will include ‘knowledge’ and ‘ability’.
There are several challenges when analysing the labour demands of occupations and assessing the extent of skills shortages. The first challenge concerns accurately identifying occupations based on their evolving skill demands. Occupations are organised into standardised hierarchical classifications, which vary across national jurisdictions. Most often, these are static, rarely updated classifications, which fail to capture the changing skill demands, or to detect the creation of new occupations. For instance, ‘Data Scientists’, ‘Data Engineers’ and ‘Data Analysts’ do not exist in the Australian and New Zealand Standard Classification of Occupations (ANZSCO); rather, they are all grouped as ‘ICT Business Analysts’. Furthermore, even when occupations are analysed based on their skill frequencies , biases emerge from the difference in their relative frequency. For example, ‘Communication Skills’ occur in around one-quarter of all job ads used in this work. However, just because some skills are common does not mean that they are more or less important than other skills that are also required in an individual job. This leads to two related questions: (1) how to adaptively identify relevant skills from labour market data while minimising biases that emerge from ad hoc aggregations? And (2) how to identify relevant occupations based on this generated set of skills?
The second challenge is detecting evidence of skills shortages from (near) real-time data. Skill shortages are mostly measured via labour market surveys . This involves surveying employers about their abilities to access workers who possess the skills their firms demand. A major shortcoming of this approach is that surveys are difficult to scale, and that they are rarely conducted on statistically valid samples . Another significant issue is that labour market surveys are lagging indicators, i.e. the publication of results can be many months after the data was collected. Lastly, due to scaling limitations, prominent labour market surveys on skills shortages (or mismatches) fail to measure all standardised occupations . Therefore, the questions are can we detect evidence of skill shortages from real-time labour market data? If so, what are the key variables for assessing skills shortages from such data?
This paper addresses the above challenges using a large dataset of over 6.7 million Australian online job ads spanning between 2012-01-01 and 2019-02-28, which has been generously provided by Burning Glass Technologies111BGT is a leading vendor of online job ads data. https://www.burning-glass.com/ (BGT). The data has been collected via web scraping and systematically processed into structured formats. The dataset consists of detailed information on individual job ads, such as location, salary, employer, educational requirements, experience demands, and more. The skill requirements have also been extracted (totalling
unique skills) and each job ad is classified into its relevant occupational and industry classes.
To address the first challenge, we first adapt an established similarity measure originating from Trade Economics  to measure the pairwise similarity between unique skills in job ads. Next, we develop a novel data-driven method to generate sets of skills highly similar to a set of seed skills. Finally, we uncover the relevant occupations for which at least of all skills required in their associated ads are from the target set of skills. We apply this method to uncover the set of DSA skills and DSA occupations, starting from a seed set of common DSA skills.
We address the second challenge by identifying five key variables from online job ads data which are critical for detecting skill shortages in real-time: (1) job ad posting frequency growth; (2) median salary levels; (3) educational requirements; (4) experience demands; and (5) job posting predictability. We then analyse the DSA occupations according to each of these five variables and find compelling evidence for how these features are predictive of skill shortages.
The main contributions of this work include:
We develop a data-driven methodology to construct skills sets for specific occupational areas, and to select occupations based on granular skills-level data;
We identify five key variables for detecting skill shortages from online job ads data;
We apply the aforementioned methods to a unique dataset of online job ads to analyse the changing labour demands of DSA skills and occupations in the advanced economy of Australia. We also construct and share the list of top DSA skills generated from this dataset.
Ii Related Work & Limitations
Job ads data as a proxy for labour demand. During 2001-2003, Lee  gathered job ads data from the websites of Fortune 500 companies in order to analyse the skill requirements Systems Analysts. Lee was able to determine that these positions demanded their candidates to have ‘all-round’ capabilities, beyond just technical skills. More recently, Gardiner et al.  procured 1,216 job ads with ‘Big Data’ in the job title from the indeed.com API. The authors then conducted content analyses to investigate how ‘Big Data’ skills have manifested in labour demand. Their research reiterated that employers are demanding technical skills in conjunction with ‘softer’ skills, such as communication and team-work.
DSA skill shortages. While the capacity to collect, store, and process information may have sharply risen, it is argued that these advances have far outstripped present capacities to analyse and make productive use of such information . Claims of DSA skill shortages are being made in labour markets around the world [7, 22, 24], including in Australia. Most similar to this research, however, are two studies conducted using BGT data to assess DSA labour demands. The first was an industry research collaboration between BGT, IBM, and the Business-Higher Education Forum in the US . The research found that in 2017 DSA jobs earned a wage premium of more than US$8,700 and DSA job postings were projected to grow 15% by 2020, which is significantly higher than average. In another study commissioned by the The Royal Society UK , BGT data were analysed for DSA jobs in the UK. The results also showed high levels of demand for DSA skills, particularly technically rigorous DSA skills.
Limitations of using online job ads data. It is argued that job ads data are an incomplete representation of labour demand. Some employers continue to use traditional forms of advertising for vacancies, such as newspaper classifieds, their own hiring platforms, or recruitment agency procurement. Job ads data also over-represent occupations with higher-skill requirements and higher wages, colloquially referred to as ‘white collar’ jobs .
Iii Skill similarity and sets of related skills
Skills provide the means for workers to perform labour tasks in order to fulfill their occupational demands. Therefore, the assortment of skills required for a job, and their pairwise interconnections uniquely identify occupations. In this section, we propose a methodology to capture the ‘similarity’ between skill-pairs that co-occur in job ads. Intuitively, two skills are similar when the two are related and complementary, i.e. the skills-pair supports each other. For example, ‘Python’ and ‘TensorFlow’ have a high similarity score because together they enable higher productivity for the worker, and because the difficulty to acquire either skill when one is already possessed by a worker is relatively low.
The Revealed Comparative Advantage of a skill. We develop a data-driven methodology to measure the pairwise similarity between pairs of skills that co-occur in job ads. One difficulty we encounter is that some skills are ubiquitous, occurring across many job ads and occupations. We address this issue by adapting the methodology proposed by Alabdulkareem et al.  to maximise the amount of skill-level information obtained from each job ad, while minimising the biases introduced by over-expressed skills in job ads. We use the Revealed Comparative Advantage (RCA) to measure the relevance of a skill for a particular job ad , computed as:
where when the skill is required for job , and otherwise; is the set of all distinct skills, and is the set of all job ads in our dataset. , and the higher the higher is the comparative advantage that is considered to have for . Visibly, decreases when the skill is more ubiquitous (i.e. when increases), or when many other skills are required for the job (i.e. when increases).
provides a method to measure the importance of a skill in a job ad, relative to the total share of demand for that skill in all job ads. It has been applied across a range of disciplines, such as trade economics  , identifying key industries in nations , and detecting the labour polarisation of workplace skills .
Measure skill similarity. The next step is measuring the complementarity of skill-pairs that co-occur in job ads. First we introduce the ‘effective use of skills’ defined as and otherwise. Finally, we introduce the skill complementarity (denoted
) as the minimum of the conditional probabilities of a skills-pair being effectively used within the same job ad. Skillsand are considered as highly complementary if they tend to commonly co-occur within individual job ads, for whatever reason. Formally:
Note that , a larger value indicates that and are more similar, and it reaches the maximum value when and always co-occur (i.e. they never appear separately).
Top DSA skills.
We use the function to create a list of Data Science and Analytics skills.
First, we qualitatively select 5 common DSA skills as seed inputs: ‘Artificial Intelligence’, ‘Big Data’, ‘Data Mining’, ‘Data Science’
‘Artificial Intelligence’, ‘Big Data’, ‘Data Mining’, ‘Data Science’, and . Next, for each of these 5 DSA skills, we calculate the top 300 skills with the highest similarity scores. Finally, we merge the five lists, we calculate the average similarity scores for each unique skill, and rank in descending order. This results in a ranked list of 589 skills, which we qualitatively assess and decide keep the top 150 skills. While some skills outside of the top 150 could be considered DSA skills, it was at this point that the relevance to DSA skills began to deteriorate and merge into other domains. For example, skills such as ‘Design Thinking’, ‘Front-end Development’, and ‘Atlassian JIRA’ – which are technical, but not DSA specific – were just outside of the top 150 skills.
The purpose of this top DSA skills list is to capture DSA labour trends rather than represent a complete taxonomy of DSA skills. The list of top 150 DSA skills can viewed in the online appendix .
Iv DSA occupations and categories
Compute the skill intensity. In this section, we present an adaptative technique to uncover Data Science and Analytics occupations from job data. First, we compute the ‘DSA skill intensity’ for each standardised BGT occupation, defined as percentage of DSA skills relative to the total skill count for the job ads related to an occupation . Formally:
where is the set of DSA skills, and is the set of job ads associated with the occupation .
Select the top DSA occupations. We qualitatively assessed the occupational list ordered by , and decided to establish a cutoff at . The rationale for this threshold level was that occupations just below this cutoff are questionably considered DSA occupations – take for example, ‘Web Developer’ and ‘UI / UX Designer / Developer’. Occupations just above this threshold appeared more consistent with the definition of DSA skills given in Section I. Moreover, the occupations with a DSA skill intensity level just above the threshold represented occupations where the authors considered DSA skills to likely become more prevalent. For example, the demands for DSA skills are expected to increase for Economists due to the growing amounts of economic data that are being made available . Therefore, this list represents occupations where DSA skills are already important, or have reached a minimum threshold of DSA skill intensity and where DSA skills are likely to become more important for the occupation.
|DSA Category||DSA Occupation||#Ads|
|Data Scientists and Advanced Analysts||Biostatistician||270|
|Financial Quantitative Analyst||947|
|Data Analyst||Business Intelligence Architect / Developer||3,166|
|Data / Data Mining Analyst||34,520|
|Data Systems Developers||Computer Programmer||16,311|
|Computer Systems Engineer / Architect||73,437|
|Data Warehousing Specialist||964|
|Mobile Applications Developer||4,357|
|Software Developer / Engineer||113,247|
|Functional Analysts||Business Intelligence Analyst||23,547|
|Fraud Examiner / Analyst||653|
|Security / Defense Intelligence Analyst||482|
|TOTALS||23 DSA Occupations||306,577|
Table I shows the 23 occupational classes that satisfy these DSA threshold requirements. Occupations are categorised to compare labour dynamics within the DSA occupational set. The occupational categories are adapted from previous BGT research completed in the US  and UK . Fig. 1 gives a brief definition of the functional role of each category and places them on a comparative scale of analytical rigour.
V Detecting Skill Shortages from Job Ads
In this section, we propose five labour demand variables for detecting skill shortages from job ads data. These include: (1) job ad posting frequency growth; (2) median salary levels; (3) educational requirements; (4) experience demands; and (5) job posting predictability. We argue that these variables taken together provide explanatory insight for identifying skill shortages of occupations.
V-a Variables for detecting skill shortages
This research has found evidence of DSA skill shortages for the ‘Data Scientists and Advanced Analysts’ (‘Data Scientists’, henceforth) and ‘Data Analysts’ categories. A combination of factors have led to these conclusions.
Job ads posting frequency. Both categories have experienced high relative growth in terms of posting frequencies (shown in Fig. (a)a). High posting frequency growth can be indicative of increasing employer demands for workers that possess specific occupational skills . Both ‘Data Scientists’ and ‘Data Analysts’ have averaged higher than average year-on-year growth rates ( and , respectively) than the other DSA categories and the market average () (see Fig. (b)b).
Salaries. ‘Data Scientists’ and ‘Data Analysts’ command high, and growing, wage premiums (Fig. (c)c). High and growing wages indicate that employers are willing to pay a premium to attract workers with specific skills . That is, when labour supply is constrained and labour demand increases, then wages should increase, as is the case for ‘Data Scientists’ and ‘Data Analysts’.
Education levels. High relative educational requirements can constrain the supply of skilled labour by creating barriers to entry . In Fig. (d)d, this is especially evident for ‘Data Scientists’, where the years of education required by employers is significantly higher than average and other categories.
Experience demands. The minimum years of experience demanded by employers can vary according to the accessibility of skilled labour. If employers have difficulty hiring the labour they demand, then they may reduce their experience-level requirements as part of their recruitment efforts . As Fig. (e)e shows, this is again the case for ‘Data Scientists’ and to a lesser extent ‘Data Analysts’, where experience levels have remained relatively low. For ‘Data Scientists’, the minimum experience requirements have decreased by almost one year since 2012 and sit just above the market average. For ‘Data Analysts’, the average years of minimum experience have been below the market average since after 2016.
Job ad posting predictability. Lastly, we assert that the predictability of job ad posting frequency should be considered as an explanatory variable for detecting skill shortages. We have observed the difficulties of predicting occupations (and skills) that have high-growth in terms of job ad postings. As seen in Fig. (f)f, the forecast predictions for ‘Data Scientists’ job ads perform relatively poorly compared to the lower growth categories. We contend that this is due to the rapidly changing labour dynamics of ‘Data Scientists’ and that this lack of predictability tends to highlight the patterns of high-growth occupations, reflecting another measure of rising labour demands. In the next section (Section V-B) we detail how we quantify the predictability variable.
Taken collectively, these factors form a strong case that the Australian labour market has been experiencing a shortage of ‘Data Scientists’ and ‘Data Analysts’. These variables form a framework of features to detect skill shortages from job ads.
V-B Predict job ad posting
Forecast ads posting. In this section, we propose a ‘predictability’ feature by building a time series model to predict job ad posting frequencies for each of the categories . We use the Prophet time series forecasting tool developed by Facebook Research . Prophet is an auto-regressive tool that fits non-linear time series trends with the effects from daily, weekly, and yearly seasonality, and also holidays. The three main model components are represented in the following equation:
where refers to the trend function that models non-periodic changes over time; represents periodic changes, such as seasonality; denotes holiday effects; and is the error term and represents all other idiosyncratic changes. For more details on Prophet and its hyper-parameter choices, please refer to the online appendix .
Prediction error measure. Using Eq. 1, one can run forward time and get forecasts for jobs ads postings in the future. We measure the accuracy of the forecast using the Symmetric Mean Absolute Percentage Error (SMAPE) [29, 23]. SMAPE is formally defined as:
where denotes the actual value of jobs posted on day , and is the predicted value of job ads on day . SMAPE ranges from 0 to 200, with 0 indicating a perfect prediction and 200 the largest possible error. When actual and predicted values are both 0, we define SMAPE to be 0. We selected SMAPE as an alternative to MAPE because it is (1) scale-independent and (2) can handle actual or predicted zero values. For a discussion on alternate error metrics, please consult the online appendix .
Evaluation protocol. The forecasts made using Prophet are deterministic (i.e. given the same input, we will obtains the same output). We evaluate the uncertainty of predicted future job ad volumes using a ‘sliding window’ approach. As shown in Fig. 9, we use a constant number of training days () to train the model, and we test the forecasting performance on the next days. We shift both the training and the testing periods right by one day, and we repeat the process. We iterate this process 365 times, denoted in Fig. 9 using Train start for the training period starting point, Test start for the starting point of the test period, and using Window start for the starting point of the unused period. Consequently, we train and test the model 365 times, and we obtain 365 SMAPE performance values, which are presented aggregated as a boxplot in Fig. (f)f. The advantage of this approach is that it has provided a distribution of SMAPE scores across a range of testing periods, which allows for a more robust evaluation of the modelling performance.
The job ads posting frequency for all DSA categories have all grown since 2012. However, the more technically rigorous categories of ‘Data Scientists’ and ‘Data Analysts’ have experienced the highest growth trends. There are three distinct change point periods observed. Firstly, from January 2012 to April 2014, where the frequency of all job ads are growing. Over this period, only ‘Data Scientists’ grew at a faster rate than the total market for ‘All Australian Job’ Ads (using the simple growth formula). This period can perhaps be explained by (1) the higher levels of job openings being posted online earlier in the dataset and (2) the early stages of DSA skills demanded by occupations, particularly for the more technically rigorous occupations.
The second period, from approximately May 2014 to November 2017, was generally one of slowing growth for online job ads. A possible explanation for this period is Australia’s increasing underemployment rate . Underemployment rose relatively steeply from just above 7% in 2014, diverging from a lowering unemployment rate, before reaching a peak just below 9% around the beginning of 2017. Underemployment then began to slightly decrease until the end of 2018. The sharp rise in underemployment could be indicative of employers being less willing or able to hire due to softening labour market conditions, which would presumably affect the frequency of job ad postings. While the more analytically rigorous categories of ‘Data Scientists’ and ‘Data Analysts’ also experienced slowing growth, they both grew at higher rates relative to other categories. The fact that these categories maintained strong upward trends, despite dampening labour market forces, highlights the high levels of labour demand for these occupational categories.
The final period from October 2017 until February 2019 (the end of this dataset), was generally one of stagnation or slight growth. Again, ‘Data Scientists’ and ‘Data Analysts’ continued upward trajectories, albeit at slower growth rates than previous periods. All DSA categories had higher trend growth rates than ‘All Australian Job Postings’ during this period. This final change point period highlights some possible conclusions. Firstly, the frequency of online job ads have potentially reached a saturation point. This means that the maximum proportion of job postings captured via online aggregators might have reached its upper limits. If this is the case, then any posting frequency growth for specific occupational classes above the total market rate could indicate high (or relatively high) labour demand. From this perspective, all DSA jobs continue to experience higher labour demands relative to all Australian job ads postings in the dataset since 2014.
The strong relative growth of ‘Data Scientists’ and ‘Data Analysts’ also provides insight. One interpretation is that Australian firms and employers have started to increasingly adopt AI technologies. A recent report by McKinsey & Co suggests that this is the case . The accelerating rate of AI adoption requires highly skilled labour to make productive use of these technologies. These are the same analytically rigorous skills that are demanded from ‘Data Scientists’ and ‘Data Analysts’. As a result, some portion of this growing labour demand for DSA skills, particularly the highly technical DSA skills, could be explained by accelerating AI adoption by Australian firms. Another related perspective is that Australian firms have increasing access to data with potentially meaningful insights. Therefore, workers with DSA skills that are able to productively use and draw insights from such data would logically be in high demand.
Vii Conclusions and Future Research
In this research, we firstly developed a data-driven methodology to construct an adaptive set of skills highly similar to a set of seed skills. We then applied this method to identify the DSA skills set and DSA occupations, organising these occupations into common DSA categories. Secondly, we proposed five variables from online job ads data which are critical for the real-time detection of skill shortages. We then analysed the DSA categories according to each of these five variables. Here, we find strong evidence for how these features are collectively predictive of skill shortages. From this analysis, we find evidence that Australia is experiencing skills shortages for ‘Data Scientists’ and ‘Data Analysts’ occupations. A combination of indicators points to these conclusions. Firstly, both categories have experienced high relative growth in terms of posting frequencies. Secondly, both categories command high, and growing, wage premiums. Thirdly, both categories demand higher than average education requirements, which constrains the supply of skilled labour pursuing these vocations. This is especially the case for ‘Data Scientists’. Fourthly, the average minimum years of experience required by employers for these categories are low. For ‘Data Scientists’, the minimum experience requirements have decreased by almost one year since 2012 and sit just above the market average. For ‘Data Analysts’, the average years of minimum experience have been below the market average since 2017. Lastly, these occupational categories are relatively difficult to predict, especially for occupations in the ‘Data Scientists’ category. Taken collectively, these factors form a strong case that the Australian labour market has been experiencing a shortage of ‘Data Scientists’ and ‘Data Analysts’.
Limitations and future work. A limitation of this work is that it only consists of labour demand data and does not account for labour supply. Future work might corroborate these findings according to official labour shortage lists published by governments (i.e. a labour supply ‘ground truth’). This could be achieved by developing a multivariate logistic classifier where the five proposed variables are used as features to predict whether an occupation is experiencing shortage. Conducting equivalent analyses on other markets and occupational groups could also provide insights into the predictive performance of these explanatory variables.
Marian-Andrei Rizoiu was partially funded by the Science and Industry Endowment Fund, under project no. D61 Challenge: E06. We would like to thank Burning Glass Technologies for generously providing the data for this research.
=0mu plus 1mu
-  (2018-07) Unpacking the polarization of workplace skills. Sci Adv 4 (7), pp. eaao6030 (en). Cited by: §III, §III.
-  (2019) Appendix: . Note: XXXXXXXX Cited by: §III, §V-B, §V-B.
-  (2018-10) Underemployment in australia: 6202.0 - labour force, australia, september 2018. Australian Bureau of Statistics (en). Note: https://www.abs.gov.au/ausstats/abs@.nsf/Lookup/6202.0main+features10September%202018Accessed: 2019-8-13 Cited by: §VI.
-  (2018) 6302.0 - average weekly earnings, australia, nov 2018. Commonwealth of Australia (en). Note: https://www.abs.gov.au/ausstats/abs@.nsf/mf/6302.0Accessed: 2019-8-1 Cited by: §-B.
-  (2019-06) 6291.0.55.003 - labour force, australia, detailed, quarterly, may 2019. Australian Bureau of Statistics (en). Cited by: §-B.
-  (2018) UCube: higher education statistics. Note: Title of the publication associated with this dataset: Completions Cited by: §-A.
-  (2019-05) Dynamics of data science skills. Technical report The Royal Society. Cited by: §II, §IV.
-  (2002) Introduction to time series and forecasting. Vol. 2, Springer. Cited by: Appendix A.
-  (2015-03) Skill gaps, skill shortages, and skill mismatches: evidence and arguments for the united states. ILR Review 68 (2), pp. 251–290. Cited by: §V-A, §V-A.
-  (2014) Understanding online job ads data. Technical report Georgetown University. Cited by: §II.
-  (2006) 25 years of time series forecasting. International journal of forecasting 22 (3), pp. 443–473. Cited by: §A-B.
-  (2018) ACS australia’s digital pulse 2018: driving australia’s international ICT competitiveness and digital growth. Technical report Australian Computer Society. Cited by: §-A.
-  Skill shortages. Note: https://www.employment.gov.au/skillshortagesAccessed: 2019-11-1 Cited by: §I.
-  Sixty per cent of job vacancies in australia are advertised online. Note: https://www.employment.gov.au/newsroom/sixty-cent-job-vacancies-australia-are-advertised-onlineAccessed: 2019-7-7 Cited by: §I.
-  (2013-12) Data science and prediction. Commun. ACM 56 (12), pp. 64–73. Cited by: §I.
-  (2014-11) Economics in the age of big data. Science 346 (6210), pp. 1243089 (en). Cited by: §IV.
-  (2018-10) Skill requirements in big data: a content analysis of job advertisements. Journal of Computer Information Systems 58 (4), pp. 374–384. Cited by: §I, §I, §II.
-  (1999-12) Adjusting to a new technology: experience and training. J. Econ. Growth 4 (4), pp. 359–383. Cited by: §V-A.
-  T. Hey, S. Tansley, and K. Tolle (Eds.) (2009-10) The fourth paradigm: Data-Intensive scientific discovery. 1 edition edition, Microsoft Research (en). Cited by: §II.
-  (2007-07) The product space conditions the development of nations. Science 317 (5837), pp. 482–487 (en). Cited by: §I, §III.
-  (2005) Analysis of skill requirements for systems analysts in fortune 500 organizations. Journal of Computer Information Systems 45 (4), pp. 84–92. Cited by: §II.
-  (2018) LinkedIn workforce report — united states. Technical report LinkedIn. Cited by: §II.
-  (1993) Accuracy measures: theoretical and practical concerns. International Journal of Forecasting 9 (4), pp. 527–529. Cited by: §A-B, §V-B.
-  (2011) Big data: the next frontier for innovation, competition, and productivity. Technical report McKinsey Global Institute. Cited by: §II.
-  (2017) The quant crunch: how the demand for data science skills is disrupting the job market. Technical report Burning Glass Technologies. Cited by: §II, §IV.
-  (2015) Skill mismatch and the costs of job displacement. In Annual Meeting of the American Economic Association, Cited by: §I.
-  (2017-02) Toward labor market policy 2.0: the potential for using online job-portal big data to inform labor market policies in india. Policy Research Working Papers, The World Bank. Cited by: §V-A.
-  (2019-05) OECD skills strategy 2019 - skills to shape a better future. Technical report OECD. Cited by: §I, §I.
-  (1985-07) Long-Range forecasting: from crystal ball to computer. 2 edition edition, Wiley-Interscience (en). Cited by: §A-B, §V-B.
-  (2016-12) Constrained pathways to a creative urban economy. Urban Stud. 53 (16), pp. 3439–3454. Cited by: §III.
-  (2019) Australia’s automation opportunity: reigniting productivity and inclusive income growth. Technical report McKinsey & Company. Cited by: §VI.
-  (2018) Forecasting at scale. The American Statistician 72 (1), pp. 37–45. Cited by: §A-A, §A-A, Appendix A, §V-B.
-  (1991-06) A theoretical evaluation of alternative trade intensity measures of revealed comparative advantage. Weltwirtsch. Arch. 127 (2), pp. 265–280. Cited by: §III.
Appendix A Time series analysis
Time series analysis provides a set of techniques to draw inferences from a sequence of observations stored in time order . The development of accurate time series models can offer insights into the principal components that have affected historical growth trajectory patterns. They also facilitate a means for making predictions into the future.
This paper applies a relatively new and high-performing time series forecasting tool developed by Facebook, called Prophet . The forecasting tool is applied to Australian online job ads data to uncover growth trends of DSA jobs.
A-a Prophet forecasting tool
In 2017, Facebook Research released Prophet as an open source forecasting procedure implemented in the Python and R programming languages. When benchmarked against ARIMA, ETS (error, trend, seasonality) forecasting, seasonal naive forecasting, and the TBATS model, Prophet forecasts had significantly lower Mean Absolute Percentage Errors (MAPE) .
The default hyperparameters of Prophet were applied for this analysis. This included an uncertainty interval of 80%, the automatic detection of trend change points, and the estimations of seasonality using a partial Fourier sum. For seasonality, Prophet uses a Fourier order of 3 for weekly seasonality and 10 for yearly seasonality. Experimentation steps were conducted by specifying a custom holidays dataframe, adjusting smoothing parameters, and fitting the model with a multiplicative seasonality setting. However, all of these specifications led to a slight deterioration of performance metrics. Therefore, the default hyperparameters were restored, which the authors state“are appropriate for most forecasting problems”.
A-B Evaluating performance
The Prophet library includes a method for calculating a range of evaluation metrics.333The method is called cross_validation. For more information, see: https://facebook.github.io/prophet/docs/diagnostics.html However, these metrics are not ideal for measuring prediction performance of online job ads for two reasons.
Firstly, analyses in this paper are comparing DSA categories with different scales of job posting frequencies. Therefore, most metrics calculated by Prophet’s diagnostics method, such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE), are not suitable for comparisons because such measurements are scale-dependant .
Secondly, an appropriate performance metric for this dataset must not be distorted by zero values. This is important for job posts, where some DSA categories recorded zero daily postings, particularly earlier in the dataset. Subsequently, this rules out the last meaningful performance metric calculated by Prophet’s diagnostics, namely MAPE. As the dataset contains zero values for posting frequencies, MAPE values can be infinite as it involves division by zero.
Therefore, accommodating for these two criterion points, the selected prediction performance metric is the Symmetric Mean Absolute Percentage Error (SMAPE). SMAPE is an alternative to MAPE that is (1) scale-independent and (2) can handle actual or predicted zero values. SMAPE, first proposed by Armstrong  and then by Makridakis ,
A-C DSA Skills List
|8||Big Data Analytics||0.11683186|
|18||Natural Language Processing||0.051589073|
|28||Internet of Things (IoT)||0.038865379|
|30||Extraction Transformation and Loading (ETL)||0.037375468|
|33||Microsoft Power BI||0.03691897|
|69||SQL Server Analysis Services (SSAS)||0.018858212|
|78||Microsoft Sql Server Integration Services (SSIS)||0.016224833|
|82||SQL Server Reporting Services (SSRS)||0.01460998|
|84||Data Lakes / Reservoirs||0.014444455|
|93||Supervised Learning (Machine Learning)||0.013255296|
|96||Visual Basic for Applications (VBA)||0.012941596|
|97||PERL Scripting Language||0.012885431|
|100||Oracle Business Intelligence Enterprise Edition (OBIEE)||0.012256767|
|107||Relational DataBase Management System (RDBMS)||0.011907611|
|116||Amazon Web Services (AWS)||0.01118572|
|122||Continuous Integration (CI)||0.010688564|
|123||Business Intelligence Reporting||0.010349562|
|126||AWS Elastic Compute Cloud (EC2)||0.010217691|
|135||Boosting (Machine Learning)||0.009409621|
|136||Platform as a Service (PaaS)||0.009390802|
|139||Support Vector Machines (SVM)||0.009167358|
|140||Data Warehouse Processing||0.00903522|
|145||AWS Simple Storage Service (S3)||0.008939552|
|146||Dimensional and Relational Modelling||0.008727614|