Predicting Personalized Academic and Career Roads: First Steps Toward a Multi-Uses Recommender System

01/03/2020 ∙ by Alexandre Nadjem, et al. ∙ Université d'Avignon et des Pays de Vaucluse 0

Nobody knows what one's do in the future and everyone will have had a different answer to the question : how do you see yourself in five years after your current job/diploma? In this paper we introduce concepts, large categories of fields of studies or job domains in order to represent the vision of the future of the user's trajectory. Then, we show how they can influence the prediction when proposing him a set of next steps to take.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Identifying the next possible step in a career involves many different factors. They can be hidden, like personal reasons, or specific to a time period, often to a more general context, like the reputation of a company. Others can also be explicit like the skills and the past job in a résumé (Kessler et al., 2012). Different strategies have been studied to predict which will be the next job or company. For instance, it is possible to find hidden mechanisms in a career evolution after investigating a specific field or job. (James et al., 2018) chose to focus on the career evolution of researchers and improved the prediction of the next workplace of a researcher by excluding the laboratory a researcher had no contact or never worked with. (Li et al., 2017) have studied a set of real LinkedIn111 data to build a next career prediction system. By crossing multiple information not only about the user’s past but also about the company, they improve the precision on not only the next job but also the next location.

We cannot compare with most works focused on predicting the next step in a career since they use the assumption it will be in the same field as the last one. At the beginning of their studies, few people have a clear objective and actually pursue it. Some follow a standard path, others hear about an opportunity and go for something they would never have expected. By increasing the number of proposed career paths, one can find new recommendations that would motivate this person. This is why we think it is important to display as many choices as possible avoiding to represent the evolution of a career with linear or standard modeling. We firmly believe that the user must not feel driven to a specific place but a set of opportunities where to look for.

2. Problem definition

A lot of information is hidden when looking only at the résumé but there might be hints in the past description useful to find clues of what has influenced some change. Maybe someone was learning music in its free time and after many years this person decided to go back to this activity. At first, we should define what a reorientation is. Reorientation may be understood in many different ways(Negroni, 2005)

. So far, we do not want to choose one of them. Nevertheless, from the moment that we agree to take into account this phenomenon, many tracks of research are open at 3 levels: data analysis, models for prediction, multi-uses of recommender systems (RS). From an

analysis point of view, can we find on each trajectory a clue allowing the hypothesis that a reorientation occurred? Is it possible to find classes of reorientation222Some reorientations could be slow and prepared, like someone starting a new diploma or a vocational education, or could happen suddenly without giving any warning, like changing from trader to baker. using or not a preset of categories? From a prediction point of view, there are mainly two problems. The first one is how to differentiate trajectories with reorientations from ones without. When there is no reorientation, recommending the continuity seems obvious. On the contrary, assuming that a reorientation is sure, the system will have to choose between many possible new activities. As done in Information Retrieval with the Relevance Feedback principle (Spärck-Jones, 1999; Cabrera-Diego et al., 2019), how can we filter all these choices and put the user in the loop? From a RS point of view, are we able to explain to the user why the system gave these results by displaying the different hints used to make one or another proposition? In the experiments reported in Section 5, we will focus on the prediction level with the purpose to reuse it soon in some RS functionalities.

3. Data representation

3.1. Data’s Fragility

An on-line résumé is composed of declarative sentences. Without a more objective data source, the RS is still dependent of the subjectivity of the autobiographical writing and the goal of creating a self-introduction (Lejeune, 1989; J.A. et al., 2011).

Any biographical work produces an illusion (Bourdieu, 1986), but we accept the risk of categorizing these massive declarative data.

3.2. Recoding for normalization

Two graduated students who studied in the same school and got the same diploma will not present it the same way in their résumés. One might use the complete name, the other an abbreviation (Bachelor of Business Administration/B.B.A). We need to address this diversity and regroup similar steps under the same entities or categories by normalizing incoming data. We have used the nomenclature from the ONISEP333 as a model for standardizing the steps composing a trajectory and for categorizing the profession(Desrosières and Thévenot, 1988; Thomas, 2013). The International Standard Classification of Occupations is a tool for organizing jobs into a clearly defined set of groups according to the tasks and duties undertaken in the job. It is intended both for statistical uses and for client oriented uses444

3.3. Dataset

In this paper, we are using data coming from Viadeo555, a professional social network. We also take advantage from previous works done by HumanRoads666 covering the extraction of a list of French diplomas, job titles and their translations.

After analyzing a résumé, either it has been written on paper or on a computer, it is possible to extract different categories of information.

  • User: A user is unique and can be represented by a name, an email or an id. For the prediction, they can be anonymous, because we focus on what the users have done and not on their real identity.

  • Steps: these data contain the highest amount of information. Each step is composed of title T, start and end date, location, additional information like detailed description of tasks and knowledge acquired.

  • Skills: it is a description of what has been learned over the years. A user can underline some skills and fields. He can give an appreciation on himself in a particular field or get it from somebody else.

The dataset is composed of 9383 users and a total of 65403 steps (i.e. an average of 6.9 steps per user).

3.4. Fields and Concepts

Fields are information/tags such as “internet” or “wind-power” that categorize the current step. Concepts (C) are even larger categories regrouping a maximum of fields while being distinct enough from each other, like “computer science” (CS) or “environment & energy”. The concepts are large enough to simulate a fuzzy vision of the future. When someone asks for help, he often has a vague idea of what he wants to do next. We use the concepts as hints to simulate someone explaining “I want to work/study somewhere related to environment or energy”. We have 17 Concepts for diplomas and 47 for jobs.

3.5. Approaches for profile modeling

A profile can be modeled as a succession of steps. As shown in the example given in Table 1

, the first 3 steps represent diplomas. Step 4 represents the first job done after the studies. Each step is composed of keywords (like CS), which help to classify the steps under the corresponding concepts.

Step 1 Step 2 Step 3 Step 4
T HighSchool Bachelor Master Software
degree in CS in CS Consultant
Math & CS & CS & CS &
Science Internet Internet Internet
& consulting
Past Present Future User intention
Table 1. Steps of a user

Baseline: for everybody, the RS will always propose, at each step, the same list ordered by decreasing frequencies. In other words, the context is completely ignored here. Looking for a better prediction, we use one of three key elements at a time to improve the recommendations. The first job after the current step, (this first job could be the next step but also a later one), the highest diploma obtained up to date and the concept of the previous step.

To simulate user interactions with the system, we need to have a “future” step. We extract fields and concepts out of the next step as a feedback from the user for the prediction. Since step 4 is the last one, we will not try to predict it. We also need more information about the past of the user, thus we will also not predict step 1. We removed all the profiles with less than 3 steps from the dataset. If a step is not classified, we also remove it. Back to Table 1, note that,to predict step 2, we only need information highlighted in italics and bold.

4. Algorithm: Next field prediction

The HumanRoads tool developed for visualization gives us a good basis for modeling interaction with the user. The following examples showcase a user whose current step is bachelor in CS. He can access the path shown in Figure 1.

Figure 1. 1st option given to the user after a bachelor in CS.

If a user chooses “Further studies”, we can ask him 2 questions leading to different options: the first one would be, do you already have a goal? If the user already knows what he wants, he is asking for additional information. Some options include the shortest way to achieve his goal, the most commonly chosen studies or intermediate jobs. But if his answer is “No”, we need more precision to propose an orientation. Maybe he has a vague idea of what he wants to study. We propose the first six most commonly chosen concepts (see Figure 2) and a list of the remaining concepts in “more résumé” (see Table 2). Since it is not possible to massively involve users during the experimentation, we have simulated these interactions by looking at concepts of the next steps.

Figure 2. 2nd option given to the user.
Concepts Frequency
Army & Security 42
Business, Sales & Marketing 41
Agriculture, fishing 10
Table 2. Other concepts in more résumé.

4.1. A simple model

At this stage, the model (1) is purposely kept very simple in order to favor the explanatory dimension. Given a set of observations (namely the concepts on which we can rely), the hypotheses are sorted for the prediction through a probabilistic decision-making approach . The frequencies are sorted by the function, in a decreasing order, guaranteeing the optimality without any approximation thanks to the Bayes rules (see eq. 1).


4.2. Evaluation criterion

For evaluation purpose, we have adapted the well-known Reciprocal Rank measure (MRR) (Voorhees and Harman, 2000). If the user goes for the first choice, we score 1, the second and so on. Since, for each bucket of 6 propositions, they are displayed more or less on the same level, we apply a ”fudge factor” softening the difference of penalties between two consecutive ranks , in the same pack: = (with empirically set to 0.2). If none of the results are correct, the answer is hidden in “more résumé”. Since it requires a new action to develop a new list, we divide the following score by 2. Every 6 propositions, we divide again by 2 the scores, because it requires an additional effort for the user to find a fitting proposal.

5. Experiments

After removing steps mentioned in section 3.5, it remains users (), diploma steps (), job steps (). In order to respect the principle of a non-biased evaluation, we opted for a cross-validation process. Before predicting a step, this one is temporarily removed from the dataset and the remaining steps are used as a training set.

5.1. Results

Figure 3 shows the Mean Rank (MR) for Concept prediction of the current job relying on 3 possible Concepts : Previous job, Last Diploma or Next job. The histogram shows the number of steps in each interval [r,r+1]. Clearly, relying on the previous job gives the best density in the lowest ranks.

Figure 3. MR for Concept prediction of current Job.

In Table 3, the MRR criterion (3rd

column) allows to compare 4 methods applied to predict concepts for the current diploma (from 0.73 to 0.75). The confidence interval (CI) is given for the MRR in the 4

th column. Finally, Table 4 shows the MRR for the jobs description. Using this criterion, we compare 4 methods to predict concepts for the current job (from 0.73 to quite 0.8). Both for diplomas and jobs, relying on any information site outperforms the Baseline.

Method MR MRR CI
Baseline 4.7 0.699 [0.692, 0.706]
First Job 4.0 0.750 [0.743, 0.756]
Previous Diploma 4.0 0.733 [0.727, 0.740]
Higher Level Diploma 4.0 0.740 [0.734, 0.747]
Table 3. MR and MRR criterion comparison of four methods to predict concepts for the current diploma.
Method MR MRR CI
Baseline 5.1 0.730 [0.724, 0.737]
Last Diploma 4.3 0.763 [0.756, 0.769]
Previous Job 3.8 0.798 [0.791, 0.804]
Next Job 3.7 0.8 [0.797, 0.811]
Table 4. MR and MRR criteria comparison of four methods to predict concepts for the current job.

5.2. Discussion

Using the concept of the first job after the current diploma gives the highest prediction. This suggests that the intention after a diploma has a higher impact in the choice of a career. This also emphasizes the need to interact with the user and include him in the decision-making.

If the current step is a job, the concept of the previous job and the next job results are close. The lower bound using the previous job (0.791) is higher than the upper bound for a diploma (0.740) showing a higher continuity between 2 consecutive steps in a professional career than in an academic one. Once again, using the user’s intention gives the best results. The MR decreases to 3.7 in this case.

With an interval ranging from 0.797 to 0.811, the MRR has been upgraded up to 0.8. A part of the remaining 20% may be due to the simplicity of our methods; the rest must come from reorientations and their unpredictability. In section 2, we choose not to define explicitly what a reorientation is. Now, we can consider it as the set of steps the system has not correctly predicted, (those included in the least probable results). This affirmation has been confirmed by many samples we found when analyzing logs of the decision process. For instance, someone working for years in hotels (tourism domain), suddenly and singularly, move to health care which is the 13

th most popular hypothesis over 47. How could this proposal be predicted at a better rank, close to the MR 3.7?

6. Conclusions and short term perspectives

In this paper, we have studied how to predict the next step of academic or career roads without taking into account a possible reorientation. We have used concepts induced from the future in order to simulate the fuzzy vision of the user intentions.

We are aware that using large categories, even if they are distinct enough, has some impact on the results. It will be straightforward to use a higher level of granularity such as the fields. Now, the system recommends for the current step an ordered list of concepts at a rough level. In order to cope with the non linear distribution, we could model such a long tail, regrouping the weak frequencies. Beyond this, we are currently working on the opposite angle: a RS designed to rank and find the best profiles matching a job concept. This way, it will be a new opportunity to search how to include the principle of mobility in the model.

The authors would like to thank the anonymous referees for their valuable comments and helpful suggestions. The work is supported by the Sponsor ANRT France Rl~Grant #3.


  • (1)
  • Bourdieu (1986) Pierre Bourdieu. 1986. L’illusion biographique. In Actes de la Recherche en Sciences Sociales. Persée, Lyon, France.
  • Cabrera-Diego et al. (2019) Luis Adrián Cabrera-Diego, Marc El-Bèze, Juan-Manuel Torres-Moreno, and Barthélémy Durette. 2019. Ranking résumés automatically using only résumés: A method free of job offers. Expert Systems with Applications 123 (2019), 91–107.
  • Desrosières and Thévenot (1988) Desrosières and Thévenot. 1988. Les catégories socioprofessionnelles.
  • J.A. et al. (2011) Baggerman J.A., Dekker R.M., and Mascuch M.J. 2011. Controlling Time and Shaping the Self: Developments in Auto­biographical Writing since the 16th Century. Egodocuments and History Series, Vol. 3. Brill.
  • James et al. (2018) Charlotte James, Luca Pappalardo, Alina Sirbu, and Filippo Simini. 2018. Prediction of next career moves from scientific profiles. ArXiv [stat.AP] 1802, 04830 (2018), 36–44.
  • Kessler et al. (2012) Rémy Kessler, Nicolas Béchet, Mathieu Roche, Juan-Manuel Torres-Moreno, and Marc El-Bèze. 2012. A hybrid approach to managing job offers and candidates. Information Processing & Management 48, 6 (2012), 1124 – 1135.
  • Lejeune (1989) Philippe Lejeune. 1989. On Autobiography. Egodocuments and History Series, Vol. 52. Paul John Eakin.
  • Li et al. (2017) Liangyue Li, How Jing, Hanghang Tong, Jaewon Yang, Qi He, and Bee-Chung Chen. 2017. NEMO: Next Career Move Prediction with Contextual Embedding. In IW3C2. WWW 2017 Companion, Perth.
  • Negroni (2005) Catherine Negroni. 2005. La reconversion professionnelle volontaire : d’une bifurcation professionnelle à une bifurcation biographique. In Cahiers internationaux de sociologie. PUF, Perth, 311–331.
  • Spärck-Jones (1999) Karen Spärck-Jones. 1999.

    Information retrieval and artificial intelligence.

    Artificial Intelligence 114, 1-2 (1999), 257–281.
  • Thomas (2013) Amossé Thomas. 2013. Revisiting the History of Socio-professional Classification in France. Annales. Histoire - Sciences Sociales 4, 2 (2013).
  • Voorhees and Harman (2000) Ellen M. Voorhees and Donna Harman. 2000. Overview of the Eighth Text REtrieval Conference (TREC-8). 1–24.