Scientific Paper Recommendation: A Survey

08/10/2020
by   Xiaomei Bai, et al.
Microsoft
0

Globally, recommendation services have become important due to the fact that they support e-commerce applications and different research communities. Recommender systems have a large number of applications in many fields including economic, education, and scientific research. Different empirical studies have shown that recommender systems are more effective and reliable than keyword-based search engines for extracting useful knowledge from massive amounts of data. The problem of recommending similar scientific articles in scientific community is called scientific paper recommendation. Scientific paper recommendation aims to recommend new articles or classical articles that match researchers' interests. It has become an attractive area of study since the number of scholarly papers increases exponentially. In this survey, we first introduce the importance and advantages of paper recommender systems. Second, we review the recommendation algorithms and methods, such as Content-Based methods, Collaborative Filtering methods, Graph-Based methods and Hybrid methods. Then, we introduce the evaluation methods of different recommender systems. Finally, we summarize open issues in the paper recommender systems, including cold start, sparsity, scalability, privacy, serendipity and unified scholarly data standards. The purpose of this survey is to provide comprehensive reviews on scholarly paper recommendation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

07/26/2020

Do recommender systems function in the health domain: a system review

Recommender systems have fulfilled an important role in everyday life. R...
11/03/2020

RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms

In recent years, there are a large number of recommendation algorithms p...
08/09/2020

Scientific Article Recommendation: Exploiting Common Author Relations and Historical Preferences

Scientific article recommender systems are playing an increasingly impor...
02/28/2019

Representation Learning for Recommender Systems with Application to the Scientific Literature

The scientific literature is a large information network linking various...
02/17/2020

HotelRec: a Novel Very Large-Scale Hotel Recommendation Dataset

Today, recommender systems are an inevitable part of everyone's daily di...
07/19/2021

T-RECS: A Simulation Tool to Study the Societal Impact of Recommender Systems

Simulation has emerged as a popular method to study the long-term societ...
10/13/2020

Assessing the Helpfulness of Review Content for Explaining Recommendations

Despite the maturity already achieved by recommender systems algorithms,...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Recommendation has become increasingly important and changed the way of communication between users and web sites. Recommender systems has a large number of applications in many fields such as economy, education, and scientific research, etc [1, 2, 3, 4]. The rapid development of information technology makes the volume of digital information increase quickly [5, 6]. Researchers search and filter information such as movies, music, or articles from search engines like Google and Bing by using big data analysis techniques [7, 8, 9]. Some researchers share their research findings and publications via digital platforms for free or fee-based access to the Internet for knowledge exchange [10]. The excessive information brings about information overload and makes it difficult for researchers to properly judge the relevance of retrieved items for making the right decision [5, 11]. Recommender systems are introduced in scientific communities to effectively retrieve information [3, 12, 13, 14, 15, 16, 17]. In academic research, recommender systems can provide papers for researchers and helps them quickly find the papers they need. For instance, for junior researchers with limited publishing experience, recommender systems may recommend new articles and classical articles from related areas for them to broaden their horizons and research interests. On the contrary, for senior researchers with stronger publication records, the recommender systems mainly recommend papers that align to their research interests [18].

Recommending similar scientific articles for researchers is called scientific paper recommendation in scientific community. Paper recommender systems aim to help researchers mitigate information overload and find relevant papers by calculating and ranking publication records, and recommend the top papers associated to a researcher’s research interests or research focus [19]. Nowadays, paper recommender systems have become an indispensable tool in the academic field. Its recommendation algorithms are continuously updated. The accuracy of the recommendation is improving over time. Compared to the traditional keyword-based search technique, recommender systems are more personalized and effective for massive amounts of data [10, 12, 20, 21, 22, 23]. The results of keyword-based searching are not always suitable, and the number of items is relatively large [24]. Researchers have to filter the searching results to get the items needed. In the case of different researchers, if they input the same query, they can obtain the same searching results. Because the keyword-based search technique does not consider the users’ different interests and purposes. In addition, some researchers don’t know how to summarize their requirements, resulting in inputting inappropriate keywords. In comparison, paper recommender systems usually consider researchers’ interests, co-author relationship and citation relationship to design the recommendation algorithms and provide the recommendation lists. It should be noted that recommendation results are usually different subjects to researchers’ interests. The number of the results can be short and controllable to ensure that the recommender systems is personalised and effective.

Since the recommender systems are introduced, many recommendation algorithms have emerged [25, 26]. The recommendation techniques can be divided into four main categories: Content-Based Filtering (CBF), Collaborative Filtering (CF), Graph-Based method (GB) and Hybrid recommend method. Each method has its own rationale underlying to recommend interesting articles for researchers [25, 27]. CBF mainly considers the users’ historical preference and personal library to extract and build users’ interest model, which is called user profile [10]. Then CBF extracts keywords from the candidate papers and calculates the similarity of the keywords extracted from user profiles and candidate papers. After ranking the similarity, papers with high similarity will be recommended to users. CF mainly focuses on the actions or ratings on the items of other users whose profiles are similar to the user’s called “neighbour users” [28]

. Users have similar interest in the past, they would probably agree in the future as well. There are many studies about the graph-based method

[29]. Previous researches construct the graph, in which authors and papers are regarded as nodes. The relationship between papers, the relationship between users and the relationship between users and papers are regarded as edges. Then random walk or other algorithms on the graph are used to compute the relevance between users and papers. For the hybrid method, recommender systems usually use content-based filtering and collaborative filtering method to generate recommendation because the two methods have their advantages and disadvantages respectively. The content-based filtering and collaborative filtering methods complement with each other, the recommender systems with their combination is usually more accurate than the system that only runs a single recommender algorithm. Apart from the three methods above, there are some other paper recommendation techniques: latent factor model [30] and the topic regression matrix factorization model [31].

Fig. 1: Main contents of scientific paper recommendation.

The main contributions in this survey include:

  1. Classification of commonly used scientific paper recommendation methods.

  2. In-depth analysis of the evaluation metrics for paper recommender systems.

  3. Summarize problems and challenges in paper recommender systems.

Fig. 1 shows the main structure of this paper, including recommender methods, evaluation metrics, and open issues. In Section II, we discuss the existing recommendation methods and their research statuses such as Content-Base Filtering, Collaborative Filtering, Graph-Based method, and Hybrid method. The evaluation metrics of the recommender systems are introduced in detail in Section III. Section IV summarizes the problems and challenges in the existing paper recommender systems, including cold start, sparsity scalability, privacy, serendipity and unified scholarly data standards. In Section IV, we present a summary of this paper.

Ii Paper Recommendation Methods

In this section, we will overview and discuss the underlying rationale, advantages, disadvantages, and applications of paper recommendation methods.

Ii-a Content-Based Filtering (CBF)

As a traditional recommendation method, CBF’s rationale is simple. The items recommended by CBF method are similar to the items of users’ interest [32]. Matching information between items and users is the key procedure. In paper recommender systems, items are the papers in the digital library and users are the researchers. In CBF method, a researcher’ papers are first collected. Citing the researcher’s papers or some other information can be used to build his profile. There are many ways to build researcher profile according our statistics. For example, researcher’s preferences and interests can be represented by extracting keywords from researcher’s research field. Moreover, paper recommender systems can extract keywords from title, abstract and content of papers to represent these papers. These candidate papers can be retrieved from the digital library. The paper recommender systems then computes the similarity of the keywords between researcher profile and candidate papers, and ranks them. The following candidate papers with high similarity are recommended to the researcher.

According to the rationale underlying, we can find some advantages of CBF. CBF system extracts paper information and compares them. If the paper is related to researcher’s interests, it will be discovered. Furthermore, compared to the keyword-based search engines, CBF usually considers the current researcher’s interest, and does not involve other researchers. If researcher’ interests change, the recommended result lists will change in the future. Fig. 2 shows the general structure of the content-based recommender systems. From Fig. 2 we can see the recommendation progress of the CBF including three main steps: Item Representation, Profile Learning and Recommendation Generation.

Fig. 2: Content-based system for paper recommendation.

Item Representation. In practice, items usually need some special attributes to distinguish each other. These attributes can be divided into two main categories: structured attribute and unstructured attribute. For the structured attribute, the value of attribute is limited and specific. For the unstructured attribute, the value of attribute often means less clear. Because its value is unlimited, which cannot be directly used to analyze. For example, on a dating site, an item is a human being, who has structured attributes such as height, education experience, origin, and unstructured attributes such as a friend’s declaration, blog content. Structured data can be used directly, making them easier to manage and use. Unstructured data (such as articles content), on the other hand, are usually required conversion into structured data before being adopted. In paper recommendation area, the whole structures of the papers are similar, but their contents are unlimited, and each author has his/her own writing style. In order to represent all the papers and compute the similarity between them, we need to translate the contents of papers into structured items. Since paper recommender systems are proposed, there are many item representation methods, such as TF-IDF model [33], keyphrase extraction model [34], language model and so on.

The TF-IDF model (term frequency-inverse document frequency) has been frequently used for information retrieval and text mining [33]. The TF-IDF value is a statistical measure to evaluate the importance of a word to a document in a collection or corpus. The basic idea of the TF-IDF model is divided into two aspects. On one hand, the more times the keyword K appears in document D, the more important K is for document D. On the other hand, the higher frequency of K appears in different documents, the less importance of K is for distinguishing the documents. The equation is defined as follows [18]:

(1)

where is the frequency of keyword in paper , is the paper count in the candidate set, and is frequency of occurrence of keywords .

CBF uses the TF-IDF model to calculate the feature vectors

of each candidate paper [18, 27]. These vectors can determine how relevant a research paper is to researcher’s query [35]. The definition of is:

(2)

where is the number of distinct terms in the paper, and denotes each term, two vectors for each paper are used as different input queries. This model is popular for CBF recommender systems, many researchers have adopted a modified version in their research. Some researchers realize that when we read a paper, we may be curious about the problem appeared in the paper or the solution to the problem. Thus, they use TF-IDF Model, Topic Model and Concept Based Topic Model to compute the similarity and find the most problem-related papers and solution-related papers to users, satisfying researcher’s specific reading purpose separately [36].

Apart from the TF-IDF model, a keyphrase (typically constituted by one to three words) extraction model is used to produce a rich description of content of papers [37]. The keyphrase list is a short list of keywords that reflects the content of a paper, capturing the main discussed topics and providing a brief summary of its content. In this model, the title, abstract and keywords of a paper are represented by different vectors: , , and , respectively [38]. The vector is extracted from the “keyword” section of the paper. If the paper has not the “Keyword” section, the analysis system will regard the most appropriately representative words as the needed keywords [39].

Profile Learning. CBF recommender systems assume that researchers have rated “Like” or “Dislike” on some items and published papers according to individual interests. The objective of this step is to generate the profile model according to researchers’ historical actions. Since researcher profile usually includes researcher’s research direction, systems can determine whether researcher U likes a new item by this model [40].

It is obvious that researcher profile should rely on the information generated by the researcher. Various methods exist for building user profiles. Previous researchers build user profile with a mixture of topics extracted from the researcher past publications by the LDA algorithm. The vectors , , are extracted from the papers of the researcher’s historical actions to build profile. The user profile could be updated if researcher publishes or rates new papers in the future.

The tag-based information system uses a component named User Preference Crawler to crawl the user preference data. The user’s profile is constructed by the papers posted by each individual user and a set of tags posted by the users [33, 41]. Similarly, tags and the set of documents tagged by researchers can be exploited by the key phrase extraction module for building user’s profile [42].

To facilitate personalization of the recommender systems, junior researchers who published a few papers and senior researchers with many publications could be differentiated [18, 27]. For a paper, the feature vector is firstly defined by the TF in TF-IDF model. The definition of is the same as equation (2).

(3)

where is the number of the distinct terms in the paper, and , , is defined as follows:

(4)

where is the frequency of term in paper . After getting the feature vectors of papers, the construction of user profile is divided into the two categories: junior researchers and senior researchers. For junior researchers with only one paper , the construction of user profile will add contribution of the papers cited by . For senior researchers with several published papers in the past, user profile will add contribution of the papers citing and in the reference list of . This method makes both senior and junior researchers’ profile more specific.

All these introduced profile learning methods are relying on researchers’ historical records or actions. In some recommender systems, they regard the papers provided by the researcher as input to build user profile [43, 44]. After the paper is provided, the needed information for the system will be extracted from the paper’s title, introduction, related work, conclusion, references part to determine the user’s profile. In addition, to satisfy user’s specific reading purpose, the abstract is sometimes divided into two parts : problem description and solution description so that the system could recommend papers from two aspects respectively [36].

Moreover, there are some other forms to represent user profile. Docear is a recommender systems which has the unique feature of utilizing mind maps for information management [45]. The users of Docear organize their data in a tree-like data structure, and they build user model from user’s mind map collection to match with its digital library. The Docear recommender systems have a component named UserInterface, which is assigned to contact with users and collect title, author name, domain, topic of the papers. Then the Docear recommender systems collect data to store as XML format for user profile, containing domains, topics and keywords [39, 46, 47].

Recommendation Generation. The representations of candidate papers and the profiles of researchers are constructed to select the most relevant

items to users. The relevance of researchers’ attributes to papers’ attributes can be obtained through similarity measure such as cosine similarity. Given two vectors of attributes

A and B, the cosine similarity can be computed as follows [33]:

(5)

The recommendation of papers uses user profile vectors and feature vectors of the candidate papers , which are defined before to compute cosine similarity of and by using equation 5 [18].

Some previous researches not only provide researchers with the most relevant papers, but also provide serendipitous recommendation with the papers from far away fields [27]. The serendipitous recommendation is helpful for researchers to discover new ideas, approaches or ways of thinking. In serendipitous recommendation researches, researchers construct a basic user profile for each researcher to recommend relevant papers and use to construct a another user profile , then compute cosine similarity between and , and to generate recommendations. The result of this recommendation has two lists: related papers and unrelated papers.

After computing the similarity of user profile and candidate papers, a result list will be generated. The last step of the recommender systems is ranking them in a certain order. The final list top papers will be recommended to researcher. While ranking the candidate papers, the number of papers citing them is sometimes considered [48].

Subsequently, researchers can use this recommender systems to find the paper they are interested in. But there are still some problems in CBF recommender systems. On one hand, CBF does not take the quality such as authoritativeness, style into consideration because its analysis techniques only base on the word analysis. On the other hand, there is the new user problem. If a junior researcher without much research experience uses the system, which perhaps run ineffectively. Because it cannot extract enough information from the user’s work, the recommended list may be not reliable [49].

Ii-B Collaborative Filtering (CF)

Like the recommendation techniques of CBF, CF needs to know users’ interests, which is especially effective for recommending related papers, even without content-based features [50]. The basic idea of CF is that if users and make ratings on some common items, their interests will be considered similar. If there are some items existing in user ’s record but not in user ’s, these items can be recommended to user . In other words, CF is the process of recommending items using the opinions of other users [51]. The ratings or opinions can be obtained from some social reference management website like CiteULike, or by asking users to fill in a questionnaire [52].

The collaborative filtering system locates the peer user by considering his rating history and finding the similar user. Then CF uses the neighbourhood to generate the recommendation. The CF Recommender Systems usually need a user-item matrix to represent the users’ ratings or comments on items. The ratings can be used to represent users’ interests. After constructing the matrix, the system will calculate the similarity between users to find similar users called “neighbour users” to recommend items. A user-item matrix is shown in Table I, the elements in the matrix are the users’ ratings. In this matrix, the rates are 0 and 1, and the rates can use more numbers to express the different degrees of like or dislike. The general structure of collaborative filtering systems is shown in Fig. 3.

Item1 Item2 Item3 Item4 ItemX
User1 0 1 0 1 1
User2 1 1 0 0 0
User3 1 0 1 1 0
UserY 1 0 0 1 1
TABLE I: User-Item Matrix

Fig. 3: Collaborative filtering system for paper recommendation.

Compared to the Content-Based Filtering method, CF has some different advantages: the content of the recommended paper is not considered, because the recommendation method depends on the ratings made by users and does not consider what kinds of items they belong to. Furthermore, the items recommended to users may not be relevant to the user’s current research, because the similarity is measured between the relationships between users.

CF mainly contains the two categories of methods [53]:

  1. User-based approach: Users are the center in the user-based approach. Recommender systems use the profiles of other similar users to recommend [54]. User-based CF finds the nearest neighbours of the users. According to the neighbour’s interests, user’s interests are predicted [54]. Usually, in the user-based systems, users are divided into the several groups, the users in the same group share the same or similar interests on some items. Based on the ratings made by the users in the same group, the recommender systems do recommendation for users.

  2. Item-based approach: Item-based method mainly focus on the relationships between papers rather than users [55, 56]. In the item-based approach, there is the assumption that user’s interest is continuous or very little change in the future. If users have given some positive ratings on some items, the recommender systems could collect the candidate items by relying on the analysis of users’ rating history. Then the recommender systems will recommend the items by clustering the similar items.

According to the users’ different needs, the above-mentioned recommendation techniques can collect necessary data and recommend papers. The metadata from CiteULike can be used to run CF recommendation algorithm, and it contains many users and their unique tags on papers [57]. The recommendation algorithm is classical and simple: in the user-based filtering, the target user is matched with the collected data to find the neighbours who have similar records. Once the neighbours are found, all the papers of the neighbour’s historical preference will be considered as the candidates to recommend to the target users. In the item-based filtering, the system recommends the papers by matching the papers with the target user’s historical records.

For the user-based CF, the similarity between two users is calculated by the ratings of their common items [58]. The equation is as follows:

(6)

where is the ratings, is the target user and is the neighbour user, stands for the ratings given by user to item , is the average rating of user over all his items. shows the common set of items between user and user . The neighbour users’ articles are recommended to the target user by ranking the predicted rating for target user . The social relations are usually added to find the proper neighbours. After finding the nearest neighbours, the next step is to predict the target user ’s rating for item [51]. The predicted formula is as follows:

(7)

For a given user-item matrix, the matrix factorization model plays an important role in the collaborative filtering recommender systems [31]. The matrix factorization model is used to predict the ratings of the candidate papers.

The user-based CF algorithms recommend papers in the social tags system [58]. Researchers summarize the user-based collaborative filtering process as two steps: the first step is to find the neighbours of the target user, the second step is to use the neighbours to rank the items, then recommend top items for the user [58]. To improve the quality of recommended result, the two steps are ameliorated [59]. At the finding neighbours step, similarity is used to obtain the neighbours of target user [60]. At the ranking items step, model is used to calculate the predicted rating. This method improves the original method by considering the number of raters, which is represented as . The new predicted rating is computed by:

(8)

Moreover, scholarly papers are recommended by using the social relationships such as friends, research familiarities [61]. Besides, the user’s profile, group profile and the social relationships between users usually are considered to recommend scholarly papers. For example, a folksonomy based method is used to combine them to recommend, and the method solves the problem that researchers cannot find the relevant scholarly papers in conferences and journals [62].

Similar to the user-based approach, the item-based collaborative filtering includes the two steps: similarity computation and prediction generation [63]. At first step, similarity like cosine similarity, thematic similarity of target items to the set of items rated by target user are used to find the most similar items for the candidate item set. In second step, after getting the most similar items, the prediction would be then computed by a weighted average of target user’s ratings on these similar items.

To guarantee the relevance of the result, an improved item-based collaborative filtering system recommends papers rated by the of target user . The recommended papers are not only similar to the target publication of interest to the target user , but also are popular among the target user ’s connections [28]. In this system, researchers first find the target user’s connections who exchange and share bibliographic references with target user. Then word correlation factors are used to determine candidate papers which are similar to the target paper from the library of . Finally, the system recommends the highest ranking scores to target user .

From the overview of the CF paper recommendation techniques, we can see the CF is a popular recommendation method. But CF still has some disadvantages because of its natural, and the most obvious shortcoming is the cold start problem. For the new items without ratings, it cannot be recommended until there is someone’s rating on it. For the new users with few ratings on any items, his/her rating history is empty, system cannot find a similar neighbourhood until he/she makes enough ratings. To overcome the problems in CF, researchers have thought out some other recommendation techniques, like graph-based method and hybrid method.

Ii-C Graph-Based method (GB)

As the name illustrates, graph-based method mainly focuses the construction of the graph. The graph can be constructed by citation networks, social networks and so on. The researchers and papers are the different nodes of graph. The relationships between researchers, researchers and papers, papers and papers can be considered as the edges between nodes. Then the recommendation system can use an algorithm like random walk on the graph to find the relevant papers for researchers. The advantage of GB is that GB can use information from different sources to recommend. CB, CF just use one or two kinds of information. GB can add social relations, trust relationships between researchers into the recommendation system to make improve the recommendation result.

In the graph-based model, we first need to collect data about researchers and papers. Then the system represents them with a heterogeneous graph , where , stands for the researchers in the system and is the set of papers published or referenced by the researchers. For each tuple , there exists an edge in the graph, and , . There is a simple graph-based model shown in Fig. 4. Moreover, in some graph-based recommender systems, there also exist edges like , which means they consider the relationships between researchers, in addition, they also consider the relationships between papers. In the graph-based model, paper recommendation activity will be translated into the graph search task [64].

Fig. 4: A simple graph-based model.

In Fig. 4, stand for different researchers in the system and represent the papers they have published. The left part is the researcher behaviour data we collect from digital library. The researcher published paper , paper and paper , likely researcher published paper and paper . We use these researcher behaviour data to build the network in right part. After getting the two-part graph of the researchers and papers, the task of recommender systems can be transformed into calculating the relevance between the unconnected user vertices and paper vertices . Many algorithms have been proposed in several papers to recommend relevant papers to researchers [25, 65].

The recommendation progress of graph-based recommender systems can be summarized as the two steps: Graph Construction and Recommendation Generation.

Graph Construction. Nowadays many digital materials are used to read and share for people. For academic research, researchers read and search relevant papers from some digital libraries like IEEE Xplore and CiteULike. Researchers can collect data about users and papers from above-mentioned websites to build graph.

For example, the relationship between a researcher and a paper means that the researcher is interested in that paper. A matrix is used to indicate whether a researcher is interested in the article as shown in equation (8). is a set of researchers . is a set of articles . The common author relationships are also added into the basic graph [12, 66, 25]. For the common author relationships between articles, another matrix is introduced to indicate whether two articles and have common author(s) as shown in equation (9).

(9)
(10)

After getting the two mentioned matrices, they will be transformed into a graph for further processing. Let , where , and . and are the vertices set of researchers and papers, similarly, represents the set of interest relationships between researchers and papers. represents common author relationships. If equals 1, between researcher and papers exists an edge in the graph. similarly, if equals 1, there is an edge between papers and article . A hybrid graph with co-author relationships can be built, which is used to generate recommendations.

Another heterogeneous graph called “Bi-Relational Graph (BG)” can be used to recommend papers [67]. BG is similar to the mentioned graph, it also includes researchers and papers. Additionally, BG contains paper similarity subgraph, researcher similarity subgraph, and a bipartite graph connecting researchers and papers.

The above heterogeneous graphs contain the two kinds of vertices: researchers and papers. In addition, there is another kind of graph: Citation Graph (Network). Citation graph contains papers and the citation relationship between the papers. The nodes represent the different papers in the citation networks, and the edges stand for the citation relationships between papers. The basic idea in the citation graph is that if two papers have common references or they are cited by one paper, they are considered to be similar [68]. Therefore, the recommendation can be given by analyzing the structure of the citation network.

Based on the citation network, a paper can be recommended to user by recommender systems [65, 69]. Let all the papers as to build a citation graph. is a subset of , indicates all the papers cited by . Papers in are related to paper . If a paper in is related to one or more papers in , then paper will be recommended to the user. Based on the similar idea, a method is proposed to recommend papers using citation network and content-based algorithm [70]. In the weighted heterogeneous graph, researchers replace the author part with the key term graph containing the key terms extracted from each paper using the TF-IDF model. The weight of the citation relationship between the pairwise papers is the cosine similarity of two vectors and . The TF-IDF score is the weight of key-term to the paper, and the similarity of two terms is the weight of edges.

Moreover, the co-author relationships between authors can be added into the citation network. This graph is called citation-collaborative network. It has the three different kinds of links representing different relationships: citation relationships, collaborative relationships and author-paper relationships [71].

The main form of graph construction has been introduced above. There are some other kinds of graphs used to generate relevant papers to the researchers or a given paper from the candidate papers, such as concept map, hub-authority graph [29, 72, 73].

Recommendation Generation. The algorithms in the graph-based paper recommender systems usually do not consider the feature of the paper content and the researchers’ profile. The reason is that they are not suitable as the nodes of graph for scholarly recommendation. In the graph, researchers and papers represent the two kinds of nodes. The paper recommendation system takes advantage of the information from the graph’s structure to find the relevant papers.

Random walk with restart algorithm can be used to rank articles [12, 66, 25, 67]. The rationale underlying of traditional random walk is that a random walker is used to traverse a graph from one or a series of vertices with the probability of walking to the neighbour vertices of the current vertex and the probability

of jumping randomly to any vertex in the graph. Each walking gives a probability distribution that indicates the probability that each vertex in the graph is accessed

[74]. This probability distribution is used as the input for the next walk and repeats this process iteratively. When certain preconditions are satisfied, the distribution tends to converge. Random walk with restart method is the improvement on the basis of random walk algorithm. Likely when the walker starts from one node in the graph, it has the probability of moving to the neighbour vertices of current vertex, and the probability of returning to the source vertex. The bipartite network uses the random walk with restart algorithm to compute the papers’ rankings [25].

Moreover, cross-domain recommender systems sometimes use the random walk model. For instance, in a cross-domain recommendation system, they use random walk to find the similar users for the target user [75]. In the study, researchers first use the social relationships to build a network between users. For the target users, the assumption is they tend to accept the recommendation from their friends with similar interests. Therefore, the random walk model is used to get the similar users. Then the systems predict the ratings by the most similar users. Finally, recommendation list is generated. Cross-domain recommender systems aim to build the relationships between the source domain and the target domain, which can alleviate the problems of cold start and sparsity [76], improving the quality of recommendation result.

PaperRank is widely used in the recommender systems to calculate the relevance between the papers in citation network [69]. PaperRank is the extension of PageRank model to evaluate the scientific papers, considering the indirect relationships between papers [77]. The citation analysis in the previous methods is simple: ISI Journal Impact Factor only averages the citation frequency of the published articles and returns a ranking list of journals [78]. The number of the cited papers is used to rank papers according to the number of direct citation relationships [79]. The rationale underlying of PaperRank algorithm is that it uses papers to replace the pages in PageRank [80]. Each individual PageRank value can be computed by the following equation:

(11)

where are the papers in the citation network, is the PageRank value of paper (ie. ranking score of the paper), is the number of the paper ’s reference papers, is the damping coefficient, is the function of whether paper cited paper . if is cited by then equals 1, otherwise equals 0. Using this method, the importance of the individual papers can be expressed.

Using the structure of the graph to recommend papers is a novel method. The GB mainly uses the relationships between the nodes.

Ii-D Hybrid Method (HM)

To improve the accuracy of the recommendation results and obtain the better performance, some scientific paper recommender systems combine the two or more recommendation techniques to recommend the personalized papers to the researchers [81]. The obvious advantage of HM is that HM can use the combination of different recommendation techniques and the information from many sources. In this section, we introduce some hybrid recommendation techniques. Fig. 5 shows a hybrid paper recommender systems using the combination of content-based and the collaborative filtering methods.

Fig. 5: A hybrid paper recommendation system.

Content-based+Collaborative Filtering. Both the content-based recommendation method and the collaborative filtering method have their own advantages and disadvantages. Some prior studies tried to combine the two methods with different forms to make better paper recommendation and overcome their shortcomings such as first-rater and sparsity problem [10, 82, 83].

There is a hybrid recommender systems using the content-based techniques and the collaborative filtering techniques. The content-based techniques build researcher’s profile by capturing previous research interests embodied in their past publications. The collaborative filtering techniques aim to discover the potential citation papers [83, 84]. The process of recommending papers includes three steps. First, researchers need to build the user profile from his/her published papers by using the TF schemes. And they compute feature vectors for each candidate papers by TF-IDF scheme. They find N papers with the highest cosine similarity scores. Second, for these papers, CF algorithm operates on the paper-citation matrix based on an idea that similar papers have similar citations to find the potential papers. Pearson correlation coefficient between citation vectors to the target paper is used to measure the similarity. Papers with highest similarity with target paper will be formed as papers. Finally, the cosine similarity of the content will be computed [10, 85, 86, 87]. By combining the two methods, this system yields superior performance over the classic recommender systems.

Base on the traditional recommendation techniques, some modified algorithms have emerged such as CBF-Separated, CF-CBF Separated and CBF-CF Parallel algorithms [88]. The CBF-Separated algorithm is built upon the pure CBF algorithm. It recommends the related paper lists not only for the target paper itself but also for its references. These recommendation lists are merged into one single list for the researcher. In the CF-CBF Separated algorithm, CF method is first used to generate a list of candidate papers to recommend. CBF then is used to give further recommendations based on the list generated from CF. CBF-CF Parallel algorithm runs both CF and CBF methods in parallel and generates recommendation lists by combining the result lists from the two methods through an ordering function to make sure the right order of the result list. All these hybrid algorithms are proved to be better than the single recommendation technique.

In addition, there are some special hybrid methods such as collaborative filtering with latent factor model, probabilistic topic model [19], spreading activation model [89], EIHI algorithm [90], FP-growth algorithm [91], etc. The performance of these hybrid methods are better than the baseline methods.

The latent factor models are used for the collaborative filtering to recommend papers according to other users’ historical records or interests, which are similar to the target user’s interest. This model is used to recommend known papers [19]. Spread activation model is used in content-based method and user-based collaborative filtering method to find users who have similar interest with the target user [89]. EIHI algorithm is designed to work in the dynamic datasets like the increasing digital library of the published papers [90]. Embedding EIHI into the content-based paper recommendation system can make the results of recommendation up-to-date and personalized. To guarantee the recommended papers’ content and quality, CBF is often used to retrieve all the possible papers in the library. A multi-criteria collaborative filtering is used to find the papers with high quality from the result of CBF [92].

Content-based+Graph-based. The combination of content-based method and graph-based method can perform better than the classic recommendation methods. Because the content-based method can gain the user profile from the content of papers that users are interested in. The graph-based method can use the citation network or the bipartite graph to find more potential candidate papers from the structure of the graph.

The content-based techniques with citation network have the ability to recommend the most relevant papers from the digital library [93]. The bipartite graph includes the two layers: papers’ layer connects papers with citation relationships. The researchers’ layer connects researchers with their social relationships. Specially, to make the recommendation more accuracy, a novel hybrid article recommendation method integrating the social information are proposed [94]. The recommendation method includes the three types of relationships: (1) For researchers and , the basic trust is that researcher and researcher have overlapped in their library. (2) The value of researcher will be increased if the researcher is the author of some papers in researcher ’s library. (3) is that researcher trusts in researcher ’s knowledge in special topic. Candidate papers (CP) are from the structure of the bipartite graph. The recommendation system selects CP from the libraries of the current researchers. While building researchers’ profile, the junior researchers and senior researchers are distinguished. Both the senior and junior researchers’ interests are represented by the feature vectors through the TF-IDF model to analysis the content of the papers. The ranking of the CP will consider the similarity between CP’ feature vectors, the researchers’ profile, the value of trust between the CP’s owner, current researcher, the citation count of the CP, and the reputation of authors.

Apart from being combined together, the recommendation methods can be used separately. The content-based method using TF-IDF model gets the feature vectors from the candidate papers. The similarity is gained by computing the cosine similarity of candidate papers and the papers in the target user’s record. The graph-based method using the classic citation network runs the BP algorithm and other algorithms to obtain the user’s preference and recommend top papers to the user. The hybrid approach uses the result lists from the two mentioned methods and gives them different weight. Let the is the result of the content-based method, is the result of graph-based method, the hybrid result is computed as follows:

(12)

where and represent the weights of the two methods. The combination can solve the over specialization problem and the new item problem of the classic methods.

We can see that HM has many different combinations and it uses many techniques. The aim of HM is to improve the quality of recommendation results by using the pros of different techniques while overcoming the cons. The most important problem of HM is the effective Combination of techniques .

Ii-E Others

Apart from the paper recommendation methods mentioned before, researchers invent some other paper recommendation techniques such as modified latent factor model [30], hash map [95], bibliographic coupling [96], etc. In this section, some novel paper recommendation techniques will be introduced.

As shown in the hybrid recommendation techniques, the latent factor model is used to represent the content of papers. The model uses the user-item matrix, papers’ content (title, abstract), attributes (author, publish year), and social network as input. The model then uses a modified topic modelling involving the content and attributes to represent users and papers. The matrix factorization method is used to predict according to the user vector , the paper vector with the results of topic modelling, and the user-item matrix [97]. The paper recommendation result list is from the papers with the highest predict ratings.

It is a fact that in the research paper recommendation domain, the number of researchers is much less than the number of papers. While building the citation matrix or the user-item matrix, there are many empty elements. To avoid this problem, the non-sparse matrices are used to represent citation graph of papers, and local sensitive hashing (LSH) constructed a representation of citations in a paper [95]. An example of traditional and non-sparse matrix representation of citation network is shown in Table II and Table III

P1 P2 P3 P4
C1 0 1 0 1
C2 1 1 0 0
C3 1 0 1 1
C4 1 0 0 0
C5 0 1 1 0
TABLE II: A matrix represents papers with citations.
P1 P2 P3 P4
C2 C1 C3 C1
C3 C2 C5 C3
C4 C5
TABLE III: A non-sparse matrix of Table  II.

In the Table  II, the columns of the matrix represent the citing papers, and the rows represent the cited papers. The sparsity comes from the fact that the matrix should include all the cited papers. For each cited paper, there is a matrix row, but each citing paper in the matrix only cites a part of the cited papers. The non-sparse matrix is shown in Table III, Table II and Table III represent the same citation relationship: P1 cites C2, C3 and C4; P2 cites C1, C2 and C5 . On each row of the non-sparse matrix, there is a hash function, the similarity depended on these functions.

Moreover, there exists some other techniques applied in scientific paper recommender systems to provide service to researchers. To improve the performance of the CBF method, CBF is used as the pre-processing step [98]

, then Long Short-Term Memory (LSTM) method learns a semantic representation of the candidate papers

[99]. Finally, the top N papers in result list with high content and semantic similarity to input paper. To help junior researchers read more classic papers online, the two principles (download persistence and citation approaching) are proposed to determine whether a paper is a classic paper, which will be recommended to the junior researchers [100]. A Citation Authority Diffusion (CAD) methodology is proposed to identify the key papers [101]. Techniques like Multi-Criteria Decision Aiding [102, 103], Bibliographic Coupling [96], Belief Propagation (BP) [92, 104]

, Deep learning

[24], Canonical Correlations Analysis (CCA) [105]

, Singular Value Decomposition (SVD)

[106] appear in some researches to recommend papers.

Ii-F Comparisons of common techniques

Now we have introduced all the recommendation techniques existing in the papers we collected. There is a comparison table of the common recommendation techniques Content-Based Filtering, Collaborative Filtering and Graph-Based method.

Technique Advantage Disadvantage
Content-Based Filtering Each paper can be discovered to compute similarity Only consider the word relevance quality is uncertain
Results are related to users’ personal preferences New user problem
Collaborative Filtering Recommendation results may be serendipitous Cold start problem
The quality of results can be guaranteed Sparsity problem
Graph-Based method Considers different source to recommend Does not consider papers’ content and users’ interests
TABLE IV: Comparisons of common techniques

Table IV shows the advantages and disadvantages of CBF, CF and GM. Each recommendation technique can overcome the disadvantages of other techniques. CF can overcome the quality problem of recommendation results, but it still has cold start and other disadvantages. To combine the advantages and avoid disadvantages of these techniques, here comes the hybrid method. The hybrid method uses CBF and CF to make the recommendation system more efficient, in addition, CBF and GB are used to recommend papers.

Iii Evaluation Methods

As described in Section , there are so many techniques used in the scientific paper recommender systems. All of them can provide researchers some papers, which are related to the input query or researchers’ profile. The more recommendation techniques are proposed, the more important their evaluation methods are [107, 108]. The type of evaluation metrics depends on the type of recommendation techniques [109]. The result of the evaluation methods determines whether the technique applied in recommendation system is effective. In this section, we will review the evaluation methods in the recommender systems. Some most frequently used metrics are shown in Table V.

Precision Recall NDCG MRR MAP F1
Number 20 20 14 9 5 4
TABLE V: Classification of evaluation methods

From Table V, we can see that and are the most frequently used evaluation methods in the papers we reviewed. Many paper recommender systems used more than one metrics to evaluate their recommendation techniques. Apart from the metrics in Table V, there are some other less used metrics in the reviewed paper, like , and , all of them will be introduced at the end of this section.

: It is used to measure the accuracy of the recommender systems recommending relevant papers to the researchers, the equation is:

A bigger value of this fraction indicates the more accurate recommendation that recommendation system made. To reduce the statistics complexity of all papers in the recommendation result, there is a modified version [106].

: it quantifies the fraction of relevant papers in the whole set of papers that are in the recommendation result list. Its equation is as follows:

the denominator in this equation is fixed because the number of the all relevant papers in the library is fixed. The value of equation depends on the rank algorithm of the recommendation system. The bigger value means that recommendation system has ability to rank the most relevant papers at the top of the result list. Similar to , , modified version is the number of relevant papers in the top m of ranking list.

: it considers that and could contradict each other [25]. From their equations, we can see that when the number of recommended list becomes bigger, then may grow while may drop. considers them together and gives a weighted harmonic average of and :

Due to the fact that and are in the range of , a high value means that the paper recommendation system is more effective.

: it is used to evaluate the quality of a given sorted recommended list [89]. In order to compute the of the th paper in the result, the average will be computed at first:

where is the set of users who participate in this paper recommendation system, is the number of users in , is the number of papers recommended to users, is the position of recommended paper in the recommended list, is a constant value, and represents the “gain” that user gets from paper . Base on the definition of is as follows:

the gain that user gets from recommended papers depends on the quality of recommended papers. If the user thinks that paper is very relevant to his/her research, the gain is high, otherwise the gain is . It is desirable that the most relevant papers appear at the top of the recommended list.

: it is invented to solve the single point value limitation from the three introduced metrics: , and . It would be calculated by averaging over all the average precision (AP) of the recommended result for each user [110]. The definition of is:

where for a user , is the number of relevant papers to , is the whole number of the papers in recommended list, represents the precision of retrieved results from the top result until get to paper [10].

: it gives an average of each user’s value:

where is the whole number of the users involved in this recommendation system.

: similar to , this metric is used to determine the quality of the sorted recommended paper lists. It only concerns about the ranking of the relevant papers in the recommended list and gives an average over all relevant papers. The definition is:

where represents the number of target papers and is the rank of th target paper.

These metrics can effectively evaluate the various paper recommendation algorithms of the recommender systems from different aspects. These metrics are popular with the researchers of the recommender systems. A good recommendation system must get high score on these metrics. Additionally, there are some evaluation metrics which are rarely applied to the system.

: it is to identify the difference between rating values and predicted values generated from recommender systems [55]. The true values in the training/testing set can be computed as follows:

where is the true rating value, is the predicted rating value and is the number of ratings in the test set. The lower the is, the stronger the predictive power of the recommendation system.

: similar to , this metric is used to evaluate the accuracy of prediction made by recommendation algorithms [92], it can be calculate by the following equation:

where is the number of predictions, is the prediction rating of paper and is the true value. The lower the , the more accuracy the recommendation system predicts ratings is.

: because of the nature of recommendation algorithms, there usually exists some users who cannot get useful information from the recommendation system, they cannot get relevant papers from the system. The equation is simple:

where is the number of users who get relevant recommendations and is the number of all the users in the system [110]. Thus, a good recommendation system can be useful for most users not only for a special kind of users in the system.

Iv Open issues and Challenges

In previous sections, we have discussed the recommendation methods and evaluation methods of the scientific paper recommender systems. Although the mentioned paper recommender systems can provide researchers some useful papers by running their own recommendation algorithms, they still have some problems need to be solved and improved. In this section, we discuss some open issues and challenges of the existing paper recommender systems, including Cold Start, Sparsity, Scalability, Privacy, Serendipity and Unified data standards.

Iv-a Cold Start

Cold start problem is an important issue of new papers and new users in recommender systems [111]. On one hand, if recommender systems are based on pure collaborative filtering method, they will suffer challenges from both new papers and new users [112]. For a new user who has no research experience or rarely rates on the papers he/she reads from the digital library, user-based CF cannot find the similar users or neighbours for new user accurately. For a new paper newly published in the digital library, few researchers have read and rated it. The new paper cannot be recognized easily from so many papers and recommended to the right researchers. On the other hand, in the content-based recommender systems, researchers use content analysis to represent all the papers and compute the similarity between papers and user profile, overcoming the new paper problem. But CBF needs to analyze the researchers’ historical records containing the papers that a user expresses interest in. If CBF cannot extract enough useful information to build user profile, the result of recommender systems is not reliable.

Iv-B Sparsity

In most recommender systems, there is an assumption that the number of users is bigger than the number of papers or equivalent to the number of papers in digital library. The recommendation algorithms can run effectively. However, the fact is that the number of users is less than the papers, and even the most popular papers may have a few ratings. While building the user-item rating matrix in the collaborative filtering method, researchers find that rating matrix is very sparse, there are too few ratings and too few correlations between users [113]. If most of the papers have few ratings and each user only rates on a few papers, it is hard to find the similar neighbours for users. It is one of the most obvious disadvantages of collaborative filtering based recommender systems.

Iv-C Scalability

The definition of scalability in recommender systems is whether the system has the ability to work effectively in numerous environments where there are so many users and products. Nowadays the datasets of the digital library are very large, and the states of papers in it are changing with time [111]. There are many papers and users added into dataset every day. It is challenging for recommender systems to deal with these large and dynamic datasets. Traditional recommendation methods like CBF and CF usually dealt with the static dataset, new learning algorithms like EIHI can handle the dynamic datasets [90]. It is desirable that each recommender systems can overcome the scalability problem.

Iv-D Privacy

Paper recommender systems aim to provide the personalized paper recommendation to the users by taking advantage of the users’ personal information. With the recommendation system widely used in academic area to solve the information overload problem [114], most personalized recommender systems collected as much users’ information as possible. Because the information collected by the system usually includes sensitive information that users wish to keep private, users may have a negative impression if the system knows too much about them [111]. It is an important topic that how to improve the recommendation algorithm by using the limited data fully, carefully and meticulously. To resolve this problem, some secure recommender systems are proposed to protect users’ private information [40, 114].

Iv-E Serendipity

The traditional paper recommender systems usually provide users with the papers relevant to his/her interests or researches [83]. In fact, the irrelative papers perhaps have some advantages for users. For example, junior researchers need to read various kinds of papers to broaden their research range and find the most interesting one. Senior researchers need to find new knowledge from other areas to enrich their own studies [27]. The serendipitous recommendation for users sometimes can be useful, but if the result of the recommendation system only has serendipitous papers and does not have related papers, user may think the system is not reliable. Collaborative filtering method based system has the ability to provide serendipitous results because the recommendation algorithm does not consider the content of the paper only use the “neighbours” to recommend items.

Iv-F Unified Scholarly Data Standards

Part of big scholarly data comes from different academic platforms such as Google Scholar, Web of Science, and Digital Bibliography & Library Project (DBLP). The other part comes from online data sets such as Microsoft Academic Maps and American Physical Society (APS). These data have their own characters. For example, the DBLP data set does not contain citation relationship, and the APS data set provides a list of citation relationship between the papers. These different data types bring a huge challenge to the construction of the paper recommender systems. In the paper recommendation systems, unifying big scholarly data standards is a challenging task.

V Conclusion

Recommender systems play an important role in information retrieval and filtering. This paper gives a survey of scientific paper recommendation systems for academic area. First, we classify the scientific paper recommender systems into four groups according to their recommendation techniques: content-based filtering, collaborative filtering, graph-based method and Hybrid method. According to our analysis, we find the content-based and hybrid methods are the most often used techniques in paper recommender systems. For each technique, we investigate the underlying rationale, advantages, disadvantages and applications. Second, the evaluation metrics are introduced to evaluate the performance of paper recommender systems: Precision, Recall, F-measure, NDCG, MAP, MRR, MAE and UCOV. Finally, this paper discusses the open issues and challenges that need to be solved in the future, including cold start, sparsity, scalability, privacy, serendipity, and unified scholarly data standards.

References

  • [1] X. Kong, M. Mao, W. Wang, J. Liu, and B. Xu, “Voprec: Vector representation learning of papers with text information and structural identity for recommendation,” IEEE Transactions on Emerging Topics in Computing, vol. PP, no. 99, pp. 1–12, 2018.
  • [2] S. Yu, J. Liu, Z. Yang, Z. Chen, H. Jiang, A. Tolba, and F. Xia, “Pave: Personalized academic venue recommendation exploiting co-publication networks,” Journal of Network & Computer Applications, vol. 104, pp. 38–47, 2018.
  • [3] A. J. C. Trappey, C. V. Trappey, C. Y. Wu, C. Y. Fan, and Y. L. Lin, “Intelligent patent recommendation system for innovative design collaboration,” Journal of Network & Computer Applications, vol. 36, no. 6, pp. 1441–1450, 2013.
  • [4] J. Son and S. B. Kim, “Academic paper recommender system using multilevel simultaneous citation networks,” Decision Support Systems, vol. 105, pp. 24–33, 2018.
  • [5] F. Xia, W. Wang, T. M. Bekele, and H. Liu, “Big scholarly data: A survey,” IEEE Transactions on Big Data, vol. 3, no. 1, pp. 18–35, 2017.
  • [6] Q. Liu, M. Zhou, and X. Zhao, “Understanding news 2.0: A framework for explaining the number of comments from readers on online news,” Information & Management, vol. 52, no. 7, pp. 764–776, 2015.
  • [7] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, and A. Bouras, “A survey of clustering algorithms for big data: Taxonomy and empirical analysis,” IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp. 267–279, 2014.
  • [8] F. Xia, J. Wang, X. Kong, Z. Wang, J. Li, and C. Liu, “Exploring human mobility patterns in urban scenarios: A trajectory data perspective,” IEEE Communications Magazine, vol. 56, no. 3, pp. 142–149, 2018.
  • [9]

    J. Liu, X. Kong, F. Xia, X. Bai, L. Wang, Q. Qing, and I. Lee, “Artificial intelligence in the 21st century,”

    IEEE Access, vol. 6, no. 99, pp. 34 403–34 421, 2018.
  • [10] J. Sun, J. Ma, Z. Liu, and Y. Miao, “Leveraging content and connections for scientific article recommendation in social computing contexts,” The Computer Journal, vol. 57, no. 9, pp. 1331–1342, 2014.
  • [11] S. J. Miah, H. Q. Vu, J. Gammack, and M. Mcgrath, “A big data analytics method for tourist behaviour analysis,” Information & Management, vol. 54, no. 6, pp. 771–785, 2016.
  • [12] H. Liu, Z. Yang, I. Lee, Z. Xu, S. Yu, and F. Xia, “Car: Incorporating filtered citation relations for scientific article recommendation,” in Proceedings of 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity).   IEEE, 2015, pp. 513–518.
  • [13] Y. Wang, G. Yin, Z. Cai, Y. Dong, and H. Dong, “A trust-based probabilistic recommendation model for social networks,” Journal of Network & Computer Applications, vol. 55, pp. 59–67, 2015.
  • [14] F. Aznoli and N. J. Navimipour, “Cloud services recommendation: Reviewing the recent advances and suggesting the future research directions,” Journal of Network & Computer Applications, vol. 77, pp. 73–86, 2016.
  • [15] L. Zhu, C. Xu, J. Guan, and H. Zhang, “Sem-ppa: A semantical pattern and preference-aware service mining method for personalized point of interest recommendation,” Journal of Network & Computer Applications, vol. 82, pp. 35–46, 2017.
  • [16] J. Bollen, M. L. Nelson, G. Geisler, and R. Araujo, “Usage derived recommendations for a video digital library,” Journal of Network & Computer Applications, vol. 30, no. 3, pp. 1059–1083, 2007.
  • [17] T. S. Chua, T. S. Chua, T. S. Chua, T. S. Chua, T. S. Chua, T. S. Chua, and T. S. Chua, “Cross-platform app recommendation by jointly modeling ratings and texts,” Acm Transactions on Information Systems, vol. 35, no. 4, p. 37, 2017.
  • [18] K. Sugiyama and M.-Y. Kan, “Scholarly paper recommendation via user’s recent research interests,” in Proceedings of the 10th annual joint conference on Digital libraries.   ACM, 2010, pp. 29–38.
  • [19] C. Wang and D. M. Blei, “Collaborative topic modeling for recommending scientific articles,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011, pp. 448–456.
  • [20] H. Feng, J. Tian, H. J. Wang, and M. Li, “Personalized recommendations based on time-weighted overlapping community detection,” Information & Management, vol. 52, no. 7, pp. 789–800, 2015.
  • [21] J. He, H. Liu, and H. Xiong, “Socotraveler : Travel-package recommendations leveraging social influence of different relationship types,” Information & Management, vol. 53, no. 8, pp. 934–950, 2016.
  • [22] T. Dai, T. Gao, L. Zhu, X. Cai, and S. Pan, “Low-rank and sparse matrix factorization for scientific paper recommendation in heterogeneous network,” IEEE Access, vol. 6, pp. 59 015–59 030, 2018.
  • [23] R. Sharma, D. Gopalani, and Y. Meena, “Concept-based approach for research paper recommendation,” in

    International Conference on Pattern Recognition and Machine Intelligence

    .   Springer, 2017, pp. 687–692.
  • [24] H. A. M. Hassan, “Personalized research paper recommendation using deep learning,” in Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization.   ACM, 2017, pp. 327–330.
  • [25] F. Xia, H. Liu, I. Lee, and L. Cao, “Scientific article recommendation: Exploiting common author relations and historical preferences,” IEEE Transactions on Big Data, vol. 2, no. 2, pp. 101–112, 2016.
  • [26] T. Song, C. Yi, and J. Huang, “Whose recommendations do you follow? an investigation of tie strength, shopping stage, and deal scarcity,” Information & Management, vol. 54, no. 8, pp. 1072–1083, 2017.
  • [27] K. Sugiyama and M.-Y. Kan, “Serendipitous recommendation for scholarly papers considering relations among researchers,” in Proceedings Of The 11th Annual International ACM/IEEE Joint Conference on Digital Libraries.   ACM, 2011, pp. 307–310.
  • [28] M. S. Pera and Y.-K. Ng, “Exploiting the wisdom of social connections to make personalized recommendations on scholarly articles,” Journal of Intelligent Information Systems, vol. 42, no. 3, pp. 371–391, 2014.
  • [29] W. Zhao, R. Wu, and H. Liu, “Paper recommendation based on the knowledge gap between a researcher’s background knowledge and research target,” Information Processing & Management, vol. 52, no. 5, pp. 976–988, 2016.
  • [30] W. Zhang, J. Wang, and W. Feng, “Combining latent factor model with location features for event-based group recommendation,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 910–918.
  • [31] Y. Li, M. Yang, and Z. M. Zhang, “Scientific articles recommendation,” in Proceedings of the 22nd ACM International Conference on Information & Knowledge Management.   ACM, 2013, pp. 1147–1156.
  • [32] M. J. Pazzani and D. Billsus, “Content-based recommendation systems,” in The Adaptive Web.   Springer, 2007, pp. 325–341.
  • [33] P. Jomsri, S. Sanguansintukul, and W. Choochaiwattana, “A framework for tag-based research paper recommender system: an ir approach,” in Proceedings of 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops (WAINA).   IEEE, 2010, pp. 103–108.
  • [34] C. Caragea, F. A. Bulgarov, A. Godea, and S. D. Gollapalli, “Citation-enhanced keyphrase extraction from research papers: A supervised approach,” in

    Conference on Empirical Methods in Natural Language Processing

    , 2014, pp. 1435–1446.
  • [35] S. Philip, P. Shola, and A. Ovye, “Application of content-based approach in research paper recommendation system for a digital library,” International Journal of Advanced Computer Science and Applications, vol. 5, no. 10, pp. 37–40, 2014.
  • [36] Y. Jiang, A. Jia, Y. Feng, and D. Zhao, “Recommending academic papers via users’ reading purposes,” in Proceedings of the sixth ACM Conference on Recommender Systems.   ACM, 2012, pp. 241–244.
  • [37] J. Beel, s. Langer, G. Bela, and N. Andreas, “The architecture and datasets of docear’s research paper recommender system,” D-Lib Magazine, vol. 20, no. 1, pp. 11–12, 2014.
  • [38] C. Basu, H. Hirsh, W. W. Cohen, and C. Nevill-Manning, “Technical paper recommendation: A study in combining multiple information sources,” Journal of Artificial Intelligence Research, vol. 14, no. 1, pp. 231–252, 2012.
  • [39] K. Hong, H. Jeon, and C. Jeon, “Userprofile-based personalized research paper recommendation system,” in Proceedings of the 8th International Conference on Computing and Networking Technology (ICCNT).   IEEE, 2012, pp. 134–138.
  • [40] T. Chen, W.-L. Han, H.-D. Wang, Y.-X. Zhou, B. Xu, and B.-Y. Zang, “Content recommendation system based on private dynamic user profile,” in

    Proceedigns of 2007 International Conference on Machine Learning and Cybernetics

    , vol. 4.   IEEE, 2007, pp. 2112–2118.
  • [41] J. Gautam and E. Kumar, “An improved framework for tag-based academic information sharing and recommendation system,” in Proceedings of the World Congress on Engineering, vol. 2.   IEEE, 2012, pp. 1–6.
  • [42] F. Ferrara, N. Pudota, and C. Tasso, “A keyphrase-based paper recommender system,” in Italian Research Conference on Digital Libraries.   Springer, 2011, pp. 14–25.
  • [43] C. Nascimento, A. H. Laender, A. S. da Silva, and M. A. Gonçalves, “A source independent framework for research paper recommendation,” in Proceedings of the 11th annual International ACM/IEEE Joint Conference on Digital Libraries.   ACM, 2011, pp. 297–306.
  • [44] D. Hanyurwimfura, L. Bo, V. Havyarimana, D. Njagi, and F. Kagorora, “An effective academic research papers recommendation for non-profiled users,” International Journal of Hybrid Information Technology, vol. 8, no. 3, pp. 255–272, 2015.
  • [45] J. Beel, S. Langer, M. Genzmehr, and A. Nürnberger, “Introducing docear’s research paper recommender system,” in Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries.   ACM, 2013, pp. 459–460.
  • [46]

    K. Hong, H. Jeon, and C. Jeon, “Personalized research paper recommendation system using keyword extraction based on userprofile,”

    Journal of Convergence Information Technology, vol. 8, no. 16, pp. 106–116, 2013.
  • [47] S. Patil and P. Ansari, “User profile based personalized research paper recommendation system using top-K query,” International Journal of Emerging Technology and Advanced Engineering, vol. 5, pp. 209–213, 2015.
  • [48] M. S. Pera and Y.-K. Ng, “A personalized recommendation system on scholarly publications,” in Proceedings of the 20th ACM international Conference on Information and Knowledge Management.   ACM, 2011, pp. 2133–2136.
  • [49] M. Balabanović and Y. Shoham, “Fab: content-based, collaborative recommendation,” Communications of the ACM, vol. 40, no. 3, pp. 66–72, 1997.
  • [50] A. Vellino, “Recommending research articles using citation data,” Library Hi Tech, vol. 33, no. 4, pp. 597–609, 2015.
  • [51] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen, “Collaborative filtering recommender systems,” in The adaptive web.   Springer, 2007, pp. 291–324.
  • [52] T. Y. Tang and G. Mccalla, “A multidimensional paper recommender: Experiments and evaluations,” IEEE Internet Computing, vol. 13, no. 4, pp. 34–41, 2009.
  • [53] M. K. C. Rana, “Survey paper on recommendation system,” International Journal of Computer Science and Information Technologies, vol. 3, pp. 3460–3462, 2012.
  • [54] L. Xu, C. Jiang, Y. Chen, Y. Ren, and K. J. R. Liu, “User participation in collaborative filtering-based recommendation systems: A game theoretic approach,” IEEE Transactions on Cybernetics, vol. PP, no. 99, pp. 1–14, 2018.
  • [55] J. Beel, a. GippStefan, and L. Breitinger, “paper recommender systems: a literature survey,” International Journal on Digital Libraries, vol. 17, no. 4, pp. 305–338, 2016.
  • [56] D. Valcarce, J. Parapar, and l. Barreiro, “Item-based relevance modelling of recommendations for getting rid of long tail products,” Knowledge-Based Systems, vol. 103, no. 1, pp. 41–51, 2016.
  • [57] T. Bogers and A. Van den Bosch, “Recommending scientific articles using citeulike,” in Proceedings of the 2008 ACM Conference on Recommender Systems.   ACM, 2008, pp. 287–290.
  • [58] D. Parrasantander and P. Brusilovsky, “Improving collaborative filtering in social tagging systems for the recommendation of scientific articles,” in Ieee/wic/acm International Conference on Web Intelligence and Intelligent Agent Technology, 2010, pp. 136–142.
  • [59] G. Mishra, “Optimised research paper recommender system using social tagging,” International Journal of Engineering Research & Applications, pp. 1503–1507, 2014.
  • [60] R. R. Larson, “Introduction to information retrieval,” Journal of the American Society for Information Science and Technology, vol. 61, no. 4, pp. 852–853, 2010.
  • [61] N. Y. Asabere, F. Xia, Q. Meng, F. Li, and H. Liu, “Scholarly paper recommendation based on social awareness and folksonomy,” International Journal of Parallel, Emergent and Distributed Systems, vol. 30, no. 3, pp. 211–232, 2015.
  • [62] F. Xia, N. Y. Asabere, H. Liu, N. Deonauth, and F. Li, “Folksonomy based socially-aware recommendation of scholarly papers for conference participants,” in Proceedings of the 23rd International Conference on World Wide Web.   ACM, 2014, pp. 781–786.
  • [63] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collaborative filtering recommendation algorithms,” in Proceedings of the 10th international Conference on World Wide Web.   ACM, 2001, pp. 285–295.
  • [64] Z. Huang, W. Chung, T.-H. Ong, and H. Chen, “A graph-based recommender system for digital library,” in Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries.   ACM, 2002, pp. 65–73.
  • [65] Q. Zhou, X. Chen, and C. Chen, “Authoritative scholarly paper recommendation based on paper communities,” in Proceedings of 2014 IEEE 17th International Conference on Computational Science and Engineering (CSE).   IEEE, 2014, pp. 1536–1540.
  • [66] R. Prabhu, D. K., J. D., and M. Priya, “A survey on enhancing scientific article recommendation using citation relationships and online graph regularized graph learning,” International Journal for Scientific Research & Development, vol. 4, no. 12, pp. 621–625, 2017.
  • [67] G. Tian and L. Jing, “Recommending scientific articles using bi-relational graph-based iterative rwr,” in Proceedings of the 7th ACM Conference on Recommender Systems.   ACM, 2013, pp. 399–402.
  • [68] H. Liu, X. Kong, X. Bai, W. Wang, T. M. Bekele, and F. Xia, “Context-based collaborative filtering for citation recommendation,” IEEE Access, vol. 3, pp. 1695–1703, 2015.
  • [69] M. Gori and A. Pucci, “Research paper recommender systems: A random-walk based approach,” in IEEE/WIC/ACM International Conference on Web Intelligence.   IEEE, 2006, pp. 778–781.
  • [70] L. Steinert, I.-A. Chounta, and H. U. Hoppe, “Where to begin? using network analytics for the recommendation of scientific papers,” in CYTED-RITOS International Workshop on Groupware.   Springer, 2015, pp. 124–139.
  • [71] Q. Wang, W. Li, X. Zhang, and S. Lu, “Academic paper recommendation based on community detection in citation-collaboration networks,” in Asia-Pacific Web Conference.   Springer, 2016, pp. 124–136.
  • [72] I. C. Paraschiv, M. Dascalu, P. Dessus, S. Trausan-Matu, and D. S. McNamara, “A paper recommendation system with readerbench: the graphical visualization of semantically related papers and concepts,” in State-of-the-Art and Future Directions of Smart Learning.   Springer, 2016, pp. 445–451.
  • [73] M. Ohta, A. Takasu, and T. Hachiki, “Related paper recommendation to support online-browsing of research papers,” in Proceedings of the Fourth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT).   IEEE, 2011, pp. 130–136.
  • [74] F. Fouss, A. Pirotte, J.-M. Renders, and M. Saerens, “Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 3, pp. 355–369, 2007.
  • [75] X. Zhenzhen, H. Jiang, X. Kong, J. Kang, W. Wang, and F. Xia, “Cross-domain item recommendation based on user similarity,” Computer Science and Information Systems, vol. 13, no. 2, pp. 359–373, 2016.
  • [76] J. Niu, L. Wang, X. Liu, and S. Yu, “Fuir: Fusing user and item information to deal with data sparsity by using side information in recommendation systems,” Journal of Network & Computer Applications, vol. 70, pp. 41–50, 2016.
  • [77] M. Du, F. Bai, and Y. Liu, “Paperrank: A ranking model for scientific publications,” in proceedings of 2009 WRI World Congress on Computer Science and Information Engineering, vol. 4.   IEEE, 2009, pp. 277–281.
  • [78] E. Garfield, “Citation analysis as a tool in journal evaluation,” Science, vol. 178, no. 4060, pp. 471–479, 1972.
  • [79] ——, “New international professional society signals the maturing of scientometrics and informetrics,” Scientist, vol. 9, no. 16, pp. 11–11, 1995.
  • [80] T. H. Haveliwala, “Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 4, pp. 784–796, 2003.
  • [81] A. Tsolakidis, E. Triperina, C. Sgouropoulou, and N. Christidis, “Research publication recommendation system based on a hybrid approach,” in Proceedings of the 20th Pan-Hellenic Conference on Informatics.   ACM, 2016, pp. 78–83.
  • [82] P. Winoto, T. Y. Tang, and G. I. McCalla, “Contexts in a paper recommendation system with collaborative filtering,” The International Review of Research in Open and Distributed Learning, vol. 13, no. 5, pp. 56–75, 2012.
  • [83] K. Sugiyama and M.-Y. Kan, “Exploiting potential citation papers in scholarly paper recommendation,” in Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries.   ACM, 2013, pp. 153–162.
  • [84] J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl, “An algorithmic framework for performing collaborative filtering,” in Proceedings of the 22nd annual international ACM SIGIR conference on Research and Development in Information Retrieval.   ACM, 1999, pp. 230–237.
  • [85] M. Zhang, W. Wang, and X. Li, “A paper recommender for scientific literatures based on semantic concept similarity,” in International Conference on Asian Digital Libraries.   Springer, 2008, pp. 359–362.
  • [86] B. Gipp, J. Beel, and C. Hentschel, “Scienstein: A research paper recommender system,” in Proceedings of the International Conference on Emerging Trends in Computing (ICETIC), 2009, pp. 309–315.
  • [87] M. Amami, R. Faiz, F. Stella, and G. Pasi, “A graph based approach to scientific paper recommendation,” in Proceedings of the International Conference on Web Intelligence.   ACM, 2017, pp. 777–782.
  • [88] B. A. Hammou, A. A. Lahcen, and S. Mouline, “Apra: An approximate parallel recommendation algorithm for big data,” Knowledge-Based Systems, vol. 157, no. 1, pp. 10–19, 2018.
  • [89] Z. Zhang and L. Li, “A research paper recommender system based on spreading activation model,” in Proceedings of 2010 2nd International Conference on Information Science and Engineering (ICISE).   IEEE, 2010, pp. 928–931.
  • [90] M. Dhanda and V. Verma, “Recommender system for academic literature with incremental dataset,” Procedia Computer Science, vol. 89, pp. 483–491, 2016.
  • [91] T. Igbe and B. Ojokoh, “Incorporating user preferences into scholarly publications recommendation,” Intelligent Information Management, vol. 8, no. 02, pp. 27–40, 2016.
  • [92] A. Naak, H. Hage, and E. Aimeur, “A multi-criteria collaborative filtering approach for research paper recommendation in papyres,” in International Conference on E-Technologies.   Springer, 2009, pp. 25–39.
  • [93] J. Beel, A. Aizawa, C. Breitinger, and B. Gipp, “Mr. dlib: Recommendations-as-a-service (raas) for academia,” in Proceedings of 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).   IEEE, 2017, pp. 1–2.
  • [94] G. Wang, X. R. He, and C. I. Ishuga, “Har-si: A novel hybrid article recommendation approach integrating with social information in scientific social network,” Knowledge-Based Systems, vol. 148, no. 15, pp. 85–99, 2018.
  • [95] A. R. Honarvar and S. Keshavarz, “A parallel paper recommender system in big data scholarly,” in International Conference on Electrical Engineering and Computer, 2015, pp. 1–8.
  • [96] R. Habib and M. T. Afzal, “Paper recommendation using citation proximity in bibliographic coupling,” Turkish Journal of Electrical Engineering & Computer Sciences, vol. 25, no. 4, pp. 2708–2718, 2017.
  • [97] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, 2009.
  • [98]

    K. M. Ravi, J. Mori, and I. Sakata, “Cross-domain academic paper recommendation by semantic linkage approach using text analysis and recurrent neural networks,” in

    Proceedings of 2017 Portland International Conference on Management of Engineering and Technology (PICMET).   IEEE, 2017, pp. 1–10.
  • [99] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  • [100] Y. Wang, E. Zhai, J. Hu, and Z. Chen, “Claper: Recommend classical papers to beginners,” in Proceedings of 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), vol. 6.   IEEE, 2010, pp. 2777–2781.
  • [101] C.-H. Chen, S. D. Mayanglambam, F.-Y. Hsu, C.-Y. Lu, H.-M. Lee, and J.-M. Ho, “Novelty paper recommendation using citation authority diffusion,” in Proceedings of 2011 International Conference on Technologies and Applications of Artificial Intelligence (TAAI).   IEEE, 2011, pp. 126–131.
  • [102] N. F. Matsatsinis, K. Lakiotaki, and P. Delia, “A system based on multiple criteria analysis for scientific paper recommendation,” in Proceedings of the 11th Panhellenic Conference on Informatics.   Springer, 2007, pp. 135–149.
  • [103] N. Manouselis and C. Costopoulou, “Analysis and classification of multi-criteria recommender systems,” World Wide Web, vol. 10, no. 4, pp. 415–441, 2007.
  • [104] J. Ha, S.-H. Kwon, S.-W. Kim, and D. Lee, “Recommendation of newly published research papers using belief propagation,” in Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems.   ACM, 2014, pp. 77–81.
  • [105]

    S. Gupta and V. Varma, “Scientific article recommendation by using distributed representations of text and graph,” in

    Proceedings of the 26th International Conference on World Wide Web Companion, 2017, pp. 1267–1268.
  • [106] J. Ha, S.-H. Kwon, and S.-W. Kim, “On recommending newly published academic papers,” in Proceedings of the 26th ACM Conference on Hypertext & Social Media.   ACM, 2015, pp. 329–330.
  • [107] J. Beel, M. Genzmehr, S. Langer, A. Nürnberger, and B. Gipp, “A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation,” in Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation.   ACM, 2013, pp. 7–14.
  • [108] J. Beel and S. Langer, A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems.   Springer International Publishing, 2015.
  • [109] F. Isinkaye, Y. Folajimi, and B. Ojokoh, “Recommendation systems: Principles, methods and evaluation,” Egyptian Informatics Journal, vol. 16, no. 3, pp. 261–273, 2015.
  • [110] D. Parra-Santander and P. Brusilovsky, “Improving collaborative filtering in social tagging systems for the recommendation of scientific articles,” in Proceedings of 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 1.   IEEE, 2010, pp. 136–142.
  • [111] B. Kumar and N. Sharma, “Approaches, issues and challenges in recommender systems: A systematic review,” Indian Journal of Science and Technology, vol. 9, no. 47, pp. 1–12, 2016.
  • [112] A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock, “Methods and metrics for cold-start recommendations,” in Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.   ACM, 2002, pp. 253–260.
  • [113] X. Luo, M. Zhou, S. Li, Y. Xia, Z. You, Q. Zhu, and H. Leung, “An efficient second-order approach to factorize sparse matrices in recommender systems,” IEEE Transactions on Industrial Informatics, vol. 11, no. 4, pp. 946–956, 2017.
  • [114] S. Lam, D. Frankowski, and J. Riedl, “Do you trust your recommendations? An exploration of security and privacy issues in recommender systems,” Emerging Trends in Information and Communication Security, pp. 14–29, 2006.