Scientific Article Recommendation: Exploiting Common Author Relations and Historical Preferences

by   Feng Xia, et al.

Scientific article recommender systems are playing an increasingly important role for researchers in retrieving scientific articles of interest in the coming era of big scholarly data. Most existing studies have designed unified methods for all target researchers and hence the same algorithms are run to generate recommendations for all researchers no matter which situations they are in. However, different researchers may have their own features and there might be corresponding methods for them resulting in better recommendations. In this paper, we propose a novel recommendation method which incorporates information on common author relations between articles (i.e., two articles with the same author(s)). The rationale underlying our method is that researchers often search articles published by the same author(s). Since not all researchers have such author-based search patterns, we present two features, which are defined based on information about pairwise articles with common author relations and frequently appeared authors, to determine target researchers for recommendation. Extensive experiments we performed on a real-world dataset demonstrate that the defined features are effective to determine relevant target researchers and the proposed method generates more accurate recommendations for relevant researchers when compared to a Baseline method.



There are no comments yet.


page 2

page 5

page 7

page 8

page 9

page 10

page 11

page 13


Scientific Paper Recommendation: A Survey

Globally, recommendation services have become important due to the fact ...

Bursting Scientific Filter Bubbles: Boosting Innovation via Novel Author Discovery

Isolated silos of scientific research and the growing challenge of infor...

Plan S. Pardon impossible to execute

The Plan S initiative is expected to radically change the market of scho...

A Recommendation System of Grants to Acquire External Funds

The recommendation system of the competitive grants to university resear...

Recommending Researchers in Machine Learning based on Author-Topic Model

The aim of this paper is to uncover the researchers in machine learning ...

Implementing Recommendation Algorithms in a Large-Scale Biomedical Science Knowledge Base

The number of biomedical research articles published has doubled in the ...

EILEEN: A recommendation system for scientific publications and grants

Finding relevant scientific articles is crucial for advancing knowledge....
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the rapid emergence of big scholarly data, tremendous growth of knowledge is now largely captured in digital form and archived all over the world. Archival materials are also currently being digitized and provided online to people for free or by paying a fee. Such situation creates the commonly known information overload problem especially in academia while bringing a significant advantage that allows people to easily access more knowledge. For example, a researcher in academia needs to find articles of interest to read for generating a research idea or citing an article related to the article he is writing, an author needs to submit his manuscript to a certain journal of which the topic is relevant to the manuscript, an editor needs to assign a manuscript to a reviewer who is an expert in the domain which the manuscript belongs to, or a researcher in a domain needs to collaborative with another researcher in another domain. These academic activities involve in an overwhelming number of articles, journals, reviewers, and researchers. Therefore, it is quite difficult for researchers to locate relevant articles, journals, reviewers, and researchers for the aforementioned purposes.

Academic recommender systems aim to solve the information overload problem in big scholarly data such as finding relevant research paper, relevant publication venue, etc. Fig. 1 shows the corresponding recommendation tasks in above-mentioned scenarios, including (i) article recommendation [6, 36, 32, 17] for suggesting relevant articles to a researcher or an article for the purposes of reading or citation, (ii) reviewer recommendation [15, 38] for assigning a manuscript to the most appropriate reviewers (e.g., an expert in the same domain), (iii) venue recommendation [45, 20] for suggesting a topic-relevant conference or journal to publish a new article, and (iv) collaboration recommendation [35, 44] for suggesting new partners to execute joint research (e.g., exploring cross-domain solution). There exist some interesting studies on these recommendation tasks. Gori and Pucci [6] built a citation relation graph and employed a random walk algorithm to compute ranking scores of each possible citation. Tayal et al. [38] assigned relevant weights to various factors which affect the expertise of the reviewer to create a fuzzy set and then compute the expertise. Yang and Davison [45] extracted features related to writing-style information for computing similarity between articles and then applied traditional collaborative filtering to recommend a venue for submission. Xia et al. [44] considered three academic factors (i.e., co-author order, collaboration time, and number of collaboration) to define link importance, and then employed a random walk algorithm to compute rankings of potential collaborators.

In this paper, we focus on article-researcher recommendation, i.e., studying how to find articles of interest for target researchers in the context of big scholarly data. In the print age, researchers found articles of interest with the help of library catalogs. In recent years, web search tools employed by scientific digital libraries like IEEE Xplore, and literature search engines like Google Scholar, can retrieve a list of relevant articles in diverse technological fields using keyword-based queries. However, these search tools have several drawbacks as follows: (i) It is not enough to describe searchers’ needs depending on only several limited keywords; (ii) The obtained results are the same for all searchers if only the keywords are the same; (iii) It is not feasible to search articles when a searcher has no ideas of what they are looking for. Article-researcher recommender systems aim to automatically suggest personalized articles of potential interest for different targets, thereby overcoming the problems stated above.

Existing studies [30, 32] generally compute the content similarity between articles to find articles which are similar to the target’s articles of interest, or compute the similarity between the target’s profile and an new article’s content to find matches. However, content extraction is not such simple because an article includes too many words. In this paper, we extract only author information to build relations between articles, i.e., common author relations. Then, these relations and researchers’ historical preferences are used together to build a heterogenous graph for article ranking. The rationale of incorporating common author relations is that, the continuous development of internet technology enables researchers to easily build personal websites and share publications with others, which makes it more convenient to search articles published by the same author(s) for researchers who have such a search preference based on authors (we call it author-based search pattern). In addition, most studies ignore the fact that there exist different recommendation methods suitable for different targets. Therefore, we define features to find relevant target researchers who have author-based search patterns by analyzing information on common author relations existing in a researcher’s historical preferences. In summary, we propose a novel Common Author relation-based REcommendation method (CARE) for specific target researchers with author-based search patterns.

Our main contributions in this work can be outlined as follows:

We present two features including the ratio of pairwise articles with common author relations and the ratio of the most frequently appeared author, to help determine relevant researchers with author-based search patterns.

We propose a novel recommendation method, which incorporates common author relations between articles to help generate better recommendations for relevant target researchers.

We conduct relevant experiments using a real-world dataset CiteULike to evaluate the impacts of the defined features and the performance of the proposed method. In addition, other two features have also been defined and proved to be not effective for determining suitable targets.

The rest of the paper is organized as follows. Section 2 reviews related work on article recommendation. Section 3 presents our problem definition. Section 4 introduces the details of our proposed method. Section 5 describes our experimental setup and discusses our results in detail. Section 6 finally concludes the paper.

Fig. 1: Four academic recommendation tasks regarding to the entities in academia: researcher, article, venue, reviewer.

2 Related Work

Recommender systems aim to automatically suggest items of potential interest to users. As well-known effective tools for solving information overload problems, recommender systems have been successfully applied in multiple domains including traffic [26], movies [34, 28], music [16, 14], news [5], e-commerce [41, 42], e-learning [3, 10], and so on. As aforementioned, with the rapid development of information technology and ever-growing amounts of scholarly data, it is becoming increasingly popular and challenging to apply recommendation techniques in academia. In this section, we focus on reviewing related work on article recommendation.

2.1 Article-Article Recommendation

Article-article recommendation, i.e., citation recommendation, includes global citation recommendation [29, 6, 1, 22, 21, 27, 18] and local citation recommendation [36, 19, 11, 37, 12]. Global citation recommendation aims to recommend a list of citations for a given query article. Strohman et al. [29] linearly combined text features and citation graph features to measure the relevance between articles. They conducted relevant experiments with their proposed citation recommender system and concluded that similarity between bibliographies of articles and Katz distance are the most important features. Gori and Pucci [6] used citation relations between articles to built a citation graph and applied a random walk algorithm in the graph to compute ranking scores of each article as a reference of a target article. Bethard and Jurafsky [1] incorporated a wide variety of features (including author impact, author citation habits, citation count, and publication ages) to build a retrieval model for literature search. After a training process, the model took abstract of an article as input to produce relevant reference lists. Nallapati et al. [22]

jointly modeled the text and citation relationship under a framework of topic model. They introduced a model Pair-Link-LDA which models the presence or absence of a link between pairwise articles and does not scale to large digital libraries. They also introduced another model called Link-PLSA-LDA which models citations as a sample from a probability distribution associated with a topic. Meng et al.

[21] incorporated various types of information including content, authorship, and collaboration network to build a unified graph-based model for personal global citation recommendation. Ren et al. [27] proposed to cluster article citations into interest groups to determine the significance of different structural relevance features for each group while deriving an article’s relative authority within each group. Liu et al. [18] employed the pseudo relevance feedback (PRF) algorithm to determine important nodes like authors and venues on a heterogeneous bibliographic graph. Then, a random walk algorithm was run to compute the ranking scores of an article.

On the other hand, local citation recommendation aims to recommend citations for a given specific context such as a sentence in a paper. Tang and Zhang [36]

formally defined the problem of topic-based citation recommendation and proposed to model article contents and citation relationships using a two-layer restricted Boltzmann machine. For a given context, they calculated the probability of each article being the reference based on the model. Lu et al.

[19] proposed to recommend citations using a translation model which is originally used in translating text in one language to another. They assumed that the languages used in citation contexts and article’s content are different and translated one word in context to one word in citation. Based on the probability of translating one word to another, relevant articles were recommended to a citation context. Huang et al. [11]

regarded an article as new ’words’ in another language and employed a translation model for estimating the probability of citing an article given a citation context. Tang et al.

[37] proposed a cross-language context-aware citation recommendation method for the purpose of recommending English citations for a given context of the place where a citation should be made in a Chinese article. Huang et al. [12]

proposed a novel neural probabilistic model which jointly learns the semantic representations of citation contexts and cited articles and then estimated the probability of citing an article by a neural network.

2.2 Article-Researcher Recommendation

Article-researcher recommendation is our focus in this paper. Most existing studies compute similarities among researchers and articles based on articles’ contents [30, 31, 40, 23, 13, 39, 32, 24, 25] or tags in social tagging systems [32, 24, 25, 43] and then apply traditional collaborative filtering to generate recommendations. Sugiyama and Kan [30] examined the effect of modeling a researcher’s past works in scientific article recommendation to the researcher. A researcher’s profile was derived from his past works and other works which are the references or citations of those works. Apart from previous explicit citations, Sugiyama and Kan [31] additionally took into account implicit citations. Potential citation articles were discovered using collaborative filtering and then combined with previous information to enhance the profiles of candidate articles and researchers. Finally, the two types of profiles were compared to compute similarity as their cosine measure. Wang and Blei [40] proposed a collaborative topic regression model for article recommendation, where each user was represented with interest’s distribution and each article was described using content-based item topic distribution. Nascimento et al. [23] proposed a framework to generate structured search queries and obtained candidate articles using existing web information sources. Then, they computed the content-based similarity for ranking candidate articles. Jiang et al. [13] employed a concept-based topic model to compute the problem-based similarity and solution-based similarity between a known article of interest and an unknown article. Tian and Jing [39] employed LDA (Latent Dirichlet Allocation) model to obtain each article’s representation based on content and computed the similarity between articles for determining their associations. Sun et al. [32] exploited semantic content and heterogeneous connections (i.e., social connection, behavioral connection, and semantic connection) to build two kinds of profiles of researchers and then computed researcher-article similarities and researcher-researcher similarities. Using a Social-Union method [33], the score of a target researcher on an article is defined by that of his nearest neighbors. Finally, it is fused with the results obtained based on researcher-article similarity computation to compute the final ranking scores. Pera and Ng [24, 25] proposed a personalized recommender for scientific articles, called PReSA. Given a target publication which a user expressed interest in, PReSA computes the three similarities between and each candidate publication which is from the ones in the personal libraries of ’s connections: a) tag similarity, b) title similarity, and c) abstract similarity. Then, these similarities and popularity of publications were fused to calculate the ranking score of by employing a weight linear combination strategy. Finally, top-N publications were recommended to . Xia et al. [43] built a active participant’s profile and each group’s profile based on tags annotated by participants. Then, their similarity was computed to recommend the active participant’s to other participants in groups with higher similarities.

In this paper, we utilize only information on articles’ authors to build common author relations between articles. Compared to analysis on content and tags, our work is simpler and more time-saving, because the number of terms in content and tags is enormous and there exist lots of irrelevant terms. In addition, these studies do not take into account specific target researchers suitable for their recommendation methods. We assume that, since our proposed method (CARE) incorporates common author relations, only a part of researchers can be selected as targets for high-quality recommendation. Accordingly, we define features to determine such researchers.

Fig. 2: An example of article library.
Fig. 3: A example of recommendation scenario including three entities (researcher, article, and author) and two relations (reading and writing).

3 Problem Statement

In academic social tagging websites such as CiteULike, each of the registered users is generally a researcher. When a researcher is interested in an article, he will post it into his article library, read it extensively and then tag the article with one or more special keywords. As shown in Fig. 2, five articles have been included in KittyWang’s library and each of them is given many different tags by the researcher. The researcher’s historical preference is represented by the set of articles that interest him in his library. In this paper, our scientific article recommendation method aims to study how to automatically find the most possibly-preferred articles which will be posted into a target researcher’s library.

Fig. 3 shows our recommendation scenario. The scenario includes three objects: researchers, articles, and authors. Although there are possibly overlaps between researchers and authors, this situation is not taken into consideration due to the facts that the CiteULike website does not provide enough information such as email address to determine each registered researcher’s identity (i.e., whether he is an author of a certain article in article library) and the recommendation targets are the registered researchers rather than the authors. Additionally, there exist two kinds of links among the three objects. The first kind of link represents that a researcher, who is a registered user in CiteULike website, has read one or more articles he is interested in. The second kind of link represents that an article is written and published by one or more authors. Traditional collaborative filtering methods utilize the first kind of link to generate recommendations. The rational underlying these methods is that, two researchers who are interested in the same articles are similar and then the tastes of similar researchers are used to predict those of target researchers. Generally, the second kind of link is ignored in these collaborative filtering methods. However, these additional information may influence recommendation quality due to the fact that researchers often focus the same authors’ publications when they find these authors’ work relevant to theirs. As a result, it is very necessary to incorporate the second kind of link to propose a novel recommendation method. This is the first problem we address in this paper.

Researchers search articles published by the same authors to find articles they are interested in. We call it author-based search pattern. Actually, not all researchers employ this pattern. Some of them are likely to search citations or only focus on articles’ titles. Therefore, the second problem we are aiming to solve is how to find target researchers with author-based search patterns. Then, the information on authors can be incorporated to help generate better recommendations for these targets.

Fig. 4: The architecture of CARE.

4 Design of CARE

4.1 Overview

Our CARE method is inspired by two important facts: (i) researchers generally search articles written by the same authors; (ii) not all researchers have such an author-based search patterns. Fig. 4 shows the architecture of CARE, which mainly includes two components: (i)researcher selection module and (ii) graph-based article ranking module. The first component is responsible for extracting relevant features from researchers’ historical preferences and then selecting researchers with author-based search patterns as recommendation targets. The second component is responsible for incorporating common author relations to build a graph and generating article ranking list through a graph-based random walk algorithm. In the domain of recommender systems, random walk-based ranking is a classical technique for recommendation. Based on the technique, many researchers [7] have successfully applied it to various recommendation scenarios. Next, we will introduce the two components in detail.

4.2 Target researcher selection

Fig. 5: An example scenario for a researcher.

For researchers who find articles of interest by searching article written by the same authors, in their online article libraries, possibly there are lots of articles which are mainly written by one or several authors. Therefore, we define two features which are relevant to common authors between any two articles to help determine target researchers:

, is the ratio of the total number of pairwise articles with common author relations to the total number of all pairwise articles for a researcher.

, is the ratio of the occurrence number of the most frequently appeared author in articles to the total number of articles for a researcher.

For a researcher, when or is larger than a given threshold, this researcher will be considered to have an author-based search pattern and will be regarded as a target which is suitable for the next ranking component of our CARE method.

We use Fig. 5 as an example scenario for illustrating the computation process of the above two features. Fig 5(a) shows the writing relations between article and author for a target researcher , where each edge links an article to one of all its authors and researcher expressed interest to all articles in the figure. For example, article is linked to its three authors , , and through three different edges respectively. We consider two articles are related if they are linked to the same author(s) in Fig. 5(a). In this way, we convert the writing relation graph into a common author relation graph, as shown in Fig. 5(b), where two articles are linked to each other if they have common author(s). From Fig. 5(b), we can easily obtain the number of pairwise articles and it is equal to 4. For these articles, the number of all possible relations between articles is equal to . As a result, is equal to . In addition, is the most frequently occurred author and its occurrence number is 3. Then, is equal to . If the thresholds of and are set to 0.2 and 0.3 respectively, then and . Therefore, the researcher is a relevant target suitable for CARE method. There may be other features to determine relevant target researchers, but in this paper we consider the above two features and conduct experiments to verify their effectiveness in Section 5.

4.3 Graph-based Article Ranking

4.3.1 Graph Construction

As aforementioned, in field of academic recommendation, there are many entities such as researchers, articles, conferences, journals, and so on. In this paper, we consider some of them and then design a method for recommending scientific articles. Scientific article recommender systems include a set of researchers and a set of articles . Based on researchers’ historical preferences, we can give the pairwise reading relations between researchers and articles, denoted as with and with , indicating whether a researcher has read and expressed interest in an article , as shown in Equation (1). As we consider undirected relations in our method, is equal to . As stated in the previous section, we convert the writing relations between articles and authors into common author relations between articles. Then, we also give the pairwise relations between articles, denoted as with indicating whether there is/are common author(s) between the two article and , as shown in Equation (2). Likewise, we employ undirected common author relations. Therefore, is equal to . In addition, as there is no consideration on relations between researchers, we denote .


Based on the above two relation matrices, we construct a graph for applying a random walk-based article ranking algorithm, as shown in Fig. 6. Let , where and . and indicate the set of researcher vertices and the set of article vertices, respectively. and describe the set of reading relations between researchers and articles and the set of common author relations between articles. An edge linking a researcher to an article exists in the graph if or is equal to 1. Similarly, an edge linking an article to another article exists in the graph if or is equal to 1.

4.3.2 Transition Probability Computation

Fig. 6: An example graph for article ranking.

A random walk in the graph is actually a transition from a vertex to another vertex. Therefore, we subsequently utilize the above three matrices to build a transition matrix, of which each element represents the transition probability between two corresponding vertices (article to article, article to researcher, and researcher to researcher). The computation process is as follows. When a random walk starts with a researcher vertex, the transition probability of moving to another researcher vertex is


and the transition probability of moving to an article vertex is


Additionaly, when a random walk starts with an article vertex, the transition probability of moving to another article vertex is


and the transition probability of moving to a researcher vertex is


The transition probability matrix is


Note that, in the above computation process of transition probability matrix, for each vertex, we assign equal values to all its neighbor vertices no matter what kind of vertex (researcher and article) the neighbor is. Specially, a vertex moves to any one of its neighbor vertices with the same probability even though these neighbors are different types of vertices.

4.3.3 Random Walk with Restart

After obtaining the transition probability matrix, a random walk with restart method is employed to compute articles’ rankings. Generally, the algorithm finds articles of interest based on the meta path: researcher-article-researcher-article. This means, a researcher is likely to be interested in an article which another researcher who has similar historical preferences expressed interest to. We incorporate common author relations between articles and then add another meta path: researcher-article-article. This means, a researcher is likely to be interested in an article which is similar to another article which another researcher has expressed interest to. Our algorithm considers the two meta paths. Starting from a source vertex (target researcher), we perform random walk with restart in the graph built in previous section. After walking to any vertex , we continue the next random walk with probability and walk to another vertex which links to with transition probability . With probability , we return to source vertex . Algorithm 1 shows the process of graph-based article ranking. In this algorithm, a list of article rankings for target researcher are computed, and top-N articles which the researcher have not expressed interest in before, will be put in the recommendation list for a target researcher.

0:    Graph, ;Random walk probability, ;Target researcher vertex, ;Maximum step length of iteration, ;Transition probability matrix, ;
0:    Ranking scores of all article vertices, ; // article vertices
1:  Define ranking scores of all vertices, ; // vertices
2:  for each  do
3:     ; //initial ranking scores are 0
4:  end for
5:  ;
6:  for ; ;  do
7:     for each  do
8:        ; //initial values are 0
9:     end for
10:     for each  do
11:        for each  do
12:           ;
13:        end for
14:        if  then
15:           ;
16:        end if
17:     end for
18:     ;
19:  end for
20:  ; // select ranking scores of article vertices
21:  return  ;
Algorithm 1 Graph-based article ranking.

5 Experiments

5.1 Dataset

Number of researchers 5550
Number of articles 15439
Number of researcher-article reading relations 200251
Sparsity of researcher-article reading relations 0.9977
Number of article-article common author relations 18646
Sparsity of article-article common author relations 0.9998
TABLE I: Data statistics.

CiteULike is a free web-based tool to help scientists, researchers, and academics store, organize, share, and discover links to academic research papers. With more than 3.5 million papers currently bookmarked and over 900000 visitors per month, CiteULike has grown to be one of the biggest and most popular social reference management websites by helping users streamline their process of storing and managing academic references. Emamy and Cameron [4] have provided detail description on CiteULike. We used the version of CiteULike dataset collected by Wang et al. [40] in our experiments. This dataset includes all registered users’ (researchers) historical preferences, i.e., articles in each user’s library, and articles’ contents. Note that there is no author information in the original dataset. We designed a web crawler to collect each article’s author information from CiteULike website. Then, we compared pairwise articles’ authors to determine their common author relations. To avoid the situation that some authors’ names are the same, two articles are considered to be relevant if they have at least two same authors. Although there possibly exist his own article(s) in a researcher’s library, this situation can be ignored due to the following facts: (i) CiteULike has not provided enough information on the registered researchers’ identities such as email address, so there is no way to determine whether an article in a researcher’s article library belongs to him (i.e., its author); (ii) The registered researchers generally put other researchers’ articles of interest into his library. The original dataset includes 5551 researchers and 16980 articles. We removed articles with less than 5 researchers who express interests in them. The distribution of the preprocessed dataset is shown in Table I. Similar to most datasets for evaluating recommendation methods, this one has the characteristic that the researcher-article and article-article relations are very sparse, i.e. data sparsity. The sparsity indicates the ratio of the difference between numbers of all possible relations and existing common author relations to the number of all possible relations. Therefore, based on the spare data, if a novel recommendation method can be designed to improve recommendation quality, to some extent, the challenge of data sparsity will be solved.

(a) Precision
(b) Recall
(c) F1
Fig. 7: Precision, recall, and F1 of CARE for different walking probability .
(a) Precision
(b) Recall
(c) F1
Fig. 8: Comparison of precision, recall, and F1 of Baseline and CARE for all researchers and relevant researchers.

5.2 Experimental Setup

To test our method’s performance, the dataset is randomly divided into a training set (80%) and a test set (20%) using the following procedures. For researchers who have less than 5 articles in his library, we randomly select one article and the corresponding researcher into the test set. For others, we randomly select articles into the test set at the ratio of 20%. The training set is treated as known information used by our method for generating recommendations, while the test set is regarded as unknown information used for testing the performance of recommendation results. To evaluate the recommendation quality of our proposed method, in our experiments, we employed three different metrics, namely, Precision, Recall, and F1, which have been widely used in the literatures [9, 2, 8] on the fields of recommender systems and information retrieval. Next, we give their definition information.

Precision. Precision represents the probability that the selected articles appeared in the recommendation list which is shown as


where represents researcher ’s precision, denotes the number of recommended articles that appeared in the researcher ’s test set, and represents the length of recommendation list. By averaging over all researchers’ precisions, we can obtain the whole recommender systems’ precision as


where represents the number of researchers. Obviously, a higher precision means a higher recommendation accuracy.

Recall. Recall represents the probability that the recommended articles appeared in researcher’s collected list shown as


where represents researcher ’s recall and is the number of articles collected by researcher in the test set. Averaging over all individuals’ recall, we can obtain the whole recommender systems’ recall as


F1. Generally speaking, for each researcher, recall is sensitive to and a larger value of

generally gives a higher recall but a lower precision. F1, that assigns equal weight for precision and recall, is defined as


By averaging over all researchers’ F1, we can also obtain the whole system’s F1 as


In order to demonstrate the effectiveness of our recommendation method, we compare CARE with the following method.

Baseline: This is a random walk model with restart, which does not take into account common author relations between articles and does not differentiate between relevant researchers and irrelevant researchers.

(a) Precision
(b) Recall
(c) F1
Fig. 9: Comparison of precision, recall, and F1 of CARE for different thresholds of FE1.
(a) Precision
(b) Recall
(c) F1
Fig. 10: Comparison of precision, recall, and F1 of CARE for different thresholds of FE2.
Fig. 11: Impact of FE1 on increase rate.
Fig. 12: Impact of FE2 on increase rate.

5.3 Impact of Walking Probability

As stated in Section 4.3, for a ceratin vertex, represents the walking probability from the vertex to its neighbor vertices and () represents the walking probability from the vertex to the source vertex (target researcher). Different values of may produce different impacts on recommendation quality. We conducted relevant experiments using our proposed CARE method for different values of . Fig. 7 shows the comparison results of precision, recall, and F1 when is equal to 0.2, 0.4, 0.6, and 0.8, respectively. As shown in these sub-figures, for a larger value of , our CARE method achieves larger values of precision, recall, and F1. For example, when N is equal to 6, CARE ( is equal to 0.2) achieves the worst results (6% precision, 10% recall, and 7.5% F1), and CARE ( is equal to 0.8) achieves the best results (8.5% precision, 14% recall, and 11% F1). This indicates that different walking probabilities have different impacts on CARE method. However, because is the common parameter for CARE and Baseline methods, it is enough to discuss their comparison results only if they employ the same value of . Therefore, we assign an empirical value of 0.8 to for next experiments.

5.4 Comparison against Baseline Method

(a) Precision
(b) Recall
(c) F1
Fig. 13: Comparison of precision, recall, and F1 of CARE for different thresholds of FE3.
(a) Precision
(b) Recall
(c) F1
Fig. 14: Comparison of precision, recall, and F1 of CARE for different thresholds of FE4.

In this section, we conducted two groups of experiments: (i) compare the results of CARE and Baseline when all researchers are targets and the two methods in this situation are called CARE-1 and Baseline-1; (ii) compare the results of CARE and Baseline when a part of researchers are selected as relevant targets through the previously-defined features ( and ) and the two methods under this situation are called CARE-2 and Baseline-2. Fig. 8 shows the result comparisons of the two groups of experiments. In the comparison of CARE-1 to Baseline-1, it can be obviously seen that, their precision, recall, and F1 are almost the same. Actually, the results of CARE-1 are a little worse than those of Baseline-1. However, we can see that, precision, recall, and F1 of CARE-2 method are significantly larger than those of Baseline-2 method. It means that, CARE performs better than Baseline method for relevant researchers filtered using two features and . This indicates that, incorporating common author relations is able to help generate accurate recommendations for relevant researchers rather than all researchers.

5.5 Impact of Researcher Features

In this section, we conducted relevant experiments to discuss the impact of researcher’s features previously defined in Section 4.2 on recommendation quality of CARE method. Fig. 9 shows the comparison results of CARE for different thresholds of (i.e., 0, 0.05, 0.1, and 0.15). We can see that, for a larger threshold of , CARE achieves larger precision, recall, and F1. For example, for the length of recommendation list is equal to 2, when the threshold of is equal to 0, their results are 12% precision, 6% recall, and 7% F1 (i.e., the least values), and when the threshold of is equal to 0.15, their results are 23% precision, 22% recall, and 23% F1 (i.e., the largest values). This indicates that, a larger threshold of can help to find more relevant researchers with author-based search patterns and then our proposed CARE method generates better recommendations for these targets. In addition, Fig. 10 shows the comparison results of CARE for different thresholds of (i.e., 0, 0.1, 0.2, and 0.3). We can also see that, CARE with a larger threshold of performs better than that with a smallest threshold of . Similarly, this demonstrates that, is an effective feature for determining relevant target researchers who have author-based searcher patterns.

In addition, we defined increase rate to represent CARE’s improvement ratio to Baseline for different thresholds of and using Equation (14). Note that increase rate is the same for precision, recall, and F1. Fig. 12 shows the increase rate when the thresholds of are 0, 0.05, 0.1, and 0.15, respectively. It can be observed that, the increase rate is positive for these four situations. This demonstrates that, for the same researchers who are filtered using the threshold of , CARE performs better than the Baseline method. Fig. 12 shows the increase rate when the thresholds of are 0.1, 0.2, 0.3, and 0.4. We can also see that, the increase rate is positive for all situations. Especially, when the threshold of is larger (e.g., 0.3 or 0.4), CARE performs much better than the Baseline method. These experiments further illustrate that these two features are useful to help find researchers with author-based search patterns and our CARE method is effective in terms of generating better recommendations for those targets.


What’s more, we also defined the following two features.

, is the total number of pairwise articles with common author relations for a researcher.

, is the ratio of the number of common authors who exist in all articles to the total number of articles for a researcher.

We also use Fig. 5 as an example scenario for illustrating the computation process of and . From Fig. 5(b), we can easily obtain the number of pairwise articles, therefore is equal to 4. Additionally, in the example scenario, the common authors are , , , and . Then, is equal to . Using and , we conducted relevant experiments as shown in Figs. 13 and 14. For a larger value of or , F1 of CARE also achieves a larger value, but recall is smaller in Fig. 13(b) or almost the same in Fig. 14(b). There may be the following reasons: (i) The feature takes into account the number of existing common author relations between articles but ignores the total number of articles read by a researcher; (ii) The feature does not take into account the occurrence number of each common author. Therefore, the two features are considered to be ineffective for determining relevant researchers.

6 Conclusion

In this paper, a novel method that exploits information pertaining to common author relations and historical preferences has been proposed to recommend articles of interest for specific researchers with author-based search patterns. In order to determine specific targets, we defined two features (i.e. and ) which are relevant to common author relations between articles. Then, the information on common authors relations was incorporated to build a graph-based article ranking algorithm for generating a recommendation list. The experimental results demonstrated that, for relevant targets determined by two features, our proposed method performs better than the Baseline method and the two features have impacts on recommendation quality. In addition, we also defined two other features ( and in Section 5.5) and they are proved to be ineffective for suitable targets selection through relevant experiments.

As our future work, we plan to define new features to explore which researchers have author-based search patterns. In addition, it is potential to incorporate additional social relations such as citation relationships to design a citation-based recommendation method. Then, relevant targets are determined by analyzing the information on citation relations between articles. Finally, different recommendation methods which are suitable for different researchers can be combined into a hybrid framework so that all researchers can obtain satisfactory recommendations.


  • [1] S. Bethard and D. Jurafsky (2010) Who should i cite: learning literature search models from citation behavior. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 609–618. Cited by: §2.1.
  • [2] J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez (2013) Recommender systems survey. Knowledge-Based Systems 46, pp. 109–132. Cited by: §5.2.
  • [3] J. Bobadilla, F. Serradilla, and A. Hernando (2009) Collaborative filtering adapted to recommender systems of e-learning. Knowledge-Based Systems 22 (4), pp. 261–265. Cited by: §2.
  • [4] K. Emamy and R. Cameron (2007) Citeulike: a researcher’s social bookmarking service. Ariadne (51), pp. 5. Cited by: §5.1.
  • [5] F. Garcin, C. Dimitrakakis, and B. Faltings (2013) Personalized news recommendation with context trees. In Proceedings of the 7th ACM Conference on Recommender Systems, pp. 105–112. Cited by: §2.
  • [6] M. Gori and A. Pucci (2006) Research paper recommender systems: a random-walk based approach. In 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 778–781. Cited by: §1, §2.1.
  • [7] M. Gori and A. Pucci (2007) ItemRank: a random-walk based scoring algorithm for recommender engines. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 2766–2771. Cited by: §4.1.
  • [8] D. A. Grossman and O. Frieder (2012)

    Information retrieval: algorithms and heuristics

    Vol. 15. Cited by: §5.2.
  • [9] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl (2004) Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22 (1), pp. 5–53. Cited by: §5.2.
  • [10] M.-H. Hsu (2008) Proposing an esl recommender teaching and learning system. Expert Systems with Applications 34 (3), pp. 2102–2110. Cited by: §2.
  • [11] W. Huang, S. Kataria, C. Caragea, P. Mitra, C. L. Giles, and L. Rokach (2012) Recommending citations: translating papers into references. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1910–1914. Cited by: §2.1, §2.1.
  • [12] W. Huang, Z. Wu, C. Liang, P. Mitra, and C. L. Giles (2015) A neural probabilistic model for context based citation recommendation. In

    Proceedings of the 29th Conference on Artificial Intelligence

    Cited by: §2.1, §2.1.
  • [13] Y. Jiang, A. Jia, Y. Feng, and D. Zhao (2012) Recommending academic papers via users’ reading purposes. In Proceedings of the 6th ACM Conference on Recommender Systems, pp. 241–244. Cited by: §2.2.
  • [14] M. Kaminskas, F. Ricci, and M. Schedl (2013) Location-aware music recommendation using auto-tagging and hybrid matching. In Proceedings of the 7th ACM Conference on Recommender Systems, pp. 17–24. Cited by: §2.
  • [15] T. Kolasa and D. Krol (2011) A survey of algorithms for paper-reviewer assignment problem. IETE Technical Review 28 (2), pp. 123–134. Cited by: §1.
  • [16] Q. Li, S. H. Myaeng, and B. M. Kim (2007) A probabilistic music recommender considering user opinions and audio features. Information Processing and Management 43 (2), pp. 473–487. Cited by: §2.
  • [17] H. Liu, Z. Yang, I. Lee, Z. Xu, S. Yu, and F. Xia (2015-12) CAR: incorporating filtered citation relations for scientific article recommendation. In The 8th IEEE International Conference on Social Computing and Networking (SocialCom), Chengdu, China. Cited by: §1.
  • [18] X. Liu, Y. Yu, C. Guo, and Y. Sun (2014) Meta-path-based ranking with pseudo relevance feedback on heterogeneous graph for citation recommendation. In Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, pp. 121–130. Cited by: §2.1.
  • [19] Y. Lu, J. He, D. Shan, and H. Yan (2011) Recommending citations with translation model. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 2017–2020. Cited by: §2.1, §2.1.
  • [20] E. Medvet, A. Bartoli, and G. Piccinin (2014) Publication venue recommendation based on paper abstract. In Proceedings of the 26th IEEE International Conference on Tools with Artificial Intelligence, pp. 1004–1010. Cited by: §1.
  • [21] F. Meng, D. Gao, W. Li, X. Sun, and Y. Hou (2013) A unified graph model for personalized query-oriented reference paper recommendation. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 1509–1512. Cited by: §2.1.
  • [22] R. M. Nallapati, A. Ahmed, E. P. Xing, and W. W. Cohen (2008) Joint latent topic models for text and citations. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 542–550. Cited by: §2.1.
  • [23] C. Nascimento, A. H. F. Laender, da A. S. Silva, and M. G. Gonçalves (2011) A source independent framework for research paper recommendation. In Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, pp. 297–306. Cited by: §2.2.
  • [24] M. S. Pera and Y.-K. Ng (2011) A personalized recommendation system on scholarly publications. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 2133–2136. Cited by: §2.2.
  • [25] M. S. Pera and Y.-K. Ng (2014) Exploiting the wisdom of social connections to make personalized recommendations on scholarly articles. Journal of Intelligent Information Systems 42 (3), pp. 371–391. Cited by: §2.2.
  • [26] M. Qu, H. Zhu, J. Liu, G. Liu, and H. Xiong (2014) A cost-effective recommender system for taxi drivers. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 45–54. Cited by: §2.
  • [27] X. Ren, J. Liu, X. Yu, U. Khandelwal, Q. Gu, L. Wang, and J. Han (2014) ClusCite: effective citation recommendation by information network-based clustering. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 821–830. Cited by: §2.1.
  • [28] Y. Shi, M. Larson, and A. Hanjalic (2013) Mining contextual movie similarity with matrix factorization for context-aware recommendation. ACM Transactions on Intelligent Systems and Technology 4 (1), pp. 16. Cited by: §2.
  • [29] G. Strohman, W. B. Croft, and D. Jensend (2007) Recommending citations for academic papers. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 705–706. Cited by: §2.1.
  • [30] K. Sugiyama and M.-Y. Kan (2010) Scholarly paper recommendation via user’s recent research interests. In Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 29–38. Cited by: §1, §2.2.
  • [31] K. Sugiyama and M.-Y. Kan (2014) A comprehensive evaluation of scholarly paper recommendation using potential citation papers. International Journal on Digital Libraries, pp. 1–19. Cited by: §2.2.
  • [32] J. Sun, J. Ma, Z. Liu, and Y. Miao (2014) Leveraging content and connections for scientific article recommendation in social computing contexts. The Computer Journal 57 (9), pp. 1331–1342. Cited by: §1, §1, §2.2.
  • [33] P. Symeonidis, E. Tiakas, and Y. Manolopoulos (2011) Product recommendation and rating prediction based on multi-modal social networks. In Proceedings of the 5th ACM Conference on Recommender Systems, pp. 61–68. Cited by: §2.2.
  • [34] J. Tang, G.-J. Qi, L. Zhang, and C. Xu (2013) Cross-space affinity learning with its application to movie recommendation. IEEE Transactions on Knowledge and Data Engineering 25 (7), pp. 1510–1519. Cited by: §2.
  • [35] J. Tang, S. Wu, J. Sun, and H. Su (2012) Cross-domain collaboration recommendation. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1285–1293. Cited by: §1.
  • [36] J. Tang and J. Zhang (2009) A discriminative approach to topic-based citation recommendation. In Advances in Knowledge Discovery and Data Mining, pp. 572–579. Cited by: §1, §2.1, §2.1.
  • [37] X. Tang, X. Wan, and X. Zhang (2014) Cross-language context-aware citation recommendation in scientific articles. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 817–826. Cited by: §2.1, §2.1.
  • [38] D. K. Tayal, PC. Saxena, A. Sharma, G. Khanna, and S. Gupta (2014) New method for solving reviewer assignment problem using type-2 fuzzy sets and fuzzy functions. Applied intelligence 40 (1), pp. 54–73. Cited by: §1.
  • [39] G. Tian and L. Jing (2013) Recommending scientific articles using bi-relational graph-based iterative rwr. In Proceedings of the 7th ACM Conference on Recommender Systems, pp. 399–402. Cited by: §2.2.
  • [40] C. Wang and D. M. Blei (2011) Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 448–456. Cited by: §2.2, §5.1.
  • [41] H.-F. Wang and C.-T. Wu (2012) A strategy-oriented operation module for recommender systems in e-commerce. Computers & Operations Research 39 (8), pp. 1837–1849. Cited by: §2.
  • [42] J. Wang and Y. Zhang (2013) Opportunity model for e-commerce recommendation: right product; right time. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 303–312. Cited by: §2.
  • [43] F. Xia, N. Y. Asabere, H. Liu, N. Deonauth, and F. Li (2014) Folksonomy based socially-aware recommendation of scholarly papers for conference participants. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, pp. 781–786. Cited by: §2.2.
  • [44] F. Xia, Z. Chen, W. Wang, J. Li, and L. T. Yang (2014) MVCWalker: random walk based most valuable collaborators recommendation exploiting academic factors. IEEE Transactions on Emerging Topics in Computing 2 (3), pp. 364–375. Cited by: §1.
  • [45] Z. Yang and B. D. Davison (2012) Venue recommendation: submitting your paper with style. In

    2012 11th International Conference on Machine Learning and Applications

    Vol. 1, pp. 681–686. Cited by: §1.