Bursting Scientific Filter Bubbles: Boosting Innovation via Novel Author Discovery

Isolated silos of scientific research and the growing challenge of information overload limit awareness across the literature and hinder innovation. Algorithmic curation and recommendation, which often prioritize relevance, can further reinforce these informational "filter bubbles." In response, we describe Bridger, a system for facilitating discovery of scholars and their work, to explore design tradeoffs between relevant and novel recommendations. We construct a faceted representation of authors with information gleaned from their papers and inferred author personas, and use it to develop an approach that locates commonalities ("bridges") and contrasts between scientists – retrieving partially similar authors rather than aiming for strict similarity. In studies with computer science researchers, this approach helps users discover authors considered useful for generating novel research directions, outperforming a state-of-art neural model. In addition to recommending new content, we also demonstrate an approach for displaying it in a manner that boosts researchers' ability to understand the work of authors with whom they are unfamiliar. Finally, our analysis reveals that Bridger connects authors who have different citation profiles, publish in different venues, and are more distant in social co-authorship networks, raising the prospect of bridging diverse communities and facilitating discovery.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 8

08/09/2020

Scientific Article Recommendation: Exploiting Common Author Relations and Historical Preferences

Scientific article recommender systems are playing an increasingly impor...
07/12/2018

A network-based citation indicator of scientific performance

Scientists are embedded in social and information networks that influenc...
10/24/2017

Implementing Recommendation Algorithms in a Large-Scale Biomedical Science Knowledge Base

The number of biomedical research articles published has doubled in the ...
01/08/2020

A Correspondence Analysis Framework for Author-Conference Recommendations

For many years, achievements and discoveries made by scientists are made...
03/24/2020

A Review of Methods for Estimating Algorithmic Complexity: Options, Challenges, and New Directions

Established and novel techniques in the field of applications of algorit...
11/19/2020

Evaluation of investigational paradigms for the discovery of non-canonical astrophysical phenomena

Non-canonical phenomena - defined here as observables which are either i...
09/30/2017

Bounded Rationality in Scholarly Knowledge Discovery

In an information-rich world, people's time and attention must be divide...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

“Opinion and behavior are more homogeneous within than between groups… Brokerage across structural holes provides a vision of options otherwise unseen.” (Burt, 2004)

The volume of papers in computer science continues to sky-rocket, with the DBLP computer science bibliography listing hundreds of thousands of publications in the year 2020 alone.111https://dblp.org/statistics/publicationsperyear.html In particular, the field of AI has seen a meteoric growth in recent years, with new authors entering the field every hour (Tang et al., 2020). Researchers rely largely on search and recommendation services like Google Scholar and Semantic Scholar to keep pace with the growing literature and the authors who contribute to it. The literature retrieval services algorithmically decide what information to serve to scientists (Beel and Gipp, 2009; Cohan et al., 2020)

, using information such as citations and textual content as well as behavioral traces such as clickthrough data, to inform machine learning models that output lists of ranked papers or authors.

By relying on user behavior and queries, these services adapt and reflect human input and, in turn, influence subsequent search behavior. This cycle of input, updating, engagement, and response can lead to an amplification of biases around searchers’ prior awareness and knowledge (Kim et al., 2017). Such biases include selective exposure (Frey, 1986), homophily (McPherson et al., 2001), and the aversion to information from novel domains that require more cognitive effort to consider (Hope et al., 2017; Kittur et al., 2019). By reinforcing these tendencies, systems that filter and rank information run the risk of engendering so-called filter bubbles (Pariser, 2011) that fail to show users novel content outside their narrower field of interest.

Figure 1. Bursting scientific bubbles with Bridger. The overarching goal is to (1) find commonalities among authors working in different areas and unaware of one another, and (2) suggest novel and valuable authors and their work, unlikely discovered otherwise due to their disparities.

This is a teaser figure which represents the Bridger system. The top is labeled ”(1) Identify shared facets (bridges).” On the left is a bubble representing the ”Text summarization community,” with circles representing authors, and arrows between the circles to show that they are communicating amongst themselves, within-community. On the right is a similar bubble for ”Video summarization community”. Between the two bubbles is a bridge, labeled with two concepts that could bridge the two communities: ”Summary evaluations,” and ”Extracting summaries”. The bottom of the figure is labeled ”Suggest new authors and ideas”. Another circle is present to represent a new author, and two new suggested ideas are presented: ”Automatic evaluation methodologies,” and ”Weakly supervised large-scale data collection for summarization”.

These bubbles and silos of information can be costly to individual researchers and for the evolution of science as a whole. They may lead scientists to concentrate on narrower niches (Klinger et al., 2020), reinforcing citation inequality and bias (Nielsen and Andersen, 2021) and limiting cross-fertilization among different areas that could catalyze innovation (Hope et al., 2017; Kittur et al., 2019; Hope et al., 2021). Addressing filter bubbles in general, in domains such as social media and e-commerce recommendations, is a hard and unsolved problem (Ge et al., 2020; Chen et al., 2020; Zhu et al., 2020). The problem is especially difficult in the scientific domain. The scientific literature consists of complex models and theories, specialized language, and an endless diversity of continuously emerging concepts. Connecting blindly across these cultural boundaries requires significant cognitive effort (Vilhena et al., 2014), translating to time and resources most researchers are unlikely to have at their disposal to enter unfamiliar research territory.222The challenge of limited time to explore novel directions is also discussed in our interviews with researchers; see §6.

Our vision in this paper is to develop an approach that boosts scientific innovation and builds bridges across scientific communities, by helping scientists discover authors that spark new ideas for research. Working toward this goal, we developed Bridger, illustrated in Figure 1. Our main contributions include:

  • [topsep=1mm,noitemsep]

  • A multidimensional author representation for matching authors along specific facets. Our novel representation includes information extracted automatically from papers, including tasks, methods and resources, and automatically inferred personas

    that reflect the different focus areas on which each scientist works. Each of these aspects is embedded in a vector space based on its content, allowing the system to

    identify authors with commonalities along specific dimensions and not others, such as authors working on similar tasks but not using similar methods.

  • Boosting discovery of useful authors and ideas from novel areas. We explore the utility of our author representation in experiments with computer science researchers interacting with Bridger. We find that this representation helps users discover authors considered novel and relevant, assisting users in finding potentially useful research directions. Bridger outperforms a strong neural model currently employed by a public scholarly search engine for search and recommendation333https://twitter.com/SemanticScholar/status/1267867735318355968.— despite Bridger’s focus on surfacing novel content and the built-in biases associated with this novelty. We conduct in-depth interviews with researchers, studying the tradeoffs between novelty and relevance in scientific content recommendations and discussing challenges and directions for author discovery systems.

  • Exploring how to effectively depict recommended authors. In addition to assessing what authors to recommend to spark new research ideas, we also consider how to display authors in a way that enables users to rapidly understand what new authors work on. We employ Bridger as an experimental platform to explore which facets are displayed to users, investigating various design choices and tradeoffs. We obtain substantially better results in terms of user understanding of profiles of unknown authors, when displaying information taken from our author representation.

  • Evidence of bridging across research communities. Finally, we conduct in-depth analyses revealing that Bridger surfaces novel and valuable authors and their work that are unlikely to be discovered in the absence of Bridger due to publishing in different venues, citing and being cited by non-overlapping communities, and having greater distances in the social co-authorship network.

Taken together, the ability to uncover novel and useful authors and ideas, and to serve this information to users in an effective and intuitive manner, suggests a future where automated systems are put to work to build bridges across communities, rather than blindly reinforcing existing filter bubbles.

2. Related Work

Inspirational Stimuli

Our work is related to literature focused on computational tools for boosting creativity (Hope et al., 2017; Chan et al., 2018; Kittur et al., 2019; Goucher-Lambert et al., 2020; Hope et al., 2021). Experiments in this area typically involve giving participants a specific concrete problem, and examining methods for helping them come up with creative solutions (Hope et al., 2017, 2021). In our efforts reported in this paper, we do not assume to be given a single concrete problem. Rather, we are given authors and their papers, and automatically identify personalized inspirations in the form of other authors and their contributions. These computationally complex objects — authors can have many papers, each with many facets and authored by multiple co-authors — are very different to the short, single text snippets typically used in this line of work (Hope et al., 2017, 2021), or even to paper abstracts (Chan et al., 2018). A recurring theme in this area is the notion of a “sweet spot” for inspiration: not too similar to a given problem that a user aims to solve, and not too far afield (Fu et al., 2013). Finding such a sweet spot remains an important challenge. We study a related notion of balancing commonalities and contrasts between researchers for discovering authors that spark new research directions.

Filter Bubbles and Recommendations

How to mitigate the filter bubble effect is a challenging open question for algorithmic recommendation systems (Nguyen et al., 2014), explored recently for movies (Zhu et al., 2020) and in e-commerce (Ge et al., 2020) by surfacing content that is aimed at being both novel and relevant. One approach that has been explored for mitigating these biases is judging recommendations not only by accuracy, but with other metrics such as diversity (difference between recommendations) (Wilhelm et al., 2018; Chen et al., 2020), novelty (items assumed unknown to the user) (Zhao and Lee, 2016), and serendipity (a measure of relevance and surprise associated with a positive emotional response) (Wang et al., 2020). The notion of serendipity is notoriously hard to quantitatively define and measure (Kaminskas and Bridge, 2016; Chen et al., 2019; Wang et al., 2020

; recommendation system based on hierarchical clustering of an article-level citation network

et al., 2016); recently, user studies have explored human perceptions of serendipity (Chen et al., 2019; Wang et al., 2020), yet this problem remains very much open. A distinct, novel feature of our work is the focus on the scientific domain, and that unlike the standard recommendation system setting we measure our system’s utility in terms of boosting users’ ability to discover authors that spur new ideas for research. In experiments with computer science researchers, we explore interventions that could potentially help provide bridges to authors working in diverse areas, with an approach based on finding faceted commonalities and contrasts between researchers.

Scientific Recommendations

Work in this area typically focuses on recommending papers, using proxies such as citations or co-authorship links in place of ground truth (Tang et al., 2012; Beel et al., 2016; Portenoy and West, 2020), or a combination of text and citation data (Cohan et al., 2020). In addition to being noisy proxies in terms of relevance, these signals reinforce existing patterns of citation or collaboration, and are not indicative of papers or authors that would help users generate novel research directions — the focus of Bridger. Furthermore, we perform controlled experiments with researchers to be able to better evaluate our approach without the biases involved in learning from observational data on citations or co-authorship. One related recent direction considers the problem of diversifying social connections made between academic conference attendees (Tsai and Brusilovsky, 2018; Tsai et al., 2020; Wang et al., 2019b), by definition a relatively narrow group working in closely-related areas, using attendee metadata or publication similarity.

3. Bridger: Approach Overview

Figure 2. Overview of Bridger’s author representation, retrieval, and depiction. Users are represented in terms of a matrix with rows corresponding to papers, and columns corresponding to facets. Bridger finds suggested authors who match along certain “slices” of the user’s data – certain facets, subsets of papers, or both.

The left of the figure shows the matrix representation of a user. Two example ”slices” are shown. ”Author slice 1” is a slice of two vertically adjacent cells, containing only the tasks taken from a single persona-based subset of the user’s papers. ”Author slice 2” is a slice of five vertically adjacent cells, representing only the resources from the user’s total set of papers. To the right of the matrix, is a representation of the ”slice embedding” which is a vector representation of one of these slices taken from the matrix. To the right of this, two new authors are shown, along with their own matrices. These are the new authors that the system has retrieved by finding similarities only along particular slices. Finally, on the far right, one of these new authors is shown represented as a card in the bridger system, with that author’s top tasks and methods listed.

In this section we present our novel faceted representation of authors, and methods for using this representation for author discovery by matching researchers along specific dimensions (Figure 2). We also present methods for depicting the recommended authors when showing them to users. Bridger is designed to enable the study of different design choices for scientific author and idea discovery. We present the general framework, and the specific instantiations that we explore. We start by describing our representation for papers, and how Bridger represents authors by aggregating paper-level information and decomposing authors into personas.444The source code for data processing, author representation and ranking, and the user-facing application displaying this data, can be found in the supplementary materials.

3.1. Paper representations

Paper Information

Each paper contains rich, potentially useful information. This includes raw text such as in a paper’s abstract, incoming and out-going citations, publication date, venues, and more. One key representation we derive from each paper is a vector representation , using a state-of-art scientific paper embedding model. This neural model captures overall coarse-grained topical information on papers, shown to be powerful in clustering and retrieving papers (Cohan et al., 2020).

Another key representation is based on fine-grained facets obtained from papers. Let be a set of terms appearing in paper . Each term is associated with a specific facet (category). We consider several categories of terms in this paper: coarse-grained paper topics inferred from the text (Wang et al., 2019a), and fine-grained spans of text referring to methods, tasks and resources — core aspects of computer science papers (Cattan et al., 2021) — automatically extracted from paper

with a scientific named entity recognition model 

(Wadden et al., 2019). Each term is located in a “cell” in the matrix illustrated in Figure 2, with facets corresponding to the columns and papers to rows. Each term is also embedded in a vector space using a neural language model (see §3.5), yielding a vector for each term.

3.2. Author representations

We represent an author, , as a set of personas in which each persona is encoded with facet-wide aggregations of term embeddings across a set of papers. Figure 2 illustrates this with outlines of “slices” in bold — subsets of rows and columns in the illustrated matrix, corresponding to personas (subsets of rows) and facets (columns).

Author personas

Each author can work in multiple areas. In our setting, this can be important for understanding the different interests of authors, enabling more control on author suggestions. We experiment with a clustering-based approach for constructing personas, , based on inferring for each set of author papers a segmentation into subsets reflecting a common theme — illustrated as subsets of rows in the matrix in Figure 2. We also experiment with a clustering based on the network of co-authorship collaborations in which takes part. See §3.5 for details on clustering. As discussed later (§4), we find that the former approach in which authors are represented with clusters of papers elicits considerably better feedback from scholars participating in our experiments.

Co-authorship information

Each paper is in practice authored by multiple people, i.e., it can belong to multiple authors . Each author assumes a position for a given paper, potentially reflecting the strength of affinity to the paper. As discussed below (§3.5), we make use of this affinity in determining what weight to assign terms for a given paper and given author.

Author-level facets

Finally, using the above information on authors and their papers, we construct multiple author-level facets that capture different aggregate aspects of . More formally, in this paper we focus our experiments on author facets , where is an aggregate embedding of ’s method facets, is an embedding capturing ’s tasks, and represents ’s resources. In addition, we also construct these facets separately for each one of the author’s personas — corresponding to “slice embeddings” over subsets of rows and columns in the matrix illustrated in Figure 2. In analyses of our experimental results (§5), we also study other types of information such as citations and venues; we omit them from the formal notations to simplify presentation.

3.3. Approaches for recommending authors

For a given author using Bridger, we are interested in automatically suggesting new authors working on areas that are relevant to but also likely to be interesting and spark new ideas. We are given a user , their set of personas , and for each persona its faceted representation . We are also given a large pool of authors across computer science, , from which we aim to retrieve author suggestions to show .

Baseline model

We employ Specter (Cohan et al., 2020), a strong neural model to which we compare, trained to capture overall topical similarity between papers based on text and citation signals (see Cohan et al. (2020) for details) and used for serving recommendations as part of a large public academic search system. For each of author ’s papers , we use this neural model to obtain an embedding . We then derive an aggregate author-level representation (e.g., by weighted averaging that takes author-term affinity into account, see §3.5). Similar authors are computed using a simple distance measure over the dense embedding space. As discussed in the introduction and §2, this approach focuses on retrieving authors with the most overall similar papers to . Intuitively, the baseline can be thought of as “summing over” both the rows and columns of the author matrix in Figure 2. By aggregating across all of ’s papers, information on finer-grained sub-interests may be lost. In addition, by being trained on citation signals, it may be further biased and prone to favor highly-cited papers or authors.

To address these issues, we explore a formulation of the author discovery problem in terms of matching authors along specific dimensions that allow more fine-grained control – such as by using only a subset of views in , or only a subset of ’s papers, or both — as in the row and column slices seen in Figure 2. This decomposition of authors also enables us to explore contrasts along specific author dimensions, e.g., finding authors who use similar tasks to but use very different methods or resources.

  • [topsep=1mm,noitemsep]

  • Single-facet matches For each author in the pool of authors , we obtain their respective aggregate representations . We then retrieve authors with similar embeddings to along one dimension (or matrix columns in Figure 2; e.g., for resources), ignoring the others. Unlike the baseline model, which aggregates all information appearing in ’s papers – tasks, methods, resources, general topics, and any other textual information – this approach is able to disentangle specific aspects of an author, potentially enabling discovery of more novel, remote connections that can expose users to more diverse ideas and cross-fertilization opportunities.

  • Contrasts Finding matches along one dimension does not guarantee retrieving authors who are distant along the others. As an example, finding authors working on tasks related to scientific knowledge discovery and information extraction from texts, could be authors who use a diverse range of resources, such as scientific papers, clinical notes, etc. While the immense diversity in scientific literature makes it likely that focusing on similarity along one dimension only will still surface diverse results in terms of the other (see results in §5), we seek to further ensure this.

    To do so, we apply a simple approach inspired by recent work on retrieving inspirations (Hope et al., 2021): We first retrieve the top authors that are most similar to along one dimension (e.g., ), for some relatively large (e.g., ). We then rank this narrower list inversely by another dimension (e.g., ), and show user authors from the top of this list. Intuitively, this approach helps balance relevance and novelty by finding authors who are similar enough along one dimension, and within that subset find authors who are relatively distant along another.

  • Persona-based matching Finally, to account for the different focus areas authors may have, instead of aggregating over all of an author’s papers, we perform the same single-view and contrast-based retrieval using the author’s personas — or, in other words, row-and-column slices of the matrix in Figure 2.

3.4. Depicting Recommended Authors

Our representation allows us to explore multiple design choices not only for which authors we show users, but also how we show them. In our experiments (§4, §5), we evaluate authors’ facets and personas in terms of their utility for helping researchers learn about new authors, and for controlling how authors are filtered.

Term ranking algorithms to explain what authors work on

Researchers, flooded with constant streams of papers, typically have a very limited attention span to consider whether some new author or piece of information is relevant to them. It is thus important that the information we display for each author (such as their main methods, tasks, resources, and also papers) is ranked, such that the most important or relevant terms appear first. We explore different approaches to rank the displayed terms, balancing between relevance (or centrality) of each term for a given author, and coverage over the various topics the author works on. We compare between several approaches, including a customized relevance metric we design, in a user study with researchers (§4). We discuss in more detail the ranking approaches we try in §3.5.

Retrieval explanations

In addition to term ranking approaches aimed at explaining to users of Bridger what a new suggested author works on, we also provide users with two rankings that are geared for explaining how the retrieved authors relate to them. First, we allow users to rank author terms by how similar they are to their own list of terms (for each facet, separately). Second, users can also rank each author’s papers by how similar they are to their own — showing the most similar papers first. These explanations can be regarded as a kind of “anchor” for increasing trust, which could be especially important when suggesting novel, unfamiliar content.

3.5. Implementation details

3.5.1. Data

We use data from the Microsoft Academic Graph (MAG) (Sinha et al., 2015). We use a snapshot of this dataset from March 1, 2021. We also link the papers in the dataset to those in an a large public academic search engine.555Redacted for anonymity. We limit the papers and associated entities to those designated as Computer Science papers. We focus on authors’ recent work, limiting the papers to those published between 2015 and 2021, resulting in 4,650,474 papers from 6,433,064 authors. Despite using disambiguated MAG author data, we observe the challenge of author ambiguity still persists (Subramanian et al., 2021). In our experiments, we thus exclude participants with very few papers (see §5), since disambiguation errors in their papers stand out prominently.

3.5.2. Term Extraction

We extract terms (spans of text) referring to tasks, methods, and resources mentioned in paper abstracts and titles, using the state-of-art DyGIE++ IE model (Wadden et al., 2019) trained on SciERC (Luan et al., 2018). We extracted 10,445,233 tasks, 20,705,854 methods, and 4,978,748 resources from 3,594,975 papers. We also use MAG topics, higher-level coarse-grained topics available for each paper in MAG. We expand abbreviations in the extracted terms using the algorithm in (Schwartz and Hearst, 2002) implemented in ScispaCy (Neumann et al., 2019).

3.5.3. Scoring papers by relevance to an author

The papers published by an author have varying levels of importance with regard to that author’s overall body of publications. To capture this, we use a simple heuristic that takes into account two factors: the author’s position in a paper as a measure of affinity (see §

3.2), and the paper’s overall impact in terms of citations. More formally, for each author , we assign a weight to each paper in , , where is 1.0 if is first or last author on and 0.75 otherwise, and is MAG’s assigned paper Rank (a citation-based measure of importance, see (Wang et al., 2019a) for details), normalized by min-max scaling to a value between .5 and 1.

3.5.4. Author similarity

We explore several approaches for author similarity and retrieval, all based on paper-level aggregation as discussed in §3.3. For the document-level Specter baseline model discussed in §3.3

, we obtain 768-dimensional embeddings for all of the papers. To determine similarity between authors, we take the average embedding of each author’s papers, weighted by the paper relevance score described above. We then compute the cosine similarity between this author and the average embedding of every other author. For our faceted approach, we compute similarities along each authors’ facets, using embeddings we create for each term in each facet. The model used to create embeddings was CS-RoBERTa 

(Gururangan et al., 2020), which we fine-tuned for the task of semantic similarity using the Sentence-BERT framework (Reimers and Gurevych, 2019). For each author or persona, we calculate an aggregate representation along each facet by taking the average embedding of the terms in all of the papers, weighted by the relevance score of each associated paper.

3.5.5. Identification of personas

We infer author personas using two different approaches. For the first approach we cluster the co-authorship network using the ego-splitting framework in (Epasto et al., 2017). In a second approach, we cluster each authors’ papers by their Specter embeddings using agglomerative clustering with Ward linkage (Murtagh and Legendre, 2014) on the Euclidean distances between embedding vectors.666Implemented in the scikit-learn Python library (Pedregosa et al., 2011). Distance threshold of 85. In our user studies, we show participants their personas and the details of each one (papers, facets, etc.).777Some authors do not have detected personas; we observe this to often be the case with early-career researchers. To make this manageable, we sort the clusters (personas) based on each cluster’s most highly ranked paper according to MAG’s assigned rank, and show participants only their top two personas.

3.5.6. Term ranking for Author Depiction

We evaluate several different strategies to rank terms (methods, tasks, resources) shown to users in Experiment I (§4):

  • TextRank: For each term in an author’s set of papers, we create a graph with vertices the terms and weighted edges , where weight is the euclidean distance between the embedding vectors and . We score each term according to its PageRank value in  (Mihalcea and Tarau, 2004).

  • TF-IDF For each , we compute TF-IDF across all authors, considering each author as a “document” (bag of terms) in the IDF (inverse document frequency) term, counting each term once per paper. We calculate the TF-IDF score for each term for each author, and use this as the term’s score.

  • Author relevance score For each , we calculate the sum of the term’s relevance scores (§ 3.5.3) derived from their associated papers. If a term is used in multiple papers, the associated paper’s score is used for each summand.

  • Random Each term is assigned a random rank.

4. Experiment I: Author Depiction

In systems that help people find authors, such as Microsoft Academic Graph, Google Scholar, and AMiner (Wan et al., 2019), authors are often described in terms of a few high-level topics. In advance of exploring how we might leverage facets to engage researchers with a diverse set of authors, we performed a user study to gain a better understanding of what information might prove useful when depicting authors. We started from a base of Microsoft Academic Graph (MAG) topics, and then added their extracted facets (tasks, methods, resources). We investigated the following research questions:

  • RQ1: Do tasks, methods, and/or resources complement MAG topics in depicting an author’s research?

  • RQ2: Which term ranking best reflects an author’s interests?

  • RQ3: Do tasks, methods, and/or resources complement MAG topics in helping users gain a better picture of the research interests of unknown authors?

  • RQ4: Do personas well-reflect authors’ different focus areas?

4.1. Experiment Design

Thirteen computer-science researchers were recruited for the experiment through Slack channels and mailing lists. Participants were compensated $20 over PayPal for their time. Study sessions were one-hour, semi-structured interviews recorded over Zoom. The participants engaged in think-aloud throughout the study. They evaluated a depiction of a known author (e.g., research mentor) for accuracy in depicting their research, as well as depictions of five unknown authors for usefulness in learning about new authors.

Throughout all parts of the experiment, the interviewer asked follow-up questions regarding the participant’s think-aloud and reactions.888The script for Experiment I can be found in our supplementary materials. To address RQ1 and RQ2, the participants first evaluated the accuracy of a known author’s depiction.

Step I. To begin, we presented the participant with only the top 10 MAG topics for the known author. We asked them to mark any topic that was unclear, too generic, or did not reflect the author’s research well. Next, we provided five more potential lists of terms. One of these lists consisted of the next 10 top topics. The other four presented 10 tasks, each selected as the top-10 ranked terms using the strategies described in §3.5. We asked participants to rank the five lists (as a whole) in terms of how well they complemented the first list (with an option to select none).

Step II. The process then repeated for five more potential lists to complement the original topics and the highest-ranked second list selected in Step I — this time, with methods instead of tasks. If the participant ranked a methods list highest, we then presented the participant with a resources list that used the same ranking strategy preferred by the participant for methods, and asked whether or not this list complemented those shown so far.

Step III. To address RQ3, participants next evaluated the utility of author depictions for five unknown authors. To describe each unknown author, we provided topics, tasks, methods, and resources lists with 10 terms each. The non-topics lists were ranked using TF-IDF as a default. The participant noted whether or not each additional non-topics list complemented the preceding lists in helping them understand what kind of research the unknown author does.

Step IV. Finally, for RQ4, we asked participants to evaluate the known author’s distinct personas presented in terms of tasks, which were ranked using TF-IDF. On a Likert-type scale of 1-5, participants rated their agreement with the statement, “The personas reflect the author’s different research interests (since the year 2015) well.”

4.2. Results

4.2.1. Results for RQ1

The majority of participants found that tasks, methods, and resources complemented topics to describe a known author’s research. For both tasks and methods, 11 of 13 participants felt that seeing information about that facet, more so than additional top MAG topics or no additional information, complemented the original top ten MAG topics. The prevailing grievance with the additional MAG topics was that they were too general. Furthermore, 7 of 9 participants who evaluated a resources list thought that it complemented the preceding lists.

4.2.2. Results for RQ2

Participants overall preferred the relevance score ranking strategy for tasks and methods. We compared the four ranking strategies and MAG topics baseline strategy for both tasks and methods. For each participant, we awarded points to each strategy based on its position in the participant’s ranking of the five strategies. We awarded the least favorite strategy one point and the most favorite strategy five points. Since there were 13 participants, a strategy could accumulate up to 65 points. Separately, we counted how many times each strategy was a participant’s favorite strategy (Figure 3c, d). With regards to tasks, TextRank and TF-IDF accrued the most points from participants, with the relevance score trailing close behind (Figure 3a). Meanwhile, the MAG topics baseline accrued the least points, even fewer than the random task ranking strategy. In addition, relevance score and TextRank were chosen most often as the favorite task ranking strategy (Figure 3c). With regards to methods, the relevance score ranking strategy performed best in terms of both total points (Figure 3b) and favorite strategy (Figure 3d).

Figure 3. Points awarded to each ranking strategy for tasks (a) and methods (b), and percentage of participants who favored each strategy most for tasks (c) and methods (d).

(a): The total points awarded to each ranking strategy for tasks show that TF-IDF and TextRank perform best with 45 points, and the relevance score trails close behind with 42 points. Random is next with 37 points, and the Topics baseline has just 26 points. (b): The total points awarded to each ranking strategy for methods show that the relevance score performs best with 46 points. TF-IDF follows with 40 points, and Random with 39 points. TextRank and Topics have 28 and 27 points respectively. (c): Participants selected the relevance score and TextRank most often as their favorite ranking strategy for tasks. Four participants favored each most. Next was TF-IDF with 3 votes, and then Topics with 2 votes. No one favored Random most. (d): Participants selected the relevance score most often as their favorite ranking strategy for methods. It received 5 votes. Next was Random with 4 votes, followed by TF-IDF with 2 votes. Topics received just one vote, and TextRank none.

4.2.3. Results for RQ3

Participants generally found tasks, methods, and resources helpful to better understand what kind of research an unknown author does. To calculate how many participants were in favor of including tasks, methods, and resources to help them better understand an author, we determined the average of each participant’s binary response per facet. Adding up the 13 responses for each facet, we saw that the majority of participants thought each additional facet helped them understand the unknown author better. All 13 participants found the tasks helpful, eight found the methods helpful, and 12 found the resources helpful. As an example, P12 connected an unknown author’s topics, tasks, and methods to better understand them: “I wouldn’t have known they were an information retrieval person from the [topics] at all…. The previous things [in topics and tasks] that mentioned translation and information retrieval and kind of separately… This [methods section] connects the dots for me, which is nice.” Interestingly, methods were not viewed to be as useful as tasks or resources. The majority of participants cited unfamiliar terms as a key issue.

4.2.4. Results for RQ4

Participants indicate preference for personas selected based on papers rather than co-authorship. After the experiment, six participants were informally asked to compare the experiment’s personas selected based on co-authorship with the personas based on paper-based clustering (see §3.5). Four of them preferred the updated version. Furthermore, one of the users who preferred the old version still thought the updated version had better personas themselves and merely did not like the updated personas’ ordering. In addition, all six participants liked seeing the personas in terms of papers. In our experiment in §5, we observed much higher satisfaction with the updated personas in comparison to the original personas of this experiment.

5. Experiment II: Author Discovery

We now turn to our main experiment, exploring whether facets can be employed in Bridger to spur users to discover valuable and novel authors and their work. We use our two author-ranking strategies (§3.3), one based on similar tasks alone (sT) and the other on similar tasks with contrasting (distant) methods (sTdM). We compare these strategies to the Specter (ss) baseline. More specifically, we investigated the following research questions:

  • RQ5: Do sT and sTdM, in comparison to Specter, surface suggestions of authors that are considered novel and valuable, coming from research communities more distant to the user?

  • RQ6: Does sorting based on personas help users find more novel and valuable author suggestions?

5.1. Experiment Design

Figure 4. Illustration of information shown to users in Experiment II, §5. When the user clicks on an author card, an expanded view is displayed with 5 sections: papers, topics, and our extracted facets — tasks, methods, and resources.

An author card is shown broken down into its five sections for the five facets: papers, topics, tasks, methods, and resources. An example of the papers section and tasks section are displayed with a few papers and tasks checked off. In the papers section, there is a sort drop-down menu at the top that currently says ”sort by similarity to focal/author persona.” There are five papers listed with their titles, author position, and year of publication. In the tasks section, there are five tasks listed: paraphrase generation, clinical diagnostic inferencing, clinical diagnosis, patient care, and reinforcement learning problems. At the bottom of the section, there are arrows to go back and forth between pages one and two of the task phrases.

Twenty computer-science researchers participated in the experiment after recruitment through Slack channels and mailing lists. Participants were compensated $50 over PayPal for their time.

All participants were shown results based on their overall papers (without personas) consisting of 12 author cards they evaluated one by one. Four cards were included for each of sT, sTdM, and ss. We only show cards for authors who are at least 2 hops away in the co-authorship graph from the user, filtering authors with whom they had previously worked.

For participants who had at least two associated personas, we also presented them with authors suggested based on each separate persona: four author cards for each of their top two personas (two under sT and two under sTdM). Whether the participants saw the personas before or after the non-persona part was randomized.

Each author card provides a detailed depiction of that author (see Figure 2). The author’s name and affiliation is hidden in this experiment to mitigate bias. As shown in Figure 4, cards showcase five sections of the author’s research: their papers, MAG topics, and our extracted facet terms. We also let users view the tasks and methods ranked by similarity to them, which could be helpful to explain why an author was selected and better understand commonalities.

The cards showed up to five items for each section, with some sections having a second page, depending upon data availability. Papers could be sorted based on recency or similarity to a participant / persona. To avoid biasing participants, the only information provided for each paper was its title, date, and the suggested author’s position on each paper (e.g., first, last).

Each of these items (papers and terms) had a checkbox, which the user was instructed to check if it fulfilled two criteria: 1) potentially interesting and valuable for them to learn about or consider in terms of utility, and 2) not too similar to things they had worked on or used previously. Following a short tutorial,999The tutorial slides are available in our supplementary materials. participants evaluated each author shown by checking the aforementioned checkboxes (see Figure 4, right). While evaluating the first and last author (randomized), the participant engaged in a protocol analysis methodology (sharing their thinking as they worked). Participants with personas were also asked, based on each persona’s top five associated papers, whether they each reflected a coherent focus area, and whether they seemed useful for filtering author suggestions.101010See supplementary materials for the source code used for generating the data for Experiment II, as well as the code for the interactive application used in the evaluation, and the script used to direct the participants.

5.2. Quantitative Results

Figure 5. More users prefer Bridger for suggesting novel, interesting authors. Percent of the participants who preferred author suggestions surfaced by faceted conditions (sT and sTdM, blue bars) compared to a baseline non-faceted paper embedding (ss, orange bars). On average, users prefer the former suggestions, leading to more discovery of novel and valuable authors and their work (a). When broken down further, we find users substantially preferred the facet items shown for authors in our condition (b), and preferred the paper embedding baseline when evaluating papers (c). See §5 for discussion.

(a) For checked papers and facets: for the sT condition, 60 percent of participants preferred bridger author suggestions compared to 40 percent who preferred the specter author suggestions. For the sTdM condition, 78 percent preferred bridger. (b) When considering checkboxes for only facets: for the sT condition, 80 percent of participants preferred bridger over specter. For the sTdM condition, 96 percent preferred bridger. (c) When considering checkboxes for only papers: for the sT condition, 41 percent of participants preferred bridger over specter. For the sTdM condition, 38 percent preferred bridger.

For each author card evaluated by a user, we calculate the ratio of checked boxes to total boxes in that card. Then, for each user, we calculate the average of these ratios per condition (sT, sTdM, ss), and calculate a user-level preference specifying which of the three conditions received the highest average ratio. Using this score, we find the proportion of users who preferred each of the sT and sTdM conditions in comparison to ss. This metric indicates the user’s preference between Bridger- and Specter-recommended authors in terms of novelty and value (RQ5).

Figure 5(a), shows results by this metric. The facet-based approaches lead to a boost over the non-faceted ss approach, with users overall preferring suggestions coming from the facet-based conditions. This is despite comparing against an advanced baseline geared at relevance, to which users are naturally primed.

We break down the results further by slightly modifying the metric to account for the different types of item types users could check off. In particular, we distinguish between the task/method/resource/topic checkboxes, and the paper checkboxes. For each of these two groups, we compute in the same way, ignoring all checkboxes that are not of that type (e.g., counting only papers). This breakdown reveals a more nuanced picture. For the task, method, resource and topic facets, the gap in favor of sT grows considerably (Figure 5b). In terms of papers only, ss, which was trained on aggregate paper-level information, achieves a marginally better outcome compared to sT, with a slightly larger gap in comparison to sTdM (Figure 5c). Aside from being trained on paper-level information, Specter also benefits from the fact that biases towards filter bubbles can be particularly strong with regard to papers. Unlike with facets, users must tease apart aspects of papers that are new and interesting to them versus aspects that are relevant but familiar. See §6.1.3 for more discussion and concrete examples.

Importantly, despite obtaining better results overall with the faceted approach, we stress that our goal in this paper is not to “outperform” Specter, but mostly to use it as a reference point — a non-faceted approach used in a real-world academic search and recommendation setting.

Personas

We also compare the results from sT and sTdM conditions based on personas for user , versus the user’s non-persona-based results presented above (RQ6). We compare the set of authors found using personas with authors retrieved without splitting into personas (equivalent to the union of all personas). Table 1 shows the number of users for which the average proportion of checked items was higher for the persona-matched authors than for the overall-matched authors (for at least one of the personas). For most participants, users signalled preference for persona-matched authors when looking at one or both of their personas. Interestingly, for papers we see a substantial boost in preference for both conditions, indicating that by focusing on more refined slices of the user’s papers, we are able to gain better quality along this dimension too.

Item type sT sTdM
All 58% 75%
Paper 83% 67%
Topic 58% 75%
Task 42% 50%
Method 67% 58%
Resource 50% 67%
Table 1. Percentage of users with personas (N12), for which the average proportion of checked items was higher for the persona-matched authors than for the overall-matched authors. Users saw suggested authors based on two of their personas. The suggestions came from either the sT or sTdM conditions. Reported here are counts of users who showed preference for one or both personas.

5.3. Evidence of Bursting Bubbles

The matched authors displayed to users were identified based either on sT and sTdM or the baseline Specter-based approach (ss). These two groups differed from each other substantially according to several empirical measures of similarity. We explore the following measures, based on author dimensions in our data that we do not use as part of the experiment: (1) Citation distance: A measure of distance in terms of citations that the user has in common with the matched author (Jaccard distance: 1 minus intersection-over-union). This is calculated both for incoming and outgoing citations. (2) Venue distance: The Jaccard distance between user and matched author for publication venues. (3) Coauthor shortest path: The shortest path length between the user and the matched author in the coauthorship graph. Findings of this analysis, shown in Figure 6, suggest that Bridger surfaces novel authors from more diverse, distant fields and research communities than Specter (RQ5).

Figure 6. Bridger suggests authors that are more likely to bridge gaps between communities. In comparison to the baseline, facet-based (Bridger) author suggestions link users to broader areas. Clockwise: (a, b) Jaccard distance between suggested authors’ papers and the user’s papers for incoming citations (a) and outgoing citations (b); greater distance means that suggested authors are less likely to be cited by or cite the same work. (c) Jaccard distance for publication venues. (d) Shortest path length in the coauthorship graph between author and user (higher is more distant). Bridger conditions (sT and, especially, sTdM) show higher measures of distance.

(a) Incoming citations Jaccard distances between suggested authors’ papers and the user’s papers: for sT: 0.95, for sTdM: 0.99, for ss

: 0.92. Error bars representing 90 percent confidence intervals do not overlap. (b) Outgoing citations Jaccard distances between suggested authors’ papers and the user’s papers: for

sT: 0.97, for sTdM: 1.0, for ss: 0.94. Error bars representing 90 percent confidence intervals do not overlap. (c) Publication venues Jaccard distances between suggested authors’ papers and the user’s papers: for sT: 0.85, for sTdM: 0.95, for ss: 0.81. Error bars representing 90 percent confidence intervals do not overlap. (d) Average shortest path length in the coauthorship network between the authors and the user: for sT: 3.1, for sTdM: 4.7, for ss: 2.9. Error bars representing 90 percent confidence intervals do not overlap.

In the following section, we conclude by diving deeper into user interviews we conducted, revealing more evidence and insights into user preferences and surfacing potential issues and challenges for building author discovery systems.

6. User Interviews: Analysis & Discussion of Author Discovery

6.1.1. Bridges Across Scientific Filter Bubbles

Bridger authors encourage more diverse connections. Under the Bridger conditions, participants noted diverse potentially useful research directions that connected their work to other authors not only within their own subareas, but also other areas. This was especially true under the sTdM condition. For instance, P9, who works on gradient descent for convex problems, saw a sTdM

author’s paper discussing gradient descent but for deep linear neural networks, which imply non-convex problems. They remarked, “

This is a new setup. It’s very different, and it’s super important …definitely something I would like to read …” Considering a paper under a sTdM author, P6 observed an interesting contrast with their work: “I think my work has been bottom-up, so top-down would be an interesting approach to look at.” As another example, P2 drew a connection between the mathematical area of graph theory and their area of human-AI decision-making under the sTdM condition: “This could be interesting mostly because …they’re using graph theory for decision making …something I have not considered in the past.” P19 remarked of an sTdM author’s paper, ”This one actually seems quite interesting because it seems like explicitly about trying to bridge the gap between computational neuroscience models, understanding the neocortex, and computing. So that seems like it’s… going to actually chart the path for me between my work and the stuff I think about like artificial neural networks and machines.”

In reacting to sTdM authors, many participants were able to go further than simply state their interest in a connection and also describe how they would utilize the connection. Looking at a sTdM

author, P6 explained how the author’s neuroscience work could motivate work in their area of natural language processing: “

I might learn from that [paper] how people compose words, and that might be inspiring for work on learning compositional representation …” P18 checked off a paper titled “Multidisciplinary Collaboration to Facilitate Hypothesis Generation in Huntington’s Disease” under a sTdM author “because new ways to think about generating hypotheses could be interesting.” Seeing the topic ‘spike-timing-dependent plasticity’ under a sTdM author, P19 mused, “I would like to understand how spike-timing-dependent plasticity works and whether that could lead to a better learning rule for other types of neural nets, like the ones I work with on language, so that seems fun.” P12 described a sTdM author’s paper about knowledge-driven search applications as useful to them because “One of my primary research areas is knowledge base completion. However, that’s not an end application. An end application would be a search application which kind of uses my method to complete the knowledge base, and gives the user the end result. …” Though the sTdM condition presented more of a risk in terms of surfacing authors with which the user could draw connections, it also surfaced the more far-reaching connections.

The sT condition also helped participants ponder new connections, though perhaps not as distant. P8 said of a sT author’s work, “I’ve worked a bit on summarization, so I want to know whether the approaches that I’ve worked on are applicable to real-time event summarization, which is a task I don’t know about.” Also reflecting on a sT author, P1 explained, “I’ve done a lot of work with micro tasks and these seem more maybe larger scale, like physical tasks or like planning travel. …There are so many problems …that I could apply my techniques to.” Other times, participants would connect one facet of their work to a different facet of the suggested author’s work. In discussing a question-answering paper from a sT author, P8 explained, “I don’t have experience with [the method] adversarial neural networks [used in this paper], but question answering is a task that I’ve worked on, so I would want to check this.” Conversely, if participants found new connections with Specter, they tended to be more immediate connections to authors in their area. As an example, when checking off the paper “Efficient Symmetric Norm Regression via Linear Sketching” from a Specter-suggested author, P9 observed, “I have used sketching techniques and I have [also] used norm regression, but [on] this specific problem I have not.” P9 also identified some of the papers from the suggested author as co-authored by their advisor.

6.1.2. Facets Help Elicit New Research Directions But Require More Context

Describing an author’s work with short, digestible items in the form of tasks, methods, and resources helped participants find interesting new research directions. For instance, P14 expressed that a sTdM author’s paper associated with medical image diagnosis would not be useful for them to consider because “breaking into that space for me would require a lot of work.” However, when they later saw ‘medical image diagnosis’ as a task, they commented, “As a task, I could see some usefulness there. There could be other approaches that might more quickly catch my interest.” Committing to interest in the task required much less effort. Moreover, participants were able to peruse more of an author’s interesting tasks and methods that they did not necessarily find in their top papers. Reacting to one sT author, P3 did not see any papers related to ‘biomedical question answering,’ but they did see ‘biomedical question answering system’ as a method. They then noted, “I’m going to click ‘biomedical question answering’ because that’s not what I have worked on before, but I’m interested in learning about it.

Tasks, methods and resource facets support discovery better than topics. While participants occasionally thought certain tasks, methods, or resources were too generic, participants were much more likely to complain that topics were too high-level to spark ideas for new, profitable research directions. P3 summarized, “I think many of them are quite generic, so I can say I already worked on it,” and P7 noted, “

Artificial intelligence’ is too broad. I think everything comes under that.

Terms with unknown meaning often garner interest, but all facets and papers require more context. Participants commonly identified tasks, methods, and resources as interesting, even when they did not fully understand their meaning. When P4 saw the method ‘least-general generalization of editing examples’ from a sT author, they stated, “Don’t know what this means exactly, but it sounds interesting.” P13 marked their interest in the task “folksonomy-based recommender systems” under a sTdM author after having commented, “I’m curious [about folksonomy] simply because I’m ignorant.

sTdM also surfaced distant resources that sparked interest. In seeing the resource ‘synaptic resources’ under a sTdM author, for example, P19 simply said, “I’d like to know what that is.” Nonetheless, many participants also struggled with indiscernible terms. For example, P20 said of the resource ‘NAIST text corpus’ under a sT author, “I’m not sure what this is, and I can’t guess from the name. And it wasn’t mentioned in the title of the papers.” P2 explained that a paper did not “seem that interesting, but mostly because I don’t understand all of these words.” Thus, providing term definitions may be helpful. For additional context, multiple participants expressed interest in having abstracts available, and P15 suggested including automated summaries (Cachola et al., 2020).

6.1.3. Biases Toward Scientific Filter Bubbles

Time constraints in the fast-moving world of research inhibit exploration beyond the filter bubble. Despite clear interest in an author’s distant research, a couple of participants were hesitant to make connections. In reacting to a sT author, P11 recognized, “There’s just a bunch of really interesting kind of theory application papers in this list that I’m not familiar with. …I would maybe scan a little bit of these, but it’s so far off that it’s harder to make room to read someone that far away, but still cool.

Unknown background knowledge can make it intimidating to consider new areas. Engaging with distant authors’ work requires a large cognitive load that can make uncovering connections difficult. P18 provided the following example: “Maybe there’s some theoretical computer science algorithm that if I knew to apply it to my problem would speed things up or something like that, but I wouldn’t know enough to recognize it as interesting.” Echoing findings in §6.1.2, this comment suggests that unfamiliar terms can especially hinder making interesting connections, and that highlighting the most useful aspects of a distant author’s research may facilitate building far-reaching connections.

Preconceived notions of an area hinder consideration of connections to that area. Because Bridger’s authors are selected to be more different from the user than Specter’s authors, they often met with hard-line resistance, without full consideration of potential links. Looking at a sTdM-suggested author, the natural language processing (NLP) researcher P20 said, “This is not really an NLP paper, so I would pass.” Similarly, P17 rejected sTdM suggestions, saying “

I don’t know anything about neuroscience, and I’m not going to start now probably.

Difficulty teasing apart novel aspects from paper titles helps Specter. Although participants were asked to only check off interesting papers that suggested something new for them to explore, biases towards filter bubbles can be particularly strong with regard to papers because users must tease apart papers’ new and interesting aspects from their relevant but familiar aspects. Even if a paper is directly connected to a user’s research, they may be tempted to check off a paper because they have not seen that exact paper or because it has minute differences from their work. In contrast, when judging a particular facet item, participants need only contemplate the novelty of the term itself, without distraction or fixation on other terms (Hope et al., 2017; Kittur et al., 2019; Hope et al., 2021). As an example, P17 swiftly separated a task’s general relevance from its lack of novelty to know not to check it. They explained, “‘Scientific article summarization’- It is relevant, [but] I’m already familiar with it.” This bias helps explain the overall preference for Specter when considering only papers (Figure 5(c)).

6.1.4. Personas

All participants with personas stated at least one would be helpful. Upon first view of their personas, of the 12 participants who had them, seven described their two personas as distinct, coherent identities that would be useful for filtering author suggestions. As an example, P2 characterized their personas as related to “human-AI collaboration or decision-making” and “error analysis and machine learning debugging” respectively. The other 5 participants described one persona as coherent and seemingly useful for filtering authors. Concerns about their other personas were related to coherence, granularity, overlap with the other persona, and preference for the non-persona results after already looking through them and their first persona. Though the persona author suggestions performed relatively well in generating novel connections (Table 1), a few participants commented that they did not see the connection between suggested authors and their persona. For example, under a persona associated with lexical semantics, P6 commented on a sTdM paper, “‘Causality’ is not a topic I would work on in lexical semantics.” Diverse author suggestions may be more confusing under personas because users look for connections specific to that persona; indicating to users when these author suggestions are for exploratory purposes may be helpful.

7. Conclusion

We presented Bridger, a framework for facilitating discovery of novel and valuable scholars and their work. Bridger consists of a faceted author representation, allowing users to see authors who match them along certain dimensions (e.g., tasks) but not others. Bridger also provides “slices” of a user’s papers, enabling them to find authors who match the user only on a subset of their papers, and only on certain facets within those papers. Our experiments with computer science researchers show that the facet-based approach was able to help users discover authors with work that is considered more interesting and novel, substantially more than a relevance-focused baseline representing state-of-art retrieval of scientific papers. Importantly, we show that authors surfaced by Bridger are indeed from more distant communities in terms of publication venues, citation links and co-authorship social ties. These results suggest a new and potentially promising avenue for mitigating the problem of isolated silos in science.

Interviews with our user-study participants show that there are many ways to improve our system. For example, we would like to improve our algorithm for persona clustering. The ability to assign informative names to personas would greatly improve usability. We also hope to address the cognitive load associated with considering new areas by providing just-in-time definitions of terms using extractive summarization (Narayan et al., 2018) or generative approaches (Liu et al., 2019). A broader challenge is in generating explanations not only for why a suggested author is found similar to the user, but also how their work may be useful. We also want to study whether these techniques can generalize outside of computer science, potentially connecting people with ideas from even more disparate fields as we make steps toward bridging gaps across all of science.

References

  • J. Beel, B. Gipp, S. Langer, and C. Breitinger (2016) Paper recommender systems: a literature survey. International Journal on Digital Libraries 17 (4), pp. 305–338. Cited by: §2.
  • J. Beel and B. Gipp (2009) Google scholar’s ranking algorithm: an introductory overview. In Proceedings of the 12th international conference on scientometrics and informetrics (ISSI’09), Vol. 1, pp. 230–241. Cited by: §1.
  • I. Cachola, K. Lo, A. Cohan, and D. S. Weld (2020) TLDR: extreme summarization of scientific documents. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 4766–4777. Cited by: §6.1.2.
  • A. Cattan, S. Johnson, D. Weld, I. Dagan, I. Beltagy, D. Downey, and T. Hope (2021) SciCo: hierarchical cross-document coreference for scientific concepts. arXiv preprint arXiv:2104.08809. Cited by: §3.1.
  • J. Chan, J. C. Chang, T. Hope, D. Shahaf, and A. Kittur (2018) Solvent: a mixed initiative system for finding analogies between research papers. Proceedings of the ACM on Human-Computer Interaction 2 (CSCW), pp. 1–21. Cited by: §2.
  • L. Chen, Y. Yang, N. Wang, K. Yang, and Q. Yuan (2019) How serendipity improves user satisfaction with recommendations? a large-scale user evaluation. In The World Wide Web Conference, pp. 240–250. Cited by: §2.
  • W. Chen, P. Ren, F. Cai, F. Sun, and M. de Rijke (2020) Improving end-to-end sequential recommendations with intent-aware diversification. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 175–184. Cited by: §1, §2.
  • A. Cohan, S. Feldman, I. Beltagy, D. Downey, and D. Weld (2020) SPECTER: Document-level Representation Learning using Citation-informed Transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 2270–2282. External Links: Link, Document Cited by: §1, §2, §3.1, §3.3.
  • A. Epasto, S. Lattanzi, and R. Paes Leme (2017) Ego-Splitting Framework: from Non-Overlapping to Overlapping Clusters. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’17, Halifax, NS, Canada, pp. 145–154 (en). External Links: ISBN 978-1-4503-4887-4, Link, Document Cited by: §3.5.5.
  • D. Frey (1986) Recent research on selective exposure to information. Advances in experimental social psychology 19, pp. 41–80. Cited by: §1.
  • K. Fu, J. Chan, J. Cagan, K. Kotovsky, C. Schunn, and K. Wood (2013) The Meaning of Near and Far: The Impact of Structuring Design Databases and the Effect of Distance of Analogy on Design Output. JMD. Cited by: §2.
  • Y. Ge, S. Zhao, H. Zhou, C. Pei, F. Sun, W. Ou, and Y. Zhang (2020) Understanding echo chambers in e-commerce recommender systems. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2261–2270. Cited by: §1, §2.
  • K. Goucher-Lambert, J. T. Gyory, K. Kotovsky, and J. Cagan (2020) Adaptive inspirational design stimuli: using design output to computationally search for stimuli that impact concept generation. Journal of Mechanical Design 142 (9). Cited by: §2.
  • S. Gururangan, A. Marasović, S. Swayamdipta, K. Lo, I. Beltagy, D. Downey, and N. A. Smith (2020)

    Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks

    .
    In ACL, External Links: Document Cited by: §3.5.4.
  • T. Hope, J. Chan, A. Kittur, and D. Shahaf (2017) Accelerating innovation through analogy mining. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, pp. 235–243. External Links: ISBN 978-1-4503-4887-4, Link, Document Cited by: §1, §1, §2, §6.1.3.
  • T. Hope, R. Tamari, H. Kang, D. Hershcovich, J. Chan, A. Kittur, and D. Shahaf (2021) Scaling creative inspiration with fine-grained functional facets of product ideas. arXiv e-prints, pp. arXiv–2102. Cited by: §1, §2, 2nd item, §6.1.3.
  • M. Kaminskas and D. Bridge (2016) Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems. ACM Transactions on Interactive Intelligent Systems (TiiS) 7 (1), pp. 1–42. Cited by: §2.
  • L. Kim, J. D. West, and K. Stovel (2017) Echo chambers in science?. In American Sociological Association, Cited by: §1.
  • A. Kittur, L. Yu, T. Hope, J. Chan, H. Lifshitz-Assaf, K. Gilon, F. Ng, R. E. Kraut, and D. Shahaf (2019) Scaling up analogical innovation with crowds and AI. 116 (6), pp. 1870–1877. Note: Publisher: National Academy of Sciences Section: Social Sciences External Links: ISSN 0027-8424, 1091-6490, Link, Document Cited by: §1, §1, §2, §6.1.3.
  • J. Klinger, J. Mateos-Garcia, and K. Stathoulopoulos (2020) A narrowing of ai research?. arXiv preprint arXiv:2009.10385. Cited by: §1.
  • Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov (2019) RoBERTa: a robustly optimized bert pretraining approach. ArXiv abs/1907.11692. Cited by: §7.
  • Y. Luan, L. He, M. Ostendorf, and H. Hajishirzi (2018)

    Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

    .
    In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium (en). External Links: Link, Document Cited by: §3.5.2.
  • M. McPherson, L. Smith-Lovin, and J. M. Cook (2001) Birds of a feather: homophily in social networks. Annual review of sociology 27 (1), pp. 415–444. Cited by: §1.
  • R. Mihalcea and P. Tarau (2004) TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, pp. 404–411. External Links: Link Cited by: 1st item.
  • F. Murtagh and P. Legendre (2014) Ward’s hierarchical agglomerative clustering method: which algorithms implement ward’s criterion?. Journal of classification 31 (3), pp. 274–295. Cited by: §3.5.5.
  • S. Narayan, S. B. Cohen, and M. Lapata (2018) Ranking sentences for extractive summarization with reinforcement learning. In NAACL-HLT, Cited by: §7.
  • M. Neumann, D. King, I. Beltagy, and W. Ammar (2019) ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. undefined (en). External Links: Link Cited by: §3.5.2.
  • T. T. Nguyen, P. Hui, F. M. Harper, L. Terveen, and J. A. Konstan (2014) Exploring the filter bubble: the effect of using recommender systems on content diversity. In Proceedings of the 23rd international conference on World wide web, pp. 677–686. Cited by: §2.
  • M. W. Nielsen and J. P. Andersen (2021) Global citation inequality is on the rise. Proceedings of the National Academy of Sciences 118 (7). Cited by: §1.
  • E. Pariser (2011) The filter bubble: what the internet is hiding from you. Penguin UK. Cited by: §1.
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. (2011) Scikit-learn: machine learning in python. the Journal of machine Learning research 12, pp. 2825–2830. Cited by: footnote 6.
  • J. Portenoy and J. D. West (2020) Constructing and evaluating automated literature review systems. Scientometrics 125, pp. 3233–3251. Cited by: §2.
  • A. recommendation system based on hierarchical clustering of an article-level citation network, J. D. West, I. Wesley-Smith, and C. T. Bergstrom (2016) IEEE Transactions on Big Data 2 (2), pp. 113–123. External Links: Document Cited by: §2.
  • N. Reimers and I. Gurevych (2019) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In EMNLP/IJCNLP, External Links: Document Cited by: §3.5.4.
  • A. S. Schwartz and M. A. Hearst (2002) A simple algorithm for identifying abbreviation definitions in biomedical text. In Biocomputing 2003, pp. 451–462. External Links: ISBN 978-981-238-217-7, Link, Document Cited by: §3.5.2.
  • A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B. (. Hsu, and K. Wang (2015) An overview of microsoft academic service (MAS) and applications. pp. 243–246. External Links: ISBN 978-1-4503-3473-0, Link, Document Cited by: §3.5.1.
  • S. Subramanian, D. King, D. Downey, and S. Feldman (2021) S2AND: a benchmark and evaluation system for author name disambiguation. arXiv preprint arXiv:2103.07534. Cited by: §3.5.1.
  • J. Tang, S. Wu, J. Sun, and H. Su (2012) Cross-domain collaboration recommendation. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1285–1293. Cited by: §2.
  • X. Tang, X. Li, Y. Ding, M. Song, and Y. Bu (2020) The pace of artificial intelligence innovations: speed, talent, and trial-and-error. Journal of Informetrics 14 (4), pp. 101094. Cited by: §1.
  • C. Tsai and P. Brusilovsky (2018) Beyond the ranked list: user-driven exploration and diversification of social recommendation. In 23rd international conference on intelligent user interfaces, pp. 239–250. Cited by: §2.
  • C. Tsai, J. Huhtamäki, T. Olsson, and P. Brusilovsky (2020) Diversity exposure in social recommender systems: a social capital theory perspective. work 5 (11), pp. 22. Cited by: §2.
  • D. Vilhena, J. Foster, M. Rosvall, J. West, J. Evans, and C. Bergstrom (2014) Finding cultural holes: how structure and culture diverge in networks of scholarly communication. 1, pp. 221–238. External Links: ISSN 23306696, Link, Document Cited by: §1.
  • D. Wadden, U. Wennberg, Y. Luan, and H. Hajishirzi (2019) Entity, Relation, and Event Extraction with Contextualized Span Representations. In EMNLP/IJCNLP, External Links: Document Cited by: §3.1, §3.5.2.
  • H. Wan, Y. Zhang, J. Zhang, and J. Tang (2019) Aminer: search and mining of academic social networks. Data Intelligence 1 (1), pp. 58–76. Cited by: §4.
  • K. Wang, Z. Shen, C. Huang, C. Wu, D. Eide, Y. Dong, J. Qian, A. Kanakia, A. Chen, and R. Rogahn (2019a) A Review of Microsoft Academic Services for Science of Science Studies. Frontiers in Big Data 2 (English). Note: Publisher: Frontiers External Links: ISSN 2624-909X, Link, Document Cited by: §3.1, §3.5.3.
  • N. Wang, L. Chen, and Y. Yang (2020) The impacts of item features and user characteristics on users’ perceived serendipity of recommendations. In Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization, pp. 266–274. Cited by: §2.
  • W. Wang, J. Liu, Z. Yang, X. Kong, and F. Xia (2019b) Sustainable collaborator recommendation based on conference closure. IEEE Transactions on Computational Social Systems 6 (2), pp. 311–322. Cited by: §2.
  • M. Wilhelm, A. Ramanathan, A. Bonomo, S. Jain, E. H. Chi, and J. Gillenwater (2018) Practical diversified recommendations on youtube with determinantal point processes. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 2165–2173. Cited by: §2.
  • P. Zhao and D. L. Lee (2016) How much novelty is relevant? it depends on your curiosity. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 315–324. Cited by: §2.
  • Z. Zhu, J. Wang, and J. Caverlee (2020) Measuring and mitigating item under-recommendation bias in personalized ranking systems. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 449–458. Cited by: §1, §2.