ULTRA: A Data-driven Approach for Recommending Team Formation in Response to Proposal Calls

by   Biplav Srivastava, et al.

We introduce an emerging AI-based approach and prototype system for assisting team formation when researchers respond to calls for proposals from funding agencies. This is an instance of the general problem of building teams when demand opportunities come periodically and potential members may vary over time. The novelties of our approach are that we: (a) extract technical skills needed about researchers and calls from multiple data sources and normalize them using Natural Language Processing (NLP) techniques, (b) build a prototype solution based on matching and teaming based on constraints, (c) describe initial feedback about system from researchers at a University to deploy, and (d) create and publish a dataset that others can use.


Graph Pattern Matching for Dynamic Team Formation

Finding a list of k teams of experts, referred to as top-k team formatio...

Algorithms for Fair Team Formation in Online Labour Marketplaces

As freelancing work keeps on growing almost everywhere due to a sharp de...

Spoiler Alert: Using Natural Language Processing to Detect Spoilers in Book Reviews

This paper presents an NLP (Natural Language Processing) approach to det...

Data-Driven Optimization for Police Zone Design

We present a data-driven optimization framework for redesigning police p...

Diverse Group Formation Based on Multiple Demographic Features

The goal of group formation is to build a team to accomplish a specific ...

Application of Genetic Algorithms to the Multiple Team Formation Problem

Allocating of people in multiple projects is an important issue consider...

A network approach to expertise retrieval based on path similarity and credit allocation

With the increasing availability of online scholarly databases, publicat...


Building teams in response to an opportunity is a common business activity. Examples are: responding to calls for proposals in product and services supply chains, expert teams for a medical procedure at a hospital, players for a match for team-based sports and crew for an airline flight. In this paper, we will focus on teaming for researchers applying to funding agencies in response to their call for proposals, denoted TeamingForFunding.

A large proportion of funding for research in public universities comes from funding agencies. Hence, it is very important for researchers to be able to identify funding opportunities and make successful proposals. Moreover, many of the funding opportunities are multi-disciplinary requiring teams to be quickly assembled from a wide variety of backgrounds who can work together. The advantage of this setting is that all data is readily available in public - proposal calls and their successful decisions (awards) from agencies like National Science Foundation (NSF) and National Institutes of Health (NIH), and profiles of researchers from data sources like Google Scholar. The users interested in the problem are researchers who will like to collaborate as well as administrators at researchers’ organizations (e.g., Universities) who want to promote more collaborations, proposals and diversity at their institutions. The objectives of teaming can be short-term (), long-term () or a combination of the two.

Figure 1: Team participant view of ULTRA showing team recommendations for select proposals (user view).
Figure 2: Team-building setup with users who can be administrators or potential team participants. A trusted team is one where all stakeholders are convinced that the right members from those available have been selected on the needs of the call for unknown success in future.

In this paper, we present a novel approach and prototype system called ULTRA111ULTRA stands for University Lead Team builder from Rfps and Analysis.. The system extracts technical skills from proposal calls found at multiple open data sources and those present in profiles of researchers online, and normalizes them using Natural Language Processing (NLP) techniques. It then performs matches between calls and potential researchers, and from these matches, creates teams respecting teaming constraints. The recommendations can be informed based on features from recent awards if the calls are repeating, and skill normalization extended using technical classification trees/ ontologies. Initial feedback of ULTRA from researchers at a University has been promising and in future, we plan a large scale evaluation of the system for a large administrative unit (College). We have created a dataset of calls, profiles and recommendations that others can use as baseline and extend.

In the rest of the paper, we first give a demonstration of the current prototype and then describe background of the problem and related work. Then we formally introduce the problem and describe our solution and implementation. We then present initial evaluation and conclude with a discussion of future work.

A Demonstration

In Figure 1, we show how the system works for an individual user who can become a team participant. When the user has logged in, he can see his details on top panel like Name, Area of Expertise, Designation. The Area of Expertise - P4 programmable switches, Science DMZs, IoT Security - have been generated from the extracted research interest from the researcher’s personal website and the Google Scholar account.

Below researcher’s details, the system shows a list of recommendations on the left, 3 at a time (a configurable parameter, and details of a selected recommendation on the right. In each proposal, the system shows the proposal call and team members recommended to be in the team, an estimated budget, and details from the proposal call like deadline and description. In Figure 2, the first recommendation is for

Network Technology and System NeTS Proposal. The system displays some important information like the Agency name, link (URL) to the call and the deadline. It also shows the supporting team members (Ali Mohammod, Huang Chin, Nelakuditi Srihari, Ziehl Paul), whose expertise complement the user as lead and a proposed budget.

A few technical challenges are apparent in this example. The public data about researchers background may be obsolete or not reflect their future research interests, the technical terms used in the proposal call and researchers background may not match, the budget size allowed in the call has to be respected while recommending team participants, the recommended participants may already know each other but may not be free to work on the proposal. The recommendation from ULTRA can be seen as a data-based trigger to kickstart collaboration transparently at an institution.

Teaming for Funding Domain and Problem

In this section, we provide preliminaries about the domain and introduce the problem. Then in the next section, we describe the related work.

Funding agencies issue Request For Proposals (RFPs) on themes where they are looking for ideas which they can fund. We will also refer to RFPs with the term calls. Researchers respond to RFP calls with proposals where they explain their ideas, list the activities that will be conducted and budget to complete the work. They look to team with other colleagues to respond to such calls.

The team building setting we consider is shown in Figure 2. Here, on the left, at different times, the environment offers candidate participants (indicated by blue ovals) the chance to match specific opportunity (indicated by square). The candidate set may change over time with significant overlap across time periods and also their skills or interests. On the right, the interactions within the proposed system at each time period are shown. The users of the system can be administrators representing institution(s) () as well as candidate team participants (). They interact with the system during an opportunity () at time . The system creates teaming choices which the users accept or reject.

At a given time, the inputs for a TeamingForFunding problem are: (a) a call for request for proposal (called call or RFP for short), (b) the profiles of researchers at an institution who are eligible to participate in teams, and (c) business objectives of the administrator and team participants at the institution. The solution to the teaming problem is a list of candidate teams where each team has two or more members with one designated leader. Optionally, each team will have an estimation of team’s budget and chance of the team’s success chances.

The objectives of teaming can be short-term (), long-term () or a combination of the two. For example, in Teaming For Funding, the short-term aims can be: meet skill needs of funding opportunity (), increase chance of success (), satisfy business constraints of researchers’ institutes ()222Examples of business constraints: not over-burden the experienced researchers whose historical performance data is available, give opportunities to new researchers with little historical data, avail special provisions in funding opportunity.. In the long-term, the objectives can be: maximize awards size over a time period (), have a robust (diversified) pipeline of experienced talent (), and satisfy diversity goals of researchers’ institutions ().

Further, teaming solutions consist of two phases: (a) matching, to determine which researchers may be of interest to the calls based on skills needed, and (b) grouping, to determine which subset of researchers should be recommended to be in a team. We address both these phases.

Related Work

We now discuss previous work related to the TeamingForFunding problem which are along teaming and matching.

Figure 3: System Architecture of ULTRA.

Team Formation: There is a rich history of AI in team formation. The most well studied type of teams are those formed in the Hedonic Games framework [4] where a team member only cares about other team members in their group. In our setting, a type of non-participants, administrators at institutions, are considered. However, our setting does not have another type of non-participants - user of a team’s output like a patient in medical teaming. In [11], the authors consider the problem of creating general teams of equal sizes. We recognize teaming as an ongoing area of research and propose to advance the state-of-the-art in fair teaming with a wider set of stakeholders inspired by practical applications.

Matching Concepts: Matching of concepts is a well studied problem in Computer Science and AI [3, 8]. There are two variants - matching the concepts as expressed representations of strings [3] or as real world entities [8]. We presently consider concepts as strings during matching.

There are many techniques to match strings starting with exact matches. Another method is Fuzzy String Matching, also known as approximate string matching, to match a pattern approximately within a distance threshold rather than exactly. This allows us to find matches even when users misspell words or provide partial words for the (skill) search. Another type of technique is embedding based matching. Here, a (deep) learning architecture learns a word or document level representation (embedding) from a large document corpus and numeric operations, like cosine similarity, is used to find similar terms

[10]. Another type of technique, popular in recommendation literature, relies on multi-arm bandits techniques when matching happens repeatedly over time and with different users [9].

Commercial Software for Advertising Researching Funding:

For research funding, there are a few commercial offerings. One system is Pivot 333https://exlibrisgroup.com/products/pivot-funding-opportunities-and-profiles/ which sends keyword based alerts to faculties whose interests match the areas listed in a RFP. Another system is Scry444https://www.amaforge.com/ which matches proposals to faculty but does not identify teaming opportunities. In focus sessions with researchers, we learnt that researchers wanted to have decision support tools that can help them with knowing more about calls, fellow researchers and about teaming opportunities.


We show the main components of our solution in Figure 3. They consist of a Content Extractor (CE) to consume public data about researchers, calls (RFP) and previous awards. We have a Matching system to match calls to researchers (M) and a Team Recommendation (TR) system to help for system output. The system also has components to analyze users feedback of output (FA) and a team recommendation manager (TRM) to send notifications to candidate team participants for their confirmations and helping the team prepare the final submission. We now explain the working of the main components.

Content Extractor (CE)

The CE module extracts content through webpage or Application Programming Interface (API) requests. It utilizes BeautifulSoup555https://pypi.org/project/beautifulsoup4/ library to parse through exhaustive content that may sometimes span hundreds of pages. It also uses the Spacy666https://spacy.io/ library to extract key details such as exact budget of RFP.

Type Number % Extracted
RFP 1,797 100
Title 1,782 99.1
Deadline 1,626 90.4
Budget 1,729 96.2
Synopsis/ Keywords 1,797 100
Users 240 100
Users/Research 205 85.4
Table 1: Performance of content extraction for calls and users. Users/Research refers to number of users we were able to confirm as researchers with a match with Google Scholar.

RFPs are extracted from the NSF website’s search page777https://www.nsf.gov/publications/index.jsp. We use data about both historical calls whose deadline has expired and also open calls whose deadline is in future. From the search page, we acquire links to the individual RFP pages, then extract the HTML and parse it for important information such as the synopsis, title, and application deadlines. Table 1 gives the statistics of extraction for each information type. The extractors are generally accurate but they face challenges with calls when they have multiple sub-tracks and corresponding deadlines.

We also looked at XML archives hosted by NSF, and the search page of Grants.gov888At: https://www.grants.gov/. It contains information about US government-funded programs and projects. (called Grants henceforth), a website containing a list of research proposals from several different agencies. We found that the XML archives had nothing linking them back to the original grant page, and did not include active grants, so we did not use them. The Grants search page had a wider variety of organizations to choose from, and included active grants updated daily. However, the information was both inconsistent and costly to parse, as getting from the search to the Grants page took inordinate amount of processing time to exhaustively search the list.

To get potential team members, a list of researchers is made by pulling information from a University College’s faculty list999https://sc.edu/study/colleges_schools/engineering_and_computing/ faculty-staff/index.php

hosted on their web site. This list is then filtered down to a subset in research role using heuristics that is applied to job titles and designations. To accurately filter and extract our data for these researchers, we perform two additional check. First, we check the individual’s university webpage for keywords, then we search Google Scholar API

101010Using library scholarly - https://pypi.org/project/scholarly/. for the individual’s scholar page. From these sources, we are able to retrieve past published works, and a list of keywords of the individual’s technical skills (research interests). We then consolidate and normalize skills obtained from different sources using NLP techniques - stop word removal, stemming and lemmatization. When no skill information is available for a person, they are omitted from the system. In numbers, for the current prototype, 318 total experts were extracted from the University’s employees’ websites out for which 78 were removed on the basis of designation. In the remaining 240, 205 had either research background from University website or interests as retrieved from Google Scholar. 83 actually had Google Scholar profiles and were confirmed researchers at the University eligible to apply for calls. In Table 1, Users/Research refers to the extraction of researcher information.

We also retrieve previous awards by funding agencies to analyze features and trends in successful proposals. We focused on NSF and NIH who are front-runners for research funding in the US with NSF having budgeted $6.5B111111https://www.nsf.gov/pubs/2020/nsf20002/pdf/nsf20002.pdf in FY 2019 and the NIH $31.8B121212https://report.nih.gov/nihdatabook/report/283. Previous award data was obtained by accessing the agencies’ websites131313https://www.nsf.gov/awardsearch/download.jsp and downloading ZIP files containing XML files corresponding to all projects (and subprojects) awarded across years of interest. We do not use previous award data in current prototype but could use in future.

Matching Proposal Calls To Researchers Skills

We now describe how a proposal call (RFP) is matched to researchers’ skills representing teaming demand with potential supply. We experimented with multiple matching methods all of which use the call synopsis obtained by CE and researchers’ skills.

As the first method, we consider approximate string matching as implemented in the Fuzz library141414https://github.com/seatgeek/thefuzz. We will refer to it as fuzzy. The second method we use is embedding based. For this, we rely on SPECTER (Scientific Paper Embedding using Citation Informed Transformers151515https://github.com/allenai/specter[5]. SPECTER is a representation learning method based on BERT [7] but uses a powerful signal of document-level relatedness: the citation graph. SPECTER is particularly helpful for recommendation tasks with reported performance like nDCG (Normalized Discount Cumulative Gain161616https://www.geeksforgeeks.org/normalized-discounted-cumulative-gain-multilabel-ranking-metrics-ml/

index of 53.9). We first generate document-level embedding on the collected corpus of call synopsis using SPECTER. Then, we implement a match function that takes call synopsis and a skills vector and provides score based on the computed embedding.

As an example, on call summary - The Artificial Intelligence and Cognitive Science AICS program focuses on advancing the state of the art in Artificial Intelligence and Cognitive Science. The program supports research and related education activities fundamental to the development of computer systems capable of performing a broad variety of intelligent tasks and to the development of computational models of intelligent behavior across the spectrum of human intelligence, and a skill profile = [’Artificial Intelligence’, ’Services’, ’Smarter Cities (Water’, ’Health’, ’Traffic)’], the fuzzy method’s matching score is 51 on a 0-100 scale where 100 represents full match. For the same, the embedding matching score is 79 on a 0-100 scale where 100 represents full match. The call is highly relevant to the researcher and the embedding method reflects this more accurately than the fuzzy method. We note that the two scores are from two different systems, and in theory, may be incomparable. So, to verify whether they correspond to what researchers perceive in practice, we created a large subset of matches with both methods and gave to researchers for their evaluation. As reported in evaluation section, we found empirically that researchers preferred matches which scored higher by embedding based method. Hence, we use it in our implementation.

For the researcher shown in Figure 1, the system found 10 calls that have the highest match. These are used to explore possible teams with the researcher as team leader.

Team Suggestions

To get team recommendations from matches, we use a greedy strategy for team formation. First, we run our matching algorithm in a user-item recommendation format i.e. for each user system recommends at most (= 10) relevant synopsis (of a RFP) in descending order of match score. Then, in second step, we form team groups to associate users with recommended RFP based on short and long term constraints. The constraints we use have been obtained after discussions in focus groups with University administrators and researchers. They are: (a) limit upper team size to 5 members unless budget size allows larger participants, (b) have at least $50K per participant (c) each member should have at least one non-overlapping skill with another. The approach can easily incorporate new constraints and can explain request for change to team composition by showing which constraints would be respected or violated. We anticipate additional constraints at user-level (by researchers) like in maximum number of recommendations in a time period. If the number of users recommended for a call are more than team size limit, we sort the matched users and select in the descending order of matched score. At the end of this step, we get user-specific recommendation of calls and possible teams, and call-specific recommendation of potential teams.

We present an example of team formed by the above method in Table 2 consisting of 5 researchers on a historical call. In ULTRA screen, the user could go to the link associated with this RFP, the synopsis obtained is from the section of call - Summary of Program Requirements Section. It is related to computational neuroscience and computer science research. The team has 4 researchers directly related to bio-informatics or neuromorphic computing and the fifth one related to deep learning. Therefore, we see that system is able to match very effectively and its results are promising.

User Research Interests Score
Agostinelli Forest

[’Artificial Intelligence, Deep Learning, Reinforcement Learning, Search, Bioinformatics’]

Hu Jianjun

[’Deep learning, machine learning, materials informatics, bioinformatics, evolutionary computation’]

Valafar Homayoun [’Computational Biology, Bioinformatics, Artificial Intelligence, Machine Learning, Neural Information Processing’] 83
Zand Ramtin [‘Hardware Design for Machine Learning Systems , Neuromorphic Computing , Emerging Nanoscale Electronics’] 80
Luo Lannan [’Sofware and Systems Security, Mobile Security, Software Engineering, Programming Language, Deep Learning’] 76
Table 2: Team Formation Result generated by ULTRA for historic NSF call - Innovative Approaches to Science and Engineering Research on Brain Function. URL: https://www.nsf.gov/pubs/2014/nsf14504/nsf14504.htm


We have conducted preliminary evaluation ULTRA’s output. The main output of the system is a list of experts who compose a team and whose skill set, experience and scientific excellence are compatible with the RFP topic. The evaluation of the effectiveness of the system was done retrospectively by analyzing historic data of both the previous RFPs and experts’ engagement with the latter.

During initial evaluation, seven users from a College in the University were selected for evaluation. The aim of the evaluation was to determine if the system had identified the RFPs for each user fitting well within their competence set. For each user the system had generated ten best matches, based on the pool of historic RFPs. The RFP match was assessed in Likert scale 1…10, by the user themselves or by the expert panel, if the user feedback was not reached. The human assessment revealed the following: for two users, all the ten system matched RFPs fit very well within their competence (at least rating 7 in Likert scale). For the remaining five users, nine out of ten RFPs were matched very well, whereas one RFP had been assessed as 6 or lower on Likert scale. Altogether, from 70 matches, 65 RFPs were assessed by respondents as 7 or higher in Likert scale, demonstrating the high effectiveness of the system in matching users’ skills to historic RFPs.

In particular, the user171717Jorge Crichigno, PhD, Associate Professor in Integrated Information Technology in Figure 1 commented that a new search term cybertraining helped him to locate a RFP which would have normally be unknown to him. Also, he noted that ”I looked at the NSF grants and they seem very relevant”, noting the satisfaction with ULTRA’s output.

Next, we obtained the records of awarded RFPs made available by the University’s division responsible for sponsored research. We manually annotated the synopsis for each awarded project. The included awards were limited to NSF data, as information about the awards is publicly well accessible. We ran our user-item recommendation system for the corpus of synopsis. We got the recommendation for Principal Investigators in the form of 10 best awards - the results were compared to actual award data, resulting in the accuracy of 54.5 per cent where the system was able to match the users with the actual awards they had received. This level can be considered satisfactory system output, as the success in getting the awards does depend also on many other factors, besides competence matching. In future, we will incorporate matching and teaming enhancements to improve over the matching baselines.

Discussion and Future Work

We now discuss the impact of our approach, initial user feedback and opportunities for future improvement. Initial Feedback: The feedback we have received from administrative and research users has been encouraging. One is to expand ULTRA’s data coverage beyond engineering. Here, Web of Science has wider coverage about researchers than Google Scholar. Forming teams is challenging, especially for cross disciplinary teams like social sciences, arts, humanities. Second, users want the system to assign star rating to team leaders who have had good experiences and track record. Third, some proposals expect junior faculty or member from disadvantaged groups to be in the lead, and this could be prioritized by the tool for teaming. Fourth, sometimes a user displays having competence in certain fields, but this is not reflected on their track record, such as previous publications and projects. The system can then use reliable online information to properly characterize users’ competence. At the same time, the system should also allow people to update their research profiles. Overall, stakeholders agree that this is a complex problem and our system is trying to get ahead of curve trying to solve this problem.

Future Directions: The current prototype can be extended along many development directions. First, we can increase the scale of experiments and data about researchers to outside of engineering. Second, we can normalize the technical areas that are used from the proposal call and researcher profiles beyond NLP techniques. In [2], the authors present an unsupervised approach to visualize text documents where they normalize skills of researchers and those mentioned in technical calls. In our tool, we can use their approach and focus on the area of Computing to use the ACM Computing Classification System (1998) [1, 6] - a set of 1532 Terms and their Codes. Beyond ACM, one can use American Economic Association Classification for the field of economics and the healthcare classification ICD-10 containing codes for diseases, signs and symptoms, abnormal findings, complaints, social circumstances, and external causes of injury or diseases. Third, one can improve matching using Bandit and Hedonic games methods. The data about previous awards could be useful in this regard. Fourth, one can also improve team formation approach with metrics for short and long-term optimization of team proposal results and performance on funded projects.


In this paper, we tackled the emerging problem of how AI can assist researchers identify right funding opportunities, assessing their suitability for the program and suggest teaming opportunities for successful proposals. This is an example of the general problem of team formation when opportunities may repeat over time. We proposed an approach based on content extraction from open sites about calls and researchers, matching users to calls, and then forming teams based on business constraints. We described a prototype, ULTRA, and presented preliminary empirical evidence that the approach is promising. This work lays the basis for future work on AI-assisted teaming spanning multiple disciplines at University-scale.



  • [1] ACM (2012) ACM classification scheme. In https://www.acm.org/publications/computing-classification-system/how-to-use, Cited by: Discussion and Future Work.
  • [2] K. Aggarwal and B. Srivastava (2021) An unsupervised system for exploring text documents. In UoSC Research Report, Cited by: Discussion and Future Work.
  • [3] A. Alqahtani, H. Alhakami, T. Alsubait, and A. Baz (2021-Feb.) A survey of text matching techniques. Engineering, Technology and Applied Science Research 11 (1), pp. 6656–6661. External Links: Link, Document Cited by: Related Work.
  • [4] H. Aziz, F. Brandl, F. Brandt, P. Harrenstein, M. Olsen, and D. Peters (2017) Fractional hedonic games. Cited by: Related Work.
  • [5] A. Cohan, S. Feldman, I. Beltagy, D. Downey, and D. S. Weld (2020) SPECTER: Document-level Representation Learning using Citation-informed Transformers. In ACL, Cited by: Matching Proposal Calls To Researchers Skills.
  • [6] N. Coulter, J. French, E. Glinert, T. Horton, N. Mead, R. Rada, A. Ralston, B. Rous, A. Tucker, P. Wegner, E. Weiss, and C. Wierzbicki (1998-02) Computing classification system 1998: current status and future maintenance report of the ccs update committee. 39, pp. . Cited by: Discussion and Future Work.
  • [7] J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805. External Links: Link, 1810.04805 Cited by: Matching Proposal Calls To Researchers Skills.
  • [8] Y. Li, J. Li, Y. Suhara, J. Wang, W. Hirota, and W. Tan (2021-01) Deep entity matching: challenges and opportunities. J. Data and Information Quality 13 (1). External Links: ISSN 1936-1955, Link, Document Cited by: Related Work.
  • [9] A. Slivkins (2021) Introduction to multi-armed bandits. In Arxiv at https://arxiv.org/abs/1904.07272, Cited by: Related Work.
  • [10] N. A. Smith (2020) Contextual word representations: putting words into computers. In Communications of the ACM, Vol. 63 No. 6, Pages 66-74, Cited by: Related Work.
  • [11] H. A. Yekta, D. Bergman, and R. Day (2018) On finding stable and efficient solutions for the team formation problem. In On Arxiv at: https://arxiv.org/abs/1804.00309, Cited by: Related Work.