Mapping the co-evolution of artificial intelligence, robotics, and the internet of things over 20 years (1998-2017)

06/03/2020
by   Katy Börner, et al.
Indiana University Bloomington
0

Understanding the emergence, co-evolution, and convergence of science and technology (S T) areas offers competitive intelligence for researchers, managers, policy makers, and others. The resulting data-driven decision support helps set proper research and development (R D) priorities; develop future S T investment strategies; monitor key authors, organizations, or countries; perform effective research program assessment; and implement cutting-edge education/training efforts. This paper presents new funding, publication, and scholarly network metrics and visualizations that were validated via expert surveys. The metrics and visualizations exemplify the emergence and convergence of three areas of strategic interest: artificial intelligence (AI), robotics, and internet of things (IoT) over the last 20 years (1998-2017). For 32,716 publications and 4,497 NSF awards, we identify their conceptual space (using the UCSD map of science), geospatial network, and co-evolution landscape. The findings demonstrate how the transition of knowledge (through cross-discipline publications and citations) and the emergence of new concepts (through term bursting) create a tangible potential for interdisciplinary research and new disciplines.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/12/2014

A Novel Method for Developing Robotics via Artificial Intelligence and Internet of Things

This paper describe about a new methodology for developing and improving...
03/31/2021

Imagine All the People: Citizen Science, Artificial Intelligence, and Computational Research

Machine learning, artificial intelligence, and deep learning have advanc...
08/04/2021

From "Analogue" Science to AI-powered Digital Science

Phase transition from the human-limited, "analogue" way of research enqu...
01/07/2020

Plunge into the Underworld: A Survey on Emergence of Darknet

The availability of sophisticated technologies and methods of perpetrati...
03/02/2021

Convergence and Inequality in Research Globalization

The catch-up effect and the Matthew effect offer opposing characterizati...
06/16/2020

Response by the Montreal AI Ethics Institute to the European Commission's Whitepaper on AI

In February 2020, the European Commission (EC) published a white paper e...
10/29/2019

Results from the Robocademy ITN: Autonomy, Disturbance Rejection and Perception for Advanced Marine Robotics

Marine and Underwater resources are important part of the economy of man...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Abstract

Understanding the emergence, co-evolution, and convergence of science and technology (S&T) areas offers competitive intelligence for researchers, managers, policy makers, and others. The resulting data-driven decision support helps set proper research and development (R&D) priorities; develop future S&T investment strategies; monitor key authors, organizations, or countries; perform effective research program assessment; and implement cutting-edge education/training efforts. This paper presents new funding, publication, and scholarly network metrics and visualizations that were validated via expert surveys. The metrics and visualizations exemplify the emergence and convergence of three areas of strategic interest: artificial intelligence (AI), robotics, and internet of things (IoT) over the last 20 years (1998-2017). For 32,716 publications and 4,497 NSF awards, we identify their conceptual space (using the UCSD map of science), geospatial network, and co-evolution landscape. The findings demonstrate how the transition of knowledge (through cross-discipline publications and citations) and the emergence of new concepts (through term bursting) create a tangible potential for interdisciplinary research and new disciplines.

Introduction and prior work

Advances in computational power, combined with the unprecedented volume and variety of data on science and technology (S&T) developments, create ideal conditions for the development and application of data mining and visualization approaches that reveal the structure and dynamics of research progress. A critical challenge for decision makers is determining how to spend limited resources most productively. To do so, one must have a basic understanding of where the most productive research is being done, who the key experts are, and how others are investing in research. The identification of topics that have recently emerged as increasingly important or that are converging to create new synergies in research can be particularly fertile areas for research and development (R&D).

Different ‘emergence’ and ‘convergence’ definitions, indicators, and metrics have been proposed in previous work in this area. We use the four-attribute model of what constitutes technological emergence [Rotolo2015] and assume that emergent topics should evidence term novelty, persistence, and accelerating growth, and typically show the formation of a research community. Much prior work exists on how to measure emergence. Guo et al. [Guo2011] proposed a mixed model that combines different indicators to describe and predict key structural and dynamic features of emerging research areas. Three indicators are combined: 1) sudden increases in the frequency of specific words; 2) the number and speed with which new authors are attracted to an emerging research area; and 3) changes in the interdisciplinarity of references cited. Applying this mixed model to four emerging areas for means of validation results in interesting temporal correlations. First, new authors enter the research area, then paper references become interdisciplinary, and then word bursts occur.

Recent work—including that funded by the US Intelligence Advanced Research Projects Activity (IARPA) Program on Foresight and Understanding from Scientific Exposition (FUSE) [IARPA2010]—focuses on advanced linguistic techniques for identifying emerging research topics. Contributions from [Carley2018] include methods to extract terms from paper titles and abstracts and filter them based on 1) novelty, 2) persistence, 3) a research community, and 4) rapid growth in research activity. Porter et al. [Porter2019] developed emergence indicators that help 1) identify “hot topic” terms, 2) generate secondary indicators that reflect especially active frontiers in a target R&D domain, 3) flag papers or patents rich in emergent technology content, and 4) score research fields on relative degree of emergence. The authors suggest future work should exploit author networks, breakout citation patterns, and/or funding trends.

Convergence research was identified by the National Science Foundation (NSF) as one of the 10 Big Ideas for Future NSF Investments [NSF2016] that will help advance US prosperity, security, health, and well-being. In this paper, we present a repeatable procedure to characterize emerging R&D topics based on publication and funding data. Using this procedure, we visualize and analyze the convergence of three emerging R&D areas. Our efforts here both build upon and expand work by [Janssen2006], who used the convergence to study scholarly networks for domains relevant for understanding the human dimensions of global environmental change.

A literature review and stakeholder-needs analysis were used to identify three domains of strategic interest: artificial intelligence (AI), robotics, and the internet of things (IoT). These three areas are of paramount importance not only for global prosperity, but also for defense and security [Tally2016]. AI, IoT, and robotics were named top technologies in 2018 with strong arguments and examples of how these technologies will drive digital innovation and completely transform business models [Matthieu2017, Pascu2018]. Since AI and robotics will have a major impact towards the future of economy, businesses need advanced preparation to meet these transformational challenges. The 2019 AI annual report pointed to the complexity of the fast-growing AI labor market: “unconditional convergence” and “unconditional divergence” in job demands at the same time [Perrault2019]

. In line with these developments, the Trump administration prioritized funding for fundamental AI research and computing infrastructure, machine learning, and autonomous systems. Also, it argued for the need to work with international allies to recognize the potential benefits of AI and to promote AI R&D 

[TheWhiteHouse2018].

The paper is organized as follows. Section 2 details the stakeholder needs analysis, which guided the selection of strategic research areas and provided information about stakeholder insight needs. Section 3 discusses both datasets used and data preprocessing performed in this study. Section 4 presents methods used, while Section 5 details results achieved. Section 6 discusses validation design and insights. Section 7 concludes the paper with a discussion of the results and outlook for future research in this area.

Stakeholder needs analysis

A stakeholder needs analysis (SNA) was employed in order to identify insight needs and use cases for using indicators and visualizations of emergence (and indirectly convergence) in daily decision-making environments. Specifically, the SNA was designed to identify stakeholder demographics, task types, insight needs, work contexts, and priorities to better understand how decision makers might utilize static and dynamic information visualizations, topics of concern, and metrics currently used when making decisions. Eleven decision makers from the Naval Surface Warfare Center, Crane Division, (NSWC Crane) in Martin County, Indiana completed the one-hour survey. Participants included personnel from human resources, corporate operations, engagement, R&D, and various technical specializations. Surveys were conducted both on the Indiana University–Bloomington campus and at WestGate Academy, a technology park located adjacent to the naval base.

Areas of strategic interest. In order to understand what topical areas were of interest, survey respondents were asked to identify which topics, from a list of eight, were most relevant for their work. Additional topics were also solicited from survey participants. The top eight topics are represented in Table 1. Each of the eight topics was queried via NSF award and Web of Science portals to identify the total of funding award and publications between 1998 and 2017.

Topic #Experts that expressed interest #NSF Funding awards active in 1998-2017 #WOS Publications published in 1998-2017
Advanced electronics 10 122 206
Artificial intelligence 10 1,075 7,414
Sensors and sensor fusion 9 145 14
Internet of things 4 348 11,371
Human systems integration 4 0 102
Robotics 3 3,074 13,931
System of systems test and evaluation 3 4 0
Power and energy management 2 0 48
Table 1: Topical interest and relevant funding and publication data.

Since we used funding and publication data to characterize emerging areas, the number of NSF awards and Web of Science (WOS) publications for each of the top-ranked areas was a factor in final topic selection (see Table 1). Note that for several areas, few funding awards have been made, allowing a human analyst to read through the related abstracts within a week. Artificial intelligence, the internet of things, and robotics are three domains identified as being of national strategic importance that have sufficient numbers of funding awards and publications for rigorous analysis (see Section 1).

Current metrics informing resource allocation decisions at NSWC Crane focus on internal R&D needs, recent calls for funding, and how other federal agencies are focusing their funding. Most respondents identified similar processes for allocating funding, deciding when to bring in outside expertise, and selecting contractors. This unified decision-making approach suggests that visualizations providing greater detail on how others are focusing their funding in strategic areas of interest have an important role to play in the process.

In the second part of the SNA, participants were asked to view three sample visualizations, identify any insights gained from them, and elaborate on how they might use these insights. At the end, participants were asked to identify which of the three figures they found most useful and why. Seven respondents found the network visualization most useful, two found the tree map that identified top funders most useful, and one preferred a visualization showing the number of citations related to one topic over time. One participant responded that all were equally useful in different ways but were each not as useful on their own.

Participants were also presented with three interactive visualizations that they were able to manipulate on laptop computers or tablets. Again, participants were asked to identify insights gained from each visualization, and to elaborate on how they might use these insights. They were also asked to identify which of the three figures they found most useful and why. Six respondents found the co-authorship geospatial visualization most useful, three found the co-authorship network most useful, and two identified the science map as most useful. In sum, co-authorship networks in combination with geographic representations proved most interesting to the stakeholders surveyed. Additionally, stakeholder interest in interactive capabilities suggests that this would be a valuable direction for future efforts.

Data and processing

A majority of publications related to each of the three target domains is captured in the Web of Science (WOS). In the US, much of the funding for these three focus areas is awarded by the National Science Foundation (NSF). The authors acknowledge that some subset of the current research in these areas is conducted through defense organizations and therefore can be difficult to capture in publicly available datasets. Only WOS publications and NSF awards from the last twenty years (1998-2017) were included in the analysis.

Publications

Publication data was retrieved from the Clarivate Analytics Web of Science (WOS Core Collection) web portal and WOS XML raw data (Web of Knowledge version 5) acquired from Clarivate Analytics by the IUNI Science of Science Hub and shared through a Data Custodian user agreement with the Cyberinfrastructure for Network Science Center (CNS) on July 7, 2018. The total number of WOS publications is 69 million and there are more than one billion citation links from 1900 through the early months of 2018. Most publications have title, abstract, and keyword information that can be used for text mining. Publications also have a publication year and author data. Using the AuthorKeywords field, Clarivate Analytics extracted publication identifiers (accession numbers - UT) on October 24, 2019 containing the (compound) query terms “artificial intelligence,” “internet of things,” “IoT,” and “robotics.” We also manually evaluated the 371 keywords where the “IoT” term was semantically and structurally ambiguous (e.g., INTEROCULAR TRANSFER (IOT), ANTIBIOTIC ACTIVITY), resulting in 64 false positives that were removed from the records. Clarivate UT identifiers were then split in eight segments and queried in the WOS web portal on January 24, 2019 to extract the ISI raw data that was then filtered by three keywords. Furthermore, two records have been removed in which the publication year was changed from 2017 to 2018: WOS:000425355100004 (artificial intelligence) and WOS:000425355100017 (IoT). The final number of records can be seen in Table 1. Two of the 7,414 papers (artificial intelligence) had no authors (WOS:000384456000001, WOS:000173337900025) and were excluded from the co-author and geospatial analyses. The number of publications per year and the number of citations for 1998-2017 are shown in Fig 1A. Note that the number of all publications (dashed line) and the number of citations (dotted line) are steadily increasing during the years 1998-2017, whereas in 2014 we observe a nascent field of IoT (blue solid line) showing a sharp increase, exceeding AI (yellow solid line) by 2012 and robotics (red solid line) by 2014.

Fig 1: WOS Publications and NSF Awards. (A) Number of WOS papers extracted and the number of citations. (B) Number of NSF grants and amount of funding awarded by NSF each year.

There are many instances where publication records contain keywords from more than one focus area. For the terms AI and robotics, for instance, there are 209 overlapping publication records. Between AI and IoT there are 46 overlapping publications, and between IoT and robotics there are 38. There were two publications containing keywords from all three areas (“A Novel Method for Implementing Artificial Intelligence, Cloud and Internet of Things in Robots” and “IT as a Driver for New Business”). Over time, there was a statistically significant increase in publications (), totaling 32,716 during the period of 1998-2017 (see Fig 1A). Table 2 displays the number of unique keywords and authors for the three areas together with the totals.

Topic #NSF Unique Investigators #NSF Unique Keywords #WOS Unique Authors #WOS Unique Keywords
Artificial Intelligence 1,297 3,081 17,316 17,534
Internet of Things 575 2,435 23,691 21,204
Robotics 3,275 6,144 30,784 23,561
Table 2: Topical interest and relevant funding and publication data.

Funding

The NSF funds research and education in science and engineering through grants, contracts, and cooperative agreements to more than 2,000 colleges, universities, and institutions across the United States. It provides about 20 percent of the federal support academic institutions receive for basic research. More than 500,000 awards—including active, expired, and historical awards from 1952 to today—are available via the NSF award search portal.

NSF funding data for awards containing the (compound) terms “artificial intelligence,” “internet of things,” “IoT,” and “robotics” was downloaded on July 24, 2018. Subsequently, the resulting sets were narrowed to NSF awards that were active in the last 20 years (Jan 1, 1998 to Dec 31, 2017): 1,075 in AI, 3,074 in robotics, and 348 in IoT (see Fig 1B and also Table 1). Note that 325 active awards have their start date earlier than 1998. Table 1 exhibits the number of NSF funding awards active in 1998-2017, and Table 2 illustrates the number of unique investigators and unique keywords for the three areas together with the totals.

There are some instances where awards overlapped: AI and robotics, for instance, received 146 funding awards, robotics and IoT received 17, while AI and IoT received only 2. There was no award for a project in all three focus areas. The annual distribution demonstrates a statistically significant increase in awards (), which seems to follow the same pattern as publications, lagging only by 10-fold on the log scale and totaling 4,497 awards over time (see Fig 1B). Note that the total award amount for the 1,075 AI funding awards is $494,310,951, the 3,074 robotics awards is $1,375,299,908, and the 348 IoT awards is $149,498,845. The number of awards and the amount of funding demonstrates a slow increase over time, with the exception of IoT showing a spike between the years 2013 and 2015.

Keyword Extraction via MaxMatch

Using results from a linguistic algorithm comparison detailed in [Borner2018d], the MaxMatch algorithm was used to identify terms in NSF funding awards that match the unique WOS Author Keywords specific to the three topic areas. The MaxMatch algorithm [Wong1996] performs word segmentation to improve precision. The algorithm first computes the maximum number of words in the lexical resource (here NSF award titles and abstracts); then it matches long terms before matching shorter terms. Thus, given the text ‘artificial intelligence’ and ‘intelligence’ in a set of relevant terms, and ‘artificial intelligence’ in the title and/or abstract of an award, the algorithm returns ‘artificial intelligence.’ This reduces oversampling of popular, short terms.

All keywords were pre-processed and normalized via Key Collision Fingerprint and ngram methods using OpenRefine [Stephens2018]. This algorithm finds “alternative representations of the same things,” thus allowing for the normalization of keywords (e.g., “internet of things (iot),” “iot – internet of things,” “internet of thing”). We identified 1,739 clusters for AI; 2,333 clusters for IoT, and 3,201 clusters for robotics, which we then normalized by merging similar terms.

As a result, for WOS publications, we identified 55,946 unique ‘Author keywords’: 17,534 unique keywords for AI; 21,204 for IoT, and 23,561 for robotics. There are 2,935 terms that are shared between AI and robotics; 2,204 between AI and IoT; 2,331 between IoT and robotics; 1,117 shared across all three sets (2% of all identified WOS keywords).

For NSF awards, we first excluded 325 active records where the start date was earlier than 1998. We then identified 9,185 unique ‘Author keywords’ terms: 3,081 unique terms for AI; 2,435 for IoT, and 6,144 for robotics. Note that the keywords intersect: There are 1,376 terms that are shared between AI and robotics; 717 between AI and IoT; 914 between IoT and robotics, and 532 are shared between all three terms (6% of all identified NSF keywords).

Methods

Burst detection and visualization

Burst detection algorithm helps identify sudden increases in how often certain keywords are used in temporal data streams [Kleinberg2002]. The Kleinberg’s algorithm, available in the Sci2 Tool [Sci2Team2009], reads a stream of events (e.g., time-stamped text from titles and abstracts) as input. It outputs a table with burst beginning and end dates and a burst weight, indicating the level of “burstiness.” Burst weights can be used to set thresholds (for example, to keep only the top-10 words with the highest burst weights). For this study, burst was run with gamma at 1.0, density scaling at 2.0, and bursting states and burst length of 1. The weights of terms that burst multiple times were summed up before top-n were picked for visual graphing. Note, however, that the original burst values are used to code the bars by area.

A novel burst visualization (see Fig 2) was implemented to show the temporal sequence of bursts in funding and publications as well as co-bursts of terms in both types of data. Using the temporal bar graph visualization as a starting point, each bursting term is represented as a horizontal bar with a specific start and end date. The area of each bar encodes a numerical attribute value; here burst weight is equally distributed over all years in which a certain burst occurs. Note that some burst terms are consecutive—they end in year x and start in year x+1—and might have different burst weights. Bars are color-coded by type.

Fig 2: Burst visualization using horizontal bar graphs.

Top organizations and funding

The Web of Science (WOS) online portal (Clarivate Analytics, 2018) supports searches for specific topics and facilitates the retrieval and examination of top funding agencies and top research organizations. Between September 11 and October 23, 2018, queries were run on the terms “artificial intelligence,” “internet of things,” “IoT,” and “robotics” using topic and title fields for the years 1998-2017. The ‘Organization-Enhanced’ field was used which returns all records including name variants and preferred organization names. Organizations and funding agencies come with a variety of spellings. For example, the NSF and National Science Foundation show up as two different entities making name unification necessary. For each of the three focus areas, we selected the top-10 research institutions and funding agencies, identified their country, and tabulated results (see Table 1 in S1 Appendix.).

Co-Author networks

The Sci2 Tool was used to extract a co-author network using the co-author column. The resulting undirected, weighted network has one node type: authors. Each node represents a unique author. Edges represent co-occurrence of authors on a paper (i.e., the relationship between pairs of authors to be co-authors or not). Edge weights denote the number of times two authors co-occur on, i.e., co-authored a paper. The example given in Fig 3 illustrates a table with five papers by a total of six authors. Authors A2 and A6, for example, co-occur on papers P2 and P6, so their link is twice as thick. All other edges have a weight of one. For this study, we used perfect match on names in the “Authors” field using the “Extract Co-Occurrence Network” functionality.

Fig 3: Co-authors listed on papers (left) are rendered as a co-author network (right).

Force directed network layout takes the dimensions of the available display area and a network as input. Using information on the similarity relationships among nodes—for example, co-author link weights—it calculates node positions so that similar nodes are in close spatial proximity. This layout is computationally expensive as all node pairs have to be examined and layout optimization is performed iteratively. Running the Generalized Expectation Maximization (GEM) layout available in Sci2 via running GUESS on a small network results in the layout shown in Fig 

3 (right). This network displays the sub-networks, their size, density, and structure. Node attributes can supplement the graph with size-code, color-code, and label nodes (e.g., by the number of papers, citations, or co-authors). Edge attributes can be used to size- and color-code edges (e.g., by the number of times two authors have co-authored). The legend communicates the mapping of data to graphic variable types.

WOS publications provide affiliation data (addresses) for authors, making it feasible to generate network overlays on geospatial maps, see Fig 4. We use Make-a-Vis [CyberinfrastructureforNetworkScienceCenter2018] to generate latitude and longitude values for author addresses as well as co-author networks; authors with no US address are excluded from this network. If an author has multiple addresses, the most recent address is used. The co-author network with geospatial node positions is saved out and read into Gephi [Bastian2009] for visualization using Mercator projection; with node area size indicating number of citations, node color indicating the first year published, and edge thickness representing the number of times two authors are listed on a paper together.

Fig 4: Co-author network overlay on geospatial map of the US. See GitHub Appendix for more information on the co-author and geospatial network workflow - https://github.com/cns-iu/AICoEvolution

Science map and classification system

The UCSD Map of Science and Classification System was created using 2006–2008 data from Scopus and 2005–2010 data from the Web of Science [Borner2012g]. The map organizes more than 25,000 journal/conference venues into 554 (sub)disciplines that are further aggregated into thirteen main scientific disciplines (e.g., physics, biology), which are labeled and color-coded in the map. The network of 554 (sub)disciplines and their major similarity linkages was then laid out on the surface of a sphere. Subsequently, it was flattened using a Mercator projection resulting in a two-dimensional map (Fig 5). The UCSD Map of Science wraps around horizontally (i.e., the right side connects to the left side of the map).

Fig 5: Science mapping process—from journal names (left), to (sub)disciplines (middle), to 2D spatial positions (right).

In order to create proportional symbol data overlays, a new dataset is “science-coded” using the journals (or keywords) associated with each of the 554 (sub)disciplines. For example, a paper published in the journal Pharmacogenomics has the science-location ‘Molecular Medicine,’ as the journal is associated with this (sub)discipline of the discipline ‘Health Professionals.’ If non-journal data (e.g., patents, grants, or job advertisements) need to be science-located, then the keywords associated with each (sub)discipline can be used to identify the science location for each record based on textual similarity. In this study, multidisciplinary journals such as Science or Nature, which are fractionally assigned to multiple disciplines, were associated with a ‘Multidisciplinary’ discipline. Journals that cannot be classified are automatically associated with an ‘Unclassified’ discipline. Bar graph visualizations showing the number of papers and citations for the 15 disciplines are used to support comparisons (see Figs 1-3 in 

S1 Appendix.).

Results

Artificial intelligence (AI)

The field of artificial intelligence studies the interplay of computation and cognition. It is concerned with subjects such as knowledge representation and retrieval, decision-making, natural language processing, and human and animal cognition. AI research generates tools and artifacts to address problems involving complex computational models, real-world uncertainty, computational intractability, and large volumes of data. It also uses computational methods to better understand the foundations of natural intelligence and social behavior.

Top-funded AI awards include BEACON: An NSF Center for the Study of Evolution in Action led by Erik Goodman at Michigan State University, active 2010-2021, total amount awarded to date $43M; the Center for Research in Cognitive Science led by Aravind Joshi at the University of Pennsylvania, 1991-2002, $21M; and the Spatial Intelligence and Learning Center (SILC) led by Nora Newcombe at Temple University, 2011-2018, $18M.

WOS-top organizations and funding

The top-10 AI-funding organizations most frequently acknowledged in papers (seven were merged, see S1 File.) and the top-10 research organizations are exhibited in Fig 6 (top right). The leading top funders are the Natural Science Foundation of China and the National Science Foundation in the US, followed by agencies from the UK, Mexico, Europe and Brazil. The top research organizations are the French National Center of Research and the French University of Cote-d’Azure. US, India, China, and United Arab Emirates are among countries in the top-10 leading AI research institutions. The list of abbreviations for agencies and institutions and their name variations is exhibited in Table 5 S1 Appendix..

Burst of activity

Within the 1,075 NSF awards that have the keyword ‘artificial intelligence,’ there are 161 bursts. There are no terms that burst twice. As for the 7,414 WOS publications with the keyword ‘artificial intelligence,’ there are 89 bursts total with no terms bursting twice. There are eight overlapping bursts for NSF and WOS keywords: ‘Agents,’ ‘Big Data,’ ‘Component,’ ‘Control,’ ‘Deep Learning,’ ‘Expert Systems,’ ‘Machine Learning,’ and ‘Psychology.’ The top bursting activity for NSF is ‘Machine Learning’ with the burst weight of 13.04, while for WOS it is ‘Learning (Artificial Intelligence)’ with the burst weight of 39.29. Among the top-10 bursting activities, ‘Big Data’ co-bursts in both sets in 2014-2017 and is rendered in gray (see Fig 

6, top left). The other two co-bursting terms are ‘Deep Learning’ and ‘Machine Learning.’ Burst weight is indicated by bar thickness with ‘Learning (Artificial Intelligence)’ having the strongest burst in 2015-2017. Interestingly, the bursting activity between 1998 and 2007 is predominately present in WOS publications (orange color). Starting with the keyword ‘Web,’ NSF awards exhibit bursting activities, culminating by co-bursting with publication from 2014 in ‘Big Data,’ ‘Machine Learning’ and ‘Deep Learning.’

Fig 6: Burst of activity and top funding organizations. Shown on the left are the top-15 keywords with the strongest bursts in funding awards (blue color) and publications (orange color). Gray color indicates a double burst. For example, ’Machine Learning’ is bursting in both publications and funding awards in 2017-2018. Bar thickness indicates the strength of each burst (weight). Given on the right are the Top-10 funders and research organizations associated with publications (see Table 5 in S1 Appendix.).

Key authors and collaboration networks

The complete co-author network for AI has 17,316 unique author nodes. There are 437 authors with more than three papers, 235 authors with more than four papers, 143 with more than five papers. Of these nodes, 901 are not connected to any other node (they are called isolates) denoting that these 901 authors have not co-authored with any others during the 20 years. There are 31,476 co-author edges. The average degree is 3.64. The network has 4,299 weakly connected components, including the 901 isolates. The largest connected component consists of 1,914 nodes and is too large to visualize in a paper. The network was filtered by time-cited ≥ 1 resulting in 11,166 nodes, 21,280 edges, 2,763 weakly connected components and 473 isolates. The largest connected component of this network has 675 co-authors (with 473 isolates removed) and is shown in Fig 7 (top left). Author nodes and node labels are size-coded by the number of citations. Links, which denote co-authorship relations, are thickness-coded by the number of joint publications. The total number of links is 2,028 and they are filtered by the number of co-authored network (≥1). The labels are filtered by the number of times cited (≥100). Author ‘Zhang, Y’ has the largest number of citations (475) in this network. Zhang is also one of the top-10 cited US authors in the complete co-author network (see Fig 7 top table).

Fig 7: Co-author network and top-10 authors by #Citations for AI (top), robotics (middle), IoT (bottom).

When presented with key author and collaboration networks during the stakeholder needs analysis, users indicated that network diagrams overlaid on top of geographic maps provided greater insights than network diagrams not anchored to geographic space. In response to that feedback, we present here key authorship and collaboration network diagrams that use the United States as a base map.

Co-author US map

To overlay the obtained co-authored network over a US map, we used a Make-a-Vis. Fig 8 (top left) shows the number of co-authors for AI with nodes representing the number of citations and a darker hue showing the first year of publication for a given author. The network concentration is noticeable in the eastern states, and Austin and Pittsburgh are the top-two cities in the mid-US, with AI research cited 910 and 888 times, respectively (see Fig 8, top right).

Fig 8: Co-author network extracted from WOS publications and overlaid on US map.

Topical evolution

The topical distribution of 7,414 WOS publications on AI is shown for two 10-year time slices in Fig 9 (top). The UCSD Map of Science and Classification system used with identical circle area size coding (see discussion in the Methods section). Most of the papers are in the ‘Electrical Engineering & Computer Science’ disciplines and in the ‘Chemical, Mechanical, and Civil Engineering’ disciplines. Note the increase of papers in the Social Sciences and Health Professionals (see also Fig 1 in S1 Appendix.). The top-five most cited papers, along with their publication year and total number of citations, are shown in Table 4 in S1 Appendix..

Fig 9 illustrates the evolution of topical coverage for each of the key terms. The comparison between left (1998-2007) and right (2008-2017) panels indicates the evolution of each term within scientific disciplines. AI shows a steady increase within each of the 15 disciplines (see also Fig 1 in S1 Appendix.).

Fig 9: Topical coverage of AI, robotics, IoT publications published in 1998-2007 (top) and 2008-2017 (bottom).

Robotics

WOS-top organizations and funding

The top-10 funding organizations (seven were merged, see S1 File.) and the top-10 research organizations are given in Fig 6 (right panel). As can be seen, the US is clearly providing the largest amount of funded publications, while the strongest research institutions are in France. The Chinese funding organization NSFC is the second top-funding agency; however, no Chinese research institution is listed among the top-10. The University of California System (UC System), the University System of Georgia (USG), and MIT are among the top-10 research institutions with the most papers.

Burst of activity

In the 3,074 NSF awards, there are 654 total bursts, with 28 double, and two triple bursts (“CPS” and “Impacts”). Summing up burst weights of double and triple bursts results in the top-15 bursts, rendered in blue in Fig 6. As for the 13,931 WOS publications, there are 261 bursts total with seven double bursts. The top-15 bursts are rendered in orange in Fig 6 (middle). Between NSF and WOS keywords, there are 47 overlapped keywords. However, there is no overlapping between the top-15 bursts.

Burst weight is indicated by the bar thickness, with ‘Soft Robotics’ having the strongest burst in 2014-2017 for WOS with the weight of 39.3 and ‘Law’ for NSF with the weight of 31.6. ‘STEM’ is the longest most recent NSF burst between 2010 and 2015.

Key authors and collaboration networks

The original dataset has 30,784 unique authors. There are 2,363 authors with more than three papers, 1,531 authors with more than four papers, and 1,096 with more than five papers. There are 621 isolates, authors who have not co-authored with any others during the 20 years. There are 96,982 co-author edges. The average degree is 6.30. The network has 3,255 weakly connected components, including the 621 isolates. The largest connected component consists of 18,545 nodes. The network was filtered by times-cited ≥ 50 resulting in 2,644 nodes and 10,144 edges. The largest connected component of this network, shown in Fig 7 (middle), has 635 authors with 30 isolates removed. The figure uses the very same size- and color-coding as the co-author network for AI (Fig 7 top). The co-author labels are filtered by the number of times an author was cited (≥ 700). Author ‘Menon, Mani’ has the largest number of citations (3,890) in this network. Menon is also the top-cited US author in the 30,784 co-authors network, along with seven other authors (marked in bold in Fig 7 middle table) that appear both in the largest connected component and listed as the top-cited US authors.

Co-author US map

Fig 8 (middle left) shows the co-author network for robotics with nodes representing the number of citations and a darker hue indicating the first year of publication for a given author. The network shows a large concentration in the eastern states and in the mid-US. Pittsburgh and Cambridge are the top-two cities, with robotics research being cited 6,486 and 5,672, respectively (see Fig 8, middle right).

Topical evolution

Electrical Engineering & Computer Science has been the front-runner discipline in robotics for two decades both in publication and citation amount. It is also noticeable in Fig 9 (middle) that Health Related disciplines (e.g., Brain Research, Health Specialties) increased over time. Similar to AI, robotics show a steady increase within each of the 15 disciplines (see also Fig 2 in S1 Appendix.). The top-five most cited papers, along with their publication year and total number of citations, are shown in Table 4 S1 Appendix..

Internet of things (IoT)

The term internet of things (IoT) refers to the network of physical devices such as phones, vehicles, or home appliances that have embedded electronics, software, sensors, actuators, and connectivity allowing them to collect, exchange, and act upon data. The dataset used here starts in 2006. Hence, there are no bursts or authors active before that year.

WOS-top organizations and funding

The top-10 funding organizations and the top-10 research organizations are given in Fig 6 (bottom right). As can be seen, funding from Chinese institutions is most often acknowledged, and three out of the top-10 research institutions are from China. NSF in the US ranks second and two US institutions are listed in the top-10 list. Funding by two European institutions is cited frequently, and four top-10 research institutions are from France.

Burst of activity

In the 348 NSF awards, there are 77 total bursts, with no term bursting more than once. The top-15 are shown in Fig 6 (bottom left). Similarly, for the 11,371 WOS publications there are 77 bursts total with no double bursts. There is no term that bursts in both sets. Burst weight is indicated by bar thickness, with “RFID” (Radio Frequency Identification) having the strongest burst of 62.2 in 2006-2013. It is important to point out that there is a clear separation of initial publications bursts, followed by several funding bursts, followed by a new set of publications bursts. The AI and robotics bursts were much more intermixed. Furthermore, there is a difference in terms of the strongest bursting weight between NSF and WOS. The most bursting term for WOS was “RFID” (62.16), whereas NSF had a much smaller value for its strongest burst, “Vehicles” (4.26).

Key authors and collaboration networks

The original IoT dataset has 23,691 nodes and 56,937 edges. In this network, there are 6,979 authors with more than three papers and 5,517 authors with more than five papers. The network has 3,345 weakly connected components, including the 506 isolates. The largest connected component consists of 11,438 nodes. After filtering by time-cited ≥ 5, the network resulted in 6,939 nodes and 12,371 edges with the largest component of 585 with 98 isolates removed. The network was further filtered by time-cited ≥ 5 with 585 authors plotted in Fig 7 (bottom). The figure uses the very same size- and color-coding as the co-author network for AI (Fig 7 top). The labels are filtered by the times cited (≥ 400). The most cited author is Xu Ld with 3,358 citations in the filtered network.

Co-author US map

Fig 8 (bottom left) exhibits the co-author network for IoT. The lighter hue indicates more recent first publications by authors in this network. Kalamazoo and Norfolk are the top-two cities with IoT research being cited 902 and 635 respectively (see Fig 8 bottom right).

Topical evolution

The topical distribution of WOS publications on IoT is shown in Fig 9. Note that there are only seven papers published in 1998-2007. All other 11,364 journal papers were published in the recent decade. Most of the papers are in the “EE & CS” discipline with some in “Chemical, Mechanical and Civil Engineering.” Note the larger number of papers in “EE & CS” published in venues such as Wireless Personal Communications (WIRELESS PERS COMMUN) (74 papers) and International Journal of Distributed Sensor Network (INT J DISTRIB SENS N) (58) dealing with personal and complex implications of IoT. It is also noticeable that IoT increases its topical coverage from three disciplines (1998-2007) to 13 disciplines (2007-2018), supporting evidence that IoT is a nascent field (see also Fig 3 in S1 Appendix.). The top-five most cited papers plus publication year and total number of citations are shown in Table 4 S1 Appendix..

Convergence

Over the last 20 years, the three areas of “artificial intelligence,” “IoT,” and “robotics” have been merging. That is, there are more and more publications, funding awards, and keywords that are shared between pairs or even among all three of these areas. Fig 10 shows the increase in inter-citation linkages. Citations from papers in AI to papers in robotics and IoT are given in yellow; arrows are thickness-coded by the number of citations. Note that early arrows are rather thin while more recent citation links are thicker. As expected, only papers from earlier or the same year can be cited (i.e., arrows either point downwards or down-left). Citations from papers in robotics are given in red and many cite papers in AI.

Fig 10: Temporal convergence between AI (yellow), robotics (red), IoT (blue).

Citations from papers in IoT are given in blue and they go back to AI and robotics papers as early as 2000; as the IoT dataset only covers papers published in 2004-2017, they only start citing papers from other domain in more recent years (2012-2017), particularly heavily citing robotics in 2013 and 2015 and AI in 2009. Of interest here, we see a spike in NSF awards for IoT research during those years (see Fig 1B).

Validation: User Studies

Decision makers from a variety of areas at NSWC Crane were invited to examine and help interpret initial versions of the visualizations related to AI in terms of readability, memorability, reproducibility, and utility.

This user study was administered as an online survey delivered via Qualtrics which took 30-50 minutes for participants to complete (see S1 Appendix. for the data instrument). Participants were presented with five visualizations on the topic of artificial intelligence. They were asked to complete tasks demonstrating their ability to interpret the visualizations and were asked to provide feedback on the utility of the visualizations in their particular line of work. Feedback collected during the user studies helped optimize algorithm and user interface implementations and improve documentation of the results. Three visualizations are relevant for the work presented here.

The first visualization is a burst diagram, showing bursts of terms in funding awards and in publications. Burst diagrams can be particularly useful for understanding temporal relationships between funding and publication streams. Users speculated that burst rates may be tied to larger economic conditions, which affect funding streams and R&D investment. The visualization can also confirm strategic direction and identify subtle shifts in focus. For example, one user described how the “earlier burst was related more to neural networks, algorithms, and knowledge/expert systems. The recent burst seems to be related to large datasets, computer vision, and deep learning (more “big data” topics).” One can gain insight into areas that are receiving funding now, and may therefore see research advances in the future, which can help drive the development of proposals that will be relevant to granting organizations.

The second visualization showed top-10 subnetworks with the largest number of authorships. Users noted that the ability to understand which researchers are most prolific and which are working across disciplines was valuable. In the words of one user, “having an understanding of major authors in each area and how they interrelate allows me to determine who to work with in a given topic area and who may have a larger breadth of knowledge.” The visualization could be used to identify potential collaborators as well as to deepen a general understanding of how researchers in this topic area are related.

The third visualization illustrated topical evolution visualized on a map of science. Users felt this map had the least direct application to their daily work. When asked to identify which visualizations were most relevant for their work, study participants identified both burst analysis and key authors and collaboration networks as highly relevant. This is consistent with results from the stakeholder needs analysis. Table 3 summarizes the feedback on the utility of each visualization for strategic planning, building potential partnerships, setting research agendas, determining national importance, and making hiring decisions.

Topic Top Organization Top Agencies Burst of Terms Co-author Network Topical Evolution
Strategic planning
Potential partnership
Research
National importance
Hiring
Table 3: Summary from the expert qualitative opinions for three types of visualizations.

Discussion and Outlook

This project used large-scale publication and funding data to support the analysis and visualization of key experts, institutions, publications, and funding in strategic areas. Results of the study can be used to identify leading experts and potential investigators or reviewers; to identify emerging or declining areas; and to understand the roles of other funding agencies or countries in various topic areas.

A stakeholder needs analysis identified key insight needs together with strategic areas of interest. A detailed analysis of topic busts in publication and funding data; international research and funding organizations; major authors in the US; and disciplinary evolution was then performed for all three areas. Results were validated through feedback with professionals working in this area. Novel algorithms such as the double-burst visualizations led to actionable insights for those experts managing and evaluating research portfolios and understanding the evolution of research areas.

Going forward, we are interested to explore additional datasets such as Federal Business Opportunities (FBO) that would make it possible to gain a more comprehensive understanding of the S&T funding landscape.

Experts that participated in the survey and user study expressed a strong interest in interactive data visualizations that make it possible to zoom into specific subareas or to retrieve details on all authors or papers or funding awards. While technically feasible, a visual interface to near real-time datasets would require a different budget than what was available here.

Acknowledgments

The authors would like to thank Joseph Brightbill from Clarivate Analytics Web of Science for providing WOS accession numbers. The authors also thank Perla Brown for design support, Tenzin Choeden and Jimmy Huang for excellent research assistance and Todd Theriault for copyediting. This work uses Web of Science data by Clarivate Analytics provided by the Indiana University Network Science Institute. User studies were conducted under IRB protocol number 1809442778.

References

Supporting information

S1 Appendix.

This appendix contains Tables 1-5 and Figs 1-3.

S1 File.

This file contains the NSF Funding/Organizations Summary for AI, robotics, IoT by countries and type of fundings.