Patent articles contain important research results that are valuable to the industry, academia, business, and policy-making organizations. Patent technology produce novel and industrially usable products which enhance industry’s competitive edge. Therefore, industry giants spend extensively in research activities for retaining and increasing their competitive advantages in the corresponding technology groups. Multiple previous works have shown that R&D outcomes are important assets for any industry giant (Hall, 1999; Hall et al., 2007). World Intellectual Property Organization (WIPO) reports that nearly 90–95% of the world’s R&D outcomes are covered in patent publications. Only the remaining 5–10% are included in the scientific literatures in the form of essays and publications (Liu and Yang, 2008). Therefore, it is crucial to analyze patent information to understand industrial trends and compare research growths of several industries in the same technology group.
In this work, we study the effect of research outcomes in the form of patents on revenue generation of top US industry giants. In contrast to previous works, we correlate patenting activities with the Fortune 500 ranks of companies222Considering ranks instead of revenue values helps to avoid yearwise dollar inflation rates, global economic trends, etc., instead of total revenue values. The Fortune 500 () is an annual ranked list of 500 top-most United States corporations published by the Fortune magazine The ranking is based on the total revenue generated in each fiscal year. The lists includes public and private US corporations — whose revenues are publicly available — from manufacturing, mining, energy exploration, banking, life insurance, retail, transportation, information technology and service domain. The current list is available at: http://fortune.com/fortune500/list/.
Limitations of the existing works: The existing works have several limitations. First, most of these have studied patents granted before 1990s, so they do not consider recent data. Second, existing systems do not consider inter-industry research competition. Third, most of these studies do not take into account the age and field expertise of the companies. Fourth, existing systems utilize crude revenue values that should not be directly compared owing to year-wise dollar inflation rates, global economic trends, etc.
Our contributions: We address some of the above limitations in this work by introducing temporal buckets that group together companies based on their foundation year and present a thorough correlation study between patenting behavior and performance.
Towards this objective, we make the following contributions.
We downloaded a massive patent dataset consisting of more than 2.6 million full text patent articles with nearly 93 million patent citations from the Reed Technology Index.
We invested extensive manual effort in extraction, cleaning, indexing, and other related preprocessing steps.
We conducted rigorous empirical study on this manually curated dataset by first dividing it into buckets based on the company foundation year. Subsequently, we conduct extensive experiments to identify the correlations between the patenting dynamics of companies and their ranks.
As a next step, we deep dive further and identify temporal rank-shifts of the companies and show that they also correlate well with company R&D activities.
Finally, we further identify that inter-industry citations representing competition could lead to decay/rise in the overall growth of the companies.
2. Related Work
A considerable amount of literature has been published to better understand several parameters like patent citations, number of patent applications, number of patent grants. The first serious discussion and analysis of patent data emerged during the 1990s (Narin, 1994). For better organization, we broadly divide the related work into four subparts:
General patent analysis: Derek De Solla Price (de Solla Price, 1969) first showed existence of high positive correlation between scientific output (measured in terms of number of research articles) coming from a country with its gross domestic product (GDP). Two decades later, Narin et al. (Narin, 1994) presented evidences of similarity between literature bibliometrics and patent bibliometrics. They showed that the number of granted patents from a country correlates positively with its GDP. James Bessen (Bessen, 2008) found that patents issued to small patentees are much less valuable than those issued to large corporations. Sampat et al. (Sampat and Ziedonis, 2005)
found that citations to research papers are significantly related to the probability that a patent is licensed, but not to revenues conditional upon licensing. Daimet al. (Daim et al., 2006) forecast for three emerging technology areas namely, fuel cell, food safety and optical storage technologies by utilizing of bibliometrics and patent analysis.
has presented an overview of patenting ethics. They present real examples to show how certain patenting activities such as broad claims, poor disclosures, etc., badly affect social benefits and delay technological advancement. However, recent trends in patenting activities have led to more openness and existence of fierce competition has resulted in overall technological advancement and an inherent benefit to the society. Tesla, in 2014, made open all of its electric vehicle technology patents to accelerate the advent of sustainable transport and to support open source movement(tes, 2014).
Industry R&D and market value: Bronwyn H. Hall conducted several interesting studies to understand market value and patenting output (Hall, 1999; Hall et al., 2005; Hall and Oriani, 2006; Hall et al., 2007, 2010). They showed that market value of the manufacturing corporations are strongly related to their knowledge assets and patenting activities beautifully capture this information (Hall, 1999). In their later work (Hall et al., 2005), they studied patent and citations between 1963–1995 and claimed that extra citation per patent boosts market value by 3%. They showed that for European corporations also, a firm’s Tobin’s , defined as the ratio of market value to the replacement value of firm’s physical assets, is positively and significantly associated with R&D and patent stocks (Hall et al., 2007). In their survey work (Hall et al., 2010), they observed that private returns to R&D are strongly positive and higher than those for ordinary capital. Otto et al. (Toivanen et al., 2002) studied relationship between innovation and the market value of UK firms in a seven year period (1989–1995). Hsu et al. (Hsu et al., 2014) showed that industries that are more dependent on external finance and that are more high-tech intensive exhibit a disproportionately higher innovation level in countries with better developed equity markets.
Relating revenue and patenting activities: Not much is known about the correlation between Fortune 500 ranks and patenting activities. Wang et al. (Xianwen et al., 2010) combined social network analysis with the patent co-citation network of Fortune 500 companies to evaluate technology level of an enterprise and also identified their core technical competitive power. In their later work (Wang et al., 2011), they identified several technology groups based on the co-citation networks. They also studied relationship between leading companies and technology groups. Zhu et al. (Zhu, 2000) proposed several diverse measures for characterizing the financial performance of the Fortune 500 companies. Strikingly, they found that only about 3% companies were operating on the best-practice frontier.
Unlike most of the previous studies, this paper is different in three aspects – (i) we present a correlation study between temporal ranks (instead of crude revenue values) of companies with patenting outputs, (ii) we meticulously overcome the bias of experience/age by introducing three temporal buckets, and (iii) as opposed to previous works, we present empirical evidences that inter-company in-citations correlate well with rise/fall in ranks.
We compile two datasets for the current study.
Patent dataset: We construct a structured patent dataset by crawling full text articles indexed in Reed Technology Index (ree, 2018). The compiled patent dataset consists of patent metadata, such as the unique patent identifier (assigned after the patent granting process), the application year, the grant year, the patent title, the applicants’ name, the company name, etc., along with patent bibliography, such as the patent citations, the non-patent citations including scientific citations, urls, blogs, white papers, etc. Table 1 presents the detailed description of the compiled dataset. The processed dataset is available at: https://github.com/mayank4490/Innovation-and-revenue.
|Grant year range||2005–2017|
|Application year range||1965–2016|
|Number of citations||93,938,858|
|Number of patent citations||75,118,567|
|Number of non-patent citations||18,820,291|
F500 dataset: We compile the Fortune 500 rank lists published between 2005–2017. The major challenge in the compilation process was to normalize the company names present in different lists. Figure 1 shows the decay in the number of common companies as newer yearly rank lists are taken into consideration. Overall, we find 201 companies that are present across all the rank lists.
4. 50-year temporal buckets
In this section, we detail the construction procedure of the 50-year temporal buckets.
Filtering: Out of the 201 companies in dataset, only 72 companies have at least 100 patents granted between 2005–2017. We also discard few very old (before 1850) and new (after 2000) companies to remove “corner” cases. Overall, we find 68 companies that satisfies all the above criterion. The rest of the paper presents all the experiments on these 68 companies.
Bucketing: We divide these 68 companies into three 50-year temporal buckets to perform the subsequent experiments. The first bucket (bucket I) consists of companies founded between 1851–1900. The bucket II and bucket III consists of companies founded between 1901–1950 and 1951–2000 respectively. The proposed bucketing scheme eliminates the normalization efforts needed to accommodate the company age. Buckets I, II and III consists of 16, 29 and 23 companies respectively (see Table 2). Bucket I mostly consists of consumer product companies, while bucket III mostly comprises information technology companies. Bucket II consists of a mixture of several groups including consumer product, information technology and automobiles.
|Bucket I||Bucket II||Bucket III|
|Company name||Foundation year||Company name||Foundation year||Company name||Foundation year|
|Corning||1851||Archer Daniels Midland||1902||Comcast||1963|
|General Mills||1856||Ford Motor||1903||Nike||1964|
|Kimberly Clark||1872||Harley Davidson||1903||Applied Materials||1967|
|Conocophillips||1875||Rockwell Automation||1903||Quest Diagnostics||1967|
|PPG Industries||1883||Kellogg||1906||First Data||1971|
|Johnson & Johnson||1886||Baker Hughes||1907||Oracle||1977|
|Bristol Myers Squibb||1887||General Motors||1908||Micron Technology||1978|
|Abbott Laboratories||1888||IBM||1911||Boston Scientific||1979|
|General Electric||1892||Illinois Tool Works||1912||Cisco Systems||1984|
|General Dynamics||1899||Cummins||1919||Capital One Financial||1988|
|Eastman Chemical||1920||Time Warner||1990|
|Texas Instruments||1930||Verizon Communications||2000|
Name normalization: We find huge variations in the company names within the patent metadata. These variations exist due to industry organization hierarchy such as different geographic locations of the research labs, several technology teams, collaborations, etc. We also find huge number of variations resulting from spelling errors, acronyms, etc. Table 3 shows one representative example. We manually normalize the different names of all the 68 companies that we experiment on. AT&T has the maximum number of unnormalized variations (total 95).
|Normalized name||Unnormalized variations|
stryker development llc, stryker biotech, stryker canadian management, stryker combo l l c, stryker coropration, stryker endoscopy, stryker ireland, stryker endo, stryker european holdings i llc, stryker france, stryker gi services c v, stryker stryker gi, stryker, stryker instruments stryker leibinger gmbh co kg, stryker leibinger gmbh co kg, stryker nv operations, stryker ortho pedics, stryker orthopaedics, safe orthopaedics, stryker puerto rico, stryker trauma ag, stryker trauma gmbh, stryker truama s a, stryker trauma sa, stryker trauma s a, stryker spine, styker spine
5. Correlating temporal ranks and patenting patterns
5.1. Experimental setup
We employ standard Pearson’s correlation coefficient metric (Lee Rodgers and Nicewander, 1988) for computing the correlation between companies’ patenting activity parameters (e.g., grant count, application count, citation count, etc.) and the respective ranks. The next four experiments are categorized into two sets as follows:
5.2. Effect of patenting on F500 ranks
In this experiment, we measure the correlation between the ‘current’ patent grant count of a company with its next five year (denoted by )
ranks. For the 68 companies in our list, we draw the ‘current’ grant count from seven different years (2005–2011), call these as ‘start’ years and estimate the correlation of each of these start year with theranks of the next five years. Figure 2 illustrates the average correlation over the seven different start years. Each bucket shows a different temporal characteristics.
Key observations: We observe higher correlation values for bucket I compared to the buckets II and III. Bucket I companies’ future revenue is therefore heavily dependent on the current patenting volume. For all the three buckets, an overall positive correlation indicates that companies with higher patenting volume tend to garner higher revenues. A further interesting point is that for bucket I companies, the effect of current patenting volume is more pronounced on the revenue garnered in the later years demonstrated by the overall increase in the correlation value. However, the correlation seems to remain stable for the two other buckets.
More experiments: We take a step further, reporting in Figure 3, the correlation values for each of the seven start years separately. Similar to our earlier observations, for each start year, we find a higher correlation for bucket I as compared to the buckets II and III. Bucket I shows that the correlation is above 0.8 for majority (4 out of 7) of the start years at . This leads us to claim that for the bucket I companies, the patenting volume affects the ranks more sharply in the long run.
In Figure 3, bucket II shows an interesting trend. Initial start years exhibit a higher correlation as compared to subsequent start years. Interestingly, for the last two start years (2010 and 2011) the correlation is quite low for all the different . This leads us to conclude that the dependence of company revenue on the patenting volume is on a steady decline for the bucket II companies.
Lastly, bucket III shows same trends for every individual value of . The two key observations here are: (i) correlation remains invariant for different values, and (ii) as opposed to bucket II, initial start years exhibit a significantly low correlation as compared to subsequent start years. Therefore, for this bucket, as time progresses, there is a steady rise in the influence of patenting volume on the future ranks.
5.3. Effect of F500 ranks on patenting
In this section, we perform the reverse experiment. We correlate the ‘current’ rank of the companies with their respective patent application counts in the next five years (denoted by ). Once again, for the 68 companies in our list, we draw the ‘current’ ranks from seven different years (2005–2011), call these as ‘start’ years and estimate the correlation of each of these start year with the respective patent application counts in the next five years. Figure 4 illustrates this correlation by averaging over seven different start years (2005–2011). Similar to Figure 2, here also, each bucket shows a different temporal characteristic.
Key observations: We observe higher correlation values for bucket I compared to buckets II and III. For bucket I companies, current revenue seems to strongly drive future patenting volume. Companies with better ranks tend to produce higher overall research output and vice-versa. In Figure 5, we present correlation for each start year separately. All the three buckets exhibit a low correlation during the “global recession” period (2007–2009) (wik, 2018).
5.4. Effect of incoming citations on F500 ranks
This experiment is similar to the one outlined in Section 5.2 except that the grant count is replaced by the overall incoming citations to all the patents produced by the company. Interestingly, we observe very similar trends as noted in Section 5.2 (figure not shown). We observe higher correlation values for bucket I compared to the buckets II and III. Bucket I companies’ future revenue is therefore heavily dependent on the current incoming citation volume.
5.5. Effect of F500 ranks on incoming citations
This experiment is a similar to the one discussed in Section 5.3 except that here the next five year application count is replaced by the next five year incoming citations to all the patents produced by the company. The results exhibit very similar trends as those in Section 5.3 (figure not shown). We observe higher correlation values for bucket I compared to bucket II and III. For bucket I companies, current revenue strongly drive future incoming citation volume.
5.6. Possible explanations
In this section, we attempt to explain the overall observations that we made in the last four sections. In particular, for each bucket, we study the incoming citations from the other two buckets between 2005–2017. Figure 6 shows the yearwise proportion of inter-bucket incoming citations. As can be noted, bucket I receives marginal number of incoming citations from both bucket II and III. Bucket II receives less incoming citations from bucket I but a considerably large number of incoming citations from bucket III. Bucket III, on the other hand, receives less incoming citations from bucket I and a moderate volume of incoming citations from bucket II. A crucial point to stress here is that the volume of incoming citations from bucket III to bucket II is much larger compared to the other direction (i.e., bucket II to bucket III). We term this as a form of knowledge stealing, i.e, bucket III is able to ‘steal’ many more novel ideas from bucket II and build up on them than the other way round. Bucket I is self sustained, witnesses least competition, neither cites nor receives high volume of incoming citations from the rest of the two buckets. Note that Figure 6 do not show proportion of self-citations and citations coming from the rest of the companies not considered in this study. A natural analogy is that the bucket I companies behave like ‘cocoons’, the bucket II companies behave like ‘larva’ and the bucket III companies behave like ‘butterflies’.
We, next, select two representative companies from each of the three buckets and list the top 10 highly citing companies (see Table 4). Once again, it is apparent that a considerable fraction of incoming citations to bucket II companies arrive from bucket III. In contrast, bucket I companies have most of the incoming citations coming from the same bucket itself.
|Bucket I: the cocoon|
|Johnson & Johnson||Pepsico|
|Johnson & Johnson (14.3)||The Coca-Cola (27.5)|
|Abbott Laboratories (11.0)||Pepsico (17.9)|
|Novartis (10.7)||Meadwestvaco (8.3)|
|Brien Holden Vision Inst. (6.9)||Concentrate Mfg. (2.5)|
|Pixeloptics (2.6)||Kimberly Clark (1.7)|
|Coopervision Int. (2.3)||Food Equipment Tech. (1.7)|
|Google (2.3)||Crestovo (1.7)|
|Mcneil (2.1)||Givaudan (1.7)|
|The Procter & Gamble (1.6)||Bunn-o-matic (1.7)|
|E-vision (1.5)||Starbucks (1.7)|
|Bucket II: the larva|
|IBM (23.5)||IBM (8.5)|
|Microsoft (6.2)||Hewlett Packard (6.5)|
|Google (2.2)||Semiconductor Energy Lab. (5.7)|
|Apple (1.9)||Microsoft (4.5)|
|Oracle (1.7)||Google (2.0)|
|Taiwan Semiconductor (1.4)||Qualcomm (1.9)|
|Intel (1.4)||Apple (1.7)|
|Tela Innovation (1.4)||Intel (1.6)|
|Micron Technology (1.3)||Canon (1.4)|
|Hewlett Packard (1.1)||Samsung (1.1)|
|Bucket III: the butterfly|
|Intel (14.3)||Microsoft (20.1)|
|IBM (9.0)||IBM (6.1)|
|Taiwan Semiconductor (3.7)||Apple (4.7)|
|Microsoft (3.1)||Google (4.0)|
|Micron Technology (2.5)||Oracle (1.4)|
|Qualcomm (2.2)||Amazon (1.4)|
|Samsung (1.7)||AT&T (1.2)|
|Apple (1.5)||Qualcomm (1.0)|
|United Microelectronics (1.5)||Samsung (0.9)|
|Google (1.2)||SAP (0.9)|
6. Temporal rank-shifts
In this section, we consider a 13-year time period to understand the shifts in the
rank profiles. 68 companies are classified into four broad categories based on their rank-shift profiles333We perform 5-year moving average to make the rank-shift profiles smooth.:
MonInc: Rank of a company is becoming better monotonically over (lower and lower) time. The revenue for these companies is therefore on a rise as time progresses (hence the name MonInc). At least 80% of the consecutive year ranks differences are positive.
MonDec: Rank of a company is worsening monotonically (higher and higher) over time. The revenue for these companies is therefore going down as time progresses (hence the name MonDec). At least 80% of the consecutive year ranks differences are negative.
Stable: Rank of a company remains stable over the years. This classification is carried out in two steps; first, we compute 13-year average () of ranks. Next, we select companies having at least 80% year ranks between . We also experiment with other variations like , , etc. The one we have chosen is able to produce the most clear separation from the other categories.
Others: Remaining companies are kept in this category. This category includes companies with rank profiles having multiple crests and troughs over time.
Key observations: Overall, we find 14, 5, 23 and 26 companies in MonInc, MonDec, Stable and Others categories respectively. Figure 7 shows the rank profiles of two representative companies from each category. Table 5 groups together companies from different buckets in each category. It utilizes a color scheme to represent each bucket – green color for bucket I, red color for bucket II and blue color for bucket III. Interestingly, majority of the items in MonInc are from bucket III, i.e., the ‘butterfly’ companies. This observation once again reinforces our earlier claim that these companies are improving upon their ranks by drawing knowledge (i.e., ‘stealing’) from bucket II companies and effectively building newer and more innovative ideas on them. In contrast, MonDec consists of majority of bucket II, i.e. the ‘larva’ companies. This indicates that the bucket II companies are not able to draw and build up on the knowledge generated in the other buckets to improve upon their ranks. This might be a potential sign of such companies ‘drying up’ in the near future.
|Corning||Bristol Myers Squibb||Kimberly Clark|
|General Mills||Texas Instruments||Johnson Johnson|
|Stryker||Morgan Stanley||General Electric|
|Capital One Financial||Archer Daniels Midland|
|Micron Technology||Ford Motor|
We conduct the first plausible correlation study between research output with the Fortune 500 ranks. An interesting future direction would be to automatically predict future revenue of companies based on the correlations established here.
- tes (2014) 2014. All Our Patent Are Belong To You. https://www.tesla.com/blog/all-our-patent-are-belong-you. Accessed: 2018-12-28.
- wik (2018) 2018. Global recession. https://en.wikipedia.org/wiki/Global_recession. Accessed: 2018-12-28.
- ree (2018) 2018. USPTO Data Sets. http://patents.reedtech.com/. Accessed: 2018-12-28.
- Bessen (2008) James Bessen. 2008. The value of U.S. patents by owner and patent characteristics. Research Policy 37, 5 (2008), 932 – 945. https://doi.org/10.1016/j.respol.2008.02.005
- Daim et al. (2006) Tugrul U. Daim, Guillermo Rueda, Hilary Martin, and Pisek Gerdsri. 2006. Forecasting emerging technologies: Use of bibliometrics and patent analysis. Technological Forecasting and Social Change 73, 8 (2006), 981 – 1012. https://doi.org/10.1016/j.techfore.2006.04.004 Tech Mining: Exploiting Science and Technology Information Resources.
- de Solla Price (1969) Derek J de Solla Price. 1969. Measuring the size of science. Israel Academy of Science.
- Gifford (2004) Daniel J Gifford. 2004. How Do the Social Benefits and Costs of the Patent System Stack up in Pharmeceuticals. J. Intell. Prop. L. 12 (2004), 75.
- Hall (1999) Bronwyn H. Hall. 1999. Innovation and Market Value. Working Paper 6984. National Bureau of Economic Research. https://doi.org/10.3386/w6984
- Hall et al. (2005) Bronwyn H. Hall, Adam Jaffe, and Manuel Trajtenberg. 2005. Market Value and Patent Citations. The RAND Journal of Economics 36, 1 (2005), 16–38. http://www.jstor.org/stable/1593752
- Hall et al. (2010) Bronwyn H. Hall, Jacques Mairesse, and Pierre Mohnen. 2010. Chapter 24 - Measuring the Returns to R&D. In Handbook of the Economics of Innovation, Volume 2, Bronwyn H. Hall and Nathan Rosenberg (Eds.). Handbook of the Economics of Innovation, Vol. 2. North-Holland, 1033 – 1082. https://doi.org/10.1016/S0169-7218(10)02008-3
- Hall and Oriani (2006) Bronwyn H. Hall and Raffaele Oriani. 2006. Does the market value R&D investment by European firms? Evidence from a panel of manufacturing firms in France, Germany, and Italy. International Journal of Industrial Organization 24, 5 (2006), 971 – 993. https://doi.org/10.1016/j.ijindorg.2005.12.001
- Hall et al. (2007) Bronwyn H Hall, Grid Thoma, and Salvatore Torrisi. 2007. The market value of patents and R&D: Evidence from European firms.. In Academy of Management Proceedings, Vol. 2007. Academy of Management, 1–6.
- Hsu et al. (2014) Po-Hsuan Hsu, Xuan Tian, and Yan Xu. 2014. Financial development and innovation: Cross-country evidence. Journal of Financial Economics 112, 1 (2014), 116 – 135. https://doi.org/10.1016/j.jfineco.2013.12.002
- Lee Rodgers and Nicewander (1988) Joseph Lee Rodgers and W Alan Nicewander. 1988. Thirteen ways to look at the correlation coefficient. The American Statistician 42, 1 (1988), 59–66.
- Liu and Yang (2008) Chen-Yuan Liu and James Chingyu Yang. 2008. Decoding patent information using patent maps. Data Science Journal 7 (2008), 14–22.
- Narin (1994) F. Narin. 1994. Patent bibliometrics. Scientometrics 30, 1 (01 May 1994), 147–155. https://doi.org/10.1007/BF02017219
- Sampat and Ziedonis (2005) Bhaven N. Sampat and Arvids A. Ziedonis. 2005. Patent Citations and the Economic Value of Patents. Springer Netherlands, Dordrecht, 277–298. https://doi.org/10.1007/1-4020-2755-9_13
- Spławiński (2005) Jacek Spławiński. 2005. Patents and ethics: Is it possible to be balanced? Science and engineering ethics 11, 1 (2005), 71–74.
- Toivanen et al. (2002) Otto Toivanen, Paul Stoneman, and Derek Bosworth. 2002. Innovation and the Market Value of UK Firms, 1989–1995*. Oxford Bulletin of Economics and Statistics 64, 1 (2002), 39–61. https://doi.org/10.1111/1468-0084.00002
- Wang et al. (2011) Xianwen Wang, Xi Zhang, and Shenmeng Xu. 2011. Patent co-citation networks of Fortune 500 companies. Scientometrics 88, 3 (01 Sep 2011), 761–770. https://doi.org/10.1007/s11192-011-0414-x
- Xianwen et al. (2010) Wang Xianwen, Liu Zeyuan, and Hou Haiyan. 2010. Technology development and technology competition of enterprises based on patent co-citation analysis: A study on industrial enterprises of Fortune 500 [J]. Science Research Management 4 (2010), 017.
- Zhu (2000) Joe Zhu. 2000. Multi-factor performance measure model with an application to Fortune 500 companies. European Journal of Operational Research 123, 1 (2000), 105 – 124. https://doi.org/10.1016/S0377-2217(99)00096-X