Innovation plays an essential role in the development of the modern global economy. It ranks among the most important of human traits, driving economic growth through the creation of job opportunities, new products and services, motivating cities, regions and countries to create environments that foster it to improve their competitiveness in local and global markets (Lea, ). Innovation is also a key component of sustainable development, and a means to uplift humanity. The United Nations (UN) has set goals for sustainable development with the aim of ending poverty, protecting the planet and bringing prosperity to all (Sus, ). UN states that one of the targets among these sustainable development goals is to “support domestic technology development, research and innovation in developing countries.” While innovation is easy to perceive, it is difficult to define, and consequently even more difficult to measure. Furthermore, it is not well understood what the economic, sociological or anthropological drivers of innovation are, and which outcomes or behaviors are results of innovative actions.
Measuring innovation is an ongoing effort in the international community and currently there are many innovation indexes, surveys and reports. A comprehensive survey of these reports can be found in the WEF Leading Indicators of Innovation study (Eur, ). The study observes a significant diversity in how these various indexes, surveys and reports approach the topic of innovation. Some have updated their innovation index yearly to provide a basis for comparisons over time, while others have only published their innovation index once. Similarly, the geographical scope of innovation covered varies as well. The study also highlights several issues with the ongoing approaches. The first issue is that the existing indexes, surveys and reports examining innovation are predominantly focused on “input” indicators of innovation (e.g. research and development expenditures, education level of population), which measure the context, environment and enabler factors that facilitate innovation, rather than actual innovation output or performance. This heavy emphasis on input indicators limits our understanding of innovation performance; while input indicators contribute to innovation capacity, they, unlike output indicators, do not measure results of innovation. Therefore, there is a need for an approach that would allow for better quantification of the overall innovation performance or perceived level of innovation.
This work attempts to provide a step forward towards developing an innovation index, which would enable the measurement of innovation capabilities in an ongoing, dynamic, regional and action-oriented way. We utilize a data driven approach to identify measurable drivers of innovation, based on predictive analysis between the input country level metrics, innovation indicators and perceived innovation levels. Our hope is that this work will contribute to better understanding of what makes a country innovative, that it will offer actionable guidance in improving innovation outcomes at global and country levels, and eventually lead towards the construction of an Open Innovation Index.
This paper is organized as follows. In Section 2 we provide an overview of datasets used and describe the indicators and innovation scores considered. In Section 3 and 4, we provide details and results of causal and predictive modeling, respectively. Conclusions and next steps can be found in Section 5.
Our analysis seeks to discover input/output relationships between historical data on numerous country-level metric (input) and perceived levels of innovation (output). To do so, we onsider data from the Global Competitiveness Report (GCR) (Schwab & Sala-i Martín, 2013), and country level metrics in the World Development Indicators (WDI) (WDI, 2014).
|Higher education and training||Infrastructure|
|Goods market efficiency||Health and education|
|Labor market efficiency||Financial markets|
2.1 Global Competitiveness Report
To capture information on the perceived level of innovation (output) we use the Global Competitiveness Report (GCR, ), a yearly report published by the WEF since 2004. The report ranks countries based on the Global Competitiveness Index, (GCI), which assesses the ability of countries to provide high levels of prosperity to their citizens. This in turn depends on how productively a country uses available resources. Therefore, the Global Competitiveness Index measures the set of institutions, policies, and factors that set the sustainable current and medium-term levels of economic prosperity (GCI, ). Over 110 variables contribute to the index; two thirds of them come from the Executive Opinion Survey, and one third come from publicly available sources, such as the UN. This survey contains the responses of roughly 14000 business leaders from 142 economies. The GCI variables are organized into twelve pillars (see Table 1), with each pillar representing an area considered as an important determinant of competitiveness. Each of the pillars is further divided into several sub-components, which help measure that pillar. Of particular interest to this analysis is the 12th pillar: Innovation. We will use this pillar score as the ground truth for a country’s innovation score.
2.2 World Development Indicators
World Development Indicators form the primary World Bank collection of development indicators, compiled from officially recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates. This statistical reference includes over 1500 indicators covering more than 150 economies. The annual publication is released in April of each year, and the online database is updated three times a year. The World Bank’s Open Data site provides access to the WDI database free of charge to all users. A selection of the WDI data is featured at data.worldbank.org. We will use the statistics provided by these indicators as inputs to our analyses.
3 Causal Analysis
To identify indicators that are causally related with innovation measurements, we perform an analysis based on Granger causality (Granger, 1988)
. Granger causality is a notion studied in the statistics, econometrics, machine learning and data mining literatures. The method utilizes time series to understand which factors affect other factors in the future. The main argument is that if a time seriessignificantly helps improve the prediction of the future values of time series , then is a potential cause of .
The country level analysis considers a single country, and attempts to identify factors that are causally related to its innovation output. The country level analysis will shed light on a lever a particular country might be applying (either favorably or adversely) that is causally related to its level of innovation. Note, however, that if for a given country there is no activity in a particular metric (for example, no changes in R & D investment, or growth in Internet users), despite its potential relevance to innovation, this metric will not be identified as causal indicator for the given country. In our analysis, we test if the past WDI metrics are predictive of the future innovation score. To do so, we use the time series data from 2007-2014.
Let us consider a country with WDI metrics. Each metric is represented as a time-series with time-points. Each country also has a time-series , which corresponds to the innovation score. At a time , we would like to predict the innovation metric value using the past values to of the innovation metric and the past
values of the WDI metrics. We also perform the same analysis by using the non-innovation related GCI metrics. The prediction problem is solved using as a sparse linear regression approach described bySindhwani & Lozano (2010). Sparse linear regression approaches jointly perform variable selection and parameter estimation. The coefficients for the selected variables are indicative of the causal effect on the innovation score. We can further exploit the group structure among the lagged temporal variables, which is imposed by the time series they belong to. For example, lagged variables , etc., of the same time series can be considered to form a group of related variables, given that they are derived from the same metric. Leveraging the group structure information leads to a more faithful implementation of the Granger causality test, as we would look at the complete time-series of a metric to determine its coefficient, instead of just looking at a lagged value of a metric. The group structure is thus imposed in terms of variable selection. If a variable is selected, then all the lagged values of this variable are also selected, and vice versa.
In our analysis, the stopping point for group selection is tuned using approximate criterion (Yuan & Lin, 2006) to maximize the regression performance. We found that in this work, the lag value provided optimal results considering the trade-off in sample size, the structure of the WDI and GCI data, and availability of metrics. The examples of causal relationships between the non-innovation GCI metrics and overall innovation score for Nicaragua are given in Figure 1. Note that in the case of correlated features, the Granger causality test will typically select one (or few) features from the group. Given that the objective of this work is to identify all levers of innovation, for each causal factor discovered we also include up to ten metrics that are strongly correlated with it. The correlations are computed at the country level, hence depending on a county, the same metric can have different correlates.
4 Predictors of Innovation
The most reliable measurements of innovation today are collected manually via executive surveys such as GCI. In this section we consider the problem of constructing an innovation index, a measurement that can be formed automatically from easily collectible metrics, thereby allowing us to benchmark countries without the burden of conducting opinion surveys. In order to enhance our understanding of how various indicators translate into the innovation scores, we formulate an appropriate prediction problem. We consider the overall innovation score from the GCI as our target variable and try to predict it using the country level indicators from WDI and non-innovation related GCI metrics. The predictive model can tell us the relative importance of different indicators in regards to the innovation score, it can help develop intuition on what factors are important for innovation, and finally, aid in identifying the metrics that should contribute to the Open Innovation Index.
To predict the innovation score, we use the Random Forest (RF) algorithm (Breiman, 2001)
. For all countries in our data set we perform prediction of the GCI innovation score using the WDI indicators. Furthermore, we also consider the prediction of the GCI innovation score using the GCI non-innovation metrics. Although not necessarily important for constructing the index, the later analysis is useful in further understanding innovation and its levers. A total of 462 WDI metrics and 162 GCI metrics that are consistently available through years 2007 to 2014 are used for the predictive modeling. Each metric is standardized to have zero mean and unit variance. Missing values are filled with 0. In the case of GCI metrics, we achieve anvalue of 0.93. In the case of WDI metrics, we achieve an value of 0.88.
|Cluster 1||Algeria, Angola, Burundi, Gabon, Guinea,|
|Haiti, Libya, Myanmar, Timor-Leste, Yemen|
|Cluster 2||Australia, Austria, Belgium, Canada, France,|
|Iceland, Ireland, Luxembourg, Norway,|
|Switzerland, United Kingdom, United States|
|Cluster 3||Bulgaria, Chile, Colombia, Costa Rica,|
|Croatia, Ecuador, Lithuania, Panama, Peru|
|Cluster 4||Brunei Darussalam, Kuwait, Saudi Arabia|
|Cluster 5||Brazil, China, Hungary, India, Indonesia,|
|Malaysia, Mexico, Poland, Russia,|
|South Africa, Thailand, Turkey, Vietnam|
4.1 Contribution Analysis
We can further probe into the RF model learned and look at the decision path for a particular country in predicting its innovation score. This can help us understand which metrics were crucial in predicting the innovation score for a particular country and allow us to do comparative analysis between a pair of countries, or among groups of countries. The decision path can be described in terms of the contribution made by each metric towards its innovation score. Let the number of metrics be and denote the contribution of the metric towards the innovation score for a country . Then the innovation score can be obtained as the sum of all the . Note that the contribution value of a variable, unlike in a linear model, is not global and depends on other variables and is specific to a particular data sample.
The representation of a country using the contributions towards innovation score denoted as
. This representation disentangles each WDI factors influence over innovation and hence is highly informative and meaningful. We use this representation to cluster countries into group of countries which are not only at the same innovation level but also have similar mechanisms in play while reaching that level. The k-means algorithm(Arthur & Vassilvitskii, 2007) is employed with the number of clusters set to . Table 2 shows the constituent countries of a few clusters. These groups of countries can then be used to compare a country within its group or across other groups to get a more insightful comparative analysis in terms of innovation drivers.
Figure 2 shows the WDI metrics with large differences in contribution values for pairs of similar countries. The first pair we consider is Kenya and Tanzania, and the second is Singapore and Hong Kong. In each pair, the constituent countries are similar in terms of WDI metrics and have relatively close predicted innovation scores. In the case of Kenya and Tanzania, quality of port infrastructure and burden of customs procedure have more positive impact on the prediction of the innovation score for Kenya than for Tanzania. The similar can be said for merchandise exports metric of Singapore in comparison to Hong Kong. The large positive effects of other metrics are balanced by household final consumption expenditure per capita metric in favor of Hong Kong. Such analysis can help decision makers obtain more targeted insights where they want to compare a country with a similar country based on a benchmark or a country in the same geographical location or similar development stage.
In this work, we proposed a set of analyses that would contribute to better understanding of innovation, how to measure it, and how to drive actionable insights for diverse countries and diverse innovation conditions around the world. Our approach is data-driven, and aims to produce innovation measurements that are repeatable, systematic and objective, and can lead to dynamic country-level benchmarks. This would allow policy makers and organizations like WEF to more efficiently shape policy and interventions in under-developed countries in order to increase their developmental activity and capacity for innovation.
As future work, we plan enrich our dataset by incorporating more indicators and longer history and also enhance our causal and predictive models to explicitly handling correlated indicators. This will improve the robustness of the models and make them more reliable. We also plan to build visualizations that would allow us to communicate actionable insights and country-level benchmark, in an easily consumable way.
- (1) European innovation scoreboards. http://ec.europa.eu/growth/industry/innovation/facts-figures/scoreboards/index_en.htm.
- (2) Global competitiveness index (GCI) - methodology. http://reports.weforum.org/global-competitiveness-report-2014-2015/methodology.
- (3) Global competitiveness reports (GCRs). http://reports.weforum.org/global-competitiveness-report-2015-2016.
- (4) The World Economic Forum Economics of Innovation Global Agenda Council evaluates leading indicators of innovation. Technical report, World Economic Forum.
- (5) Sustainable development goals. http://www.un.org/sustainabledevelopment/sustainable-development-goals.
- WDI (2014) World Development Indicators (WDI) 2014. World Bank Publications, 2014.
- Arthur & Vassilvitskii (2007) Arthur, D. and Vassilvitskii, S. K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035, 2007.
- Breiman (2001) Breiman, L. Random forests. Machine Learning, 45(1):5–32, 2001.
- Granger (1988) Granger, C. W. J. Some recent development in a concept of causality. Journal of Econometrics, 39(1):199–211, 1988.
- Schwab & Sala-i Martín (2013) Schwab, K and Sala-i Martín, X. The global competitiveness report 2013–2014: Full data edition. In World Economic Forum, pp. 551, 2013.
- Sindhwani & Lozano (2010) Sindhwani, V. and Lozano, A. C. Block variable selection in multivariate regression and high-dimensional causal inference. In Advances in Neural Information Processing Systems, pp. 1486–1494, 2010.
- Yuan & Lin (2006) Yuan, M. and Lin, Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49–67, 2006.