On the Impact of Micro-Packages: An Empirical Study of the npm JavaScript Ecosystem

09/14/2017
by   Raula Gaikovina Kula, et al.
Osaka University
0

The rise of user-contributed Open Source Software (OSS) ecosystems demonstrate their prevalence in the software engineering discipline. Libraries work together by depending on each other across the ecosystem. From these ecosystems emerges a minimized library called a micro-package. Micro- packages become problematic when breaks in a critical ecosystem dependency ripples its effects to unsuspecting users. In this paper, we investigate the impact of micro-packages in the npm JavaScript ecosystem. Specifically, we conducted an empirical in- vestigation with 169,964 JavaScript npm packages to understand (i) the widespread phenomena of micro-packages, (ii) the size dependencies inherited by a micro-package and (iii) the developer usage cost (ie., fetch, install, load times) of using a micro-package. Results of the study find that micro-packages form a significant portion of the npm ecosystem. Apart from the ease of readability and comprehension, we show that some micro-packages have long dependency chains and incur just as much usage costs as other npm packages. We envision that this work motivates the need for developers to be aware of how sensitive their third-party dependencies are to critical changes in the software ecosystem.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

01/27/2022

An Empirical Study of Yanked Releases in the Rust Package Registry

Cargo, the software packaging manager of Rust, provides a yank mechanism...
05/26/2022

Giving Back: Contributions Congruent to Library Dependency Changes in a Software Ecosystem

Popular adoption of third-party libraries for contemporary software deve...
02/23/2018

An Empirical Study on README contents for JavaScript Packages

Contemporary software projects often utilize a README.md to share crucia...
12/10/2020

Guiding Development Work Across a Software Ecosystem by Visualizing Usage Data

Software is increasingly produced in the form of ecosystems, collections...
08/17/2021

A grounded theory of Community Package Maintenance Organizations-Registered Report

a) Context: In many programming language ecosystems, developers rely mor...
10/02/2017

Extracting Insights from the Topology of the JavaScript Package Ecosystem

Software ecosystems have had a tremendous impact on computing and societ...
08/13/2021

Contrasting Third-Party Package Management User Experience

The management of third-party package dependencies is crucial to most te...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

User-contributed Open Source Software (OSS) ecosystems have become prevalent in the software engineering discipline, capturing the attention of both practitioners and researchers alike. In recent times, ‘collections of third-party software’ ecosystems such as the node package manager (npm) JavaScript [NPM] package ecosystem fosters development for huge amounts of server-side NodeJs and client-side JavaScript applications. According to a study in 2016 [Wittern:2016], the npm ecosystem for the NodeJs platform hosts over 230 thousand packages with ‘hundreds of millions package installations every week’.

Jansen et al. [Jansen09ICSE] states that ecosystems emerge when ‘large computation tasks [are] split up and shared by collection of small, nearly independent, specialized units depending on each other’. Based on this concept, we conjecture that third-party software ecosystems like npm encourage the philosophy of specialized software within these self-organizing ecosystems. A micro-package is the result of when a package becomes ‘minimalist’ in its size and performs a single task [Mens:2016]. For instance, the negative-zero111https://www.npmjs.com/package/negative-zero package has the trivial task of determining whether or not an input number has a negative-zero value. Micro-packages function as a single unit by forming ‘transitive’ dependencies between dependent packages (i.e., dependency chains) across the ecosystem.

We conjecture that an influx of micro-packages will result in an ecosystem that becomes fragile to any critical dependency changes. This is where breaking one critical dependency in the ecosystem will ripple its effect down the dependency chain to all dependent packages. For example, a breakage by removal of a tiny package called

left-pad

in March 2016 caused waves of dependency breakages throughout the npm ecosystem when it ‘broke the internet’ of web applications [npmIssue]. Although left-pad is tasked with simply ‘adding left space padding to a html page’, it was heavily relied upon in the ecosystem by thousands of unaware packages and applications, including the core Babel compiler and Node JavaScript environment.

In this paper, we investigate the impact of micro-packages within the npm JavaScript ecosystem. Specifically, we conducted an empirical investigation with 169,964 npm packages to understand (RQ1:) the spread of micro-packages across the npm ecosystem. We then investigate micro-package usage implications such as the (RQ2:) the size of its complex dependency chains and (RQ3:) the amount of developer usage costs (ie., fetch, install, load times) incurred by using a micro-package. Findings show that micro-packages account for a significant portion of the ecosystem, with some micro-packages having just as long dependency chain lengths and reachability to other packages in the npm ecosystem. Furthermore, our findings indicate that micro-packages have no statistical differences in developer usage costs to the rest of the npm packages. We envision that this work motivates the notion of micro-packages and how to increase their resilience to changes in the ecosystem. The study concludes that developers should be aware of how sensitive their third-party dependencies are to critical changes in its ecosystem.

The main contributions of this paper are three-fold and can be summarized as follows: (i) we quantitatively study the phenomena of micro-packages and their impact in a software ecosystem, (ii) we motivate the need to revisit best practices to increase software resilience to their ecosystem changes and (iii) we motivate the need for developers to be aware of how sensitive their third-party dependencies are to changes in a software ecosystem.

Ii Basic Concepts & Definitions

In this section, we introduce our definition of the different types of npm micro-packages and motivate how it can be problematic for developers that use these packages. We then introduce and define two types of micro-packages that we believe create excessive dependencies.

Ii-a Micro-Packages and a Fragile Ecosystem

We define micro-packages as having a minimalist modular design through encapsulating and delegating complex tasks to other packages within the same ecosystem. Encapsulating complexity through abstraction or information hiding to ease code comprehension and promotes code reuse. Parnas [Parnas:2002] first proposed concepts of information hiding through the formation of modular structures. Baldwin and Clark later [Baldwin:1999] propose modularity theory, when elements of a design become split up and assigned to modules according to a formal architecture or plan. In this setting, ‘some of these modules remain hidden, while other modules are visible and embody design rules that hidden modules must obey if they are to work together’.

We conjecture that an influx of micro-packages will result in a fragile ecosystem. As packages evolve, so do their dependencies within this ecosystem. Sometimes these changing dependencies breaks a critical dependency that will impact and ripple changes throughout the ecosystem. Mens et al. [Mens:2016] reports experiences from both practitioners and researchers where managing the complexities of dependencies (i.e., colloquially known as Dependency Hell) leads to issues that ripple throughout the ecosystem. Our hypothesis is that a micro-package contributes to a fragile ecosystem by creating excessive dependencies across the ecosystem.

Ii-B Package Interoperability in the npm Ecosystem

The JavaScript language adheres to certain specifications222CommonJS Group has with a goal of building up the JavaScript ecosystem with specifications. Documentation at http://wiki.commonjs.org/wiki/Modules/1.0 that state how a package should be written in order to be ‘interoperable among a class of module systems that can be both client and server side, secure or insecure, implemented today or supported by future systems with syntax extensions’.

For the npm ecosystem, according the npmjs333Official Website of the npmjs repository at https://docs.npmjs.com/how-npm-works/packages documentation and CommonsJS Group444http://wiki.commonjs.org/wiki/Packages/1.0 specify that a npm package should at least have the following requirements:

  • a package.json configuration file.

  • a JavaScript module.

  • an accessible url that contains a gzipped tarball of package.json and the module.

A npm module refers to any JavaScript file that can be loaded using the require() command. Typically, the package.json points to an entry point file. This entry point file is used to load any modules (i.e., both local and foreign) required by the package.

Importantly, the JavaScript require() function is unable to differentiate between a module or package. Therefore, packages can load internal or external modules/packages that are installed on the NodeJs platform environment. For external dependencies, the require() function enables interoperability between all declared global functions to be accessible as the Application Programming Interface (API).

Ii-C Micro-packages in the npm ecosystem

We now introduce two types of npm micro-packages. We first define two types of npm micro-packages that (i) perform trivial tasks and (ii) acts as a facade to load foreign module (i.e., third-party) dependencies.

1  ’use strict’;
2  module.exports = function (x) {
3    return x === 0 && 1 / x === -Infinity;
4};  \end{lstlisting}
5
6Listing \ref{code:useful} shows an example of a micro-package that performs a trivial task.
7In only four lines of code, the \texttt{negative-zero} package has the sole task of determining whether an input number has a value of negative zero.
8Importantly, the definition of a negative-zero is not prone to change so it is safe to assume that this package is less likely to be evolved in the near future.
9
10\begin{lstlisting}[language=Python,
11caption={The \texttt{user-home} package entry point source code. As shown the library depends on the \texttt{os-homedir} library.},
12label=code:userhome]
13  ’use strict’;
14  module.exports = require(’os-homedir’)();\end{lstlisting}
15
16Listing \ref{code:userhome} is example of a micro-package that forms a facade to another foreign package.
17As shown in the listing, \texttt{user-home} package is comprised of only two lines of code, but is tasked with determining the user-home folder for any operating system.
18Furthermore, the listing shows (Line 2) the \texttt{user-home} library dependent and loading the \texttt{os-homedir} package (shown in Listing \ref{code:os-homedir}) to perform  complexities of the task.
19
20\begin{lstlisting}[language=Python,
21caption={\texttt{os-homedir} package entry point source code. It contains one function \texttt{homedir}.},
22label=code:os-homedir]
23  ’use strict’;
24  var os = require(’os’);
25
26  function homedir() {
27  var env = process.env;
28  var home = env.HOME;
29  var user = env.LOGNAME || env.USER || env.LNAME || env.USERNAME;
30
31  if (process.platform === ’win32’) {
32    return env.USERPROFILE || env.HOMEDRIVE + env.HOMEPATH || home || null;
33  }
34
35  if (process.platform === ’darwin’) {
36    return home || (user ? ’/Users/’ + user : null);
37  }
38
39  if (process.platform === ’linux’) {
40    return home || (process.getuid() === 0 ? ’/root’ : (user ? ’/home/’ + user : null));
41  }
42
43    return home || null;
44  }
45  module.exports = typeof os.homedir === ’function’ ? os.homedir : homedir;\end{lstlisting}
46
47
48%In the JavaScript environment, it is possible to have a library without any function. Instead, this type of library acts as a simplified interface to encapsulate functional complexities. Specific to JavaScript, libraryes can use the \texttt{module.exports} function call to a hidden class or third-party library. This design is commonly known as the \textit{facade} design pattern and is used to encapsulate complexities and provide a simple interface for library users.
49
50Listing \ref{code:os-homedir} shows hidden \texttt{os-homedir} package that performs the computation for the \texttt{user-home} package.
51Importantly, the \texttt{user-home} package is prone to evolutionary dependency changes as operating systems (see Lines 9, 13, 17) may change their platform configurations in the future.
52In this case, any application that uses package with a dependency on the \texttt{user-home} is unaware that it is indirectly impacted by any evolutionary API changes or removal of the \texttt{user-home} package.
53
54%\subsection{Identifying \texttt{npm} Micro-packages}
55%The main challenge to identifying a micro-packages due to the ambiguity between
56%%he \texttt{require()} function is unable to distinguish
57%internal or external package interoperability.
58%In this facade micro-package design,
59We leverage the package size attributes of (i) global function APIs and (ii) physical size to propose a method to identify the two types of micro-package defined in this study.
60In detail, we propose and rationalize these package size metrics:
61
62\begin{itemize}
63  \item \textit{\# of FuncCount (FuncCount)} - is a count of API functions implemented in a package (\ie~function in the \texttt{entry point file}).
64  Our rationale is a single function is more likely a trivial micro-package, while a facade micro-package is likely to use the \texttt{require()} function to directly load its dependencies.
65  \item \textit{\# of Lines of Code (SLoC)} - is the count of source code lines in a package (\ie~function in the \texttt{entry point file}).
66  We  consider both the physical (i.e., physical includes code comments and spacing) and logical lines of code for a  library. Our rationale is that trivial or facade micro-packages tend to have smaller sLoC.
67\end{itemize}
68
69We can now use these metrics to rationalize our definition of a micro-package.
70Formally, for any given Package $P$, $P$ is a micro-package when \texttt{funcCount(P)}$\le $ 1.
71%Alternatively, a package is classified as a regular-package if APICount(P)$>$ 1.
72
73%Listing \ref{code:userhome} is another example of facade designed micro-library (\ie~without function). As shown in the listing, the \texttt{user-home} library is comprised of only two lines of code, which is tasked with determining the user-home folder for any operating system. However, as shown in Listing \ref{code:userhome} (Line 2) the \texttt{user-home} library relies on the \texttt{os-homedir} library (shown in Listing \ref{code:os-homedir} ) for complexities of the task. It is also important to note that unlike \texttt{negative-zero}, \texttt{user-home} is prone to dependency changes as operating systems (see Lines 9, 13, 17) may change their platform configurations. We consider libraries that encapsulate such complexities may be considered as a micro-library, since in most times these encapsulated complexities are unknown to the user.
74%\label{sec:example}
75
76\section{Empirical Study}
77\label{sec:study}
78Our main goal is to better understand the role and impact of micro-packages in the \texttt{npm} ecosystem.
79We conducted an empirical investigation with three guiding research questions regarding micro-packages.
80
81RQ1 relates to how widespread the phenomena of micro-packages exists in the \texttt{npm} ecosystem.
82In this research question analyzes the size of \texttt{npm} packages to determine whether micro-packages form a significant portion of the ecosystem.
83
84RQ2 and RQ3 relates the evaluation of some perceptions on when using a micro-package.
85From a complexity of dependencies viewpoint, RQ2 investigate the size of dependency chains developers incur when using a micro-package.
86For RQ3, we then examine fetch, install and load usage costs incurred by a web developer when using a micro-package as one of its dependencies.
87
88\subsection{\textbf{RQ1: \RqOne}}
89\subsubsection{\underline{Motivation}}
90Our motivation for RQ1 is to understand how widespread the micro-package phenomena is in the ecosystem.
91In this research question, we analyze \texttt{npm} package size attributes to survey whether or not micro-packages form a significant portion of packages in the ecosystem.
92
93
94\subsubsection{\underline{Research Method}}
95Our research method to answer RQ1 is by statistical analysis and manual validation.
96It follows a two step of data collection and then analysis of the results.
97In the first step, we collect and process \texttt{npm} libraries to extract their size attributes (\ie~FuncCount and SLoC) for each package.
98
99For the analysis step, we describe the npm size metrics distribution using statistical metrics (\ie~Min., Mean ($\mu$), Median ($\bar{x}$), Max.) and plot on a histogram.
100We then manually investigate and pull out examples of the results to qualitatively explain the reasoning behind our conclusions.
101Finally, we identify how many npm packages fit our definition of micro-packages.
102We then utilize boxplots plots with the sLoC metric to compare any statistical differences between micro-packages to the rest of the packages.
103
104%The second step involves analysis of our size metrics to understand the distribution of each  (\ie~FuncCount and SLoC).
105%We then use our especially use the FuncCount metric to identify all micro-libraries.
106
107%Then in the third step we compare the sLoC of both micro-libraries and multi-function libraries.
108
109%Steps two and three requires the analysis of our collected dataset. Thus, we perform a statistical analysis of the two size metrics and use a histogram to visually show micro-library coverage. To capture all ranges of functions, the histogram groups libraries systematically. Finally, in step three, we analyze the size differences between micro-libraries and regular-libraries. To this end, we utilize boxplots plots with the sLoC metric to depict the differences between micro-libraries and regular-libraries.
110
111\label{sec:rm1}
112
113\label{sec:data1}
114\begin{table}
115  \begin{center}
116    \caption{Dataset Collected \texttt{npm} corpus for RQ1
117    }
118    \label{tab:datasetRQ1}
119    \begin{tabular}{lrr}
120      \hline
121      \multirow{1}{*}{}
122      & \multirow{1}{*}{\textbf{Dataset statistics}}
123      \\
124      Snapshot of \texttt{npm} ecosystem&July-1st-2016 $\sim$ July-15th-2016&\\
125      \# downloads &186,507 libraries \\
126      Size (GB)&200GB\\
127      \# js entry point files analyzed& 169,964 files\\\hline
128    \end{tabular}
129  \end{center}
130\end{table}
131
132
133\subsubsection{\underline{Dataset}}
134Similar to the collection method of \cite{Wittern:2016}, we use the npm registry\footnote{\url{https://registry.npmjs.org/-/all}} to procure an offline copy of the publicly available packages \textit{npm} ecosystem.
135%To extract our size metrics, we used source code analysis which involves the extraction of data from source code (\ie~\texttt{index.js}).
136
137Table \ref{tab:datasetRQ1} presents statistics of the final collected dataset after a quality pre-processing was performed to remove invalid or erroneous \texttt{package.json} metafiles\footnote{A blog on the official npmjs website details some of the perils of the npm front-end packaging at \url{http://blog.npmjs.org/post/101775448305/npm-and-front-end-packaging}}.
138As shown in the table, after downloading 186,507 \texttt{npm} packages, 169,964 packages remain for our analysis.
139Since \texttt{npm} ecosystem policies allows unrestricted access, pre-processing was a quality filter to remove invalid packages.
140
141\begin{lstlisting}[language=xml,
142caption={Snippet from the \textit{package.json}. Note that lines 14 and 15 correspond to the entry point file that is accessible for a client user.},
143label=code:package.json]
144{
145”name”: ”user-home”,
146”version”: ”2.0.0”,
147”description”: ”Get path to user home directory”,
148”license”: ”MIT”,
149”repository”: ”sindresorhus/user-home”,
150”author”: {
151”name”: ”Sindre Sorhus”,
152”email”: ”sindresorhus@gmail.com”,
153”url”: ”sindresorhus.com”
154},
155….
156},
157”main”: [
158”index.js”
159],
160….
161”dependencies”: {
162”os-homedir”: ”^1.0.0”
163},
164
165}\end{lstlisting}
166
167\begin{figure*}
168  \centering
169  \subfigure[JavaScript libraries grouped by FuncCount
170  ]{\label{fig:fc}
171    \includegraphics[width=.9\columnwidth]{FC}
172  }
173  \subfigure[Comparing Lines of Code (sLoC) between micro-packages and other packages.
174  ]{\label{fig:loc}
175    \includegraphics[width=.9\columnwidth]{sLOC}
176s }
177  \caption{Results for RQ1 package size metrics (a) FuncCount and (b) sLoC}
178  \label{fig:mod}
179\end{figure*}
180
181Listing \ref{code:package.json} depicts the \texttt{package.json} metadata file used to identify and extract the \texttt{entry point file}.
182This entry point will determine the available API of a package.
183From this metafile, we extract the main field (Lines 14 and 15), which identifies the entry point.
184Note that the \texttt{name} attribute is a unique identifier for any library.
185%By default, most libraries point to the \texttt{index.js} in the root folder of the library.
186We then use the \texttt{esprima} npm package \cite{NPM:esprima} to extract our package size metrics.
187% functions (FuncCount) and lines of code (sLoC) of that specific library.
188\texttt{esprima} is a high performance  and highly popular\footnote{as of Feb 10, recorded over 17,700,000 downloads from the npm} JavaScript analysis tool that can construct a JavaScript syntax tree and loads all related class files.
189
190
191
192
193\begin{table}
194  \begin{center}
195    \caption{Library Design Size Metric Statistics for RQ1
196    }
197    \label{tab:RQ1Stats}
198    \begin{tabular}{lrrrrrc}
199      \hline
200      \multirow{1}{*}{}
201      & \multirow{1}{*}{\textbf{Min.}}
202      & \multirow{1}{*}{\textbf{Median ($\bar{x})$}}
203      & \multirow{1}{*}{\textbf{Mean ($\mu$)}}
204      & \multirow{1}{*}{\textbf{Max.}}\\
205      FuncCount   & 0.00 &  2.00 &   8.65  & 7,738.00\\
206      physical SLoC & 1.00  &  29.00 &    112.30 &  137,700.00   \\
207      logical SLoC &0.00 &   17.00 &    63.63  & 129,100.00 \\\hline
208    \end{tabular}
209  \end{center}
210\end{table}
211
212
213
214\subsubsection{\underline{Findings}}
215Table \ref{tab:RQ1Stats} and Figure \ref{fig:mod} report the results of RQ1.
216Table \ref{tab:RQ1Stats} shows the summary of statistics our package size metrics, while the figure is a visual distribution of the funcCount and sLoC metrics.
217
218From a topological viewpoint, we observe from Table \ref{tab:RQ1Stats} that libraries have a median of two functions per library.
219The result shows that most \texttt{npm} packages contain a small number of API functions available from the entry point file.
220Furthermore, we observe that the median size of the source code is between 17 loc (logical) and 29 loc (physical).
221Looking at the largest package, we find that the largest in terms of both API and lines of code is the  (\texttt{tesseract.js-core})\footnote{Size of the code can be summarized by the description \textit{‘if you’re a big fan of manual memory management and slow, blocking computation, then you’re at the right place!’}  \url{https://github.com/naptha/tesseract.js-core}} package.
222This package contains 5,951 functions and 137,700 lines of code.
223
224%\begin{hassanbox}
225% Findings suggest \texttt{npm} libraries have a median of 2 functions and up to 29 lines of code.
226%\end{hassanbox}
227
228A key observation from Figure \ref{fig:fc} is that micro-packages account for up to 47\% of all \texttt{npm} libraries.
229We can observe from the figure that 32\% of all packages are without function (\ie~$FuncCount=0$).
230Our manual validation shows that many of these packages belong to the \texttt{facade} micro-package type.
231However, some of these micro-packages are used to only local modules so are not bridging APIs to other external packages.
232For example, the \texttt{12env} package contains only the single line:
233
234\begin{lstlisting}[language=Python,
235label=code:user-home]
236module.exports = require(’./lib/config’)();\end{lstlisting}
237
238that is tasked to load the internal \texttt{config.js} module that performs the computation of this package.
239Further analysis reveals \texttt{config.js} to have a single API function with 71 lines of code.
240Although the config module is part of the package, we conclude according to our micro-package definitions it still  has an excessive interoperability dependency.
241
242% As mentioned in Section \ref{sec:example}, this is an example of the facade library.
243%\begin{hassanbox}
244% Findings suggest that 32\% of libraries implemented a facade design, hiding its complexity from the library user.
245%\end{hassanbox}
246
247Figure \ref{fig:loc} shows a statistical measure of the physical sizes of micro-packages compared to the rest of the npm packages.
248We observe that micro-packages are statistically smaller in size.
249We report micro-packages have a lower median  of lines of code (physical ($\bar{x}=5 LoC$) and logical($\bar{x}=3 LoC$))
250This result is not surprising and matches with our intuitions of micro-packages.
251From the figure, we also observe that not all micro-packages contain small code.
252For instance, the \texttt{icao}\footnote{is a library that looks up hard-coded International Civil Aviation Organization airport codes. The website is at \url{https://github.com/KenanY/icao/blob/master/index.js}} package has 8,255 lines of code.
253Further analysis concludes that most of the code include hard-coded variable constants.
254
255
256
257\begin{hassanbox}
258In summary for RQ1, we propose our package size metrics. To answer RQ1: \textit{ our findings show that micro-packages account for a significant 47\% of all packages, with 32\% acting as a facade to create excessive dependencies within the ecosystem. Interestingly, most \texttt{npm} packages are designed to be smaller in size, with up to 2 functions and 29 lines of code per package.
259}
260\end{hassanbox}
261
262
263\subsection{\textbf{RQ2: \RqTwo}}
264\subsubsection{\underline{Motivation}}
265
266Our motivation for RQ2 is to understand the extent of how micro-packages form dependency chains across the ecosystem.
267In this research question, we calculate the dependency chains of \texttt{npm} packages to survey whether or not micro-packages result in an increase of dependencies.
268Therefore, we investigate whether or not there is a difference in dependency complexity for micro-packages.
269RQ2 includes testing the hypothesis that \textit{micro-packages include lesser dependency complexity than other packages in the npm ecosystem}.
270
271%This is where breaking one critical dependency in the ecosystem will ripple down the dependency chain to all dependent packages.
272%
273%Results from RQ1 suggest that \texttt{npm} library developers do consider smaller design size in their libraries. Since library designers hide complexity through dependencies there is a concept that the \texttt{npm} libraries form an ecosystem of library dependencies. Therefore, for RQ2, we would like to investigate dependency relations, especially to understand the extent of complexity for micro-libraries. Due to the single-functionality, we generally assume that micro-libraries should include trivial dependency complexities.
274
275\begin{figure}
276  \centering
277  \includegraphics[width=0.5\textwidth]{npmSNS}
278  \caption{Modeling a chain of dependencies for the \texttt{eslint} package as a directed acyclic graph network. For better examples, we added \texttt{os-hoge} and \texttt{fun-hoge} libraries}
279  \label{fig:SUG}
280\end{figure}
281
282
283\subsubsection{\underline{Research Method}}
284Our research method to answer RQ2 is by using network analysis\cite{wasserman1994social} with statistical analysis of the network generated metrics and then manual validation to understand our results.
285It is performed in two steps.
286The first step involves the construction of \texttt{npm} network that models and represents dependency chains in the \texttt{npm} ecosystem.
287
288The second step involves analysis of the generated \texttt{npm} graph network.
289To do this, we propose a set of dependency metrics derived from the network analysis to describe a npm package dependency complexity.
290We then utilize box-plots and violin plots with each dependency metric to depict the differences between micro-packages to the rest of the packages.
291Finally, in our analysis we test the hypothesis whether or not micro-package has greater or lesser dependency complexity that other packages.
292To assess the significance of dependency metric differences, we use the Wilcoxon–Mann–Whitney and Cohen’s \textit{d} test  \cite{Romano:2006} as they do not require the assumption of normal distribution.
293The null hypothesis ($H_0$) states that \textit{either micro-packages include greater or lesser dependency complexity}\footnote{We set the confidence limit $\alpha$ as 0.01}
294If the null hypothesis is accepted, we then determine which population has the higher means to state our hypothesis. The alternate hypothesis  ($H_1$) is that \textit{micro-packages include identical dependency complexities to other packages.}
295Furthermore, to assess the difference magnitude, we studied the effect size based on Cohen’s \textit{d}. The effect size is considered: (1) small if 0.2 $\leqslant$ \textit{d} $<$ 0.5, (2) medium if 0.5 $\leqslant$ \textit{d} $<$ 0.8, or (3) large if \textit{d} $\geqslant$ 0.8.
296
297\subsubsection{Modeling the \texttt{npm} ecosystem network}
298%We leverage network analysis investigate the size of dependency chains that exist in the \texttt{npm} ecosystem.
299Figure \ref{fig:SUG} serves as our graph-based model of dependencies in the \texttt{npm} ecosystem.
300In this model, we depict graph nodes as each a unique package and edges as directed dependencies.
301Formally, we define our \texttt{npm} ecosystem network as a directed graph $G = (V, E)$ where $V$ is a set of nodes (=packages) and E $\subseteq$ $V\times V$ is a set of edges (=dependencies).
302Since the graph is directed, $E$ has an ordering: $\{u,v\}\neq\{v,u\}$.
303For instance, we represent the \texttt{eslint} package dependency on \texttt{user-home} package.
304We observe that the \texttt{eslint} package is transitively dependent on \texttt{os-homedir} through its dependence on the \texttt{user-home} package.
305
306Furthermore, we define dependency metrics to describe the direct and transitive dependencies.
307We first introduce the following direct dependency metrics:
308\begin{itemize}
309  \item \textit{Dependents (Incoming Degree)} - is the number of packages that directly depend on a package (incoming dependencies).
310  In figure \ref{fig:SUG} \texttt{user-home} has three dependent (\ie~includes the \texttt{eslint} package).
311  Dependents is a useful measure of package reuse within the ecosystem.
312
313  \item \textit{Dependencies (Outgoing Degree)} - is the number of packages on which a package directly depends on (outgoing dependencies).
314  In figure \ref{fig:SUG} \texttt{user-home} has two dependencies (\ie~\texttt{os-homedir} and \texttt{os-hoge}).
315
316  % \item \textbf{Degree of Depend (InOut Degree)} - is the total Dependents and Dependencies metrics. Therefore, according to Figure \ref{fig:SUG}, \textit{userhome} has an InOut Degree of 2 (\ie~Incoming (\textit{eslint}) and Outgoing (\textit{os-homedir})).
317
318\end{itemize}
319
320We then introduce our transitive dependency metrics:
321
322%One of the most used concept in network analysis is centrality to measure network importance [cite]. Hence, we use network centrality measures to locate libraries that are have importance in the ecosystem. In an event of a disruption, our hypothesis is that libraries of central importance and has a shortest path are more likely to cause rippling effects into the ecosystem. Hence, to measure importance we use the following ranking network measures:
323
324\begin{itemize}
325  \item \textit{Chain Length (Eccentricity score)} - is a normalized measure to evaluate the longest chain of dependencies for a library dependency.
326  This metric counts the longest path for a given graph node to any of its reachable nodes.
327  As shown in figure \ref{fig:SUG}, \texttt{eslint} has a chain length of three (\ie~$\textit{eslint} \rightarrow \textit{user-home} \rightarrow \textit{os-hoge} \rightarrow \textit{fun-hoge}$).
328
329  %Related, the \textit{Average Chain Distance (ACD)} is a topology baseline measure to the number of chained dependencies for all possible pairs of pairs on a network. %For instance as shown in Figure \ref{fig:SUG}, the eccentricity score is 2 (\ie~$\textit{eslint} \rightarrow \textit{user-home} \rightarrow \textit{os-homedir}$)
330
331  \item \textit{Reach Dependencies ($\#$ of reachable nodes)} - is the number of libraries that are within its reachable path.
332  For library \texttt{eslint} has four reachable nodes (\ie~$\{$\texttt{user-home}, \texttt{os-homedir}, \texttt{os-hoge} and \texttt{fun-hoge}$\}$).
333  %Note, the library \textit{Open-in-editor} is not reachable in this path. This is due to the acyclic nature of the graph, since its edges are directed.
334\end{itemize}
335\label{sec:motive2}
336
337
338
339
340
341\subsubsection{\underline{Dataset}}
342%Step one of the RQ2 research method involves the generation of the \texttt{npm} ecosystem network. In order to generate the \texttt{npm} network, we extract all dependencies for each library.
343  \begin{lstlisting}[language=xml,
344  caption={Snippet from the \textit{package.json} of  the \texttt{user-home} in RQ1 (See Section \ref{sec:rm1}) that shows the header and dependency relation to \texttt{os-homedir} library.},
345  label=code:snippackagen]
346  {
347  ”name”: ”user-home”,
348  ”version”: ”2.0.0”,
349  ….
350  ”dependencies”: {
351  ”os-homedir”: ”^1.0.0”
352  },
353  
354  }\end{lstlisting}
355
356As shown in Listing \ref{code:snippackagen}, we extract each dependency (\ie~Lines 5 and 6.) from the \texttt{package.json} metadata file to generate our npm graph network.
357For each package, we collect all runtime dependencies, development and optional dependencies.
358
359\begin{algorithm}
360  \small
361  \caption{\texttt{npm} Ecosystem Network Algorithm}\label{alg:genGraph}
362  \KwData{$V$ is a package graph node
363    $E$ is a dependency edge,\\
364    $packageList$ lists all collected packages,\\
365    $getDeps(package)$ list all dependencies for a package lib\\
366     \\
367  }
368
369  Initialize $packageList$ with all packages\;
370  Initialize npmGraph $G$\;
371
372  \For{ \textbf{each} package \textbf{in} packageList}{
373    \eIf{pacakge lib \textbf{in} $G$}{
374      node $V_{lib} \gets getNpmGraphNode(G,lib)$\;
375    }{
376    create node $V_{lib}\gets createNode(lib)$\;
377    add node $V_{lib}$ to $G$\;
378  }
379  dependencyList $\gets$ getDeps(lib)\;
380  \For{ \textbf{each} library dep \textbf{in} dependencyList}{
381    create node $V_{dep} \gets createNode(dep)$\;
382    add node $V_{dep}$ to $G$\;
383    create edge $E$ $\gets$ $createEdge(V_{dep},V_{lib})$\;
384    add edge $E$ to $G$\;
385  }
386}
387\end{algorithm}
388
389Algorithm \ref{alg:genGraph} details the algorithm we use to construct and generate the \texttt{npm} graph network.
390The key idea in building the \textit{npm} network is to append each package as a new graph node to existing packages in the network.
391In this sense, a node is a package with that is either dependent or depended upon by other package nodes in the network.
392So for each new package, we first determine if it exists in the graph (Step 4) as a depended node from another package.
393If its does not exist (Steps 7-8), we then proceed to create a new node for this package and then check its dependent whether they also exist on the graph (Steps 11-15).
394
395
396
397
398\begin{figure*}
399  \centering
400  \subfigure[Direct Dependents
401  ]{\label{fig:DepIn}
402    \includegraphics[width=0.45\columnwidth]{Dependents}
403  }
404  \subfigure[Direct Dependencies
405  ]{\label{fig:DepOut}
406    \includegraphics[width=0.45\columnwidth]{Dependencies}
407  }
408  \subfigure[Transitive Chain Length
409  ]{\label{fig:ecc}
410    \includegraphics[width=0.45\columnwidth]{Ecc}
411  }
412  \subfigure[Transitive Reach Dependencies
413  ]{\label{fig:rd}
414    \includegraphics[width=0.45\columnwidth]{RD}
415  }
416  \caption{Results for RQ2 direct and transitive dependency metrics for the \texttt{npm} ecosystem network}
417  \label{fig:allDep}
418\end{figure*}
419
420
421\begin{table}
422  \begin{center}
423    \caption{Summary Statistics of the \texttt{npm} ecosystem network for RQ2
424    }
425    \label{tab:datasetRQ2}
426    \begin{tabular}{lcc}
427      \hline
428      \multirow{1}{*}{}
429      & \multirow{1}{*}{\textbf{\texttt{npm} ecosystem network}}
430      \\
431      \# nodes&169,964&\\
432      \# edges&986,075& \\
433      Average Chain Length & 5.89& \\ \hline
434    \end{tabular}
435  \end{center}
436\end{table}
437
438Table \ref{tab:datasetRQ2} shows that our \texttt{npm} graph network comprises of 169,964 graph nodes and 986,075 graph edges with an average chain length of 5.89.
439We use python scripts to implement the algorithm and the Neo4j graph database to store the npm network (\ie~\textit{py2neo} \cite{neo4j}, the \textit{neo4j}  \cite{neo4j}.
440The \textit{gephi} tool \cite{gephi} was used to generate the dependents, dependencies and chain length dependency metrics.
441Furthermore, we used a Depth First Search (DFS) query in Neo4j to generate the Reach Dependencies metric.
442For a package node $v$ with a package name $v.name$, we use the Neo4j cypher query\footnote{For realistic computation-time costs, we set a threshold of our reachable node to four chains of dependencies (x=5). As the average chain length is 5.89 it should have minimal effect on the result (See Table \ref{tab:datasetRQ2})}:
443%\begin{quote}
444  \texttt{Match v$\{$name:v.name$\}$ -[*..x]->(w) where NOT v-w RETURN v.name, count(DISTINCT(v))}
445%\end{quote}
446to find all packages that are within its dependency reach in the ecosystem.
447%Step two requires the analysis of our collected dataset. Thus, similar to RQ1, we perform statistical analysis of the metrics using a histogram. To capture all ranges of functions, the histogram groups libraries systematically.
448
449
450\begin{table}
451  \begin{center}
452    \caption{Dependency metric statistics of the collected \texttt{npm} for RQ2
453    }
454    \label{tab:RQ2Ego}
455    \begin{tabular}{lrrrrrc}
456      \hline
457      \multirow{1}{*}{}
458      & \multirow{1}{*}{\textbf{Min.}}
459      & \multirow{1}{*}{\textbf{Median ($\bar{x})$}}
460      & \multirow{1}{*}{\textbf{Mean ($\mu$)}}
461      & \multirow{1}{*}{\textbf{Max.}}\\
462      Dependents  &  0.00 &     3.00  &   5.81 & 22,750.00 \\
463      Dependencies &  0.00 &     0.00   &  5.80 & 41,550.00   \\
464      Chain Length&  0.00 &  0.00 &  1.03 & 20.00 \\
465      Reach Dependents & 0.00  &  0.00 &    563.90 &  121,600.0    \\\hline
466    \end{tabular}
467  \end{center}
468\end{table}
469
470%\begin{table}
471% \begin{center}
472%   \caption{Global Chained Dependency statistics of the collected \texttt{npm}
473%   }
474%   \label{tab:RQ2Global}
475%   \begin{tabular}{llrrrrc}
476%     \hline
477%     \multirow{1}{*}{}
478%     & \multirow{1}{*}{\textbf{Metric}}
479%     & \multirow{1}{*}{\textbf{Min.}}
480%     & \multirow{1}{*}{\textbf{Median ($\bar{x})$}}
481%     & \multirow{1}{*}{\textbf{Mean ($\mu$)}}
482%     & \multirow{1}{*}{\textbf{Max.}}\\
483%     Ego& Dependents   &  0.00 &     3.00  &   5.81 & 22,750.00 \\
484%     &Dependencies &  0.00 &     0.00   &  5.80 & 41,550.00   \\\hline
485%     Global &Eccentricity  Score&  0.00 &  0.00 &  1.03 & 20.00 \\
486%     &Reachable Dep. & 0.00  &  0.00 &    563.90 &  121,600.0    \\\hline
487%   \end{tabular}
488% \end{center}
489%\end{table}
490
491
492
493%\begin{table*}
494% \begin{center}
495%   \caption{Dependency metric statistics of the collected \texttt{npm}
496%   }
497%   \label{tab:RQ2All}
498%   \begin{tabular}{llrrrrc}
499%     \hline
500%     \multirow{1}{*}{}
501%     & \multirow{1}{*}{\textbf{Metric}}
502%     & \multirow{1}{*}{\textbf{Min.}}
503%     & \multirow{1}{*}{\textbf{Median ($\bar{x})$}}
504%     & \multirow{1}{*}{\textbf{Mean ($\mu$)}}
505%     & \multirow{1}{*}{\textbf{Max.}}\\
506%     micro-library & Dependents  &  0.00 &    7.00 &  10.84 &  276.00  \\
507%     & Dependencies & 0.00  & 0.00  &  19.35 & 41,550.00   \\
508%     &Eccentricity  Score & 0.00 &  0.00 &  0.81 & 19.00  \\
509%     &Reachable Dep.  & 0.00 &  0.00 &  1,370.00 & 121,700.00 \\\hline
510%     regular-library & Dependents  &  0.00 &   5.00  & 9.22 & 276.00 \\
511%     & Dependencies &   0.00   &  0.00  &  16.91  & 41,550.00  \\
512%     &Eccentricity  Score &0.00 & 0.00 &  0.85 & 20.00  \\
513%     &Reachable Dep. & 0.00  &  0.00 &1,211.00 & 121,800 \\\hline
514%   \end{tabular}
515% \end{center}
516%\end{table*}
517
518\begin{figure*}
519  \centering
520  \subfigure[Direct Dependents
521  ]{\label{fig:DeppComp}
522    \includegraphics[width=0.4\columnwidth]{comDeps}
523  }
524  \subfigure[Direct Dependencies
525  ]{\label{fig:DepComp}
526    \includegraphics[width=0.4\columnwidth]{comDepcies}
527  }
528  \subfigure[Transitive Chain Length
529  ]{\label{fig:EccComp}
530    \includegraphics[width=0.4\columnwidth]{comEcc}
531  }
532  \subfigure[Transitive Reach Dependencies
533  ]{\label{fig:RDComp}
534    \includegraphics[width=0.4\columnwidth]{comRD}
535  }
536  \caption{Micro-package dependency metric comparisons against the other packages in the \texttt{npm} ecosystem network.}
537  \label{fig:RQ2Comp}
538\end{figure*}
539
540
541
542\subsubsection{\underline{Findings}}
543Table \ref{tab:RQ2Ego} shows the summary statistics for the dependency metrics generated for the \texttt{npm} network.
544From a topological viewpoint, we observe from that most packages had a median of three dependents per with no dependencies.
545With a manual validation, we find that the \texttt{everything} package has the most dependents in the network.
546Interestingly, this package is labeled by many JS developers as a \textit{‘hoarder’}\footnote{blog discussing this at \url{https://github.com/jfhbrook/hoarders/issues/2}} library.
547With over 22,750 other incoming dependency relations from other libraries, this package offer no useful function but to access every other package on the network.
548Conversely, we observe the very useful \texttt{mocha}\footnote{website at \url{https://github.com/mochajs/mocha}} package as the most depended upon package (\ie~Dependencies = 41,570). This testing framework validates the notion that \texttt{npm} library developers indeed run and perform test on their npm packages.
549
550Figure \ref{fig:allDep} shows the statistical distribution for all direct dependency metrics across the \texttt{npm} ecosystem.
551The key observation from Figure \ref{fig:DepIn} is that only 9\% of all packages have no dependents  (\ie Dependents=0), inversely indicating that almost 91\% of libraries are being dependent upon by another package in the ecosystem.
552Findings also provide evidence that more than 12\% of the libraries have about 11 to 15 dependents.
553Figure \ref{fig:DepOut} confirms the results in Table \ref{tab:RQ2Ego}, that clearly showing that 65\% of packages in the npm ecosystem show no dependencies to other packages.
554
555Figure \ref{fig:ecc} and Figure \ref{fig:rd} we show through our transitive dependency metrics that around 60\% of packages are not transitively reachable from another package.
556With a manual validation we find two packages that are critical to the \texttt{npm} ecosystem.
557The network analysis shows that \texttt{fs-walk}\footnote{website at \url{https://github.com/confcompass/fs-walk}} having the longest transitive dependency chain length (\ie~Chain length = 20).
558This useful library is used for synchronous and asynchronous recursive directory browsing.
559Its long dependency chain provides evident that it is a critical package within the \texttt{npm} ecosystem as it is part of multiple dependency chains.
560We also report the very popular and useful AST-based pattern checker tool \texttt{eslint}\footnote{\url{https://www.npmjs.com/package/eslint}} to be the most reachable by any other package on the \texttt{npm} ecosystem.
561
562Testing our hypothesis that \textit{either micro-pacakage include greater or lesser dependency complexity} shows that  dependency complexities of a micro-package is not so trivial, with some micro-packages have just as long chain lengths and reach dependencies as the rest of the packages.
563Although visually in Figure \ref{fig:RQ2Comp} there seems no transitive dependency differences between micro-packages and the rest of packages, results of the WilcoxonMannWhitney tests show otherwise.
564Table \ref{tab:RQ2TComp}  show that a micro-package are more likely to be reused, indicated by statistically more dependents than the rest of the packages (\ie~Dependent accepted H$_0$ for micro-package).
565Furthermore, the results show micro-packages are more likely to reach other packages (\ie~Reach Dependencies accepted H$_0$ for micro-packages) in the \texttt{npm} ecosystem.
566
567%Furthermore, the results show micro-packages are more likely to have shorter dependency chains than the rest of the packages (\ie~Chain length accepted H$_0$ for rest of packages) while a micro-package is more likely to reach other packages (\ie~Reach Dependencies accepted H$_0$ for micro-packages) in the \texttt{npm} ecosystem.
568%We report that the results are reported with a (S) small effect size for all metrics.
569%Nevertheless, we conclude that the dependency complexities of micro libraries are not trivial as we had previously assumed.
570
571\begin{table}
572  \begin{center}
573    \caption{Statistical Test results of Dependency Metrics for micro-packages against the rest of the packages. The ($>\mu$) column shows the library type with the greater means between the populations.
574    }
575    \label{tab:RQ2TComp}
576    \begin{tabular}{llccrrc}
577      \hline
578      & \multirow{1}{*}{\textbf{$H_0$}}
579      & \multirow{1}{*}{\textbf{$>\mu$}}
580      & \multirow{1}{*}{\textbf{p-value}}
581      & \multirow{1}{*}{\textbf{\textit{d}}}\\
582      Dependents  &  \colorbox{green!25} {accept}& micro-package & $<0.01$  & 0.09 (S)  \\
583      Dependencies & \colorbox{red!25} {reject} & identical&$<0.13$ & 0.13 (S) \\
584      Chain Length& \colorbox{green!25} {accept} & other packages &$<0.001$  & 0.01 (S)    \\
585      %Reach & & && \\
586      Reach Dep.& \colorbox{green!25}{accept} & micro-package&$<0.002$ &0.01 (S) \\\hline
587    \end{tabular}
588  \end{center}
589\end{table}
590
591\begin{hassanbox}
592In summary for RQ2, we propose and model a \textit{npm} dependency network for the \texttt{npm} ecosystem.
593To answer RQ2: \textit{ our findings depict micro-packages dependencies as