The success of machine learning (ML) is leading to an ever increasing demand for data scientists. However, software developers who try to fill this gap and have already delved into the necessary scientific ML background still face the challenge of learning how to correctly use the related ML APIs. The importance of API learnability and usability has been emphasized in many influential studies (Robillard, 2009; Robillard and DeLine, 2011; de Souza and Bentolila, 2009; Wang and Godfrey, 2013; Zibran et al., 2011). Therefore, we aimed to find out whether and how the learning curve can be flattened and correct use can be fostered.
Using scikit-learn (Pedregosa et al., 2011), a widely-adopted ML library as a case study, we address the following research questions:
Can the complexity of the API be reduced without affecting its existing users?
Which deviations from established principles of good API design make learning and correct use of ML APIs hard, even for developers who have the required mathematical and software engineering skills?
Can learnability and correct use be improved by a different design of the API?
Given an improved API design, can an implementation be derived and maintained automatically, based on the official library? What precisely can be automated and how much human effort is still needed?
2. API Complexity Reduction
Existing ML APIs, such as scikit-learn, follow a “one size fits all” approach, irrespective of user expertise. We questioned this approach, assuming that novices need just a subset of the API and are hence distracted and confused by additional functionalities (Zibran et al., 2011). We consider novices to be persons who aim to develop an ML application and have the required mathematical and software engineering background, but no experience using the particular ML API.
2.1. Empirical Results
To verify our assumption, we aimed to quantify the amount of functionalities that novice users do not need. However, lacking a representative pool of programs written only by novices we started with a broader analysis of all usages that we could find. We did this in two steps:
- API statistics:
We counted the number of API elements in scikit-learn version 0.24.2. In Fig. 1, the dark blue bars indicate the total total numbers of classes, functions, and parameters, whereas the red bars show the respective numbers of public elements.
- Usage statistics:
By analyzing 92,402 programs contributed to Kaggle competitions111https://www.kaggle.com/competitions by users of any skill level we found 41,867 that used scikit-learn. Within this pool, we counted the number of uses of each class, function and function parameter. Based on this raw data, we computed the additional two bars shown in Fig. 1. The green bars indicate the number of classes, functions and parameters actually used in Kaggle competitions. The light blue bar indicates useful parameters, which are set to at least two different values. We call parameters that are always set to the same value useless, since they can be replaced by a local variable in the containing function and eliminated from the API. An example of a useless parameter is copy_X (optional parameter with default True) in the Lasso model, since it is set to True in all 748 cases the constructor is invoked. However, the parameter is not unused since users passed the value True three times explicitly.
From Fig. 1 we can infer the reduction of the API’s complexity if users were presented just with the used classes and functions and useful parameters. This benefits novices and experts alike: It shows that even among the public elements, 56 classes (21%) and 597 functions (47%) are unused, while 2467 parameters (57%) are useless. These API elements can be eliminated from the user-visible API without affecting the functionality of the analysed applications. The improvements are huge and can be a significant time-saver when learning a new API.
2.2. Plausibility Check
We wanted to be sure that the above results are not biased by the choice of Kaggle competitions as sample data, but can be generalized to other ML applications. To this end, we inspected unused classes and functions, and useless parameters, in search of reasons why elements are generally not needed for application development.
- Unused classes:
We found that most unused classes are indeed clearly irrelevant for application programmers, being designed just for extending the API by additional models and internal code reuse. Examples include mixins, base classes, and a few unused, exotic ML models, like MultiTaskLasso.
- Unused functions:
Similarly, several groups of unused functions are intended for API extensions only, such as validation functions for inputs. Some are related to sample datasets (fetch_california_housing), which are clearly not used in applications that include their own data. However, there are also unused core ML functions, often composite ones (fit_transform, fit_predict etc.) that exist for API consistency, and which we do not want to remove for this reason. We guess they are not called because users might call the component functions separately (e.g. fit or transform). Finally, some introspection functions (decision_functions) are not used either and also deserve further investigation.
- Unused and useless parameters:
Given the sheer number of parameters, we have not yet completed categorizing them. However, our findings to date confirm the generality of the empirical results, also for parameters.
Since our plausibility check suggests that the results obtained for Kaggle can be generalized, we conclude that for scikit-learn, the complexity of the API can indeed be reduced without affecting existing users (RQ1) by roughly the numbers indicated in Fig. 1, by eliminating unused and useless elements from the public API.
3. API Redesign
The manual inspection done for our plausibility check revealed aspects that were not evident from the statistical data. We found that even the used and useful part of the API is subject to design and usability problems that deserve specific attention. This motivated our research questions RQ2 and RQ3. In the following, we address these questions in turn, enumerating found usability issues (RQ2) and proposing related design improvements (RQ3).
3.1. Proliferation of Primitives
One of the design principle of scikit-learn is non-proliferation of classes, which means classes are only added for learning algorithms. All other concepts are expressed by primitive types, or types from other scientific libraries, like numpy or scipy (Buitinck et al., 2013). This, however, results in long lists of parameters of primitive types, contradicting established design principles. McConnell’s guideline, based on psychological research, is to use at most seven parameters (McConnell, 2004, p. 178). Martin ((Martin, 2009, p. 288)) suggest to keep parameter lists as short as possible and to avoid more than three parameters. While the removal of unused parameters (Sec. 2.1) alleviates the problem, it does not eliminate it. This section discusses related issues and ways to address them without sacrificing the generality of the API:
- Tangled concerns:
The long parameter lists of the constructors of ML models hide the important hyperparameters between parameters that are unrelated to the core ML concern, such as debugging or performance options. For example, the constructor of the SVC model has 15 parameters (all of which are used) for a multitude of concerns: kernel
is a hyperparameter, whileverbose is for debugging, and cache_size concerns performance.
Such parameters can be removed from the constructors of ML models leaving only the hyperparameters, drastically increasing their visibility. If needed, attributes for non-ML concerns in the created object can anyway be set to non-default values before calling its fit method to start training.
- Implicit dependencies:
Flat lists of inter-dependent parameters can allow combinations of parameter values that are semantically wrong. For instance, it is non-obvious that the degree parameter of the SVC constructor should only be set if the kernel parameter is set to ''poly''. The invocation SVC(kernel=''linear'', degree=3) is possible but most surely an error, since the degree value will be ignored. However, SVC(kernel=''poly'', degree=3) is correct.
Grouping dependent parameters in one object, called a parameter object in (Fowler, 2018), improves understandabilty (Lacerda et al., 2020), makes dependencies clear, and prevents any accidental misuse of the API. In our example, we create different variants of Kernel objects, with a degree parameter only in the ''poly'' variant. Via SVC(Kernel.linear) and SVC(Kernel.poly(degree=3)) we can then create legal SVC configurations, but the erroneous example above will be impossible to create.
- Magic strings:
Scikit-learn often uses string parameters that only have a limited set of legal values222 In 2007, when scikit-learn was initially released, Python offered no alternative. Later, the string types had to be kept for backwards compatibility.. This has the downside that statically ensuring use of a valid value is difficult. This problem is similar to the anti-pattern of using integers to encode such limited value sets (Bloch, 2017; Martin, 2009).
The problem is worsened by the fact that scikit-learn sometimes does not validate parameters. For example, the criterion parameter of the DecisionTreeClassifier should either be ''gini'' or ''entropy''. However, the creation of an instance with the call DecisionTreeClassifier(criterion=''giny'') is silently accepted but, when we later call the fit method on the model, the program crashes. The error message KeyError: ''giny'' gives us only a vague hint about the source of the error.
Using instead enums in conjunction with type hints333https://docs.python.org/3/library/typing.html (added in Python 3.5) lets static analysis tools ensure that a correct value is passed.
The proposed introduction of parameter objects and enums would increase the number of classes, contrary to the non-proliferation of classes design of scikit-learn. It remains to be investigated which design is easier to learn and use correctly. We are confident that, with a careful redesign, (1) the improved error-resilience of the redesigned API is worth the effort and (2) the increase in the size of the API will be outweighed by the usage-based reduction discussed in Sec. 2.
3.2. Module Structure
Module structure is another target for API redesign: Models for supervised learning are grouped by model category, such as Support Vector Machine (SVM) models. The task they solve is only indicated by a suffix on the class name. For example,SVC is an SVM model for classification, and SVR is one for regression. Similarly, metrics for model evaluation are grouped in sklearn.metrics, regardless of task, even though the sets of metrics applicable to different tasks are disjoint.
However, developers are interested in fulfilling a task, rather than exploring a particular family of algorithms. Therefore, we suggest grouping of models and metrics by task, like in Apache Spark MlLib (Meng et al., 2016). This speeds up the search for models applicable to a task, or metrics suited to evaluate their performance, and makes it obvious, which models and metrics can be used interchangeably. The importance of aligning software components with tasks has already been discussed in (Kersten and Murphy, 2006; Wang et al., 2013; Thung et al., 2013).
4. API Wrapping
The design improvements discussed so far should not be misunderstood as a proposal for redesigning scikit-learn, a high-quality, widely used ML library. Neither do we want to reinvent the wheel, nor break the huge number of programs using scikit-learn.
Our vision is to wrap the existing scikit-learn library into an API that implements the suggested improvements so that it is more suited for novices. However, the size of the scikit-learn library prohibits manual wrapper creation and maintenance. Thus, our approach raises the two challenges, summarized in RQ4: (1) automated initial wrapper creation and (2) automated update of wrappers, whenever a version of the scikit-learn library is released.
4.1. Initial Wrapper Creation
As a basis for automation, the manually derived design improvements outlined previously need to be expressed in a machine-readable form that specifies precisely which changes should be performed on which API elements.
|@remove||Remove unused & unnecessary API element.|
|@attribute||Remove parameter from the constructor of a model and keep it only as a model attribute.|
|@group||Group dependent parameters as an object.|
|@enum||Replace string type with enum type.|
|@move||Move class / global function to another module.|
We do this by annotating elements of the scikit-learn API. Each annotation type (Table 1) corresponds to one of the improvements outlined in Sec. 2 and 3. A user can attach these annotations to classes, functions and parameters in a web-based annotation editor444https://github.com/lars-reimann/api-editor. To reduce the workload of annotators and guide them to relevant elements, an initial set of @remove annotations is created automatically from the usage data described in Sec. 2.1. Based on the annotations and the original scikit-learn API, the new API is inferred and corresponding wrappers (Gamma et al., 1994) are generated.
4.2. Wrapper Update
The authors of scikit-learn follow a strict release policy aimed not to break existing clients. API elements scheduled for removal are deprecated two versions in advance. Renamings and moves are implemented by deprecating the existing element and adding a new one. Similarly, changes of legal parameter values are implemented by deprecating the current value and adding a new one. This addition of new elements and forwarding from old to new ones is basically a wrapping step. Thus, a lightweight evolution policy can reflect the deprecations, additions and deletions from scikit-learn in our adapted API, delegating the task of updating client programs to users of our API. Additions and deletions can be identified by our code analysis (Sec. 2.1
). Deprecation information can be extracted automatically from the scikit-learn documentation using a mix of parsing, and rudimentary natural language processing (NLP). This avoids having to repeat the initial manual annotation work in any future releases. Only a fraction of the added elements might need new annotations.
5. Related Work
Among the rich body of literature about API usability, other contributions that were not cited so far but have influenced our work include the empirical study of API usability problems by Piccioni et al. (Piccioni et al., 2013), the evaluation of API usability using Human-Computer Interaction methods by Grill et al. (Grill et al., 2012), general recommendations for API designers by Myers and Stylos (Myers and Stylos, 2016), and the JavaDoc alternative by Stylos et al. (Stylos et al., 2009), which takes usage data of an API as input to create better documentation. However, we only found one other assessment of APIs in the context of ML, namely by Király et al. (Király et al., 2021), who compiled the interfaces and design patterns underlying scikit-learn and other ML APIs and derived high-level guidelines for ML API design but didn’t investigate their practical implications on a concrete framework, such as scikit-learn. In addition their scitype concept also makes the notion of tasks explicit, which provides a formal basis for the task-oriented restructuring that we propose in Sec. 3.2.
ML for novices
, as a back-end for a user-friendly Domain Specific Language (DSL) for ML. The Simple-ML API and DSL are embedded into an Integrated Development Environment (IDE) for ML workflows. The entire system is tailored to the needs of data science novices, to make ML more accessible.
Our API wrapping approach described in Sec. 4 is similar to the one of Hossny et al. (Hossny et al., 2016), who use semantic annotations to hide proprietary APIs of different cloud providers behind a generic one, in order to combat vendor lock-in. Wrappers are automatically created so they can easily be kept up-to-date.
The vast literature about techniques to tackle the evolution of an API and automatically keep client code up-to-date is compiled in a recent survey (Lamothe et al., 2021). However, our focus is not on updating client code but on updating our generated wrappers. For this, we need to be able to detect changes in scikit-learn and identify conflicts with respect to the changes implied by our annotations. An extensive review of related change detection and change merging approaches can be found in (Mens, 2002).
6. Threats to Validity
For now, we only pulled usage data from Kaggle programs that were linked to competitions. This can skew results, since we got groups of programs that solve the same problem and, therefore, might employ similar methods. Moreover, some competitions were very popular, so we could get 1000 entries555The maximum that can be retrieved via Kaggle’s REST API, whereas others only had a handful.
In the meantime, a new version of scikit-learn has been released. The few contained API changes666https://scikit-learn.org/stable/whats_new/v1.0.html slightly change the absolute numbers but not the percentages reported in Sec. 2.
7. Future Plans
To eliminate possible bias (Sec. 6), we want to extend the gathered Kaggle data with additional scikit-learn usage data from GitHub repositories. Afterwards, our plausibility check (Sec. 2.2) for parameters needs to be completed. The suggestions for API redesign (Sec. 3) initially had to be entered manually in the annotation editor. We have started automating this by extracting information from the documentation of scikit-learn, using a mix of parsing (for the structured parts of the documentation) and natural language processing (for the rest). Task-oriented restructuring of modules (Sec. 3.2
) will be partially automated by checking the suffix of model names (e.g Classifier vs. Regressor). The annotation editor (Sec.4) is fully implemented. Annotation of scikit-learn is in progress and the added annotations will be used as input for wrapper generation, and as a test oracle for automated extraction of necessary design changes from the documentation. Finally, the resulting API needs to be carefully evaluated through usability studies to determine whether it really yields the conjectured improvement.
In this paper, we have presented a novel approach for easing the learning and correct use of a complex API. Unlike approaches to learnability that focused on improving the documentation of an API (Robillard and DeLine, 2011; Robillard, 2009), we investigated whether API complexity can be reduced, to avoid overwhelming application developers. Based on analysis of the API’s usage in 41,867 programs, we showed that 21% of classes, 47% of functions, and 57% of parameters can be eliminated, without affecting any of the analysed programs (Fig. 1).
Analysing code and documentation of the remaining API elements, we found a proliferation of elements of primitive types, which impedes learnability and eases API misuse. Tangling of different concerns in long parameter lists hides the essential hyperparameters. Hard to debug errors arise from implicit dependencies of parameters and hidden narrowing of parameter domains in code that behaves correctly just on a subset of the values that can be passed (Sec. 3).
We showed that design improvements inspired by refactorings can solve these issues. However, since we cannot refactor a third-party API, we proposed an alternative approach: semi-automated API wrapping. We showed how a layer of wrappers that implement an improved API can be built and kept up-to-date with minimal manual effort, based on analysis of the original library’s source code and documentation, usage statistics, and code generation.
The only ML-specific argument in our discussion is the distinction between hyperparameters and parameters for other concerns. This notion can be generalized to a distinction between domain-specific parameters and others. Hence, semi-automated API wrapping should be applicable to other domains and APIs, as a general way to reduce API complexity and improve API design.
Acknowledgements.We want to thank the reviewers for their very constructive comments and inspiring suggestions for future work. This work was partially funded by the German Federal Ministry of Education and Research (BMBF) under project Simple-ML (01IS18054).
- Effective java. 3rd edition, Addison-Wesley Professional. External Links: Cited by: item Magic strings.
- API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122. Cited by: §3.1.
- Automatic evaluation of api usability using complexity metrics and visualizations. In 2009 31st International Conference on Software Engineering - Companion Volume, Vol. , pp. 299–302. External Links: Cited by: §1.
- Refactoring: improving the design of existing code. 2nd edition, Addison-Wesley Professional. External Links: Cited by: item Implicit dependencies.
- Design patterns: elements of reusable object-oriented software. 1st edition, Addison-Wesley Professional. External Links: Cited by: §4.1.
Simple-ml: towards a framework for semantic data analytics workflows.
Semantic Systems. The Power of AI and Knowledge Graphs, M. Acosta, P. Cudré-Mauroux, M. Maleshkova, T. Pellegrini, H. Sack, and Y. Sure-Vetter (Eds.), Cham, pp. 359–366. External Links: Cited by: §5.
- Methods towards api usability: a structural analysis of usability problem categories. In Human-Centered Software Engineering, M. Winckler, P. Forbrig, and R. Bernhaupt (Eds.), Berlin, Heidelberg, pp. 164–180. External Links: Cited by: §5.
- Semantic-based generation of generic-api adapters for portable cloud applications. In Proceedings of the 3rd Workshop on CrossCloud Infrastructures & Platforms, CrossCloud ’16, New York, NY, USA. External Links: Cited by: §5.
- Using task context to improve programmer productivity. SIGSOFT ’06/FSE-14, New York, NY, USA, pp. 1–11. External Links: Cited by: §3.2.
- Designing machine learning toolboxes: concepts, principles and patterns. External Links: Cited by: §5.
- Code smells and refactoring: a tertiary systematic review of challenges and observations. Journal of Systems and Software 167, pp. 110610. External Links: Cited by: item Implicit dependencies.
- A systematic review of api evolution literature. ACM Comput. Surv. 54 (8). External Links: Cited by: §5.
- Clean code: a handbook of agile software craftsmanship. 1st edition, Prentice Hall. External Links: Cited by: item Magic strings, §3.1.
- Code complete. 2nd edition, Microsoft Press. External Links: Cited by: §3.1.
- MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17 (1), pp. 1235–1241. External Links: Cited by: §3.2.
- A state-of-the-art survey on software merging. IEEE Transactions on Software Engineering 28 (5), pp. 449–462. External Links: Cited by: §5.
- Improving api usability. Commun. ACM 59 (6), pp. 62–69. External Links: Cited by: §5.
- Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, pp. 2825–2830. Cited by: §1.
- An empirical study of api usability. In 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement, Vol. , pp. 5–14. External Links: Cited by: §5.
- Achieving guidance in applied machine learning through software engineering techniques. In Conference Companion of the 4th International Conference on Art, Science, and Engineering of Programming, ¡programming¿ ’20, New York, NY, USA, pp. 7–12. External Links: Cited by: §5.
- A field study of api learning obstacles. Empirical Software Engineering 16, pp. 703–732. External Links: Cited by: §1, §8.
- What makes apis hard to learn? answers from developers. IEEE Software 26 (6), pp. 27–34. External Links: Cited by: §1, §8.
- Improving api documentation using api usage information. In 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Vol. , pp. 119–126. External Links: Cited by: §5.
- Automatic recommendation of api methods from feature requests. ASE’13, pp. 290–300. External Links: Cited by: §3.2.
- How developers perform feature location tasks: a human-centric and process-oriented exploratory study. Journal of Software: Evolution and Process 25 (11), pp. 1193–1224. External Links: Cited by: §3.2.
- Detecting api usage obstacles: a study of ios and android developer questions. In 2013 10th Working Conference on Mining Software Repositories (MSR), Vol. , pp. 61–64. External Links: Cited by: §1.
- Useful, but usable? factors affecting the usability of apis. pp. 151–155. External Links: Cited by: §1, §2.