OpenML-Python: an extensible Python API for OpenML

11/06/2019 ∙ by Matthias Feurer, et al. ∙ 0

OpenML is an online platform for open science collaboration in machine learning, used to share datasets and results of machine learning experiments. In this paper we introduce OpenML-Python, a client API for Python, opening up the OpenML platform for a wide range of Python-based tools. It provides easy access to all datasets, tasks and experiments on OpenML from within Python. It also provides functionality to conduct machine learning experiments, upload the results to OpenML, and reproduce results which are stored on OpenML. Furthermore, it comes with a scikit-learn plugin and a plugin mechanism to easily integrate other machine learning libraries written in Python into the OpenML ecosystem. Source code and documentation is available at



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

OpenML is a collaborative online machine learning (ML) platform, meant for sharing and building on prior empirical machine learning research (Vanschoren et al., 2014).

It goes beyond open data repositories, such as UCI (Dua and Graff, 2017), PMLB (Olson et al., 2018), the ‘datasets’ submodules in scikit-learn (Pedregosa et al., 2011)

and tensorflow 

(Abadi et al., 2016)

, and the closed-source data sharing platform at, since OpenML also collects millions of shared experiments on these datasets, linked to the exact ML pipelines and hyperparameter settings, and includes comprehensive logging and uploading functionalities which can be accessed programmatically via a REST API. However, sharing ML experiments adds significant complexity to most people’s workflows.

OpenML-Python is a seamless integration of OpenML into the popular Python ML ecosystem111, that takes away this complexity by providing easy programmatic access to all OpenML data and automating the sharing of new experiments.222Other clients already exist for R (Casalicchio et al., 2017) and Java (van Rijn, 2016). In this paper, we introduce OpenML-Python’s core design, showcase its extensibility to new ML libraries, and give code examples for several common research tasks.

2 Use cases for the OpenML-Python API

OpenML-Python allows for easy dataset and experiment sharing by handling all communication with OpenML’s REST API. In this section, we briefly describe how the package can be used in several common machine learning tasks and highlight recent uses.

Working with datasets. OpenML-Python can retrieve the thousands of datasets on OpenML (all of them, or specific subsets) in a unified format, retrieve meta-data describing them, and search through them with filters. Datasets are converted from OpenML’s internal format into numpy, scipy or pandas data structures, which are standard for ML in Python. To facilitate contributions from the community, it allows people to upload new datasets in only two function calls, and to define new tasks on them (combinations of a dataset, train/test split and target attribute).

Publishing and retrieving results. Sharing empirical results allows anyone to search and download them in order to reproduce and reuse them in their own research. One goal of OpenML is to simplify the comparison of new algorithms and implementations to existing approaches by comparing to the results on OpenML. To this end we also provide an interface for integrating new machine learning libraries with OpenML and we have already integrated scikit-learn. OpenML-Python can then be used to set up and conduct machine learning experiments for a given task and flow (an ML pipeline including hyperparameters and random states), and publish reproducible results.

Use cases in published works. OpenML-Python has already been used to scale up studies with hundreds of consistently formatted datasets (Feurer et al., 2015; Fusi et al., 2018), supply large amounts of meta-data for meta-learning (Perrone et al., 2018), answer questions about algorithms such as hyperparameter importance (van Rijn and Hutter, 2018) and facilitate large-scale comparisons of algorithms (Strang et al., 2018).

3 High-level Design of OpenML-Python

The OpenML platform is organized around several entity types which describe different aspects of a machine learning study. It hosts datasets, tasks that define how models should be evaluated on them, flows that record the structure and other details of ML pipelines, and runs that record the experiments evaluating specific flows on certain tasks. For instance, an experiment (run

) shared on OpenML can show how a random forest (

flow) performs on ‘iris’ (dataset) if evaluated with 10-fold cross-validation (task), and how to reproduce that result. In OpenML-Python, all these entities are represented by classes, each defined in their own submodule. This implements a natural mapping from OpenML concepts to Python objects. While OpenML is an online platform, we facilitate offline usage as well.

Plugins. To allow users to automatically run and share machine learning experiments with different libraries through the same OpenML-Python interface, we designed a plugin interface that standardizes the interaction between machine learning library code and OpenML-Python. We also created a plugin for scikit-learn (Pedregosa et al., 2011), as it is one of the most popular Python machine learning libraries. This plugin can be used for any library which follows the scikit-learn API (Buitinck et al., 2013).

A plugin’s responsibility is to convert between the libraries’ models and OpenML flows, interact with its training interface and format predictions. For example, the scikit-learn plugin can convert an OpenMLFlow

to an Estimator (including hyperparameter settings), train models and produce predictions for a task, and create an

OpenMLRun object to upload the predictions to the OpenML server. The plugin also handles advanced procedures, such as scikit-learn’s random search or grid search and uploading its traces (hyperparameters and scores of each model evaluated during search).
We are working on more plugins, and anyone can
contribute their own using the scikit-learn plugin
implementation as a reference.

SVM hyperparameter contour plot generated by the code in Figure 1.

1import openml; import numpy as np
2import matplotlib.pyplot as plt
3df = openml.evaluations.list_evaluations_setups(
4    ’predictive_accuracy’, flow=[8353], task=[6],
5    output_format=’dataframe’, parameters_in_separate_columns=True,
6) # Choose an SVM flow (e.g. 8353), and the dataset ’letter’ (task 6).
7hp_names = [’sklearn.svm.classes.SVC(16)_C’,’sklearn.svm.classes.SVC(16)_gamma’]
8df[hp_names] = df[hp_names].astype(float).apply(np.log)
9C, gamma, score = df[hp_names[0]], df[hp_names[1]], df[’value’]
10cntr = plt.tricontourf(C, gamma, score, levels=12, cmap=’RdBu_r’)
11plt.colorbar(cntr, label=’accuracy’)
12plt.xlim((min(C), max(C))); plt.ylim((min(gamma), max(gamma)))
13plt.xlabel(’C (log10)’, size=16); plt.ylabel(’gamma (log10)’, size=16)
14plt.title(’SVM performance landscape’, size=20)
Figure 1:

Code for retrieving the predictive accuracy of an SVM classifier on the ‘letter’ dataset and creating a contour plot with the results.

4 Examples

We show two example uses of OpenML-Python to demonstrate its API’s simplicity. First, we show how to retrieve results and evaluations from the OpenML server in Figure 1 (generating the plot on the right). Second, in Figure 2 we show how to conduct experiments on a benchmark suite (Bischl et al., 2019). Further examples, including how to create datasets and tasks and how OpenML-Python was used in previous publications, can be found in the online documentation.333We provide documentation and code examples on and host the project on

1import openml
2import sklearn.tree, sklearn.impute, sklearn.pipeline
3# obtain a benchmark suite
4benchmark_suite =’OpenML-CC18’)
5clf = sklearn.pipeline.Pipeline(steps=[
6    (’imputer’, sklearn.impute.SimpleImputer()),
7    (’estimator’, sklearn.tree.DecisionTreeClassifier()),
8])  # build a sklearn classifier
9for task_id in benchmark_suite.tasks:  # iterate over all tasks
10    task = openml.tasks.get_task(task_id)  # download the OpenML task
11    run = openml.runs.run_model_on_task(clf, task)  # run classifier on splits
12    # run.publish()  # upload the run to the server, optional
Figure 2:

Training and evaluating a decision tree classifier from scikit-learn on each task of the OpenML-CC18 benchmark suite 

(Bischl et al., 2019).

5 Project development

The project has been set up for development through community effort from different research groups, and has received contributions from numerous individuals. The package is developed publicly through Github which also provides an issue tracker for bug reports, feature requests and usage questions. To ensure a coherent and robust code base we use continuous integration for Windows and Linux as well as automated type and style checking. Documentation is also rendered on continuous integration servers and consists of a mix of tutorials, examples and API documentation.

For ease of use and stability, we use well-known and established third-party packages where needed. For instance, we build documentation using the popular sphinx Python documentation generator444           5, use an extension to automatically compile examples into documentation and Jupyter notebooks5

, and employ standard open-source packages for scientific computing such as

numpy, scipy (Virtanen et al., 2019), and pandas (McKinney, 2010). The package is written in Python3 and open-sourced with a 3-Clause BSD License.3

6 Conclusion

OpenML-Python allows easy interaction with OpenML from within Python. It makes it easy for people to share and reuse the data, meta-data, and empirical results which are generated as part of an ML study. This allows for better reproducibility, simpler benchmarking and easier collaboration on ML projects. Our software is shipped with a scikit-learn plugin and has a plugin mechanism to easily integrate other ML libraries written in Python.

MF, NM and FH acknowledge funding by the Robert Bosch GmbH. AK, JvR and FH acknowledge funding by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant no. 716721. JV and PG acknowledge funding by the Data Driven Discovery of Models (D3M) program run by DARPA and the Air Force Research Laboratory. The authors also thank Bilge Celik, Victor Gal and everyone listed at for their contributions.