The PHOTON Wizard – Towards Educational Machine Learning Code Generators

02/13/2020
by   Ramona Leenings, et al.
0

Despite the tremendous efforts to democratize machine learning, especially in applied-science, the application is still often hampered by the lack of coding skills. As we consider programmatic understanding key to building effective and efficient machine learning solutions, we argue for a novel educational approach that builds upon the accessibility and acceptance of graphical user interfaces to convey programming skills to an applied-science target group. We outline a proof-of-concept, open-source web application, the PHOTON Wizard, which dynamically translates GUI interactions into valid source code for the Python machine learning framework PHOTON. Thereby, users possessing theoretical machine learning knowledge gain key insights into the model development workflow as well as an intuitive understanding of custom implementations. Specifically, the PHOTON Wizard integrates the concept of Educational Machine Learning Code Generators to teach users how to write code for designing, training, optimizing and evaluating custom machine learning pipelines.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

07/28/2021

Developing Open Source Educational Resources for Machine Learning and Data Science

Education should not be a privilege but a common good. It should be open...
08/15/2019

Tracing Player Knowledge in a Parallel Programming Educational Game

This paper focuses on "tracing player knowledge" in educational games. S...
02/13/2020

PHOTON – A Python API for Rapid Machine Learning Model Development

This article describes the implementation and use of PHOTON, a high-leve...
04/03/2018

Vanlearning: A Machine Learning SaaS Application for People Without Programming Backgrounds

Although we have tons of machine learning tools to analyze data, most of...
03/27/2018

How Developers Iterate on Machine Learning Workflows -- A Survey of the Applied Machine Learning Literature

Machine learning workflow development is anecdotally regarded to be an i...
03/28/2019

Building Automated Survey Coders via Interactive Machine Learning

Software systems trained via machine learning to automatically classify ...
09/08/2020

Procedural Generation of STEM Quizzes

Electronic quizzes are used extensively for summative and formative asse...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The field of Machine Learning (ML) has drawn unparalleled interest from science and industry alike. Recent efforts to democratize the accessibility [1, 2, 3, 4, 5] yielded both free educational content and an increasing number of well-maintained, open-source ML software libraries [6, 7, 8, 9, 10, 11, 12]. However, their effective use still requires substantial programming skills, rendering many breakthroughs inaccessible to researchers in the applied sciences.

In general, educational resources for acquiring the requested skills already exist [13, 14, 15, 16]

. A plethora of online courses introducing different programming languages and basic programming concepts are freely available. Likewise, many online tutorials showcase function and syntax of ML software libraries and explain their usage. For real-life ML projects, however, the exemplary solutions are usually insufficient: In order to handle common obstacles such as class imbalance, small sample sizes, and the integration of different data modalities, suitable algorithms for the problem at hand must not only be identified, but integrated across diverse software libraries and coding paradigms. In addition, the basic ML workflow - including training, evaluation and hyperparameter optimization - needs to adhere to best-practice guidelines potentially requiring e.g. nested cross-validation and hyperparameter optimization. Finally, the data format must be adapted to suit different library-specific requirements. Thus, for an inexperienced user the task of implementing efficient and effective ML solutions suitable for the learning task at hand may appear insurmountable.

While providing an easy-to-use alternative, graphical user interfaces (GUIs) [17] come at a cost: First, in comparison to the vast number of options offered in common Python libraries, the functionality accessible via the GUI is usually very limited. Especially in an applied-science context, covering all suitable elements such as learning algorithms, modality-specific processing, performance metrics, hyperparameter optimization algorithms and cross-validation strategies in one particular application appears impossible. Second, as GUI-based ML tools are self-contained, users depend on the availability and maintenance of the software system. Furthermore, the implementation of methodological advances is at the sole discretion of the product developer, hampering innovation especially in a scientific context.

In contrast, we envision a system that utilizes the advantages of a GUI to convey the necessary coding skills. We propose an Educational Machine Learning Code Generator framework that mirrors existing GUI-based systems and enables instant access to ML, but in addition translates each GUI interaction into valid source code. Instant feedback lets users experience the impact of their choices on the underlying code, intuitively conveying insights into the emergence of custom solutions. While eleviating the burden of coding and structuring the ML workflow, a basic understanding of theoretical ML concepts - e.g. the choice of algorithms or cross-validation - is however required. The concept of Educational Machine Learning Code Generators aims at minimizing the entry barrier for users possessing theoretical knowledge about fundamental ML concepts, in order to enable them to conduct analysis right away. We thereby hope to increase user motivation and to mitigate the conflict of accessibility versus flexibility.

While generally possible with any ML library or toolbox, we choose to build our Educational Machine Learning Generator based on the Python library PHOTON [18]. PHOTON introduces a layer of abstraction to the ML model development workflow, enabling the design of advanced machine learning pipelines and encapsulating several existing machine learning toolboxes [6, 8, 9, 10, 11, 12] into a single and straightforward Application Programming Interface (API) [19]. Thereby it creates a framework already integrating the infrastructure needed for processing data from different modalities with various data processing and learning algorithms, different strategies for hyperparameter optimization and state-of-the-art performance evaluation using nested cross validation.

In the following, we will first establish objectives for an Educational Machine Learning Code Generator and then describe a proof-of-concept software implementation; the PHOTON Wizard. Finally, the notion of Educational Machine Learning Code Generators and the PHOTON Wizard implementation are discussed in the context of applied-science ML and future developments and conclusions are outlined.

2 Methods

2.1 Objectives for Educational Machine Learning Code Generators

Code generators are tools that translate input information into valid source code [20]. In the use case of educational Machine Learning infrastructure, we intend to utilize this process in order to lecture users on code implementation. In particular, GUI control operations shall be translated to appropriate source code capable of executing the desired action. While usability of GUIs in general and GUI control elements in particular is a widely researched field, we would like to add some objectives in regard to the educational component of such a code generating GUI system.

Structure. An object structure is needed which divides the problem into sub-tasks that can be presented as coherent GUI control elements and code units. In addition, the object structure educates the user on available choices, key components and best-practices.

Feedback. The key to successful learning is the direct connection between GUI interactions and code output. User GUI interactions are represented in form of appropriate changes to the generated code.

Transparency. While implicit (hidden) default settings can foster readability and increase convenience in GUIs, an educational approach requires all information to be explicit, i.e. the source code must be transparently mapped to the GUI control elements. This is true especially for e.g. imports or other non-trivial settings across the entire target scope.

Information. As the main goal of the system is to educate users, supplying explanations in the GUI is a key incentive. Specifically, the user should have direct access to background knowledge and further learning resources regarding the meaning of a particular concept, an algorithm or a parameter e.g. in form of explanative tooltips, educational texts, videos or links.

2.2 The PHOTON Wizard


Figure 1: The PHOTON Wizard web application. Here, step two is depicted collecting information about training, evaluation and optimization. The user chooses suitable hyperparameter optimization and cross validation strategies including their respective parameters. Simultaenously, Python source code is generated matched to the user’s choices and GUI interactions.

In the following, we will introduce the PHOTON Wizard, a proof-of-concept implementation showcasing the idea of Educational Machine Learning Code Generators.

The PHOTON Wizard is a web application that enables the design of custom ML pipelines based on the PHOTON framework, a Python library functioning as high-level API to rapid ML model development [18]. PHOTON automatizes the repetitive development, training, hyperparameter optimization, and evaluation workflow and integrates several state-of-the-art toolboxes [6, 7, 8, 9, 10, 11, 12] covering data processing and learning algorithms, modality-specific functions, hyperparameter optimization strategies and other useful concepts such as data augmentation and class balancing strategies. Due to the wide scope of PHOTON, it is perfectly suitable to educate users on the development of machine learning models.

In the PHOTON Wizard, we conceptualize an ML project as a basic scaffold that can be customized with building blocks, such as data processing and learning algorithms, hyperparameter optimization strategies, performance metrics etc., selectable from a variety of options. For example, hyperparameter optimization strategies are represented as control elements (SelectBoxes) and can be chosen from a predefined selection (see figure 1). In addition, data processing steps can be selected, customized, ordered (see figure 2) and finally, one or more learning algorithms can be added to the pipeline. For each building block, default settings or hyperparameters are provided, which are internally adjusted to previous user choices. For example, performance metrics are filtered according to the prior specification of the analysis type (regression or classification) and the number of cross-validation folds is adjusted in relation to the number of training data available.

In addition, we divide the design process in steps that subsequently request user input in form of ML design choices. In accordance with the common terminology for stepwise assisted setup approaches, we name the application PHOTON Wizard [21]. Implicitly, this type of user interface conveys information about how to break down complex tasks - here the development of a ML model - into a series of simpler steps.

Importantly, feedback is provided in form of changes in the generated source code reflecting the choice of GUI elements. In this way, the operational concepts of the GUI are directly translated into information on how to achieve the same action via PHOTON code. Each time the user finishes specifications in one of the steps, the source code adapts accordingly.

Finally, a complete and fully functional code script is generated, including Python import statements as well as Python statements to load training and test data using the Python library pandas [22].

In order to conveniently present educational information, each selectable item is associated with a tooltip containing a short explanation, helpful tips and links to further online resources. Also, aim and scope as well as important key considerations for each step of the ML workflow are shortly described at the top of the page.

The PHOTON Wizard is implemented in Flask [23], a Python micro framework for web development adhering to the MVC Pattern [24]. Data is stored in the open source document-oriented MongoDB database [25] and serialized using the pymodm library [26]. In order to facilitate the information exchange between GUI elements and database objects, we introduce a custom Binding layer [27] that extracts the user input from the submitted requests and stores them in the corresponding data entities. In order to enhance usability for the selection and ordering of the data processing algorithms within the pipeline, we use a javascript plugin (JQuery Sortable [28]) to dynamically drag and drop items.


Figure 2: The PHOTON Wizard allows to dynamically design a machine learning pipeline. Here, graphical control elements (SelectBoxes and TextBoxes) are used to select and customize a sequence of data processing steps. The order of the items can be changed via drag-and-drop. When hovering over a particular data processing algorithm or a respective parameter, tooltips showcase further information.

The content of the PHOTON Wizard, i.e. all available building blocks such as data processing and learning algorithms, hyperparameter optimization and cross validation strategies as well as performance metrics are easily adaptable via Excel files. Within these excel files, default parameters can be defined that nudge users to sensible default values according to prior selections. We use a tag system to identify both suitable elements and default parameters, e.g. to limit the available performance metrics to those suitable for a regression task. Additional elements can be integrated via the PHOTON registry.

A demo of the PHOTON Wizard can be seen online, while the source code is publicly available under the GNU General Public License v3.0 [29] on github. In order to facilitate hosting and deployment, we also include a docker image specification file.

3 Discussion

With the PHOTON Wizard, we introduce a GUI that generates valid Python source code to design, optimize and evaluate custom Machine Learning pipelines. We thereby enable non-programmers to apply Machine Learning and - most importantly - by showing how GUI actions translate to source code foster the capability of implementing custom ML solutions.

While other excellent educational resources exist, our approach more directly engages the users, helping to overcome typical limitations of a GUI and translate their often exceptional understanding of a problem domain and its technical concepts into practice.

In the future, we would like to extend the GUI to include additional ML elements; particularly those specific to data modalities.

While providing users with an easily accessible tool to generate valid source code, an advanced learning task with e.g. a GPU-based, distributed computation approach, still requires in-depth knowledge about computational infrastructure, potentially excluding certain groups. To simplify this, future additions may include dedicated docker containers with pre-installed libraries etc.

4 Conclusion

With the concept of Educational Machine Learning Code Generators, we aim to mitigate the trade-off between accessibility and flexibility in the context of ML software. Transforming GUI-based choices into valid, executable code in real-time combines easy accessibility with insights into the implementation of custom code. In addition, further content-related extensions are facilitated as they now only require changes to pre-generated code. Finally, seeing one’s choices in the GUI manifest into valid code in real-time has, in our experience, greatly increased user’s motivation to learn how to code. We introduce the PHOTON Wizard as a concrete, proof-of-concept web application for building custom machine learning pipelines on top of the Python PHOTON framework. In addition to a theoretical didactic concept, we thus provide a hands-on implementation for anyone to try. With this, we hope to foster code understanding, convey basic knowledge about the machine learning workflow, integrate current best-practice for valid performance estimations and, exemplified in PHOTON, contribute to an intuitive understanding of how to develop custom solutions.

References

  • [1] Jeffrey Dean. Bringing the benefits of AI to everyone, 2019.
  • [2] Kartik Hosangar. The Democratization of Machine Learning: What It Means for Tech Innovation, 2017.
  • [3] Bibb Allen, Sheela Agarwal, Jayashree Kalpathy-Cramer, and Keith Dreyer. Democratizing AI. Journal of the American College of Radiology, 16(7):961–963, jul 2019.
  • [4] Microsoft. Democratizing AI - For every person and every organization, 2016.
  • [5] Joaquin Vanschoren, Jan N van Rijn, Bernd Bischl, and Luis Torgo. OpenML: Networked Science in Machine Learning. SIGKDD Explorations, 15(2):49–60, 2013.
  • [6] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and Others. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(Oct):2825–2830, 2011.
  • [7] Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake Vanderplas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. API Design For Machine Learning Software: Experiences From the scikit-learn Project. sep 2013.
  • [8] Mart’in Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, GregS.Corrado, AndyDavis, JeffreyDean, MatthieuDevin, SanjayGhemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng.

    {TensorFlow}: Large-Scale Machine Learning on Heterogeneous Systems, 2015.

  • [9] François Chollet and Others. Keras, 2015.
  • [10] Guillaume Lemaitre, Fernando Nogueira, and Christos K Aridas. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine Learning Research, 18(17):1–5, 2017.
  • [11] Tim Head, MechCoder, Gilles Louppe, Iaroslav Shcherbatyi, Fcharras, Zé Vinícius, Cmmalone, Christopher Schröder, Nel215, Nuno Campos, Todd Young, Stefano Cereda, Thomas Fan, Rene-rex, Kejia (KJ) Shi, Justus Schwabedal, Carlosdanielcsantos, Hvass-Labs, Mikhail Pak, SoManyUsernamesTaken, Fred Callaway, Loïc Estève, Lilian Besson, Mehdi Cherti, Karlson Pfannschmidt, Fabian Linzberger, Christophe Cauet, Anna Gut, Andreas Mueller, and Alexander Fabisch. Scikit-optimize, 2018.
  • [12] Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. Sequential Model-Based Optimization for General Algorithm Configuration. In International Conference on Learning and Intelligent Optimization, pages 507–523. Springer, 2011.
  • [13] Andrew (Stanford University) Ng and Daphne Koller. Coursera - Machine Learning, 2011.
  • [14] Deeplearning.ai, 2020.
  • [15] Eren Bali, Oktay Caglar, and Gagan Biyani. Udemy DataScience, 2009.
  • [16] Jason Brownlee. Machine Learning Mastery, 2014.
  • [17] Steve Krug. Don’t Make Me Think: A Common Sense Approach to Web Usability, 2nd Edition. New Riders, 2005.
  • [18] Ramona Leenings, Nils Ralf Winter, Lucas Plagwitz, Vincent Holstein, Jan Ernsting, Julian Gebker, Jakob Steenweg, Kelvin Sarink, Daniel Emden, Dominik Grotegerd, Nils Opel, Benjamin Risse, Xiaoyi Jiang, Udo Dannlowski, and Tim Hahn. PHOTON - A high level Python API for Designing and Optimizing Machine Learning Pipelines., 2020.
  • [19] Daniel Jacobson. Apis: A Strategy Guide. O’Reilly Media, 2011.
  • [20] Steven Kelly and Juha-Pekka Tolvanen. Domain-Specific Modeling: Enabling Full Code Generation. Wiley-IEEE Computer Society Pr, 2008.
  • [21] Wizards (software), 2020.
  • [22] Wes McKinney. Data Structures for Statistical Computing in Python. In Stéfan van der Walt and Jarrod Millman, editors, Proceedings of the 9th Python in Science Conference, pages 51–56, 2010.
  • [23] Miguel Grinberg. Flask Web Development: Developing Web Applications with Python. O’Reilly Media, 2014.
  • [24] A Leff and J T Rayfield. Web-application development using the Model/View/Controller design pattern. In Proceedings Fifth IEEE International Enterprise Distributed Object Computing Conference, pages 118–127, 2001.
  • [25] Dwight Merriman, Eliot Horowitz, and Kevin Ryan. MongoDB. 2007.
  • [26] Luke Lovett and Shane Harvey. PyMODM.
  • [27] Shravan Kumar Kasagoni. Building Modern Web Applications Using Angular. Packt Publishing, 2017.
  • [28] JQuery Sortable, 2020.
  • [29] GNU General Public License.