In the 21st century, we take it for granted how to harness the potential of electricity. It may seem incomprehensible that specialist engineers had to be hired to operate electric generators in situ at homes just to run light bulbs in the early days. It required a coming together of great minds, ripe business opportunities, technological breakthroughs, and abundance of patience, to turn electricity into what we now consider a mere utility. While Machine Learning(ML) and Artificial Intelligence(AI) have been around for a long time, with roots in various disciplines such as a Statistics, Computer Science, Neuroscience, Physics, Philosophy, among others, the ecosystem seems to be ripe for deriving utilitarian value out of them. We are already witnessing their proven utility in Information Retrieval, and e-commerce, to name a few. But it is just the tip of the ice-berg. They are a part of a much larger, sweeping 4th Industrial Revolution[Schwab], whose ramifications are too significant to ignore, and too complex to fathom. As it stands today, AI222AI & ML, Explainability & Interpretability, Models & Algorithms are used interchangeably for the sake of simplicity is a specialized field, and a select few have the expertise required to shape them for the benefit of the society at large. A plausible solution is to democratize the very creation, distribution, and consumption of the technology. A concerted cooperation between public and private entities in creating both scientific and technological tools is necessary. Governments are reacting by creating national policies and task forces [niti], and new laws are being formed to protect the rights of data citizens[gdpr, futureai]. Many institutions, both academic and private[mit, OpenAI, AllenAI, google, h2o] are addressing this problem by heavily investing in advancing the science and technology. Below, we take a cursory look at some desirable attributes of AI that make it responsible, and responsive.
Portable: Separate the creation of AI from its consumption
JAVA transformed the enterprise software by making it portable across platforms – code once written, can be run everywhere. Likewise, intelligence (encoded in the form of models) once created, shall run anywhere. Some existing approaches are via porting models in PMML[PMML] format, or compile them to ONNX[ONNX] format, implemented in winML[WinML]. Another approach is to embrace write-once, scale-anyhow paradigms like Spark[spark] and Julia[bezanson2017julia]. Our approach is discussed in Section 2.1.
Explainable222AI & ML, Explainability & Interpretability, Models & Algorithms are used interchangeably for the sake of simplicity: Provide plausible explanations, along with predictions When decisions are made by AI agents in high stakes environments, such as those encountered in FinTech, EdTech, and Healthcare, it is imperative that a human being would be able to interpret those decisions in the context of the situation, and explain how the agent has come to that decision[somadot, Rudin]. Tools such as LIME[LIME], CORELS[CORELS], DoWhy[dowhy], shapely[shapely] are some practical approaches to providing explanations. For an overview see here[molnar]. Our approach is outlined in Section 2.2.
, provide uncertainty estimates, along with predictions, in terms of standard errors and p-values. Bayesian counterparts produce credible intervals. Statistics community lays enormous importance on producing such inference summaries. Unfortunately, ML tools rarely provide such reports, as they are primarily concerned with prediction tasks alone. This problem is exasperated in the Deep Learning space, where they reportedly can make confident mistakes[nguyen2015deep]. How can we make AI credible by not only making predictions, but also by reporting coverage intervals and other uncertainty estimates around the predictions? For a curated list of papers, and tools, see [uq-deep].
Fair: Make AI-bias free and equitable
It is said, what we eat is what we are and what we think is what we become. AI is no different. What is predicted is based on what data the AI systems are fed. As a result, AI systems can make biased, unethical decisions, putting some sections of the community at risk. How can AI be made fair and equitable? Recent, on going work by IBM, provides some alternatives to detecting bias, and acting on it [aif360-oct-2018].
Decentralized and Distributable - Let models go where data is, instead of the other way round
All internet-first commercial players rely on user generated data to provide hyper personalized products. Although such products provide tangible benefits to the end users, they come at the cost of risking private data and security threats. A decentralized system wherein the data remains within the users’ premises can potentially avoid such privacy issues. This pivoting has lead to the popularization of techniques such as Secure Multiparty Computation(SMPC) and Federated Learning. SMPC restricts exposing model weights, while Federated Learning takes care of securely distributing the training process to multiple worker nodes. Frameworks like PySyft[Pysyft] makes SMPC and Federated Learning intuitive and accessible to machine learning developers. Decentralizing the distribution channel of data, and intelligence, can mitigate some risks of centralized control.
Declarative: Say what the model needs and what it should do. Do not worry about how it should go about doing it
An important aspect of ML model development is getting the right data, and model management. Many practitioners estimate that more than 80% of Developer time goes in the ETL (Extract, Transform, Load) jobs. However, Developers should only be concerned with what they need and not how they should get the data. This problem is largely addressed by the databases world with standards like SQ. Many databases, including NoSQL databases, support SQL for querying and retrieving the data. Project Morpheous[cypher] certainly looks very interesting as it makes graph and tabular data interoperable seamlessly. It is a practical alternative to associative arrays that attempt to unify RDBS, Graphs, and NoSQL databases[d4d]. That means that Data Scientist can leverage SQL/OpenCypher for asking what they need and offload the “getting” part to the database’s compute engines. While such open standards are not available for model specification, the workhorse ’lm’ package in R allows models to be specified in the formula syntax. And SAS’s language certainly looks like the SQL analogue for model specification. Can we make SQL/OpenCypher the de facto declarative language for ETL and develop a standard to declaratively specify the entire pipeline, including model specification and data flow specification? For an outline of the proposal, see here[dagger].
Reproducible: Reproduce any result, on-demand
Model building and doing Data Science is experimental in nature. Unless carefully managed, reproducing the results is next to impossible, particularly when working in distributed environments. At the very least, all results shall be reproducible. It is possibly by ensuring that 1) Data 2) Models and 3) Run Time Environments are versioned. An open source ML-As-A-Service platform,daggit[daggit], promises to handle this responsibility by using DVC[dvc], git[git] and docker[docker], for versioning Data, Models, and Run Times, respectively.
The above list is neither comprehensive nor exhaustive. There are many other challenges to over come such as making AI systems scalable, auditable, actionable, governable, discoverable among others. But, as it appears, solutions are fragmented, and a holistic viewpoint is missing. At mlsquare, we are developing a single point of interface to bring several solutions together. In the next Sections, we focus on the specific solutions we are building.
2 Democratizing AI at mlsquare
We provide an extensible Python framework that incorporates the above mentioned tools, where possible, and innovate where necessary. The following are the design goals:
Bring Your Own Spec First. Use existing, well-documented APIs, such as those in sklearn for ML. Only provide implementations or extend the functionality where lacking.
Bring Your Own Experience First: Minimal changes shall be introduced in the development workflow. For example, with just one additional line of code to standard model fitting in sklearn, the resulting model can run on a GPU.
Consistent: All the quality attributes of AI described earlier shall become first class methods of a model object with a consistent interface. It should not be necessary to stitch different, possibly incompatible, functionalities.
Compositional: Deep Learning, when looked from a technology perspective, is a lego block computational framework for composing models. They can support inputs and outputs of varying shape, size, and nature. It allows many models to be expressed compositionally, without having to implement every piece of it. The lego block architecture gives a lot of expressive power to the model designer. With backing from all hardware vendors, and framework developers, hardware acceleration is an added benefit.
: Thanks to inherent modularity of Deep Learning technology, exploit the inherent object-orientedness of many algorithms. For example, when developing a Decision Tree equivalent in Deep Neural Network (DNN)[DNDT], write the Decision Tree as a DNN layer. Then, one could immediately use it either in a classification or regression task. Even one can develop a Kernel Decision Tree. This modularization amplifies the expressiveness.
Extensible: All the provided implementations shall be extensible or default behaviours can be overwritten if one chooses to.
In the following subsections, we describe our approach to providing portability and explainability.
As ML is practiced today, Data Scientist or an ML Scientist would write the models in a particular language (eg Python), and a Production Engineer would rewrite them in a different language or reimplement them in a different tech stack – scalability being the primary concern. This is called as the two language problem. One way to deal with the two language problem is to unify the tech stack. Languages like Julia claim that same piece of code can scale from a laptop to a cluster. Frameworks like Apache Spark also support multiple languages, and scale from a single CPU system to a cluster. But a developer is tied to a single ecosystem. Another way to achieve portability is by means by having an intermediate representation of the models such as PMML and ONNX. Many Deep Learning frameworks such as a TensorFlow[TensorFlow],PyTorch[PyTorch] and MXNet[MXNet] support ONNX. Of interest is WinML which allows saving sklearn, xgboost[XgBoost] models in ONNX. It does so by reimplementing the models in a target language like TensforFlow. While such exact reimplementations might provide high fidelity portings, extending and supporting the models can be tedious and time consuming, if not impossible. Rather than providing such one-to-one operator level mappings of models, we suggest a more generic semantic mapping of models as an efficient alternative. We utilize three different approaches that enable this transpilation process, each with its own advantages and disadvantages. We focus on supervised tasks.
Exact Semantic Map
When an exact equivalence between a user supplied model(referred to as primal model), and it’s neural network counterpart(referred to as proxy model) exists, we can initialize the weights of the neural network with the parameters of the fitted model, and just save the model. This can be achieved for many classical models that fall under the Generalized Linear Models umbrella. An advantage of this approach is that the proxy model is faithful to the primal model.
Approximate Semantic Map
In this approach, the proxy model is trained on same data as the primal model, but its target labels are the predictions from the primal model. Further, the proxy model’s architecture is chosen so as to closely fulfill proxy model’s intent. Together, they ensure a better semantic approximation. In someways, the capacity of the proxy model is constrained by the primal model’s capacity. In Section cite results section
, we compare the the performance of such transpiled models. On the theoretical side, a Probably Approximately Faithful (PAF) framework has to be developed.
In the third approach, both the intent and implementation can be delegated to the proxy model. In someways, neural networks are used as generic black box functional approximators. This may be useful when the primal models can not be fit on large data sets, or there is no theoretical backing for a semantic map. Say, a user is interested in training a
sklearn Logistic Regressionmodel on a 1TB dataset. This would be extremely hard with existing sklearn implementations. On the other hand, a proxy model can be easily scaled with the underneath hardware. But the downside is that, fidelity between primal and proxy models may not be preserved.
Explainability of black-box machine learning models is extremely critical while developing machine learning applications. Typically, explainability is focused on explaining what a model did or how a model was making predictions. While they are certainly useful for a Data Scientist, they are not really the explanations that an end-user needs. Instead, the explanations shall be about the predictions, and what an end user can do about them. We approach explainability from this perspective. In particular, we define, at an abstract level, explanations as predicates in a First Order Logic system, represented in Conjunctive Normal Form (CNF). A proxy model shall produce such predicates as its output. In that light, we see producing explanations as a synthetic language generation problem.
The explain method provides explanations of a model’s output. explain
is model agnostic and provides interpretations as a decision tree path. We use recurrent neural networks to train on the outputs of a localized decision tree for the given training dataset. When fed with a new unseen data point, this RNN would output the corresponding decision tree path traversed. This path provides us with the decision taken by the model at each feature and acts as an interpretation for the same.explain is an ongoing research and is only available as a prototype.
3 Specifics of the Framework
The mlsquare framework utilizes multiple extensible components as shown in Fig.1. Portability in the framework is achieved with a single line of additional code. The user is expected to pass their standard machine learning model(primal model) to the dope function. dope then handles the responsibility of detecting and triggering the corresponding neural network architecture with the help of adapters and optimizers. Internally, a neural network architecture search (NAS) occurs and dope returns the optimal neural network model. The Python object returned is wrapped in such a way that it’s interface remains identical to that of primal model sent to dope. Below, we outline the components that act as the foundation for dope’ing.
dope accepts the primal model as an argument(see Fig. 2). Additional model configuration can be passed as keyword arguments. This function handles the responsibility of mapping the primal model to its neural network equivalent. Once the right architectural parameters are obtained, dope delegates the responsibility of wrapping the neural network model with the characteristics of the primal model to adapters. The adapter returns a Python Object that behaves like the primal model but under the hood utilizes neural networks. As shown in Fig. 1, dope is assisted by two other components - registry and adapters.
Adapters act as a connector between the primal model and the proxy model (i.e. the neural network equivalent). We currently support Keras[Keras] as the backend for our proxy models and sklearn for primal models. Adapters provide methods(ex - fit, score, predict etc.) that extend on the proxy model’s APIs. When the fit method is invoked, it triggers the Optimizer. The Optimizer receives model configurations and returns a trained model. Adapter then returns the trained model to dope. The save method exports the trained model as a Keras and ONNX models. The Adapter module can hold multiple Adapter classes for each type of "primal-proxy" mappings. For example, we currently have two classes in Adapters - SklearnKerasClassifier and SklearnKerasRegressor. A sklearnclassifier model, that needs to be mapped to a Keras model, can utilize the SklearnKerasClassifier adapter. New adapters can be written for interacting with other scientific libraries.
The neural network architecture of the proxy model can depend on the dataset it is being trained. Hence, we pass the proxy model through a neural architecture search (NAS) process. We use Tune[Tune] based on Ray[Ray] for our models’ NAS process. The optimizer component searches for best model and returns it to the adapter. Configuration of tune can be edited or reconfigured by passing the parameters via the fit method of the proxy model.
The registry object, like the name suggests, maintains a registry of models supported by ML Square. registry is a python dictionary object which is initialized while importing the mlsquare library. It expects a tuple of length two as key - name of the algorithm and the package name. registry then returns the corresponding proxy model and the adapter. registry can be used to register a model with the help of a decorator – just add @registry.register decorator in the models’ class definition. The model should contain adapter, model name and primal models’ module name. Then, upon calling dope, a corresponding NAS can be triggered.
The architectures module in mlsquare library provides the semantic mappings of the primal models. It is a Python class with various attributes to define the mappings. These attributes include the corresponding adapter of the model, module name of the primal model, name of the algorithm and model parameters required to initialize the neural network. To keep track of multiple versions of a given proxy model, an additional attribute called version is supported. There are three levels of class declarations involved in creating a model mapping. BaseClass is the topmost abstract base class (ABC). This class ensures that the basic methods expected for the functioning of a proxy model are not left undeclared. The next level is a generic parent class from which multiple algorithms can inherit common attributes. For example, the GenralizedLinearModel is used as a parent class for LinearRegression, LogisticRegression, RidgeRegression etc. The final level is declaring the proxy model itself as a class. This level includes declaring the key attributes required to create the model.
Taking a closer look at Fig. 3 reveals how the modular aspects of neural networks are being utilized in mlsquare
to accomplish semantic mappings. Tweaking the activation and loss function set in the model parameters of theproxy model that inherits GeneralizedLinearModel can provide a range of different models. It provides a lot of flexibility for the developers in deciding how a model mapping should be declared.
Figure 4 shows a typical user workflow while utilizing mlsquare’s dope functionality. Such a workflow involves three touchpoints with dope while transpiling the model.
import the dope function
pass the primal model to the dope function
use the transpiled model to train, score, predict or save it for future use
The following subsections provide details on the different methods available on the transpiled proxy model.
4.1 Developer Experience
The code snippet below demonstrates a typical workflow of training and scoring a SklearnLinear Regression model.
As mentioned earlier, adding one additional line of code(line no. 18) converts the sklearn model into a neural network. The object returned by dope() can then be used like any other sklearn model. The API interface remains consistent, with the only difference being that the training and scoring are performed on a neural network. The capabilities of this extended model are explained in detail in the following sections.
4.1.1 fit() method
Below shown is a code snippet for calling the fit() method.
When the fit method is called a neural architecture search is performed. If no additional parameters are given (line no.2), the default mappings are used to create and train the neural network. dope provides the option to customize the hyperparameters used for neural architecture search (line no. 8 and 9). In the above example, we customize the "optimizer" of the neural network. This, in turn, does a grid search on adam and nadam optimizers and returns the best performing model.
4.1.2 score() and predict() methods
The dope’d model contains both the score and predict methods like a sklearn model.
In the above example, the score method returns a list with two values - loss and accuracy. The predict method returns an array of predicted values.
4.1.3 Saving dope’d model
The save method available on the dope’d model allows exporting the trained model in two formats - ONNX and HDF5. The method expects a file name while saving the model.
The ONNX file can then be loaded, converted or used directly in different runtimes. We could score a sklearn model in chrome browser.
The framework currently contains a collection of seven sklearn models mapped to their neural network equivalents. Table 1 shows the performance comparison of each of these models. The comparisons are done by training and scoring each algorithm on a relevant dataset. The training and scoring are done on the standard sklearn model and its transpiled equivalent dope’d model.
|Algorithm||Dataset Name||Dope (acc/)||sklearn (acc/)|
|Linear Regression||UCI Diabetes||0.42||0.40|
|Ridge Regression||UCI Diabetes||0.44||0.42|
|Lasso Regression||UCI Diabetes||0.43||0.43|
|ElasticNet Regression||UCI Diabetes||0.44||0.44|
|Decision Tree Classifier||Iris||0.96||0.93|
The datasets chosen were UCI Iris for regression algorithms and UCI Diabetes datasets[UCI] for classification algorithms. The performance metric used was accuracy and score for classification and regression respectively. Table 1 results clearly shows that the performance of models transpiled by dope are on par with their sklearn counterparts’ performance.
A more exhaustive collection of trial runs and their corresponding results can be found in this link. This spreadsheet contains performance and configuration details of each new algorithm added to mlsquare. Once a new algorithm is added, a Python script trains and scores the algorithm with relevant datasets. By making it openly available to view the results, it helps us maintain accountability and quality of new algorithms added.
Table 2 depicts the available features[✓], ongoing work and future research on the mlsquare framework. One of the goals of building this framework is to propose and demonstrate the feasibility of a machine learning framework that can ease the process of democratizing AI, utilizing existing tools with very minimal changes. The mlsquare framework currently supports the portability feature(.save() method). It can convert above mentioned sklearn methods to a neural network model. We used Keras as our primary framework to create the transpiled neural network. So far our focus has been in developing the right architecture that can support a wide range of existing frameworks and in prototyping different custom features like save(), explain(), nas() and so forth.
|ML Square||Supported module||Algorithm||Status|
|Support Vector Machines||✓|
|Decision Tree Classifier||✓|
|Linear Discriminant Analysis|
|Decision Tree Regressor|
|Passive Aggressive Classifier|
The following list provides a gist of our ongoing research:
Extending save() to XgBoost, IRT[irt], Surprise[Surprise] and DeepCTR[DeepCTR]: Supporting these widely used frameworks will help us make deep learning techniques more accessible to a broader audience.
Adding support for Deep Random Trails(DeRT) in explain(): DeRT is an in-house explainability technique. We are currently working on on-boarding this tool to the explain method.
Adding support for PyTorch: As mentioned earlier, we currently use Keras to define the neural networks. We will soon be extending support to PyTorch as well.
Extending neural architecture search capabilities with AutoKeras[AutoKeras].
Bringing uncertainty quantification as a first class method on a model object.
mlsquare is an open-source framework built by committed and enthusiastic volunteers. We strongly believe in the power of community-driven solutions. If you find mlsquare’s mission exciting, you can contribute to the framework in the following ways -
Refer to this link to add new algorithms to dope that you might find be useful to other developers.
Please reach out to us at firstname.lastname@example.org, if you are interested in any of the ongoing research or yet-to-begin projects, marked as  and , respectively in Table 2.
This spreadsheet mentioned in Section 5 is an internal tool used by mlsquare to keep a track of the performance of every algorithm available on the framework. We would like to add more datasets and create an openly accessible API to execute training and inference sessions for anyone interested in contributing. If you know of an interesting dataset or is interested in extending this tool, please do let us know by dropping an email at email@example.com.
We are open to contributions at all levels(from documentation to architectural design inputs). To suggest changes to the framework, you can raise an issue on our GitHub repository.
In this paper, the need for democratizing AI, and several of its facets are introduced. An extensible Python framework is proposed towards that end. Details of different components of the framework and their responsibilities are discussed. The framework currently provides support for porting a subset of sklearn models to their approximately faithful Deep Neural Network counterparts, represented in ONNX format. An important take-away is that, Deep Learning architectures can be constrained to reflect well-understood modeling paradigms. It debunks a common misconception that Deep Learning requires big data. In order to democratize the very creation of this framework, and bring transparency into the process, we have created a leader board to access the quality metrics of the semantic maps. Several enhancements and extensions are in pipeline. A process is to put in place to allow community to contribute.