Super Learning in the SAS system

by   Alexander P Keil, et al.

Background and objective: Stacking is an ensemble machine learning method that averages predictions from multiple other algorithms, such as generalized linear models and regression trees. A recent iteration of stacking, called super learning, has been developed as a general approach to black box learning and has seen frequent usage, in part due to the availability of an R package. I develop super learning in the SAS software system using a new macro, and demonstrate its performance relative to the R package. Methods: I follow closely previous work using the R SuperLearner package and assess the performance of super learning in a number of domains. I compare the R package with the new SAS macro in a small set of simulations assessing curve fitting in a prediction model, a set of 14 publicly available datasets to assess cross-validated, expected loss, and data from a randomized trial of job seekers' training to assess the utility of super learning in causal inference using inverse probability weighting. Results: Across the simulated data and the publicly available data, the macro performed similarly to the R package, even with a different set of potential algorithms available natively in R and SAS. The example with inverse probability weighting demonstrated the ability of the SAS macro to include algorithms developed in R. Conclusions: The super learner macro performs as well as the R package at a number of tasks. Further, by extending the macro to include the use of R packages, the macro can leverage both the the robust, enterprise oriented procedures in SAS and the nimble, cutting edge packages in R. In the spirit of ensemble learning, this macro extends the potential library of algorithms beyond a single software system and provides a simple avenue into machine learning in SAS.


An R package for parametric estimation of causal effects

This article explains the usage of R package CausalModels, which is publ...

CausalML: Python Package for Causal Machine Learning

CausalML is a Python implementation of algorithms related to causal infe...

SLEM: Machine Learning for Path Modeling and Causal Inference with Super Learner Equation Modeling

Causal inference is a crucial goal of science, enabling researchers to a...

The JuliaConnectoR: a functionally oriented interface for integrating Julia in R

Like many groups considering the new programming language Julia, we face...

tmleCommunity: A R Package Implementing Target Maximum Likelihood Estimation for Community-level Data

Over the past years, many applications aim to assess the causal effect o...

Snap Machine Learning

We describe an efficient, scalable machine learning library that enables...

Please sign up or login with your details

Forgot password? Click here to reset