
PKLM: A flexible MCAR test using Classification
We develop a fully nonparametric, fast, easytouse, and powerful test ...
read it

ricu: R's Interface to Intensive Care Data
Providing computational infrastructure for handling diverse intensive ca...
read it

Predicting sepsis in multisite, multinational intensive care cohorts using deep learning
Despite decades of clinical research, sepsis remains a global public hea...
read it

Proper Scoring Rules for Missing Value Imputation
Given the prevalence of missing data in modern statistical research, a b...
read it

Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression
We propose an adaptation of the Random Forest algorithm to estimate the ...
read it

High Probability Lower Bounds for the Total Variation Distance
The statistics and machine learning communities have recently seen a gro...
read it

Fair Data Adaptation with Quantile Preservation
Fairness of classification and regression has received much attention re...
read it

A direct approach to detection and attribution of climate change
We present here a novel statistical learning approach for detection and ...
read it

Causal discovery in heavytailed models
Causal questions are omnipresent in many scientific problems. While much...
read it

Spectral Deconfounding and Perturbed Sparse Linear Models
Standard highdimensional regression methods assume that the underlying ...
read it

RSVPgraphs: Fast Highdimensional Covariance Matrix Estimation under Latent Confounding
In this work we consider the problem of estimating a highdimensional p ...
read it

Anchor regression: heterogeneous data meets causality
This is a preliminary draft of "Anchor regression: heterogeneous data me...
read it

GroupingByID: Guarding Against Adversarial Domain Shifts
When training a deep network for image classification, one can broadly d...
read it

Symmetric Rank Covariances: a Generalised Framework for Nonparametric Measures of Dependence
The need to test whether two random vectors are independent has spawned ...
read it

Preserving Differential Privacy Between Features in Distributed Estimation
Privacy is crucial in many applications of machine learning. Legal, ethi...
read it

Scalable Adaptive Stochastic Optimization Using Random Projections
Adaptive stochastic gradient methods such as AdaGrad have gained popular...
read it

The xyz algorithm for fast interaction search in highdimensional data
When performing regression on a dataset with p variables, it is often of...
read it

DUALLOCO: Distributing Statistical Estimation Using Random Projections
We present DUALLOCO, a communicationefficient algorithm for distribute...
read it

backShift: Learning causal cyclic graphs from unknown shift interventions
We propose a simple method to learn linear causal cyclic models in the p...
read it

On bbit minwise hashing for largescale regression and classification with sparse data
Largescale regression problems where both the number of variables, p, a...
read it

Minimum Distance Estimation for Robust HighDimensional Regression
We propose a minimum distance estimation method for robust regression in...
read it

Random Intersection Trees
Finding interactions between variables in large and highdimensional dat...
read it

Node harvest
When choosing a suitable technique for regression and classification wit...
read it

Pvalues for highdimensional regression
Assigning significance in highdimensional regression is challenging. Mo...
read it