
Efficient computation and analysis of distributional Shapley values
Distributional data Shapley value (DShapley) has been recently proposed ...
read it

Improving Adversarial Robustness via Unlabeled OutofDomain Data
Data augmentation by incorporating cheap unlabeled data from multiple do...
read it

FrugalML: How to Use ML Prediction APIs More Accurately and Cheaply
Prediction APIs offered for a fee are a fastgrowing industry and an imp...
read it

MOPO: Modelbased Offline Policy Optimization
Offline reinforcement learning (RL) refers to the problem of learning po...
read it

Beyond User SelfReported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation
Open Domain dialog system evaluation is one of the most important challe...
read it

Improving Training on Noisy Stuctured Labels
Finegrained annotations—e.g. dense image labels, image segmentation and...
read it

A Distributional Framework for Data Valuation
Shapley value is a classic notion from game theory, historically used to...
read it

Approximate Data Deletion from Machine Learning Models: Algorithms and Evaluations
Deleting data from a trained machine learning (ML) model is a critical t...
read it

Neuron Shapley: Discovering the Responsible Neurons
We develop Neuron Shapley as a new framework to quantify the contributio...
read it

Identifying Invariant Factors Across Multiple Environments with KL Regression
Many datasets are collected from multiple environments (e.g. different l...
read it

Who's responsible? Jointly quantifying the contribution of the learning algorithm and training data
A fancy learning algorithm A outperforms a baseline method B when they a...
read it

Learning transport cost from subset correspondence
Learning to align multiple datasets is an important problem with many ap...
read it

Mixed Dimension Embeddings with Application to MemoryEfficient Recommendation Systems
In many realworld applications, e.g. recommendation systems, certain it...
read it

LitGen: Genetic Literature Recommendation Guided by Human Explanations
As genetic sequencing costs decrease, the lack of clinical interpretatio...
read it

Making AI Forget You: Data Deletion in Machine Learning
Intense recent discussions have focused on how to provide individuals wi...
read it

Gradio: HassleFree Sharing and Testing of ML Models in the Wild
Accessibility is a major challenge of machine learning (ML). Typical ML ...
read it

Discovering Conditionally Salient Features with Statistical Guarantees
The goal of feature selection is to identify important features that are...
read it

A Knowledge Graphbased Approach for Exploring the U.S. Opioid Epidemic
The United States is in the midst of an opioid epidemic with recent esti...
read it

Data Shapley: Equitable Valuation of Data for Machine Learning
As data becomes the fuel driving technological and economic growth, a fu...
read it

Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings
We provide an NLP framework to uncover four linguistic dimensions of pol...
read it

Contrastive Variational Autoencoder Enhances Salient Features
Variational autoencoders are powerful algorithms for identifying dominan...
read it

Adaptive Monte Carlo Multiple Testing via MultiArmed Bandits
Monte Carlo (MC) permutation testing is considered the gold standard for...
read it

Concrete Autoencoders for Differentiable Feature Selection and Reconstruction
We introduce the concrete autoencoder, an endtoend differentiable meth...
read it

Largescale Generative Modeling to Improve Automated Veterinary Disease Coding
Supervised learning is limited both by the quantity and quality of the l...
read it

Minimizing Closek Aggregate Loss Improves Classification
In classification, the de facto method for aggregating individual losses...
read it

Contrastive Multivariate Singular Spectrum Analysis
We introduce Contrastive Multivariate Singular Spectrum Analysis, a nove...
read it

Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization
The ModelX knockoff procedure has recently emerged as a powerful approa...
read it

Autowarp: Learning a Warping Distance from Unlabeled Time Series Using Sequence Autoencoders
Measuring similarities between unlabeled time series trajectories is an ...
read it

Knockoffs for the mass: new feature importance statistics with false discovery guarantees
An important problem in machine learning and statistics is to identify f...
read it

DeepTag: inferring allcause diagnoses from clinical notes in underresourced medical domain
In many underresourced settings, clinicians lack time and expertise to ...
read it

Multiaccuracy: BlackBox PostProcessing for Fairness in Classification
Machine learning predictors are successfully deployed in applications ra...
read it

Feedback GAN (FBGAN) for DNA: a Novel FeedbackLoop Architecture for Optimizing Protein Functions
Generative Adversarial Networks (GANs) represent an attractive and novel...
read it

Stochastic EM for Shuffled Linear Regression
We consider the problem of inference in a linear regression model in whi...
read it

CoVeR: Learning CovariateSpecific Vector Representations with Tensor Decompositions
Word embedding is a useful approach to capture cooccurrence structures ...
read it

Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes
Word embeddings use vectors to represent words such that the geometry be...
read it

NeuralFDR: Learning Discovery Thresholds from Hypothesis Features
As datasets grow richer, an important challenge is to leverage the full ...
read it

Interpretation of Neural Networks is Fragile
In order for machine learning to be deployed and trusted in many applica...
read it

The Effects of Memory Replay in Reinforcement Learning
Experience replay is a key technique behind many recent advances in deep...
read it

Contrastive Principal Component Analysis
We present a new technique called contrastive principal component analys...
read it

Why adaptively collected data have negative bias and how to correct for it
From scientific experiments to online A/B testing, the previously observ...
read it

Estimating the unseen from multiple populations
Given samples from a distribution, how many new elements should we expec...
read it

Beyond Bilingual: Multisense Word Embeddings using Multilingual Context
Word embeddings, which represent a word as a point in a vector space, ha...
read it

Linear Regression with Shuffled Labels
Is it possible to perform linear regression on datasets whose labels are...
read it

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
The blind application of machine learning runs the risk of amplifying bi...
read it

Quantifying and Reducing Stereotypes in Word Embeddings
Machine learning algorithms are optimized to model statistical propertie...
read it

Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation
A good clustering can help a data analyst to explore and understand a da...
read it

Quantifying the accuracy of approximate diffusions and Markov chains
Markov chains and diffusion processes are indispensable tools in machine...
read it

How much does your data exploration overfit? Controlling bias via information usage
Modern data is messy and highdimensional, and it is often not clear a p...
read it

Intersecting Faces: Nonnegative Matrix Factorization With New Guarantees
Nonnegative matrix factorization (NMF) is a natural model of admixture ...
read it