
A HigherOrder KolmogorovSmirnov Test
We present an extension of the KolmogorovSmirnov (KS) twosample test, which can be more sensitive to differences in the tails. Our test statistic is an integral probability metric (IPM) defined over a higherorder total variation ball, recovering the original KS test as its simplest case. We give an exact representer result for our IPM, which generalizes the fact that the original KS test statistic can be expressed in equivalent variational and CDF forms. For small enough orders (k ≤ 5), we develop a lineartime algorithm for computing our higherorder KS test statistic; for all others (k ≥ 6), we give a nearly lineartime approximation. We derive the asymptotic null distribution for our test, and show that our nearly lineartime approximation shares the same asymptotic null. Lastly, we complement our theory with numerical studies.
03/24/2019 ∙ by Veeranjaneyulu Sadhanala, et al. ∙ 16 ∙ shareread it

Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising
The Gaussian mechanism is an essential building block used in multitude of differentially private data analysis algorithms. In this paper we revisit the Gaussian mechanism and show that the original analysis has several important limitations. Our analysis reveals that the variance formula for the original mechanism is far from tight in the high privacy regime (ε→ 0) and it cannot be extended to the low privacy regime (ε→∞). We address this limitations by developing an analytic Gaussian mechanism whose variance is calibrated directly using the Gaussian cumulative density function instead of a tail bound approximation. We also propose to equip the Gaussian mechanism with a postprocessing step based on adaptive denoising estimators by leveraging that the variance of the perturbation is known. Our experiments show that analytical calibration removes at least a third of the variance of the noise compared to the classical Gaussian mechanism, and that denoising dramatically improves the accuracy of the Gaussian mechanism in the highdimensional regime.
05/16/2018 ∙ by Borja Balle, et al. ∙ 12 ∙ shareread it

ImitationRegularized Offline Learning
We study the problem of offline learning in automated decision systems under the contextual bandits model. We are given logged historical data consisting of contexts, (randomized) actions, and (nonnegative) rewards. A common goal is to evaluate what would happen if different actions were taken in the same contexts, so as to optimize the action policies accordingly. The typical approach to this problem, inverse probability weighted estimation (IPWE) [Bottou et al., 2013], requires logged action probabilities, which may be missing in practice due to engineering complications. Even when available, small action probabilities cause large uncertainty in IPWE, rendering the corresponding results insignificant. To solve both problems, we show how one can use policy improvement (PIL) objectives, regularized by policy imitation (IML). We motivate and analyze PIL as an extension to ClippedIPWE, by showing that both are lowerbound surrogates to the vanilla IPWE. We also formally connect IML to IPWE variance estimation [Swaminathan and Joachims 2015] and natural policy gradients. Without probability logging, our PILIML interpretations justify and improve, by rewardweighting, the stateofart crossentropy (CE) loss that predicts the action items among all action candidates available in the same contexts. With probability logging, our main theoretical contribution connects IMLunderfitting to the existence of either confounding variables or model misspecification. We show the value and accuracy of our insights by simulations based on Simpson's paradox, standard UCI multiclasstobandit conversions and on the Criteo counterfactual analysis challenge dataset.
01/15/2019 ∙ by Yifei Ma, et al. ∙ 12 ∙ shareread it

Online Forecasting of TotalVariationbounded Sequences
We consider the problem of online forecasting of sequences of length n with totalvariation at most C_n using observations contaminated by independent σsubgaussian noise. We design an O(n n)time algorithm that achieves a cumulative square error of Õ(n^1/3C_n^2/3σ^4/3) with high probability. The result is rateoptimal as it matches the known minimax rate for the offline nonparametric estimation of the same class [Mammen and van de Geer, 1997]. To the best of our knowledge, this is the first polynomialtime algorithm that optimally forecasts total variation bounded sequences. Our proof techniques leverage the special localized structure of Haar wavelet basis and adaptivity to unknown smoothness parameter in the classical wavelet smoothing [Donoho et al., 1998]. We also compare our model to the rich literature of dynamic regret minimization and nonstationary stochastic optimization, where our problem can be treated as a special case. We show that the workhorse in those settings  online gradient descent and its variants with a fixed restarting schedule  are instances of a class of linear forecasters that require a suboptimal regret of Ω̃(√(n)). This implies that the use of more adaptive algorithms are necessary to obtain the optimal rate.
06/08/2019 ∙ by Dheeraj Baby, et al. ∙ 5 ∙ shareread it

Provably Efficient QLearning with Low Switching Cost
We take initial steps in studying PACMDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization. This is motivated by the difficulty of running fully adaptive algorithms in realworld applications (such as medical domains), and we propose to quantify adaptivity using the notion of local switching cost. Our main contribution, QLearning with UCB2 exploration, is a modelfree algorithm for Hstep episodic MDP that achieves sublinear regret whose local switching cost in K episodes is O(H^3SA K), and we provide a lower bound of Ω(HSA) on the local switching cost for any noregret algorithm. Our algorithm can be naturally adapted to the concurrent setting, which yields nontrivial results that improve upon prior work in certain aspects.
05/30/2019 ∙ by Yu Bai, et al. ∙ 3 ∙ shareread it

Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting
Time series forecasting is an important problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. In this paper, we propose to tackle such forecasting problem with Transformer. Although impressed by its performance in our preliminary study, we found its two major weaknesses: (1) localityagnostics: the pointwise dotproduct self attention in canonical Transformer architecture is insensitive to local context, which can make the model prone to anomalies in time series; (2) memory bottleneck: space complexity of canonical Transformer grows quadratically with sequence length L, making modeling long time series infeasible. In order to solve these two issues, we first propose convolutional self attention by producing queries and keys with causal convolution so that local context can be better incorporated into attention mechanism. Then, we propose LogSparse Transformer with only O(L( L)^2) memory cost, improving the time series forecasting in finer granularity under constrained memory budget. Our experiments on both synthetic data and realworld datasets show that it compares favorably to the stateoftheart.
06/29/2019 ∙ by Shiyang Li, et al. ∙ 3 ∙ shareread it

Subsampled Rényi Differential Privacy and Analytical Moments Accountant
We study the problem of subsampling in differential privacy (DP), a question that is the centerpiece behind many successful differentially private machine learning algorithms. Specifically, we provide a tight upper bound on the Rényi Differential Privacy (RDP) (Mironov, 2017) parameters for algorithms that: (1) subsample the dataset, and then (2) apply a randomized mechanism M to the subsample, in terms of the RDP parameters of M and the subsampling probability parameter. This result generalizes the classic subsamplingbased "privacy amplification" property of (ϵ,δ)differential privacy that applies to only one fixed pair of (ϵ,δ), to a stronger version that exploits properties of each specific randomized algorithm and satisfies an entire family of (ϵ(δ),δ)differential privacy for all δ∈ [0,1]. Our experiments confirm the advantage of using our techniques over keeping track of (ϵ,δ) directly, especially in the setting where we need to compose many rounds of data access.
07/31/2018 ∙ by YuXiang Wang, et al. ∙ 2 ∙ shareread it

ProxQuant: Quantized Neural Networks via Proximal Operators
To make deep neural networks feasible in resourceconstrained environments (such as mobile devices), it is beneficial to quantize models by using lowprecision weights. One common technique for quantizing neural networks is the straightthrough gradient method, which enables backpropagation through the quantization mapping. Despite its empirical success, little is understood about why the straightthrough gradient method works. Building upon a novel observation that the straightthrough gradient method is in fact identical to the wellknown Nesterov's dualaveraging algorithm on a quantization constrained optimization problem, we propose a more principled alternative approach, called ProxQuant, that formulates quantized network training as a regularized learning problem instead and optimizes it via the proxgradient method. ProxQuant does backpropagation on the underlying fullprecision vector and applies an efficient proxoperator in between stochastic gradient steps to encourage quantizedness. For quantizing ResNets and LSTMs, ProxQuant outperforms stateoftheart results on binary quantization and is on par with stateoftheart on multibit quantization. For binary quantization, our analysis shows both theoretically and experimentally that ProxQuant is more stable than the straightthrough gradient method (i.e. BinaryConnect), challenging the indispensability of the straightthrough gradient method and providing a powerful alternative.
10/01/2018 ∙ by Yu Bai, et al. ∙ 2 ∙ shareread it

Nonstationary Stochastic Optimization with Local Spatial and Temporal Changes
We consider a nonstationary sequential stochastic optimization problem, in which the underlying cost functions change over time under a variation budget constraint. We propose an L_p,qvariation functional to quantify the change, which captures local spatial and temporal variations of the sequence of functions. Under the L_p,qvariation functional constraint, we derive both upper and matching lower regret bounds for smooth and strongly convex function sequences, which generalize previous results in (Besbes et al., 2015). Our results reveal some surprising phenomena under this general variation functional, such as the curse of dimensionality of the function domain. The key technical novelties in our analysis include an affinity lemma that characterizes the distance of the minimizers of two convex functions with bounded L_p difference, and a cubic spline based construction that attains matching lower bounds.
08/09/2017 ∙ by Xi Chen, et al. ∙ 0 ∙ shareread it

Perinstance Differential Privacy and the Adaptivity of Posterior Sampling in Linear and Ridge regression
Differential privacy (DP), ever since its advent, has been a controversial object. On the one hand, it provides strong provable protection of individuals in a data set, on the other hand, it has been heavily criticized for being not practical, partially due to its complete independence to the actual data set it tries to protect. In this paper, we address this issue by a new and more finegrained notion of differential privacy  per instance differential privacy (pDP), which captures the privacy of a specific individual with respect to a fixed data set. We show that this is a strict generalization of the standard DP and inherits all its desirable properties, e.g., composition, invariance to side information and closedness to postprocessing, except that they all hold for every instance separately. When the data is drawn from a distribution, we show that perinstance DP implies generalization. Moreover, we provide explicit calculations of the perinstance DP for the output perturbation on a class of smooth learning problems. The result reveals an interesting and intuitive fact that an individual has stronger privacy if he/she has small "leverage score" with respect to the data set and if he/she can be predicted more accurately using the leaveoneout data set. Using the developed techniques, we provide a novel analysis of the OnePosteriorSample (OPS) estimator and show that when the data set is wellconditioned it provides (ϵ,δ)pDP for any target individuals and matches the exact lower bound up to a 1+Õ(n^1ϵ^2) multiplicative factor. We also propose AdaOPS which uses adaptive regularization to achieve the same results with (ϵ,δ)DP. Simulation shows several ordersofmagnitude more favorable privacy and utility tradeoff when we consider the privacy of only the users in the data set.
07/24/2017 ∙ by YuXiang Wang, et al. ∙ 0 ∙ shareread it

Understanding the 2016 US Presidential Election using ecological inference and distribution regression with census microdata
We combine finegrained spatially referenced census data with the vote outcomes from the 2016 US presidential election. Using this dataset, we perform ecological inference using distribution regression (Flaxman et al, KDD 2015) with a multinomiallogit regression so as to model the vote outcome Trump, Clinton, Other / Didn't vote as a function of demographic and socioeconomic features. Ecological inference allows us to estimate "exit poll" style results like what was Trump's support among white women, but for entirely novel categories. We also perform exploratory data analysis to understand which census variables are predictive of voting for Trump, voting for Clinton, or not voting for either. All of our methods are implemented in python and R and are available online for replication.
11/11/2016 ∙ by Seth Flaxman, et al. ∙ 0 ∙ shareread it
YuXiang Wang
is this you? claim profile
Scientist, Amazon AI, AWS, Applied Scientist, Amazon AI, Machine Learning Department in Carnegie Mellon University, PhD Student at Carnegie Mellon University.