
When LempelZivWelch Meets Machine Learning: A Case Study of Accelerating Machine Learning using Coding
In this paper we study the use of coding techniques to accelerate machine learning (ML). Coding techniques, such as prefix codes, have been extensively studied and used to accelerate lowlevel data processing primitives such as scans in a relational database system. However, there is little work on how to exploit them to accelerate ML algorithms. In fact, applying coding techniques for faster ML faces a unique challenge: one needs to consider both how the codes fit into the optimization algorithm used to train a model, and the interplay between the model structure and the coding scheme. Surprisingly and intriguingly, our study demonstrates that a slight variant of the classical LempelZivWelch (LZW) coding scheme is a good fit for several popular ML algorithms, resulting in substantial runtime savings. Comprehensive experiments on several realworld datasets show that our LZWbased ML algorithms exhibit speedups of up to 31x compared to a popular and stateoftheart ML library, with no changes to ML accuracy, even though the implementations of our LZW variants are not heavily tuned. Thus, our study reveals a new avenue for accelerating ML algorithms using coding techniques and we hope this opens up a new direction for more research.
02/22/2017 ∙ by Fengan Li, et al. ∙ 0 ∙ shareread it

Manifold Assumption and Defenses Against Adversarial Perturbations
In the adversarial perturbation problem of neural networks, an adversary starts with a neural network model F and a point x that F classifies correctly, and identifies another point x', which is nearby x, that F classifies incorrectly. In this paper we consider a defense method that is based on the semantics of F. Our starting point is the common manifold assumption, which states that natural data points lie on separate low dimensional manifolds for different classes. We then make a further postulate which states that (a good model) F is confident on natural points on the manifolds, but has low confidence on points outside of the manifolds, where a natural measure of "confident behavior" is F( x)_∞ (i.e. how confident F is about its prediction). Under this postulate, an adversarial example becomes a point that is outside of the low dimensional manifolds which F has learned, but is still close to at least one manifold under some distance metric. Therefore, defending against adversarial perturbations becomes embedding an adversarial point back to the nearest manifold where natural points are drawn from. We propose algorithms to formalize this intuition and perform a preliminary evaluation. Noting that the effectiveness of our method depends on both how well F satisfies the postulate and how effective we can conduct the embedding, we use a model trained recently by Madry et al., as the base model, and use gradient based optimization, such as the CarliniWagner attack (but now they are used for defense), as the embedding procedure. Our preliminary results are encouraging: The base model wrapped with the embedding procedure achieves almost perfect success rate in defending against attacks that the base model fails on, while retaining the good generalization behavior of the base model.
11/21/2017 ∙ by Xi Wu, et al. ∙ 0 ∙ shareread it

The Manifold Assumption and Defenses Against Adversarial Perturbations
In the adversarialperturbation problem of neural networks, an adversary starts with a neural network model F and a point x that F classifies correctly, and applies a small perturbation to x to produce another point x' that F classifies incorrectly. In this paper, we propose taking into account the inherent confidence information produced by models when studying adversarial perturbations, where a natural measure of "confidence" is F( x)_∞ (i.e. how confident F is about its prediction?). Motivated by a thought experiment based on the manifold assumption, we propose a "goodness property" of models which states that confident regions of a good model should be well separated. We give formalizations of this property and examine existing robust training objectives in view of them. Interestingly, we find that a recent objective by Madry et al. encourages training a model that satisfies well our formal version of the goodness property, but has a weak control of points that are wrong but with low confidence. However, if Madry et al.'s model is indeed a good solution to their objective, then good and bad points are now distinguishable and we can try to embed uncertain points back to the closest confident region to get (hopefully) correct predictions. We thus propose embedding objectives and algorithms, and perform an empirical study using this method. Our experimental results are encouraging: Madry et al.'s model wrapped with our embedding procedure achieves almost perfect success rate in defending against attacks that the base model fails on, while retaining good generalization behavior.
11/21/2017 ∙ by Xi Wu, et al. ∙ 0 ∙ shareread it

DRACO: Robust Distributed Training via Redundant Gradients
Distributed model training is vulnerable to worstcase system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To tolerate node failures and adversarial attacks, recent work suggests using variants of the geometric median to aggregate distributed updates at the PS, in place of bulk averaging. Although medianbased update rules are robust to adversarial nodes, their computational cost can be prohibitive in largescale settings and their convergence guarantees often require relatively strong assumptions. In this work, we present DRACO, a scalable framework for robust distributed training that uses ideas from coding theory. In DRACO, each compute node evaluates redundant gradients that are then used by the parameter server to eliminate the effects of adversarial updates. We present problemindependent robustness guarantees for DRACO and show that the model it produces is identical to the one trained in the adversaryfree setup. We provide extensive experiments on real datasets and distributed setups across a variety of largescale models, where we show that DRACO is several times to orders of magnitude faster than medianbased approaches.
03/27/2018 ∙ by Lingjiao Chen, et al. ∙ 0 ∙ shareread it

The Effect of Network Width on the Performance of Largebatch Training
Distributed implementations of minibatch stochastic gradient descent (SGD) suffer from communication overheads, attributed to the high frequency of gradient updates inherent in smallbatch training. Training with large batches can reduce these overheads; however, large batches can affect the convergence properties and generalization performance of SGD. In this work, we take a first step towards analyzing how the structure (width and depth) of a neural network affects the performance of largebatch training. We present new theoretical results which suggest thatfor a fixed number of parameterswider networks are more amenable to fast largebatch training compared to deeper ones. We provide extensive experiments on residual and fullyconnected neural networks which suggest that wider networks can be trained using larger batches without incurring a convergence slowdown, unlike their deeper variants.
06/11/2018 ∙ by Lingjiao Chen, et al. ∙ 0 ∙ shareread it

Modelbased Pricing for Machine Learning in a Data Marketplace
Data analytics using machine learning (ML) has become ubiquitous in science, business intelligence, journalism and many other domains. While a lot of work focuses on reducing the training cost, inference runtime and storage cost of ML models, little work studies how to reduce the cost of data acquisition, which potentially leads to a loss of sellers' revenue and buyers' affordability and efficiency. In this paper, we propose a modelbased pricing (MBP) framework, which instead of pricing the data, directly prices ML model instances. We first formally describe the desired properties of the MBP framework, with a focus on avoiding arbitrage. Next, we show a concrete realization of the MBP framework via a noise injection approach, which provably satisfies the desired formal properties. Based on the proposed framework, we then provide algorithmic solutions on how the seller can assign prices to models under different market scenarios (such as to maximize revenue). Finally, we conduct extensive experiments, which validate that the MBP framework can provide high revenue to the seller, high affordability to the buyer, and also operate on low runtime cost.
05/26/2018 ∙ by Lingjiao Chen, et al. ∙ 0 ∙ shareread it
Lingjiao Chen
is this you? claim profile