
On the Tunability of Optimizers in Deep Learning
There is no consensus yet on the question whether adaptive gradient meth...
10/25/2019 ∙ by Prabhu Teja Sivaprasad, et al. ∙ 8 ∙ shareread it

A Distributed SecondOrder Algorithm You Can Trust
Due to the rapid growth of data and computational resources, distributed...
06/20/2018 ∙ by Celestine Dünner, et al. ∙ 2 ∙ shareread it

Efficient Use of LimitedMemory Accelerators for Linear Learning on Heterogeneous Systems
We propose a generic algorithmic building block to accelerate training o...
08/17/2017 ∙ by Celestine Dünner, et al. ∙ 0 ∙ shareread it

Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees
Greedy optimization methods such as Matching Pursuit (MP) and FrankWolf...
05/31/2017 ∙ by Francesco Locatello, et al. ∙ 0 ∙ shareread it

Learning Aerial Image Segmentation from Online Maps
This study deals with semantic segmentation of highresolution (aerial) ...
07/21/2017 ∙ by Pascal Kaiser, et al. ∙ 0 ∙ shareread it

A Unified Optimization View on Generalized Matching Pursuit and FrankWolfe
Two of the most fundamental prototypes of greedy optimization are the ma...
02/21/2017 ∙ by Francesco Locatello, et al. ∙ 0 ∙ shareread it

Faster Coordinate Descent via Adaptive Importance Sampling
Coordinate descent methods employ random partial updates of decision var...
03/07/2017 ∙ by Dmytro Perekrestenko, et al. ∙ 0 ∙ shareread it

Screening Rules for Convex Problems
We propose a new framework for deriving screening rules for convex optim...
09/23/2016 ∙ by Anant Raj, et al. ∙ 0 ∙ shareread it

Pursuits in Structured NonConvex Matrix Factorizations
Efficiently representing real world data in a succinct and parsimonious ...
02/12/2016 ∙ by Rajiv Khanna, et al. ∙ 0 ∙ shareread it

On the Global Linear Convergence of FrankWolfe Optimization Variants
The FrankWolfe (FW) optimization algorithm has lately regained popular...
11/18/2015 ∙ by Simon LacosteJulien, et al. ∙ 0 ∙ shareread it

Convex Optimization without Projection Steps
For the general problem of minimizing a convex function over a compact c...
08/04/2011 ∙ by Martin Jaggi, et al. ∙ 0 ∙ shareread it

An Equivalence between the Lasso and Support Vector Machines
We investigate the relation of two fundamental tools in machine learning...
03/05/2013 ∙ by Martin Jaggi, et al. ∙ 0 ∙ shareread it

BlockCoordinate FrankWolfe Optimization for Structural SVMs
We propose a randomized blockcoordinate variant of the classic FrankWo...
07/19/2012 ∙ by Simon LacosteJulien, et al. ∙ 0 ∙ shareread it

A Combinatorial Algorithm to Compute Regularization Paths
For a wide variety of regularization methods, algorithms computing the e...
03/27/2009 ∙ by Bernd Gärtner, et al. ∙ 0 ∙ shareread it

An Exponential Lower Bound on the Complexity of Regularization Paths
For a variety of regularized optimization problems in machine learning, ...
03/27/2009 ∙ by Bernd Gärtner, et al. ∙ 0 ∙ shareread it

Unsupervised Learning of Sentence Embeddings using Compositional nGram Features
The recent tremendous success of unsupervised word embeddings in a multi...
03/07/2017 ∙ by Matteo Pagliardini, et al. ∙ 0 ∙ shareread it

Leveraging Large Amounts of Weakly Supervised Data for MultiLanguage Sentiment Classification
This paper presents a novel approach for multilingual sentiment classif...
03/07/2017 ∙ by Jan Deriu, et al. ∙ 0 ∙ shareread it

Unsupervised robust nonparametric learning of hidden community properties
We consider learning of fundamental properties of communities in large n...
07/11/2017 ∙ by Mikhail A. Langovoy, et al. ∙ 0 ∙ shareread it

EmbedRank: Unsupervised Keyphrase Extraction using Sentence Embeddings
Keyphrase extraction is the task of automatically selecting a small set ...
01/13/2018 ∙ by Kamil BennaniSmires, et al. ∙ 0 ∙ shareread it

Revisiting FirstOrder Convex Optimization Over Linear Spaces
Two popular examples of firstorder optimization methods over linear spa...
03/26/2018 ∙ by Francesco Locatello, et al. ∙ 0 ∙ shareread it

EndtoEnd DNN Training with Block Floating Point Arithmetic
DNNs are ubiquitous datacenter workloads, requiring orders of magnitude ...
04/04/2018 ∙ by Mario Drumond, et al. ∙ 0 ∙ shareread it

Generating Steganographic Text with LSTMs
Motivated by concerns for user privacy, we design a steganographic syste...
05/30/2017 ∙ by Tina Fang, et al. ∙ 0 ∙ shareread it

Global linear convergence of Newton's method without strongconvexity or Lipschitz gradients
We show that Newton's method converges globally at a linear rate for obj...
06/01/2018 ∙ by Sai Praneeth Karimireddy, et al. ∙ 0 ∙ shareread it

Don't Use Large MiniBatches, Use Local SGD
Minibatch stochastic gradient methods are the current state of the art ...
08/22/2018 ∙ by Tao Lin, et al. ∙ 0 ∙ shareread it

Wasserstein is all you need
We propose a unified framework for building unsupervised representations...
08/29/2018 ∙ by Sidak Pal Singh, et al. ∙ 0 ∙ shareread it

COLA: CommunicationEfficient Decentralized Linear Learning
Decentralized machine learning is a promising emerging paradigm in view ...
08/13/2018 ∙ by Lie He, et al. ∙ 0 ∙ shareread it

Sparsified SGD with Memory
Huge scale machine learning problems are nowadays tackled by distributed...
09/20/2018 ∙ by Sebastian U. Stich, et al. ∙ 0 ∙ shareread it

Training DNNs with Hybrid Block Floating Point
The wide adoption of DNNs has given birth to unrelenting computing requi...
04/04/2018 ∙ by Mario Drumond, et al. ∙ 0 ∙ shareread it

Error Feedback Fixes SignSGD and other Gradient Compression Schemes
Signbased algorithms (e.g. signSGD) have been proposed as a biased grad...
01/28/2019 ∙ by Sai Praneeth Karimireddy, et al. ∙ 0 ∙ shareread it

Overcoming MultiModel Forgetting
We identify a phenomenon, which we refer to as multimodel forgetting, t...
02/21/2019 ∙ by Yassine Benyahia, et al. ∙ 0 ∙ shareread it

Evaluating the Search Phase of Neural Architecture Search
Neural Architecture Search (NAS) aims to facilitate the design of deep n...
02/21/2019 ∙ by Christian Sciuto, et al. ∙ 0 ∙ shareread it

Forecasting intracranial hypertension using multiscale waveform metrics
Objective: Intracranial hypertension is an important risk factor of seco...
02/25/2019 ∙ by Matthias Hüser, et al. ∙ 0 ∙ shareread it

Structure TreeLSTM: Structureaware Attentional Document Encoders
We propose a method to create document representations that reflect thei...
02/26/2019 ∙ by Khalil Mrini, et al. ∙ 0 ∙ shareread it

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication
We consider decentralized stochastic optimization with the objective fun...
02/01/2019 ∙ by Anastasia Koloskova, et al. ∙ 0 ∙ shareread it

Efficient Greedy Coordinate Descent for Composite Problems
Coordinate descent with random coordinate selection is the current state...
10/16/2018 ∙ by Sai Praneeth Karimireddy, et al. ∙ 0 ∙ shareread it

Unsupervised Scalable Representation Learning for Multivariate Time Series
Time series constitute a challenging data type for machine learning algo...
01/30/2019 ∙ by JeanYves Franceschi, et al. ∙ 0 ∙ shareread it

SysML: The New Frontier of Machine Learning Systems
Machine learning (ML) techniques are enjoying rapidly increasing adoptio...
03/29/2019 ∙ by Alexander Ratner, et al. ∙ 0 ∙ shareread it

On Linear Learning with Manycore Processors
A new generation of manycore processors is on the rise that offers dozen...
05/02/2019 ∙ by Eliza Wszola, et al. ∙ 0 ∙ shareread it

Crosslingual Document Embedding as ReducedRank Ridge Regression
There has recently been much interest in extending vectorbased word rep...
04/08/2019 ∙ by Martin Josifoski, et al. ∙ 0 ∙ shareread it

Better Word Embeddings by Disentangling Contextual nGram Information
Pretrained word vectors are ubiquitous in Natural Language Processing a...
04/10/2019 ∙ by Prakhar Gupta, et al. ∙ 0 ∙ shareread it

Correlating Twitter Language with CommunityLevel Health Outcomes
We study how language on social media is linked to diseases such as athe...
06/13/2019 ∙ by Arno Schneuwly, et al. ∙ 0 ∙ shareread it

PowerSGD: Practical LowRank Gradient Compression for Distributed Optimization
We study gradient compression methods to alleviate the communication bot...
05/31/2019 ∙ by Thijs Vogels, et al. ∙ 0 ∙ shareread it

Decentralized Deep Learning with Arbitrary Communication Compression
Decentralized training of deep learning models is a key element for enab...
07/22/2019 ∙ by Anastasia Koloskova, et al. ∙ 0 ∙ shareread it

Model Fusion via Optimal Transport
Combining different models is a widely used paradigm in machine learning...
10/12/2019 ∙ by Sidak Pal Singh, et al. ∙ 0 ∙ shareread it

On the Relationship between SelfAttention and Convolutional Layers
Recent trends of incorporating attention mechanisms in vision have led r...
11/08/2019 ∙ by JeanBaptiste Cordonnier, et al. ∙ 0 ∙ shareread it
Martin Jaggi
is this you? claim profile
TenureTrack Assistant Professor at EPFL (École polytechnique fédérale de Lausanne)