Stochastic Structured Prediction under Bandit Feedback

06/02/2016
by   Artem Sokolov, et al.
0

Stochastic structured prediction under bandit feedback follows a learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. We present applications of this learning scenario to convex and non-convex objectives for structured prediction and analyze them as stochastic first-order methods. We present an experimental evaluation on problems of natural language processing over exponential output spaces, and compare convergence speed across different objectives under the practical criterion of optimal task performance on development data and the optimization-theoretic criterion of minimal squared gradient norm. Best results under both criteria are obtained for a non-convex objective for pairwise preference learning under bandit feedback.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2017

Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

Bandit structured prediction describes a stochastic optimization framewo...
research
06/12/2018

Sparse Stochastic Zeroth-Order Optimization with an Application to Bandit Structured Prediction

Stochastic zeroth-order (SZO), or gradient-free, optimization allows to ...
research
01/18/2016

Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation

We present an approach to structured prediction from bandit feedback, ca...
research
09/12/2017

Setpoint Tracking with Partially Observed Loads

We use online convex optimization (OCO) for setpoint tracking with uncer...
research
07/01/2020

Bandit Linear Control

We consider the problem of controlling a known linear dynamical system u...
research
06/05/2020

Learning Multiclass Classifier Under Noisy Bandit Feedback

This paper addresses the problem of multiclass classification with corru...
research
09/13/2021

Zeroth-order non-convex learning via hierarchical dual averaging

We propose a hierarchical version of dual averaging for zeroth-order onl...

Please sign up or login with your details

Forgot password? Click here to reset