Active Imitation Learning with Noisy Guidance

05/26/2020 ∙ by Kianté Brantley, et al. ∙ University of Maryland 9

Imitation learning algorithms provide state-of-the-art results on many structured prediction tasks by learning near-optimal search policies. Such algorithms assume training-time access to an expert that can provide the optimal action at any queried state; unfortunately, the number of such queries is often prohibitive, frequently rendering these approaches impractical. To combat this query complexity, we consider an active learning setting in which the learning algorithm has additional access to a much cheaper noisy heuristic that provides noisy guidance. Our algorithm, LEAQI, learns a difference classifier that predicts when the expert is likely to disagree with the heuristic, and queries the expert only when necessary. We apply LEAQI to three sequence labeling tasks, demonstrating significantly fewer queries to the expert and comparable (or better) accuracies over a passive approach.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


We thank Rob Schapire, Chicheng Zhang, and the anonymous ACL reviewers for very helpful comments and insights. This material is based upon work supported by the National Science Foundation under Grant No. 1618193 and an ACM SIGHPC/Intel Computational and Data Science Fellowship to KB. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation nor of the ACM.


  • Atlas et al. (1990) Les E Atlas, David A Cohn, and Richard E Ladner. 1990. Training connectionist networks with queries and selective sampling. In NeurIPS.
  • Augenstein et al. (2017) Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, and Andrew McCallum. 2017. Semeval 2017 task 10: Scienceie - extracting keyphrases and relations from scientific publications. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017).
  • Balcan et al. (2006) Nina Balcan, Alina Beygelzimer, and John Langford. 2006. Agnostic active learning. In ICML.
  • Beltagy et al. (2019) Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. Scibert: Pretrained language model for scientific text. In EMNLP.
  • Bengio et al. (2015) Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015.

    Scheduled sampling for sequence prediction with recurrent neural networks.

    In NeurIPS.
  • Beygelzimer et al. (2009) Alina Beygelzimer, Sanjoy Dasgupta, , and John Langford. 2009. Importance weighted active learning. In ICML.
  • Beygelzimer et al. (2010) Alina Beygelzimer, Daniel Hsu, John Langford, and Tong Zhang. 2010. Agnostic active learning without constraints. In NeurIPS.
  • Bloodgood and Callison-Burch (2010) Michael Bloodgood and Chris Callison-Burch. 2010. Bucking the trend: Large-scale cost-focused active learning for statistical machine translation. In ACL.
  • Cesa-Bianchi et al. (2006) Nicolò Cesa-Bianchi, Claudio Gentile, and Luca Zaniboni. 2006. Worst-case analysis ofselective sampling for linear classification. JMLR.
  • Collins and Roark (2004) Michael Collins and Brian Roark. 2004.

    Incremental parsing with the perceptron algorithm.

    In ACL.
  • Culotta and McCallum (2005) Aron Culotta and Andrew McCallum. 2005. Reducing labeling effort for structured prediction tasks. In AAAI.
  • Daumé et al. (2009) Hal Daumé, III, John Langford, and Daniel Marcu. 2009. Search-based structured prediction. Machine Learning Journal.
  • Devlin et al. (2019) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL.
  • Florescu and Caragea (2017) Corina Florescu and Cornelia Caragea. 2017. PositionRank: An unsupervised approach to keyphrase extraction from scholarly documents. In ACL.
  • Hachey et al. (2005) Ben Hachey, Beatrice Alex, and Markus Becker. 2005. Investigating the effects of selective sampling on the annotation task. In CoNLL.
  • Haertel et al. (2008) Robbie Haertel, Eric K. Ringger, Kevin D. Seppi, James L. Carroll, and Peter McClanahan. 2008. Assessing the costs of sampling methods in active learning for annotation. In ACL.
  • Haghighi and Klein (2006) Aria Haghighi and Dan Klein. 2006. Prototype-driven learning for sequence models.
  • Helmbold et al. (2000) David P. Helmbold, Nicholas Littlestone, and Philip M. Long. 2000. Apple tasting. Information and Computation.
  • Judah et al. (2012) Kshitij Judah, Alan Paul Fern, and Thomas Glenn Dietterich. 2012. Active imitation learning via reduction to iid active learning. In AAAI.
  • Khashabi et al. (2018) Daniel Khashabi, Mark Sammons, Ben Zhou, Tom Redman, Christos Christodoulopoulos, Vivek Srikumar, Nicholas Rizzolo, Lev Ratinov, Guanheng Luo, Quang Do, Chen-Tse Tsai, Subhro Roy, Stephen Mayhew, Zhili Feng, John Wieting, Xiaodong Yu, Yangqiu Song, Shashank Gupta, Shyam Upadhyay, Naveen Arivazhagan, Qiang Ning, Shaoshi Ling, and Dan Roth. 2018. CogCompNLP: Your swiss army knife for NLP. In LREC.
  • Leblond et al. (2018) Rémi Leblond, Jean-Baptiste Alayrac, Anton Osokin, and Simon Lacoste-Julien. 2018. SEARNN: Training RNNs with global-local losses. In ICLR.
  • Lee et al. (2011) Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. 2011. Stanford’s multi-pass sieve coreference resolution system at the conll-2011 shared task. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task.
  • Littlestone and Warmuth (1989) N. Littlestone and M. K. Warmuth. 1989. The weighted majority algorithm. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science.
  • Nivre (2018) Joakim et. al Nivre. 2018. Universal dependencies v2.5. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics, Charles University.
  • Ratnaparkhi (1996) Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In EMNLP.
  • Rendell (1986) Larry Rendell. 1986. A general framework for induction and a study of selective induction. Machine Learning Journal.
  • Riloff and Wiebe (2003) Ellen Riloff and Janyce Wiebe. 2003. Learning extraction patterns for subjective expressions. In EMNLP.
  • Ringger et al. (2007) Eric Ringger, Peter McClanahan, Robbie Haertel, George Busby, Marc Carmen, James Carroll, Kevin Seppi, and Deryle Lonsdale. 2007. Active learning for part-of-speech tagging: Accelerating corpus annotation. In Proceedings of the Linguistic Annotation Workshop.
  • Ross et al. (2011) Stéphane Ross, Geoff J. Gordon, and J. Andrew Bagnell. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In AI-Stats.
  • Sculley (2007) David Sculley. 2007. Practical learning from one-sided feedback. In KDD.
  • Singh (2017) Vikash Singh. 2017. Replace or retrieve keywords in documents at scale. CoRR, abs/1711.00046.
  • Smit et al. (2014) Peter Smit, Sami Virpioja, Stig-Arne Grönroos, and Mikko Kurimo. 2014. Morfessor 2.0: Toolkit for statistical morphological segmentation. In EACL.
  • Thompson et al. (1999) Cynthia A. Thompson, Mary Elaine Califf, and Raymond J. Mooney. 1999. Active learning for natural language parsing and information extraction. In ICML.
  • Tjong Kim Sang and De Meulder (2003) Erik F. Tjong Kim Sang and Fien De Meulder. 2003.

    Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition.

  • Vapnik (1982) Vladimir Vapnik. 1982. Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics). Springer-Verlag, Berlin, Heidelberg.
  • Whitehead (1991) Steven Whitehead. 1991.

    A study of cooperative mechanisms for faster reinforcement learning.

    Technical report, University of Rochester.
  • Zesch et al. (2008) Torsten Zesch, Christof Müller, and Iryna Gurevych. 2008. Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In LREC.
  • Zhang and Chaudhuri (2015) Chicheng Zhang and Kamalika Chaudhuri. 2015. Active learning from weak and strong labelers. In NeurIPS.