A Streaming Approach For Efficient Batched Beam Search

10/05/2020
by   Kevin Yang, et al.
0

We propose an efficient batching strategy for variable-length decoding on GPU architectures. During decoding, when candidates terminate or are pruned according to heuristics, our streaming approach periodically "refills" the batch before proceeding with a selected subset of candidates. We apply our method to variable-width beam search on a state-of-the-art machine translation model. Our method decreases runtime by up to 71 search baseline and 17 baselines' BLEU. Finally, experiments show that our method can speed up decoding in other domains, such as semantic and syntactic parsing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2022

A Simple Baseline for Beam Search Reranking

Reranking methods in machine translation aim to close the gap between co...
research
04/11/2017

Later-stage Minimum Bayes-Risk Decoding for Neural Machine Translation

For extended periods of time, sequence generation models rely on beam se...
research
04/21/2018

A Stable and Effective Learning Strategy for Trainable Greedy Decoding

As a widely used approximate search strategy for neural network decoders...
research
09/22/2021

Conditional Poisson Stochastic Beam Search

Beam search is the default decoding strategy for many sequence generatio...
research
04/12/2021

Machine Translation Decoding beyond Beam Search

Beam search is the go-to method for decoding auto-regressive machine tra...
research
05/02/2022

The Implicit Length Bias of Label Smoothing on Beam Search Decoding

Label smoothing is ubiquitously applied in Neural Machine Translation (N...
research
09/09/2019

A Quantum Search Decoder for Natural Language Processing

Probabilistic language models, e.g. those based on an LSTM, often face t...

Please sign up or login with your details

Forgot password? Click here to reset