DeepAI
Log In Sign Up

A Streaming Approach For Efficient Batched Beam Search

10/05/2020
by   Kevin Yang, et al.
0

We propose an efficient batching strategy for variable-length decoding on GPU architectures. During decoding, when candidates terminate or are pruned according to heuristics, our streaming approach periodically "refills" the batch before proceeding with a selected subset of candidates. We apply our method to variable-width beam search on a state-of-the-art machine translation model. Our method decreases runtime by up to 71 search baseline and 17 baselines' BLEU. Finally, experiments show that our method can speed up decoding in other domains, such as semantic and syntactic parsing.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/11/2017

Later-stage Minimum Bayes-Risk Decoding for Neural Machine Translation

For extended periods of time, sequence generation models rely on beam se...
04/21/2018

A Stable and Effective Learning Strategy for Trainable Greedy Decoding

As a widely used approximate search strategy for neural network decoders...
10/31/2022

Blank Collapse: Compressing CTC emission for the faster decoding

Connectionist Temporal Classification (CTC) model is a very efficient me...
09/22/2021

Conditional Poisson Stochastic Beam Search

Beam search is the default decoding strategy for many sequence generatio...
04/12/2021

Machine Translation Decoding beyond Beam Search

Beam search is the go-to method for decoding auto-regressive machine tra...
10/06/2020

If beam search is the answer, what was the question?

Quite surprisingly, exact maximum a posteriori (MAP) decoding of neural ...
10/18/2022

Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

Decoding methods for large language models often trade-off between diver...