Log In Sign Up

Controllable Length Control Neural Encoder-Decoder via Reinforcement Learning

by   Junyi Bian, et al.

Controlling output length in neural language generation is valuable in many scenarios, especially for the tasks that have length constraints. A model with stronger length control capacity can produce sentences with more specific length, however, it usually sacrifices semantic accuracy of the generated sentences. Here, we denote a concept of Controllable Length Control (CLC) for the trade-off between length control capacity and semantic accuracy of the language generation model. More specifically, CLC is to alter length control capacity of the model so as to generate sentence with corresponding quality. This is meaningful in real applications when length control capacity and outputs quality are requested with different priorities, or to overcome unstability of length control during model training. In this paper, we propose two reinforcement learning (RL) methods to adjust the trade-off between length control capacity and semantic accuracy of length control models. Results show that our RL methods improve scores across a wide range of target lengths and achieve the goal of CLC. Additionally, two models LenMC and LenLInit modified on previous length-control models are proposed to obtain better performance in summarization task while still maintain the ability to control length.


page 1

page 2

page 3

page 4


Positional Encoding to Control Output Sequence Length

Neural encoder-decoder models have been successful in natural language g...

Reinforced Abstractive Summarization with Adaptive Length Controlling

Document summarization, as a fundamental task in natural language genera...

A Large-Scale Multi-Length Headline Corpus for Improving Length-Constrained Headline Generation Model Evaluation

Browsing news articles on multiple devices is now possible. The lengths ...

Length-controllable Abstractive Summarization by Guiding with Summary Prototype

We propose a new length-controllable abstractive summarization model. Re...

Sentence Length

The distribution of sentence length in ordinary language is not well cap...

LenAtten: An Effective Length Controlling Unit For Text Summarization

Fixed length summarization aims at generating summaries with a preset nu...

Generating Diverse Story Continuations with Controllable Semantics

We propose a simple and effective modeling framework for controlled gene...


Neural encoder-decoder was firstly adopted for machine translation [Sutskever, Vinyals, and Le2014], and fastly diffused to other domains like image caption [Vinyals et al.2015]

and text summarization

[Rush, Chopra, and Weston2015]. In this paper, we focus on text summarization which aims to generate condensed summaries while retains overall points of source articles. Previous advanced work [Rush, Chopra, and Weston2015, Nallapati et al.2016] make remarkable progress and sequence to sequence (seq2seq) framework has become the mainstream in summarization task. An issue in original neural encoder-decoder is that it cannot generate the sequence with specified length, i.e., lack of length control (LC) capacity. Sentences with constrained length are required in many scenarios. For example, the headlines and news usually have length limit, or articles and messages in different devices have different length demands. Generate the sentences with various lengths also improve the diversity of outputs. However, the study of length control is scarce, and most research of neural encoder-decoder aim to improve the evaluation score.

To control the output length, kikuchi_2016 (kikuchi_2016) first proposed two learning-based models for neural encoder-decoder named LenInit and LenEmb. We observe that when two models have same or similar structures, the evaluation score of one model with more precise length control is usually lower than another with weaker length control. In other words, worse LC capacity results in better output quality. For instance, LenEmb can generate the sequence with more accurate length but evaluation scores are lower than LenInit. In most situations when sentence length is in an adequate range, i.e. the length constraint is satisfied, people prefer to focus on semantic accuracy of the produced sentence, at this case, LenInit seems to be a more appropriate choice than LenEmb. Therefore, it makes sense to research the control of trade-off between LC capacity and sentence quality, which we called controllable length control (CLC).

To track this trade-off, we set our sight into using Reinforcement Learning (RL) [Sutton and Barto2018]. Commonly, RL in neural language generation is used to overcome two issues: the exposure bias [Ranzato et al.2015]

and inconsistency between training objective and evaluation metrics. Recently, great efforts have been devoted to solve the above two problems

[Ranzato et al.2015, Rennie et al.2017, Paulus, Xiong, and Socher2018] In addition, RL can actually bring two benefits in allusion to the LC neural language generation. Firstly, most datasets provide only one reference summary in each sentence pair, so we can only learn fixed-length summary for each source document under maximum likelihood (ML) training. But for RL, we could appoint various lengths as input to sample sentences for training, consequently, promote the model to become more robust to generate sentences given different desired length. Secondly, the length information could be easily incorporated into reward design in RL to induce the model to have different LC capacity, in this way, CLC could be achieved.

Normally, RL for sequence generation is operated on ML-trained models, however, we find that directly applying RL algorithm on pre-trained models will dramatically degrade LC capacity. In this paper, we design two RL methods for LC neural text generation: MTS-RL, and SCD-RL. By adjusting the rewards in RL according to outputs score and length, our MTS-RL and SCD-RL can improve the summarization performance as well as control the LC capacity. Furthermore, we can make some modifications on previous models to improve the score by leveraging the trade-off. An intuitive approach is that we could add a “regulator” between length input and decoder to suppress or enhance the transmission of the length information. Under the guidance of this idea, two models named

LenLInit and LenMC are proposed. These two LC models significantly improve the evaluation score at low cost of its ability to control the length in both ML and RL. The major contributions of our paper are four-fold:

  • To the best of our knowledge, this is the first work applying reinforcement learning on length-control neural abstractive summarization, and we present the concept of CLC.

  • Two RL methods are developed to successfully control the LC capacity, and improve the scores significantly. Meanwhile, we find that RL for LC text generation alleviate the limitation of inadequacy and unbalance of Ground-Truth reference in different lengths.

  • Two models named LenLInit and LenMC are proposed based on previous neural LC models [Kikuchi et al.2016].

  • Extensive experiments are conducted to verify that proposed models with devised RL algorithms cover a wide range of LC ability and smoothly achieve CLC on Gigaword summarization Dataset.

Related Work

Abstractive Text Summarization

There are increasing heuristic work based on the encoder-decoder framework

[Rush, Chopra, and Weston2015, Nallapati et al.2016]

. DRGD designed by li_2017 (li_2017) is a seq2seq oriented model equipped with deep recurrent generative decoder. point_2017 (point_2017) proposed a hybrid pointer-generator network that uses pointer to copy words from articles while produce the words by generator. cao_2018 (cao_2018) used OpenIE and dependency parser to extract fact descriptions from the source text, then adopted a dual attention model to force the faithfulness of outputs. yang_2019 (yang_2019) explored a human-like reading strategy for abstract summarization and leveraged it by training model with multi-task learning system.

Length Control neural Encoder-Decoder

kikuchi_2016 (kikuchi_2016) first proposed two learning-based neural encoder-decoder models to control sequence length named LenInit and LenEmb. LenEmb mixes the inputs of decoder with remaining length embedded into each time step, while LenInit

initializes the memory cell state of LSTM decoder with whole length information. Before that, sentence length is controlled by ignoring “EOS” at certain time or truncating output sentence. fan_2017 (fan_2017) treated the length of ground truth summaries in different ranges as independent properties and identify it as a discrete mark in an embedding unit. lccnn_2018 (lccnn_2018) presented a convolutional neural network (CNN) encoder-decoder, the inputs and length information are proceeded by CNN before entering the decoder unit. Generally, length control model in neural encoder-decoder can be divided into two types:

Whole Length Infusing (WLI) model and Remaining Length Infusing (RLI) model. WLI model is to inform the decoder with entire length of target sentence and RLI model is to tell the remaining length of the sentence in each time step. LenInit [Kikuchi et al.2016], Fan [Fan, Grangier, and Auli2018] and LCCNN [Liu, Luo, and Zhu2018] all belong to WLI models, while LenEmb [Kikuchi et al.2016] is a typical RLI model. Ordinarily, RLI models have better length control capacity but lead to poor sentence quality compare with WLI models. We follow kikuchi_2016 (kikuchi_2016) to define the length of a sentence in character level, which is more challenge than lccnn_2018 (lccnn_2018) in word level.

Reinforcement learning in NLG

There are several successful attempts to integrate encoder-decoder and RL for neural language generation. ranzato_2015 (ranzato_2015) applied RL algorithm to directly optimize the non-differential evaluation metric, which highly raise score. scst_2017 (scst_2017) modified RL algorithm by replacing the critic model with inference results to produce rewards, this simple modification makes significant improvements in image caption task. seqgan_2017 (seqgan_2017) rewarded the Monte-Carlo sampled sentences with adversarial trained discriminator. deep_2017 (deep_2017) employed intra-temporal attention, and combined supervised word prediction with RL to generate more readable summaries. gan_2018 (gan_2018) designed an adversarial process for abstractive text summarization. fast_2018 (fast_2018) firstly selected the salient sentences and rewrote the summary, in which non-differential computation is connected via policy gradient. However, above mentioned work did not involve and explore length control in RL.


Problem Definition

The dataset for text summarization contains pairs of input source sequence and corresponding ground truth summary , where and is the length of the input article and reference, respectively. The target of summarization is trying to seek a transform from to using a parameterized policy

, this can be formalized to maximize the conditional probability in Eq.(

1), where .


Encoder-Decoder Attention Model

Encoder-decoder with attention mechanism [Bahdanau, Cho, and Bengio2014]

is selected as the basic framework in this work. RNN encoder sequentially takes each word embedding of input sentence. Then the final hidden state of the encoder which contains whole information of source sentence is fed into decoder as the initial state. We select bi-directional Long Short-Term Memory

[Hochreiter and Schmidhuber1997] as the encoder to read the source sequence. Here we denote as the hidden state of the BiLSTM encoder in forward direction at time step and for backward direction. and are the memory cell states of the BiLSTM encoder:


Outputs of the encoder at time are concatenated as

, depicting the vector for attention. where

is denoted as concatenation.

Decoder unrolls the output summary from initial hidden state by predicting one word each time. Neglecting length control, initial state of decoder is set as and , and the hidden state is calculated by:


Context vector is used to measure which parts of the source words that decoder pays attention to at time :


Then we can concatenate with hidden state to predict the next word:


Length Control Models

Figure 1: Decoder structure of four neural length-control models. Four colors in above area represent the different modification of original LSTM in four models, respectively. Below are details of the corresponding model structures.

To control the length of output, we need to put the desired length information into the decoder, hence, the training objective in supervised ML with “teacher forcing”[Williams and Zipser1989] becomes:


Here, is denoted as length information the decoder perceives at time

. As is introduced before, LC models are classified into two groups. For the RLI model, remaining length is updated in each time step by

, while is set to . In WLI model, decoder only aware of the whole length of the sentence, so we set all as .

In this section, We will introduce four models: LenInit, LenEmb, LenLInit and LenMC. The first two models are proposed by [Kikuchi et al.2016]. We make modification on them and propose the remaining two.


This WLI model uses memory cell to control the output length by rewriting the initial state as:


is regarded as the entire desired length of the output sentence, and is a learnable vector.


This model can be viewed as a variant of LenInit

. In order to produce higher scores by leveraging the LC capacity, we simply add a linear transformation

of length information, the model is thus named Length Linear Initialization (LenLInit). Unlike the LenInit, is replaced by , a gaussian sampled non-trainable vector, and the initial memory cell state of decoder is:



For this RLI model, embedding matrix transforms into a vector , where is the possible length types, then will be concatenated with the word embedding vector as additional input for LSTM decoder:



Other than LenEmb that length information is concatenated as additional input, we infuse into memory cell at each time step in the same way as LenLInit, and name this RLI model as LenMC.

Figure 2: Summary length distribution in parts of Gigaword

Length Control Reinforcement Learning

Models trained by maximum likelihood estimation with “teacher forcing” suffer from the problem of “exposure bias”

[Ranzato et al.2015]. Moreover, the training process is to minimize the cross-entropy loss, while in test time, results are evaluated with language metrics. One way to redeem these conflicts is to learn a policy that directly maximizes the evaluation of metric instead of maximum-likelihood loss where RL could be naturally considered.

From the perspective of RL for sequence generation, our LC models can be viewed as an agent, parameters of the network form a policy , and making prediction at each step can be treated as action. After generating a complete sentence, agent receives a reward computed by evaluation metrics. During training process, decoder can produce two types of output: with greedy search, and in which

is sampled from the probability distribution

at time . We assign a random number within an appropriate range as the target summary length for each article and feed it into LC model to sample a sentence , then reward is evaluated between ground truth summary and sampled sentence . We apply the self-critical sequence training (SCST) [Rennie et al.2017] as our RL backbone, and the training objective of SCST becomes:


This reveals that the goal of policy gradient RL in sequence generation is equivalent to increase the probability of generating high-score sentences.

We encounter two additional problems about LC summarization, first is that LC models are designed to generate summaries in different lengths, but existing datasets only provide one or a few ground-truth references for each article, worse still, the number of reference with different length are terribly unbalanced (see Figure 2). In consequence, models trained under ML by this dataset tend to have better performance only in particular lengths. By sampling sequences with randomly assigned length

in reinforcement training, uniform-distributed length sentences are served as additional summaries to be judged by RL system, as a result, alleviate the above-mentioned issue.

The second problem is that directly applying SCST for LC models will seriously diminish the LC capacity, because some of the sampled sentences have deviation in length, enlarging the generation probability of these sentences will corrupt the LC capacity that in turn would further force the model to generate more length-deviation sentences, and therefore reinforcing a vicious cycle to lead LC capacity crash. To save the model from length control collapse in RL, an intuitive idea is to adjust the reward incorporating with outputs length, especially for those sentences with high scores and mismatched length. In consequence, we propose two training approaches for length control RL: Manually Threshold Select (MTS) and Self-Critical Dropout (SCD). Additionally, both training algorithms can regulate the model by tuning a hyper-parameter that has better LC capacity but lower sentence quality and vice versa, i.e, accomplishing the CLC.

Manually Threshold Select

As an initial point, semantic accuracy is still the most critical indicator needed to be guaranteed. For a sentence has low score, its generation probability would be reduced during the training even with expected length. Considering sentences with high scores, for those who have expected length, reward should be naturally retained, thus, we only need to deal with remaining sentences with unqualified length.

Suppose the desired length for sampling sentence is , and the length of the output sequence is . The length prediction error is the difference of two lengths: . We manually choose an error threshold to eliminate the reward of sentence when exceeds :


The LC capacity can be adjusted by setting different , larger would yield better evaluation score while smaller get better length control.

Self-Critical Dropout

Two drawbacks occur in MTS-RL. Firstly, sentences exceeding the limit are completely ignored even though they reach high evaluation scores while is slightly larger than . Secondly, can only take discrete values, and this makes it hard to control those models that have precise length control such as LenMC. Inspired by SCST [Rennie et al.2017] to approximate the baseline from the current training model, we propose Self-Critical Dropout RL approach. In each iteration, a batch of sampled outputs is obtained, where is the sampled sentence with desired length . The mean of is approximated by:


We take as the threshold, unlike the previous method that restrains the rewards of all sentences with larger than , we keep their rewards by a probability of . At the same time, rewards should be more likely to be reserved when get closer to :


reflects the degree of length constraint towards output sequence, therefore controls the LC capacity. Larger could force the model to generate sentences that have more accurate lengths, while smaller have weaker control of length so could improve the performance.

Source article arsenal chairman peter hill-wood revealed thursday that he fears french striker thierry henry will leave highbury at the end of the season .
Reference summary arsenal boss fears losing henry
model desired length and sampled summaries (true length)
25 arsenal chief quits to leave (24)
LenLInit 45 gunners chief fears french striker will leave the end (45)
65 gunners chief says he will leave as he fears french striker will leave (58)
25 arsenal fears henry henry (22)
LenInit 45 arsenal fears french striker henry will leave arsenal (46)
65 arsenal fears french striker henry will leave arsenal says arsenal chairman (65)
25 arsenal fear henry will leave (25)
LenMC 45 arsenal ’s arsenal worried about henry ’s return home (45)
65 arsenal ’s arsenal worried about french striker henry will leave wednesday (64)
25 arsenal ’s henry to quit again (25)
LenEmb 45 arsenal chairman fears henry ’s fate of henry ’s boots (45)
65 arsenal chairman fears french striker henry says he ’s will leave retirement (65)
Table 1: Example summaries of four LC models. (Note that “gunners” is a nickname of arsenal)


The experiments are divided into two parts. We make basic experiments in ML to observe the gap of accuracy between LC models and other summarization baselines. Besides, trained models will be served as the initial state of RL. Then further comparison on LC models under different RL methods is conducted, we pay more attention to this part and perform extensive experiments to demonstrate the effectiveness of controllable length control by designed RL.

Experiment Setting

Gigaword Dataset

Gigaword dataset is selected for our experiments. The corpus pairs including the collected news and corresponding headlines [Napoles, Gormley, and Van Durme2012]. We use the standard train/valid/test data splits followed by [Rush, Chopra, and Weston2015], which are pruned to improve data quality. The whole processed dataset contains nearly 3.8 million sentences for training, along with one summary each. In the experiment of ML, to compare with other summarization models in a unified standard, we conduct the experiment on the entire dataset. Results are reported by standard Gigaword testset which contains 1951 instances and we name it “test-1951”. For the experiments on RL, we shrink the size of training set by sampling 600K pairs of it, validation/test set is rebuilt imitating infused_2018 (infused_2018), two non-overlapped sets are sampled from a standard validation set called: “valid-10K” and “test-4k” for model selection and result evaluation, respectively.

Notice that the scores on “test-4k” are much higher than those on “test-1951”, this is because in standard test set, words in summary sentences do not frequently occur in source texts which brings difficulty for word prediction during decoding. We build the dictionary containing 50000 words with the highest frequency and the other words are replaced by “unk” tag.

model name R-1 R-2 R-L svar
Summarization models
ABS 29.55 11.32 26.42 -
ABS+ 29.76 11.88 26.96 -
Luong-NMT 33.10 14.45 30.71 -
RAS-LSTM 32.55 14.70 30.03 -
RAS-ELman 33.78 15.97 31.15 -
seq2seq (our impl.) 32.24 14.92 30.21 14.23
Length-control models
LenLInit (our) 30.47 13.35 28.40 2.15
LenInit 29.97 13.03 28.07 2.11
LenMC (our) 29.45 12.65 27.41 0.87
LenEmb 28.83 11.89 26.92 0.85
Table 2: Results of ML training on standard “test-1951”

Evaluation Metric

Following other summarization work, we evaluate the quality of generated sentence by F-1 scores of ROUGE-1(R-1), ROUGE-2(R-2), ROUGE-L(R-L) [Lin2004].

To measure the LC capacity, lccnn_2018 (lccnn_2018) use variance of summary lengths

against target length , In this paper, we use the square root of variance (svar) :

25 45 65
model parameter R-1 R-2 R-L R-1 R-2 R-L R-1 R-2 R-L svar(std)
LenLInit 39.03 17.68 37.46 42.04 20.47 39.87 39.40 18.71 36.96 3.96
LenInit 37.36 16.76 35.92 42.11 20.55 39.83 38.67 18.23 36.33 2.98
LenMC 37.10 16.68 35.72 41.38 19.98 38.99 37.93 17.87 35.51 1.05
LenEmb 34.77 14.85 33.41 40.00 18.43 37.74 36.96 16.89 34.49 0.97
LenLInit 42.90 20.10 40.78 43.48 20.83 40.99 42.61 20.37 40.06 11.02.57
LenInit 39.55 17.45 37.85 42.75 20.20 40.35 40.79 19.01 38.37 8.902.12
LenMC 40.38 18.14 38.52 42.14 19.98 39.48 38.36 17.75 35.65 2.460.47
LenEmb 37.77 15.42 35.88 40.40 18.24 37.75 37.48 16.87 34.77 1.590.12
LenLInit 42.64 20.13 40.50 43.12 20.80 40.62 41.44 19.81 38.91 8.540.78
LenLInit 41.43 19.01 39.46 42.63 20.55 40.23 39.81 19.03 37.43 5.140.60
LenLInit 40.66 18.43 38.85 42.46 20.45 40.02 39.13 18.61 36.70 3.870.10
LenInit 40.22 17.88 38.42 42.77 20.36 40.31 40.32 18.83 37.69 6.170.46
LenInit 39.52 17.75 37.79 42.42 20.19 39.95 39.16 18.28 36.57 3.500.56
LenInit 38.62 17.31 36.98 42.26 20.29 39.82 38.52 18.00 36.04 2.790.13
LenMC 38.56 16.53 36.89 41.33 19.67 39.04 37.83 17.66 35.39 1.010.07
LenMC 38.60 16.98 36.98 41.89 20.08 39.37 38.18 17.93 35.68 0.890.02
LenLInit 41.22 18.95 39.31 42.77 20.65 40.34 40.14 19.22 37.76 6.050.45
LenLInit 40.12 18.15 38.41 42.25 20.46 39.93 39.12 18.59 36.67 3.840.05
LenInit 40.45 17.88 38.48 42.88 20.08 40.30 40.26 18.47 37.38 4.520.14
LenInit 38.39 17.07 36.75 42.14 20.21 39.75 38.45 17.98 35.88 2.640.06
LenMC 39.86 17.69 38.13 42.27 20.19 39.78 38.33 18.02 35.78 1.460.04
LenMC 38.87 17.28 37.28 41.64 19.95 39.20 37.75 17.83 35.41 1.150.02
Table 3: Performance of length control RL in “test-4k” (ML results also included for comparison). Obviously highest scores (0.4 larger than the second best) are in bolded font, the scores in italic font are significantly worse score (2 lower than best socre).

Implementation details

Dimensions of hidden state for our BiLSTM encoder and one-layer LSTM decoder are both fixed to 512. The size of vector and incorporating length input is 512 and the number of possible lengths in LenEmb is 150.

We first train our models in supervised ML using Adam [Kingma and Ba2014]

as optimizer and anneal the learning rate by a factor of 0.5 every four epochs. We also apply gradient clip

[Pascanu, Mikolov, and Bengio2013] with a range of [-10, 10], and batch size is set to 64.

Then we run RL algorithms on previously trained LC models with initial learning rate of 0.00001 and reward in RL is also set as the sum of R-1, R-2, and R-L scores. During the RL, desired length to sample the sentences is average distributed in a interval . We evaluate the model in validation set at each 2000 iterations and select the model according to its cumulative score of R-1, R-2 and R-L.

Note that in our experiments, the space is not counted into sentence length which is slightly different with [Kikuchi et al.2016].

Experiment Results Analysis

Length Control in ML

Although the evaluation score is not the unique objective in this research, it is of interest that how exactly the score is deprived by LC capacity. The results of four LC models in ML are presented in Table 2, and ROUGE scores are collected with desired length of . To embody the accuracy level of our LC models, we list several existing summarization baselines including ABS, ABS+ [Rush, Chopra, and Weston2015], RAS-LSTM, RAS-Elman and Luong-NMT [Chopra, Auli, and Rush2016]. After individually comparing two WLI models and two RLI models, we find the two proposed models, LenLInit and LenMC, slightly corrupt LC capacity while improve the scores obviously.

In Table 1, we provide a representative example of the summaries generated by LC models, and results demonstrate that these models are able to output well-formed sentences with various lengths. It is also observed that LenLInit and LenMC perform better on short sentence summary in this case.

RL for Length Control

Table 3 displays overall comparison of all models under RL. We evaluate our models with sentence length of 25, 45 and 65, which represent short, median and long sentences separately. Results may vary after each training process since RL is usually unstable, so we repeat training for multiple times in each model and statistic the results on average.

We first present the results of four LC models in ML. After that, we apply raw self-critical sequence training (SCST) on this basis, without any constraints on output length, we find that WLI models tend to lose control of length sharply but increase the accuracy significantly, while RLI methods still keep the good LC ability. This is mainly because the lengths of sampled sentences for RLI models are consistent with the input length in most cases, consequently, the training process is stable.

To further investigate the impacts that RL makes on LC models, we evaluate the models on all expected lengths within the range of [20, 70]. These results are reported in Figure 3, where x-axis represents the length, ROUGE score and svar of y-axis measures output quality and LC capacity separately. For convenience, we take the average of R-1, R-2, and R-L values as ROUGE score. Obviously, RL improves scores among the range of all lengths but release LC capacity. The gain of scores is significantly on both short and long sentences for WLI models as well as short sentences for RLI models, which signify that RL alleviates the problem due to unbalanced amount of multiple lengths in training corpus. In particular, LenLIint performs the highest score among four models, nonetheless, have poor LC on long sentences. It is worth noting that LenMC with SCST results even higher score than LenInit on short summaries, and still perserve excellent LC ability. Since SCST has negligible effect on LenEmb, we exclude LenEmb for further comparison under length-control RL.

Controllable Length Control Analysis

(a) LenLInit (b) LenInit
(c) LenEmb (d) LenMC
Figure 3: Performance of SCST-trained LC models versus ML-trained LC models on both aspects of LC capacity (bars) and outputs score (lines).

Results of MTS-RL and SCD-RL in Table 3 are followed by SCST part. We make experiments of MTS-RL on three LC models, for WLI models LenLInit and LenInit, accuracy and svar both rise under the selected increasing, which means hyper-parameter in MTS-RL can be used to adjust the LC capacity. However, for the RLI model LenMC, results show there is no obvious distinction in scores when we use different . Hence, we adopt SCD-RL training algorithm for LenMC, the results show our SCD-RL algorithm can control LC capacity for RLI model as MTS-RL does for the WLI model. and SCD-RL can also manage the LC capacity for WLI models. Overall, two RL training algorithms prevent the model from length control collapsing, and make this capacity controllable via their own hyper-parameters.

In order to make comprehensive comparison considering all factors, we build a scatter map (see Figure 4) to display the performance of models in different training strategies. The x-axis is svar to measure the LC capacity. To evaluate the scores intergrating different lengths, we take the average of R-1, R-2, R-L scores with lengths of , , as the value on y-axis. From Figure 4, we can give some intuitive interpretations: (i) SCST as length control RL for WLI models is extremely unstable. (ii) For those models with similar average ROUGE scores, LenMC have strictly better LC capability than LenInit. (iii) Statistically, LenLInit performs higher score than LenInit when their svar values are relatively close. (iv) The models with designed RL algorithms sufficiently cover wide range of LC capacity with accuracy in a reasonable scope.

Figure 4: Exhibition of overall experiment results on four models in length control RL. ( MTS is only applied on WLI models with chosen from [4, 8, 10, 16]. In SCD training, for WLI models is selected from [0.8, 0.4, 0.2, 0.1, 0.05], and we set as one of [0.8, 0.4, 0.1] for LenMC. )

Conclusion and Future Work

In this paper, we proposed LenLInit and LenMC inspired by former work, our modified models improved length control summarization performance on Gigaword Dataset. Two developed RL algorithms were successfully applied in length control models to significantly improve the scores on all short, median and long sentences, and to allow users to determine the model with expected length control capacity. Due to the deficiency of the research in this field, extra work need to be pursued. We plan to perform experiments on other tasks such as image caption and dialogue system to further verify our RL algorithms. It is also valuable to investigate the mathematical relationship between length control capacity and evaluation scores, which can be beneficial for model selection. Furthermore, the controllable ability can be extended to other domains like sentiment or style.


  • [Bahdanau, Cho, and Bengio2014] Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. ICLR.
  • [Cao et al.2018] Cao, Z.; Wei, F.; Li, W.; and Li, S. 2018. Faithful to the original: Fact aware neural abstractive summarization. In AAAI.
  • [Chen and Bansal2018] Chen, Y.-C., and Bansal, M. 2018. Fast abstractive summarization with reinforce-selected sentence rewriting. In ACL, 675–686.
  • [Chopra, Auli, and Rush2016] Chopra, S.; Auli, M.; and Rush, A. M. 2016.

    Abstractive sentence summarization with attentive recurrent neural networks.

    In NAACL, 93–98.
  • [Fan, Grangier, and Auli2018] Fan, A.; Grangier, D.; and Auli, M. 2018. Controllable abstractive summarization. ACL  45.
  • [Hochreiter and Schmidhuber1997] Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural computation 9(8):1735–1780.
  • [Kikuchi et al.2016] Kikuchi, Y.; Neubig, G.; Sasano, R.; Takamura, H.; and Okumura, M. 2016. Controlling output length in neural encoder-decoders. In EMNLP, 1328–1338.
  • [Kingma and Ba2014] Kingma, D. P., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • [Li et al.2017] Li, P.; Lam, W.; Bing, L.; and Wang, Z. 2017. Deep recurrent generative decoder for abstractive text summarization. In EMNLP, 2091–2100.
  • [Lin2004] Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In ACL, 74–81.
  • [Liu et al.2018] Liu, L.; Lu, Y.; Yang, M.; Qu, Q.; Zhu, J.; and Li, H. 2018. Generative adversarial network for abstractive text summarization. In AAAI.
  • [Liu, Luo, and Zhu2018] Liu, Y.; Luo, Z.; and Zhu, K. 2018. Controlling length in abstractive summarization using a convolutional neural network. In EMNLP, 4110–4119.
  • [Nallapati et al.2016] Nallapati, R.; Zhou, B.; dos Santos, C.; Gulcehre, C.; and Xiang, B. 2016. Abstractive text summarization using sequence-to-sequence rnns and beyond. In CoNLL, 280–290.
  • [Napoles, Gormley, and Van Durme2012] Napoles, C.; Gormley, M.; and Van Durme, B. 2012. Annotated gigaword. In ACL, 95–100. Association for Computational Linguistics.
  • [Pascanu, Mikolov, and Bengio2013] Pascanu, R.; Mikolov, T.; and Bengio, Y. 2013. On the difficulty of training recurrent neural networks. In ICML, 1310–1318.
  • [Paulus, Xiong, and Socher2018] Paulus, R.; Xiong, C.; and Socher, R. 2018. A deep reinforced model for abstractive summarization. ICLR.
  • [Ranzato et al.2015] Ranzato, M.; Chopra, S.; Auli, M.; and Zaremba, W. 2015. Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732.
  • [Rennie et al.2017] Rennie, S. J.; Marcheret, E.; Mroueh, Y.; Ross, J.; and Goel, V. 2017.

    Self-Critical Sequence Training for Image Captioning.

    In CVPR, 1179–1195.
  • [Rush, Chopra, and Weston2015] Rush, A. M.; Chopra, S.; and Weston, J. 2015. A neural attention model for abstractive sentence summarization. In EMNLP, 379–389.
  • [See, Liu, and Manning2017] See, A.; Liu, P. J.; and Manning, C. D. 2017. Get to the point: Summarization with pointer-generator networks. In ACL, 1073–1083.
  • [Song, Zhao, and Liu2018] Song, K.; Zhao, L.; and Liu, F. 2018. Structure-infused copy mechanisms for abstractive summarization. In COLING, 1717–1729.
  • [Sutskever, Vinyals, and Le2014] Sutskever, I.; Vinyals, O.; and Le, Q. V. 2014. Sequence to Sequence Learning with Neural Networks. In NIPS. 3104–3112.
  • [Sutton and Barto2018] Sutton, R. S., and Barto, A. G. 2018. Reinforcement learning: An introduction. MIT press.
  • [Vinyals et al.2015] Vinyals, O.; Toshev, A.; Bengio, S.; and Erhan, D. 2015. Show and tell: A neural image caption generator. In CVPR, 3156–3164.
  • [Williams and Zipser1989] Williams, R. J., and Zipser, D. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural computation 1(2):270–280.
  • [Yang et al.2019] Yang, M.; Qu, Q.; Tu, W.; Shen, Y.; Zhao, Z.; and Chen, X. 2019. Exploring human-like reading strategy for abstractive text summarization. In AAAI, volume 33, 7362–7369.
  • [Yu et al.2017] Yu, L.; Zhang, W.; Wang, J.; and Yu, Y. 2017. Seqgan: Sequence generative adversarial nets with policy gradient. In AAAI.