Chinese Poetry Generation with Planning based Neural Network

10/31/2016 ∙ by Zhe Wang, et al. ∙ Baidu, Inc. USTC 0

Chinese poetry generation is a very challenging task in natural language processing. In this paper, we propose a novel two-stage poetry generating method which first plans the sub-topics of the poem according to the user's writing intent, and then generates each line of the poem sequentially, using a modified recurrent neural network encoder-decoder framework. The proposed planning-based method can ensure that the generated poem is coherent and semantically consistent with the user's intent. A comprehensive evaluation with human judgments demonstrates that our proposed approach outperforms the state-of-the-art poetry generating methods and the poem quality is somehow comparable to human poets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This work is licensed under a Creative Commons Attribution 4.0 International License. License details: http://creativecommons.org/licenses/by/4.0/ The classical Chinese poetry is a great and important heritage of Chinese culture. During the history of more than two thousand years, millions of beautiful poems are written to praise heroic characters, beautiful scenery, love, friendship, etc. There are different kinds of Chinese classical poetry, such as Tang poetry and Song iambics. Each type of poetry has to follow some specific structural, rhythmical and tonal patterns. Table 1 shows an example of quatrain which was one of the most popular genres of poetry in China. The principles of a quatrain include: The poem consists of four lines and each line has five or seven characters; every character has a particular tone, Ping (the level tone) or Ze (the downward tone); the last character of the second and last line in a quatrain must belong to the same rhyme category [wang2002summary]. With such strict restrictions, the well-written quatrain is full of rhythmic beauty.

In recent years, the research of automatic poetry generation has received great attention. Most approaches employ rules or templates [tosa2008hitch, wu2009new, netzer2009gaiku, oliveira2009automatic, oliveira2012poetryme]

, genetic algorithms

[manurung2004evolutionary, zhou2010genetic, manurung2012using], summarization methods [yan2013poet] and statistical machine translation methods [jiang2008generating, he2012generating]

to generate poems. More recently, deep learning methods have emerged as a promising discipline, which considers the poetry generation as a sequence-to-sequence generation problem

[zhang2014chinese, Wang2016ChineseSI, yi2016generating]. These methods usually generate the first line by selecting one line from the dataset of poems according to the user’s writing intents (usually a set of keywords), and the other three lines are generated based on the first line and the previous lines. The user’s writing intent can only affect the first line, and the rest three lines may have no association with the main topic of the poem, which may lead to semantic inconsistency when generating poems. In addition, topics of poems are usually represented by the words from the collected poems in the training corpus. But as we know, the words used in poems, especially poems written in ancient time, are different from modern languages. As a consequence, the existing methods may fail to generate meaningful poems if a user wants to write a poem for a modern term (e.g., Barack Obama).

In this paper, we propose a novel poetry generating method which generates poems in a two-stage procedure: the contents of poems (“what to say”) are first explicitly planned, and then surface realization (“how to say”) is conducted. Given a user’s writing intent which can be a set of keywords, a sentence or even a document described by natural language, the first step is to determine a sequence of sub-topics for the poem using a poem planning model, with each line represented by a sub-topic. The poem planning model decomposes the user’s writing intent into a series of sub-topics, and each sub-topic is related to the main topic and represents an aspect of the writing intent. Then the poem is generated line by line, and each line is generated according to the corresponding sub-topic and the preceding generated lines, using a recurrent neural network based encoder-decoder model (RNN enc-dec). We modify the RNN enc-dec framework to support encoding of both sub-topics and the preceding lines. The planning based mechanism has two advantages compared to the previous methods. First, every line of the generated poem has a closer connection to user’s writing intent. Second, the poem planning model can learn from extra knowledge source besides the poem data, such as large-scale web data or knowledge extracted from encyclopedias. As a consequence, it can bridge the modern concepts and the set of words covered by ancient poems. Take the term “Barack Obama” as the example: using the knowledge from encyclopedias, the poem planning model can extend the user’s query, Barack Obama, to a series of sub-topics such as outstanding, power, etc., therefore ensuring semantic consistency in the generated poems.

The contribution of this paper is two-fold. First, we propose a planning-based poetry generating framework, which explicitly plans the sub-topic of each line. Second, we use a modified RNN encoder-decoder framework, which supports encoding of both sub-topics and the preceding lines, to generate the poem line by line.

静夜思 Thoughts in a Still Night
床前明月, (P P Z Z P) The luminous moonshine before my bed,
疑是地上。 (* Z Z P P) Is thought to be the frost fallen on the ground.
举头望明月, (* Z P P Z) I lift my head to gaze at the cliff moon,
低头思故。 (P P Z Z P) And then bow down to muse on my distant home.
Table 1: An example of Tang poetry. The tone is shown at the end of each line. P represents the level-tone, and Z represents the downward-tone; * indicates that the tone can be either. The rhyming characters are in boldface.

The rest of this paper is organized as follows. Section 2 describes some previous work on poetry generation and compares our work with previous methods. Section 3 describes our planning based poetry generation framework. We introduce the datasets and experimental results in Section LABEL:sec:experiments. Section LABEL:sec:conclusion concludes the paper.

2 Related Work

Poetry generation is a challenging task in NLP. Oliveira et al. oliveira2009automatic,oliveira2012poetryme,Oliveira2014AdaptingAG proposed a poem generation method based on semantic and grammar templates. netzer2009gaiku employed a method based on word association measures. tosa2008hitch and wu2009new used a phrase search approach for Japanese poem generation. Greene2010AutomaticAO applied statistical methods to analyze, generate and translate rhythmic poetry. Colton2012FullFACEPG described a corpus-based poetry generation system that uses templates to construct poems according to the given constrains. yan2013poet considered the poetry generation as an optimization problem based on a summarization framework with several constraints. Manurung manurung2004evolutionary,manurung2012using and zhou2010genetic used genetic algorithms for generating poems. An important approach to poem generation is based on statistical machine translation (SMT). jiang2008generating used an SMT-based model in generating Chinese couplets which can be regarded as simplified regulated verses with only two lines. The first line is regarded as the source language and translated into the second line. he2012generating extended this method to generate quatrains by translating the previous line to the next line sequentially.

Recently, deep learning methods achieve great success in poem generation. zhang2014chinese proposed a quatrain generation model based on recurrent neural network (RNN). The approach generates the first line from the given keywords with a recurrent neural network language model (RNNLM) [mikolov2010recurrent]

and then the subsequent lines are generated sequentially by accumulating the status of the lines that have been generated so far. Wang2016ChineseSI generated the Chinese Song iambics using an end-to-end neural machine translation model. The iambic is generated by translating the previous line into the next line sequentially. This procedure is similar to SMT, but the semantic relevance between sentences is better. Wang2016ChineseSI did not consider the generation of the first line. Therefore, the first line is provided by users and must be a well-written sentence of the poem. yi2016generating extended this approach to generate Chinese quatrains. The problem of generating the first line is resolved by a separate neural machine translation (NMT) model which takes one keyword as input and translates it into the first line. Marjan2016topical proposed a poetry generation algorithm that first generates the rhyme words related to the given keyword and then generated the whole poem according to the rhyme words with an encoder-decoder model

[Sutskever2014].

Our work differs from the previous methods as follows. First, we don’t constrain the user’s input. It can be some keywords, phrases, sentences or even documents. The previous methods can only support some keywords or must provide the first line. Second, we use planning-based method to determine the topic of the poem according to the user’s input, with each line having one specific sub-topic, which guarantees that the generated poem is coherent and well organized, therefore avoiding the problem of the previous method that only the first line is guaranteed to be related to the user’s intent while the next lines may be irrelevant with the intention due to the coherent decay problem [he2012generating, zhang2014chinese, Wang2016ChineseSI, yi2016generating]. Third, the rhythm or tone in [zhou2010genetic, yan2013poet, zhang2014chinese, yi2016generating, Marjan2016topical] is controlled by rules or extra structures, while our model can automatically learn constrains from the training corpus. Finally, our poem generation model has a simpler structure compared with those in [zhang2014chinese, yi2016generating].

3 Approaches

Figure 1: Illustration of the planning based poetry generation framework.

3.1 Overview

Inspired by the observation that a human poet shall make an outline first before writing a poem, we propose a planning-based poetry generation approach (PPG) that first generates an outline according to the user’s writing intent and then generates the poem. Our PPG system takes user’s writing intent as input which can be a word, a sentence or a document, and then generates a poem in two stages: Poem Planning and Poem Generation. The two-stage procedure of PPG is illustrated in Figure 1.

Suppose we are writing a poem that consists of lines with representing the -th line of the poem. In the Poem Planning stage, the input query is transformed into keywords , where is the -th keyword that represents the sub-topic for the -th line. In the Poem Generation stage, is generated by taking and as input, where is a sequence concatenated by all the lines generated previously, from to . Then the poem can be generated sequentially, and each line is generated according to one sub-topic and all the preceding lines.

3.2 Poem Planning

3.2.1 Keyword Extraction

The user’s input writing intent can be represented as a sequence of words. There is an assumption in the Poem Planning stage that the number of keywords extracted from the input query must be equal to the number of lines in the poem, which can ensure each line takes just one keyword as the sub-topic. If the user’s input query is too long, we need to extract the most important words and keep the original order as the keywords sequence to satisfy the requirement.

We use TextRank algorithm [Mihalcea2004TextRankBO] to evaluate the importance of words. It is a graph-based ranking algorithm based on PageRank [Brin1998TheAO]. Each candidate word is represented by a vertex in the graph and edges are added between two words according to their co-occurrence; the edge weight is set according to the total count of co-occurrence strength of the two words. The TextRank score is initialized to a default value (e.g. 1.0) and computed iteratively until convergence according to the following equation:

(1)

where is the weight of the edge between node and , is the set of vertices connected with , and is a damping factor that usually set to 0.85 [Brin1998TheAO], and the initial score of is set to 1.0.

3.2.2 Keyword Expansion

If the user’s input query is too short to extract enough keywords, we need to expand some new keywords until the requirement of keywords number is satisfied. We use two different methods for keywords expansion.

RNNLM-based method. We use a Recurrent Neural Network Language Model (RNNLM) [mikolov2010recurrent] to predict the subsequent keywords according to the preceding sequence of keywords: , where is the -th keyword and is the preceding keywords sequence.

The training of RNNLM needs a training set consisting of keyword sequences extracted from poems, with one keyword representing the sub-topic of one line. We automatically generate the training corpus from the collected poems. Specifically, given a poem consisting of lines, we first rank the words in each line according to the TextRank scores computed on the poem corpus. Then the word with the highest TextRank score is selected as the keyword for the line. In this way, we can extract a keyword sequence for every poem, and generate a training corpus for the RNNLM based keywords predicting model.

Knowledge-based method. The above RNNLM-based method is only suitable for generating sub-topics for those covering by the collected poems. This method does not work when the user’s query contains out-of-domain keywords, for example, a named entity not covered by the training corpus.

To solve this problem, we propose a knowledge-based method that employs extra sources of knowledge to generate sub-topics. The extra knowledge sources can be used include encyclopedias, suggestions of search engines, lexical databases (e.g. WordNet), etc. Given a keyword , the key idea of the method is to find some words that can best describe or interpret . In this paper, we use the encyclopedia entries as the source of knowledge to expand new keywords from . We retrieve those satisfying all the following conditions as candidate keywords: (1) the word is in the window of around ; (2) the part-of-speech of the word is adjective or noun; (3) the word is covered by the vocabulary of the poem corpus. Then the candidate words with the highest TextRank score are selected as the keywords.

3.3 Poem Generation

Figure 2: An illustration of poem generation model.

In the Poem Generation stage, the poem is generated line by line. Each line is generated by taking the keyword specified by the Poem Planning model and all the preceding text as input. This procedure can be considered as a sequence-to-sequence mapping problem with a slight difference that the input consists of two different kinds of sequences: the keyword specified by the Poem Planning model and the previously generated text of the poem. We modify the framework of an attention based RNN encoder-decoder (RNN enc-dec)

[bahdanau2014neural] to support multiple sequences as input.

Given a keyword which has characters, i.e. , and the preceding text which has characters, i.e. , we first encode into a sequence of hidden states , and into

, with bi-directional Gated Recurrent Unit (GRU)

[cho2014learning] models. Then we integrate

into a vector

by concatenating the last forward state and the first backward state of , where

(2)

We set , then the sequence of vectors represents the semantics of both and , as illustrated in Figure 2. Notice that when we are generating the first line, the length of the preceding text is zero, i.e. , then the vector sequence only contains one vector, i.e. , therefore, the first line is actually generated from the first keyword.

For the decoder, we use another GRU which maintains an internal status vector , and for each generation step

, the most probable output

is generated based on , context vector and previous generated output . This can be formulated as follows:

(3)

After each prediction, is updated by

(4)

is an activation function of GRU and

is recomputed at each step by the alignment model:

(5)

is the -th hidden state in the encoder’s output. The weight is computed by

(6)

where

(7)

is the attention score on at time step t. The probability of the next word can be defined as:

(8)

where is a nonlinear function that outputs the probability of .

The parameters of the poem generation model are trained to maximize the log-likelihood of the training corpus:

(9)