Generating videos from text is a challenging task due to its high
comput...
Recent neural models for image captioning usually employs an encoder-dec...
Attention mechanisms are widely used in current encoder/decoder framewor...
Exploring fine-grained relationship between entities(e.g. objects in ima...