Visual storytelling aims to generate a narrative based on a sequence of
...
Temporal language grounding (TLG) aims to localize a video segment in an...
In document image rectification, there exist rich geometric constraints
...
In this work, we propose a new framework, called Document Image Transfor...
Hand gesture serves as a critical role in sign language. Current
deep-le...
Temporal language grounding (TLG) is a fundamental and challenging probl...
Pre-training text representations has recently been shown to significant...