Grounded Situation Recognition (GSR), i.e., recognizing the salient acti...
We present MMOCR-an open-source toolbox which provides a comprehensive
p...
Transformers with powerful global relation modeling abilities have been
...
Human vision is able to capture the part-whole hierarchical information ...
Key information extraction from document images is of paramount importan...
Scene graph generation aims to produce structured representations for im...
The attention-based encoder-decoder framework has recently achieved
impr...
Large geometry (e.g., orientation) variances are the key challenges in t...