We present Kosmos-2.5, a multimodal literate model for machine reading o...
Current state-of-the-art models for natural language understanding requi...
We develop a diffusion-based approach for various document layout sequen...
The surge of pre-training has witnessed the rapid development of documen...
Image Transformer has recently achieved significant progress for natural...
We study the problem of recognizing structured text, i.e. text that foll...
Text recognition is a long-standing research problem for document
digita...
Multimodal pre-training with text, layout, and image has achieved SOTA
p...
Pre-training of text and layout has proved effective in a variety of
vis...
In this paper, we propose Text-Aware Pre-training (TAP) for Text-VQA and...
Active speaker detection (ASD) and virtual cinematography (VC) can
signi...
Fine-tuning through knowledge transfer from a pre-trained model on a
lar...
Filter pruning has shown to be effective for learning resource-constrain...
A well-trained Convolutional Neural Network can easily be pruned without...
Resource-efficient convolution neural networks enable not only the
intel...
Identity transformations, used as skip-connections in residual networks,...
Crowd sourcing has become a widely adopted scheme to collect ground trut...