We present Kosmos-2.5, a multimodal literate model for machine reading o...
Scaling sequence length has become a critical demand in the era of large...
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enablin...
A big convergence of language, multimodal perception, action, and world
...
Large Transformers have achieved state-of-the-art performance across man...
A big convergence of model architectures across language, vision, speech...
Payment channel is a class of techniques designed to solve the scalabili...
A big convergence of language, vision, and multimodal pretraining is
eme...
Foundation models have received much attention due to their effectivenes...
We introduce a vision-language foundation model called VL-BEiT, which is...
Recording fast motion in a high FPS (frame-per-second) requires expensiv...
Knowledge distillation (KD) methods compress large models into smaller
s...
We propose a cross-modal attention distillation framework to train a
dua...
We present a unified Vision-Language pretrained Model (VLMo) that jointl...
Pretrained bidirectional Transformers, such as BERT, have achieved
signi...
Large pre-trained models have achieved great success in many natural lan...
Fine-tuning pre-trained cross-lingual language models can transfer
task-...
We generalize deep self-attention distillation in MiniLM (Wang et al., 2...
In this work, we formulate cross-lingual language model pre-training as
...
Question Answering (QA) has shown great success thanks to the availabili...
Neuromorphic data, recording frameless spike events, have attracted
cons...
We propose to pre-train a unified language model for both autoencoding a...
Pre-trained language models (e.g., BERT (Devlin et al., 2018) and its
va...
In this work we focus on transferring supervision signals of natural lan...
Machine reading comprehension with unanswerable questions is a challengi...
This paper presents a new Unified pre-trained Language Model (UniLM) tha...