With the help of conditioning mechanisms, the state-of-the-art diffusion...
At present, backdoor attacks attract attention as they do great harm to ...
3D photography renders a static image into a video with appealing 3D vis...
Existing zero-shot cross-lingual transfer methods rely on parallel corpo...
Previous studies have proved that cross-lingual knowledge distillation c...
In this paper, we present NUWA-Infinity, a generative model for infinite...
This paper presents a unified multimodal pre-trained model called NÜWA t...
The task of video-based commonsense captioning aims to generate event-wi...
We propose Unicoder-VL, a universal encoder that aims to learn joint
rep...