Image captioning is conventionally formulated as the task of generating
...
We present FIT: a transformer-based architecture with efficient
self-att...
Panoptic segmentation assigns semantic and instance ID labels to every p...
While language tasks are naturally expressed in a single, unified, model...
We present Imagen, a text-to-image diffusion model with an unprecedented...
This paper presents Pix2Seq, a simple and generic framework for object
d...
Contrastive loss and its variants have become very popular recently for
...
The Insertion Transformer is well suited for long form text generation d...
Increasing the batch size is a popular way to speed up neural network
tr...