LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

11/03/2021
by   Christoph Schuhmann, et al.
0

Multi-modal language-vision models trained on hundreds of millions of image-text pairs (e.g. CLIP, DALL-E) gained a recent surge, showing remarkable capability to perform zero- or few-shot learning and transfer even in absence of per-sample labels on target image data. Despite this trend, to date there has been no publicly available datasets of sufficient scale for training such models from scratch. To address this issue, in a community effort we build and release for public LAION-400M, a dataset with CLIP-filtered 400 million image-text pairs, their CLIP embeddings and kNN indices that allow efficient similarity search.

READ FULL TEXT

page 2

page 4

research
10/16/2022

LAION-5B: An open large-scale dataset for training next generation image-text models

Groundbreaking language-vision architectures like CLIP and DALL-E proved...
research
04/18/2021

Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation

Traditional computer vision models are trained to predict a fixed set of...
research
03/22/2022

WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models

Compared with the domain-specific model, the vision-language pre-trainin...
research
06/21/2023

OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

Large multimodal models trained on natural documents, which interleave i...
research
01/21/2023

MTTN: Multi-Pair Text to Text Narratives for Prompt Generation

The increased interest in diffusion models has opened up opportunities f...
research
06/20/2023

Quilt-1M: One Million Image-Text Pairs for Histopathology

Recent accelerations in multi-modal applications have been made possible...
research
01/04/2019

MultiDEC: Multi-Modal Clustering of Image-Caption Pairs

In this paper, we propose a method for clustering image-caption pairs by...

Please sign up or login with your details

Forgot password? Click here to reset