Solo-learn: A Library of Self-supervised Methods for Visual Representation Learning

This paper presents solo-learn, a library of self-supervised methods for visual representation learning. Implemented in Python, using Pytorch and Pytorch lightning, the library fits both research and industry needs by featuring distributed training pipelines with mixed-precision, faster data loading via Nvidia DALI, online linear evaluation for better prototyping, and many additional training tricks. Our goal is to provide an easy-to-use library comprising a large amount of Self-supervised Learning (SSL) methods, that can be easily extended and fine-tuned by the community. solo-learn opens up avenues for exploiting large-budget SSL solutions on inexpensive smaller infrastructures and seeks to democratize SSL by making it accessible to all. The source code is available at



There are no comments yet.


page 1

page 2

page 3

page 4


CompRess: Self-Supervised Learning by Compressing Representations

Self-supervised learning aims to learn good representations with unlabel...

Self-Supervised Representation Learning using Visual Field Expansion on Digital Pathology

The examination of histopathology images is considered to be the gold st...

Improving Few-Shot Learning with Auxiliary Self-Supervised Pretext Tasks

Recent work on few-shot learning <cit.> showed that quality of learned r...

Extending and Analyzing Self-Supervised Learning Across Domains

Self-supervised representation learning has achieved impressive results ...

GURLS: a Least Squares Library for Supervised Learning

We present GURLS, a least squares, modular, easy-to-extend software libr...

Mining for strong gravitational lenses with self-supervised learning

We employ self-supervised representation learning to distill information...

Self-Supervised Viewpoint Learning From Image Collections

Training deep neural networks to estimate the viewpoint of objects requi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep networks trained with large annotated datasets have shown stunning capabilities in the context of computer vision. However, the need for human supervision is a strong limiting factor. Unsupervised learning aims to mitigate this issue by training models from unlabeled datasets. The most prominent paradigm for unsupervised visual representation learning is Self-supervised Learning (SSL), where the intrinsic structure of the data provides supervision for the model. Recently, the scientific community devised increasingly effective SSL methods that match or surpass the performance of supervised methods. Nonetheless, implementing and reproducing such works turns out to be complicated. Official repositories of state-of-the-art SSL methods have very heterogeneous implementations or no implementation at all. Although a few SSL libraries

(Goyal et al., 2021; Susmelj et al., 2020) are available, they assume that larger-scale infrastructures are available or they lack some recent methods. When approaching SSL, it is hard to find a platform for experiments that allows running all current state of the art methods with low engineering effort and at the same time is effective and straightforward to train. This is especially problematic because, while the SSL methods seem simple on paper, replication of published results can involve a huge time and effort from researchers. Sometimes official implementations of SSL methods are available, however, releasing standalone packages (often incompatible with each other) is not sufficient for the fast-paced progress in research and emerging real-world applications. There is no toolbox offering a genuine off-the-shelf catalog of state-of-the-art SSL techniques that is computationally efficient, which is essential for in-the-wild experimentation.

To address these problems, we present solo-learn

, an open-source framework that provides standardized implementations for a large number of state-of-the-art SSL methods. We believe

solo-learn will enable a trustworthy and reproducible comparison between the state of the art methods. The code that powers the library is written in Python, using Pytorch (Paszke et al., 2019) and Pytorch Lightning (Team, 2019) as back-ends and Nvidia DALI111 for fast data loading, and supports more modern methods than related libraries. The library is highly modular and can be used as a complete pipeline, from training to evaluation, or as standalone modules.

2 The solo-learn Library: An Overview

Currently, we are witnessing an explosion of works on SSL methods for computer vision. Their underlying idea is to unsupervisedly learn feature representations by enforcing similar feature representations across multiple views from the same image while enforcing diverse representations for other images. To help researchers have a common testbed for reproducing different results, we present solo-learn, which is a library of self-supervised methods for visual representation learning. The library is implemented in Pytorch, providing state-of-the-art self-supervised methods, distributed training pipelines with mixed-precision, faster data loading via Nvidia DALI, online linear evaluation for better prototyping, and many other training strategies and tricks presented in recent papers and via Pytorch lightning. Our goal is to provide an easy-to-use library that can be easily extended by the community. Also, the additional features that are provided make it much easier for researchers and practitioners to train on smaller infrastructures.

2.1 Self-supervised Learning Methods

We implemented 12 state-of-the-art methods, namely, Barlow Twins (Zbontar et al., 2021), BYOL (Grill et al., 2020), DINO (Caron et al., 2021b), MoCo V2+ (Chen et al., 2020b), NNCLR (Dwibedi et al., 2021), ReSSL Zheng et al. (2021), SimCLR (Chen et al., 2020a), Supervised Contrastive Learning (Khosla et al., 2021), SimSiam (Chen and He, 2020), SwAV (Caron et al., 2021a), VICReg (Bardes et al., 2021) and W-MSE (Ermolov et al., 2021).

2.2 Architecture

Figure 1: Overview of solo-learn.

In Figure 1, we present an overview of how a training pipeline with solo-learn is carried out. In the bottom, we show the packages and external data at each step, while at the top, we show all the defined variables on the left and an example of the newest defined variable on the right. First, the user interacts with solo.args, a subpackage that is responsible for handling all the parameters selected by the user and providing automatic setup. Then, solo.methods interacts with solo.losses to produce the selected self-supervised method. While solo.methods contains all implemented methods, solo.losses

contains the loss functions for each method. Afterwards,

solo.utils handles external data to produce the pretrain dataloader, which contains all the transformation pipelines, model checkpointer, automatic UMAP visualization of the features and many other utility functionalities. Lastly, this is given to a Pytorch Lightning Trainer, which provides hardware support and extra functionality, such as, distributed training, automatic logging results, mixed precision and much more. We note that although we show all subpackages working together, they can be used in a standalone fashion with minor modifications.

2.3 Comparison to Related Libraries

The most related libraries to ours are VISSL (Goyal et al., 2021) and Lightly (Susmelj et al., 2020), which lack some of our key features. First, we support more modern SSL methods, such as BYOL, NNCLR, SimSiam, VICReg and W-MSE, while we lack DeepCluster V2 (Caron et al., 2021a), which they support. Second, we target researchers with fewer resources, namely from 1 to 8 GPUs, allowing much faster data loading via DALI, while still being able to scale due to Pytorch Lightning. Lastly, we provide additional utilities, such as automatic linear evaluation, support to custom datasets and automatically generating UMAP (McInnes et al., 2018) visualizations of the features during training.

3 Experiments


We benchmarked the available SSL methods on CIFAR-10, CIFAR-100 and ImageNet-100 and made public the pretrained checkpoints. For Barlow Twins, BYOL, MoCo V2+, NNCLR, SimCLR and VICReg, hyperparameters were heavily tuned, reaching higher performance than reported on original papers or third-party results. Table


presents the top-1 and top-5 accuracy values for the online linear evaluation. For ImageNet-100, traditional offline linear evaluation is also presented.

Nvidia DALI vs traditional data loading.

We compared the training speeds and memory usage of using traditional data loading via Pytorch Vision222

against data loading with DALI. For consistency, we ran three different methods (Barlow Twins, BYOL and NNCLR) for 20 epochs on ImageNet-100. Table

2 presents these results.

Method CIFAR-10 CIFAR-100 ImageNet-100
Acc@1 Acc@5 Acc@1 Acc@5 Acc@1 (offline) Acc@5 (offline)
Barlow Twins 92.10 99.73 70.90 91.91 80.38 (80.16) 95.28 (95.14)
BYOL 92.58 99.79 70.46 91.96 79.76 (80.16) 94.80 (95.15)
DINO 89.52 99.71 66.76 90.34 74.84 (74.92) 92.92 (92.78)
MoCo V2+ 92.94 99.79 69.89 91.65 78.20 (79.28) 95.50 (95.18)
NNCLR 91.88 99.78 69.62 91.52 79.80 (80.16) 95.28 (95.28)
ReSSL 90.63 99.62 65.92 89.73 76.92 (78.48) 94.20 (94.24)
SimCLR 90.74 99.75 65.78 89.04 77.04 (77.48) 94.02 (93.42)
Simsiam 90.51 99.72 66.04 89.62 74.54 (78.72) 93.16 (94.78)
SwAV 89.17 99.68 64.88 88.78 74.04 (74.28) 92.70 (92.84)
VICReg 92.07 99.74 68.54 90.83 79.22 (79.40) 95.06 (95.02)
W-MSE 88.67 99.68 61.33 87.26 67.60 (69.06) 90.94 (91.22)
Table 1: Online linear evaluation accuracy on CIFAR-10, CIFAR-100 and ImageNet-100. In brackets, offline linear evaluation accuracy is also reported for ImageNet-100.
Method DALI Total time for 20 epochs Time for a 1 epoch GPU memory (per GPU)
Barlow Twins 1h 38m 27s 4m 55s 5097 MB
43m 2s 2m 10s (56% faster) 9292 MB
BYOL 1h 38m 46s 4m 56s 5409 MB
50m 33s 2m 31s (49% faster) 9521 MB
NNCLR 1h 38m 30s 4m 55s 5060 MB
42m 3s 2m 6s (64% faster) 9244 MB
Table 2: Speed and memory comparison with and without DALI on ImageNet-100.

4 Conclusion

Here, we presented solo-learn, a library of self-supervised methods for visual representation learning, providing state-of-the-art self-supervised methods in Pytorch. The library supports distributed training using Pytorch Lightning, fast data loading via DALI and provides many utilities for the end-user, such as online linear evaluation for better prototyping and faster development, many training tricks, and visualization techniques. We are continuously adding newer SSL methods, improving usability, documents, and tutorials. Finally, we welcome contributors to help us at


  • A. Bardes, J. Ponce, and Y. LeCun (2021)

    VICReg: variance-invariance-covariance regularization for self-supervised learning

    External Links: 2105.04906 Cited by: §2.1.
  • M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin (2021a) Unsupervised learning of visual features by contrasting cluster assignments. External Links: 2006.09882 Cited by: §2.1, §2.3.
  • M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin (2021b) Emerging properties in self-supervised vision transformers. External Links: 2104.14294 Cited by: §2.1.
  • T. Chen, S. Kornblith, M. Norouzi, and G. Hinton (2020a) A simple framework for contrastive learning of visual representations. External Links: 2002.05709 Cited by: §2.1.
  • X. Chen, H. Fan, R. Girshick, and K. He (2020b) Improved baselines with momentum contrastive learning. External Links: 2003.04297 Cited by: §2.1.
  • X. Chen and K. He (2020) Exploring simple siamese representation learning. External Links: 2011.10566 Cited by: §2.1.
  • D. Dwibedi, Y. Aytar, J. Tompson, P. Sermanet, and A. Zisserman (2021) With a little help from my friends: nearest-neighbor contrastive learning of visual representations. External Links: 2104.14548 Cited by: §2.1.
  • A. Ermolov, A. Siarohin, E. Sangineto, and N. Sebe (2021) Whitening for self-supervised representation learning. External Links: 2007.06346 Cited by: §2.1.
  • P. Goyal, Q. Duval, J. Reizenstein, M. Leavitt, M. Xu, B. Lefaudeux, M. Singh, V. Reis, M. Caron, P. Bojanowski, A. Joulin, and I. Misra (2021) VISSL. Note: Cited by: §1, §2.3.
  • J. Grill, F. Strub, F. Altché, C. Tallec, P. H. Richemond, E. Buchatskaya, C. Doersch, B. A. Pires, Z. D. Guo, M. G. Azar, B. Piot, K. Kavukcuoglu, R. Munos, and M. Valko (2020) Bootstrap your own latent: a new approach to self-supervised learning. External Links: 2006.07733 Cited by: §2.1.
  • P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan (2021) Supervised contrastive learning. External Links: 2004.11362 Cited by: §2.1.
  • L. McInnes, J. Healy, and J. Melville (2018) UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints. External Links: 1802.03426 Cited by: §2.3.
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala (2019)

    PyTorch: an imperative style, high-performance deep learning library

    In NeurIPS, Cited by: §1.
  • I. Susmelj, M. Heller, P. Wirth, J. Prescott, and M. E. et al. (2020) Lightly. GitHub. Note: Cited by: §1, §2.3.
  • P. L. D. Team (2019) PyTorch lightning. GitHub. Note: 3. Cited by: §1.
  • J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny (2021) Barlow twins: self-supervised learning via redundancy reduction. External Links: 2103.03230 Cited by: §2.1.
  • M. Zheng, S. You, F. Wang, C. Qian, C. Zhang, X. Wang, and C. Xu (2021) ReSSL: relational self-supervised learning with weak augmentation. External Links: 2107.09282 Cited by: §2.1.