Flood Segmentation on Sentinel-1 SAR Imagery with Semi-Supervised Learning

07/18/2021 ∙ by Sayak Paul, et al. ∙ 30

Floods wreak havoc throughout the world, causing billions of dollars in damages, and uprooting communities, ecosystems and economies. Accurate and robust flood detection including delineating open water flood areas and identifying flood levels can aid in disaster response and mitigation. However, estimating flood levels remotely is of essence as physical access to flooded areas is limited and the ability to deploy instruments in potential flood zones can be dangerous. Aligning flood extent mapping with local topography can provide a plan-of-action that the disaster response team can consider. Thus, remote flood level estimation via satellites like Sentinel-1 can prove to be remedial. The Emerging Techniques in Computational Intelligence (ETCI) competition on Flood Detection tasked participants with predicting flooded pixels after training with synthetic aperture radar (SAR) images in a supervised setting. We use a cyclical approach involving two stages (1) training an ensemble model of multiple UNet architectures with available high and low confidence labeled data and, (2) generating pseudo labels or low confidence labels on the unlabeled test dataset, and then, combining the generated labels with the previously available high confidence labeled dataset. This assimilated dataset is used for the next round of training ensemble models. This cyclical process is repeated until the performance improvement plateaus. Additionally, we post process our results with Conditional Random Fields. Our approach sets a high score on the public leaderboard for the ETCI competition with 0.7654 IoU. Our method, which we release with all the code including trained models, can also be used as an open science benchmark for the Sentinel-1 released dataset on GitHub. To the best of our knowledge we believe this the first works to try out semi-supervised learning to improve flood segmentation models.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 4

page 6

Code Repositories

ETCI-2021-Competition-on-Flood-Detection

Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The impact of floods are widespread as a large proportion of the world’s population (40%) lives within a close proximity (100 km) of the coasts333https://nasadaacs.eos.nasa.gov/learn/toolkits/disasters-toolkit/floods-toolkit. Flooding events are on the rise due to climate change, increasing sea levels and increasing extreme weather events. Additionally, the United Nations has maintained effective response and proactive risk assessment for disasters like flood events in their Sustainable Development Goals. Scientists and decision makers can use live earth observations data via satellites like Sentinel-1, MODIS, and other ground-based data like precipitation, runoff, soil moisture, snow cover and snow water equivalent, topography, land surface reflectance to develop not only real-time response and mitigation tactics but understand flooding events and an areas predisposition to flooding. Usually a combination of satellite and ground based data is utilized for complete spatial and temporal coverage. Projects like NASA’s Precipitation Measurement Mission, Tropical Rainfall Measuring Mission, Global Precipitation Measurement missions, Soil Moisture Active Passive satellites and LDAS provide data to determine open water flood areas and flood levels.

The Emerging Techniques in Computational Intelligence (ETCI) 2021 competition on Flood Detection aims to “promote innovation in the detection of flood events and water bodies, as well as to provide objective and fair comparisons among methods.” and is supported by NASA’s Interagency Implementation and Advanced Concepts Team (IMPACT), Institute of Electrical and Electronic Engineers (IEEE) Geoscience and Remote Sensing Society (GRSS) and Earth Science Informatics Technical Committee (ESI TC). The competition provides SAR Sentinel-1 imagery with labeled pixels for a geographic area prior and post flood event. Participants are tasked with a semantic segmentation task that identifies pixels that are flooded and are measured on the standard Intersection over Union metric (IoU).

We explore the competition from a limited domain knowledge perspective and aim to develop an understanding of the complex field of flood segmentation. In an effort to promote open science and cross-collaboration we release all our code including trained models. Additionally we aim to benchmark the inference pipeline and show that it can be performed in real time aiding in real time disaster mitigation efforts. To the best of our knowledge we believe this is the first works to try out semi-supervised learning to improve flood segmentation models.

Figure 1: Raw data provided with the competition with the corresponding VV and VH GeoTIFF files, and the state of water body before and after flood. The composite RGB or colour image is generated using the ESA Polarimetry guidelines with VV channel for red, VH channel for green and the ratio for blue. All images are a random sample from the training set. We note the visible grains in different directions potentially due to recently harvested agricultural fields from Bangladesh. North Alabama show various artifacts including potential swath gaps due to differences in satellite coverage, while the RGB color range in Nebraska is unique. The North Alabama image with swath gaps is kept because of at least some positive ground truth artifacts (after flood) available.

2 Data

The contest dataset consists of 66k tiled images from various geographic locations including Nebraska, North Alabama, Bangladesh, Red River North, and Florence. The training split has even distributions of Nebraska and North Alabama with approximately one-third distribution of Bangladesh. The validation dataset released for phase two was primarily Florence while the test dataset is primarily the Red River North region. Each tile is generated from 3 RGB channels from their corresponding VV and VH GeoTIFF files (see raw images in Figure 1) obtained via Hybrid Pluggable Processing Pipeline “hyp3” from the Sentinel-1 C-band SAR imagery. Each tile also comes with a timestamp indicating when the snapshot was taken and is useful in inferring the progress of flooding events. Data also contains swath gaps where in some cases less than .5% of the pixels in an image are present. Such images do not contribute substantial information and as such are removed.

Our processing pipeline includes combining the image channels to produce an RGB composite image (Figure 1) following the ESA Polarimetry guidelines444https://sentinel.esa.int/web/sentinel/user-guides/sentinel-1-sar/product-overview/polarimetry. On observing the RGB composites, some images are completely white (Figure 2) and are treated as noise, and thus discarded from the pipeline. These white images are common when the VV and VH images do not align and have been confirmed as noise with the competition communication.

Figure 2: Noisy images either due to swath gaps or completely empty ones which occur when the VV and VH images do not align and are filtered out. Note that ground truth artifacts are unfavorable as they do not provide a positive example.

Because the data is available almost as a continual timestamp, we experimented with and without random sampling of data and instead using ordered frames for additional context, and we experimented with and without random shuffling of data as the region-timestamp pair would get broken. The imbalance between the segmentation masks that had flood regions vs. the ones that did not have flood regions was so prominent. Ultimately we proceeded with stratified sampling ensuring each training batch contained at least 50% of the samples having some amount of flood region present.

Discussions with domain experts helped note that different geographic locations due to urban and rural conditions have varied backscatter which could be relevant for performance optimizations and generalizability to test imagery. The Red River geographic area which is predominant in the test set, is primarily an agricultural hub and recently harvested fields can look similar to floods due to low backscatter in both VV and VH polarizations. Similarly Florence which comprises of the validation set has a primarily urban setting. Observing this potential uncertainty motivated us to combine different forms of ensembling with stacking, and, test-time augmentation. Both stacking and test-time augmentations combined predictions from the various trained models helping model uncertainty and ultimately making the predictions robust.

We experimented with preprocessing techniques like speckle removal but noted no visual difference in the original and processed images. Augmentation techniques include horizontal flips, rotations, and elastic transformations. We store the processed training and validation data as a PyTorch data loaders

Paszke et al. (2019) and apply test-time augmentation (TTA) to the test dataset. Test-time augmentation has proven to be a game changer especially in Kaggle competitions555https://www.kaggle.com/competitions as it not only performs augmentations on test data, but also averages the predictions of each augmented image resulting in a single prediction, helping to capture uncertainty better. Our test-time augmentation transformations include horizontal and vertical flips, transpositions and rotations by . Collectively, these four transformations are also known as Dihedral Group D4 Sharma (2020).

3 Methodology

With the data loaders performing stratified sampling on the processed and filtered training data we proceed with training. Our training strategy involved experimenting with UNet-inspired architectures and backbones, multiple loss functions like focal loss and dice loss, combining both loss functions into one, various augmentation techniques, and regularization. Additionally, we leverage test-time augmentation. We set seeds at various framework and package levels to enable reproducibility, and present results as the average of multiple training runs. For additional experimental configurations, we refer the readers to our code repository on GitHub

666https://git.io/JW3P8.

Sampling.

There is an imbalance problem in the training dataset i.e. the number of satellite images that have presence of flood regions is smaller than the number of images that do not have contain flood regions. This is why we follow a stratified sampling strategy during data loading to ensure half of the images in any given batch always contain some flood regions. Empirically we found out that this sampling significantly helped in convergence. Using this sampling strategy in our setup was motivated from a solution that won a prior Kaggle competition Anuar (2019).

Encoder backbone.

Throughout this work we stick to using MobileNetV2 Sandler et al. (2018) as the encoder backbone owing to its use of pointwise convolutions which turns out to be a good fit for the problem. Since the boundary details present inside the Sentinel-1 imagery are extremely fine, point-wise convolutions are a great fit for this. Empirically, we experimented with a number of different backbones but none of their performance consistency was comparable to MobileNetV2 backbone.

Segmentation architecture.

We use UNet Ronneberger et al. (2015) and UNet++ Zhou et al. (2018). As before, prioritization to pointwise convolutions still stands. We avoid using architectures where dilated convolutions are used such as the DeepLab family of architectures Chen et al. (2018b). The other architectures that we tried include LinkNet Chaurasia and Culurciello (2017) and MANet Zhao et al. (2020) but they did not produce good results.

Loss function.

Dice coefficient was introduced for medical workflows  Milletari et al. (2016) to primarily deal with data imbalance. Flood imagery similar to organ or medical voxel segmentation has a large amount of imbalance with only a few pixels per image being identified as flooded. Focal loss Lin et al. (2017) assigns weight to the limited number of positive examples (flooded pixels in our case) while preventing the majority of non-flooded pixels from overwhelming the segmentation pipeline during training. We empirically noted slight improvement while using Dice loss compared to Focal loss and the two combined.

In the following sections, we discuss the baselines and subsequent modifications we explored. To validate our approaches, we report the IoU scores on the test set obtained from the competition leaderboard.

Baseline model.

UNet with a MobileNetV2 backbone and test-time augmentation. No pseudo-labeling or Conditional Random Fields (CRFs) Krähenbühl and Koltun (2012) for post processing are utilized. This gets to an IoU of 0.57 on the test set leaderboard.

Modified Architecture.

With the exact same configuration as the baseline, we trained a UNet++ and got an IoU of 0.56.

Ensembling.

An ensemble of the baseline UNet and UNet++ produced a boost in the performance with 0.59 IoU. We follow a stacking-based ensembling approach where after deriving predictions from each of the ensemble members we simply taken an average of those predictions.

Post processing with CRFs.

Evidence indicates that post processing with CRFs Krähenbühl and Koltun (2012); Chen et al. (2018a); Arnab et al. (2018) may yield improved performance especially for semantic segmentation like tasks. CRFs are known to be computationally expensive especially during during inference as it involves an array of preset parameters. Potential implementationsZheng et al. (2015)

that involve appending a CRF layer to the last layer of a trained neural network, and making the trained neural network untrainable, and only training the now last layer of the CRF, thus eliminating expensive computation at test-time. This is additionally helpful if the final model can potentially be deployed to run in real time on satellites or on ground instruments aiding in disaster mitigation efforts. When we applied CRF to the ensemble predictions it produced a significant increase in the IoU score – 0.68 (and eventually to 0.7654 with the cyclical training strategy described in Section

3.1).

Experiments with Noisy Student Training.

In an effort to unify our iterative training procedure, we also experimented with techniques like the Noisy Student Training Xie et al. (2020) method to, but this method did not fare well. Following the recipes of Xie et al. (2020), we performed self-training with noise injected only to the training data777In Noisy Student Training, noise is injected to the models as well in the form of Stochastic Depth Huang et al. (2016) and Dropout Srivastava et al. (2014).. We used the ensemble of the UNet and UNet++ models as a teacher and a UNet model (with MobileNetV2 backend) as a student. During training the student model our data consists of both the training and test data. This training pipeline is depicted in Figure 3. With this pipeline we obtained an IoU of 0.75, which as we will see later is inferior to the approach we ultimately followed. We also note that this method requires significantly less compute compared to the approach we ultimately settled with. So, if IoU can be traded with limited compute requirements, this method still yields competitive results.

Figure 3: Our semi-supervised training pipeline based on Noisy Student Training Xie et al. (2020).

In regard to Figure 3, is defined as per (1).

(1)

where and denote predictions from the student and teacher networks respectively,

is a vector of containing the ground-truth segmentation maps,

is a scalar that controls the contributions from and (KL-Divergence), and is a scalar denoting the temperature Hinton et al. (2015).

Note that for computing in 1, we use the predictions obtained from strongly augmented original training set and their ground-truth segmentation maps.

Pseudo Labels.

Following a plethora of existing work on semi-supervised Zhu (2005); Zhu and Goldberg (2009); Chapelle et al. (2009)

training that assumes predicted labels with maximally predicted probability as ground truth, we apply this learning technique to segmentation. Seminal work includes application of pseudo labels

Lee (2013) as an entropy regularizer eventually outperforming other conventional methods with a small subset of labeled data.

3.1 Final Model and Training Strategy

Figure 4: The best performing cyclical training strategy with pseudo labels on the test set and a final post processing with CRFs.

Ultimately, our best performing model was trained in multiple two-stage iterations, with the output of each stage feeding into the next stage, which we describe next. The training strategy is summarized in Figure 4. This workflow is closely inspired from Babakhin (2019).

Iteration 0, Stage 1: Training on available data, performing inference on test data, and generating Pseudo Labels.

Additionally, as noted previously, the provided test data is only from the Red River North region which does not occur in training (Bangladesh + North Alabama + Nebraska) or validation dataset (only Florence), and thus out-of-distribution impacts were imminent. Such differences in distributions prompted us to utilize ensembling. We first train two models with UNet and UNet++, both with MobileNetv2 backbones and combined dice and focal loss on the available training data. Then, create an ensemble with these two trained models. On performing test inference with the UNet, UNet++, the ensemble model, and averaged predictions from the ensemble, we note a performance improvement of 2-3% on average for the averaged predictions, in each step. As noted before, we also perform test-time augmentations on the test data to further reduce uncertainty.

Next, we filter those test predictions where at least 90% of the pixels have at least high confidence (90%) of either flood or no flood, and consider only those predictions as “weak labels” for pseudo-labeling. We apply this filter over the softmax output of the predictions. This is discussed in more details in the following paragraphs.

In the zeroth training iteration, no pseudo labels are available and training is only on the provided training dataset. For the next step, i.e., step 1 or training iteration 1, pseudo labels from iteration 0 can be used. As such training iteration n can incorporate pseudo labels from step n-1.

Iteration 0, Stage 2: Combining Pseudo Labels + Original Training data.

Now, the filtered pseudo labels from the previous stage are incorporated into the training dataset. Thus, a new training dataset is created which is composed of (1) original training data with available ground truth, referred to as “high confidence” labels, and, (2) filtered pseudo labels or “low confidence” labels on the unlabeled test dataset. This assimilated dataset is used for the next round of training individual UNet, UNet++ and the ensemble models.

Iteration n, Stage 1: Training on available data, performing inference on test data, and generating Pseudo Labels.

With the training data now composed of the original training dataset and pseudo labels from the test dataset, we perform training from scratch of the UNet and UNet++ models, and fine-tuning of the UNet from the previous iteration. Training and fine-tuning are all on the same dataset of original training data and pseudo-labeled test data. Note that the ensemble models are only used to generate predictions, and not for fine-tuning. Now, with three trained models, as before, averaged predictions are generated, and filtered to create the new set of “weak labels”. All the data for training UNet, UNet++ and the fine-tuned UNet is processed through stratified sampling as before. This cyclical process is repeated until the performance improvement plateaus.

A note on filtering for good pseudo labels.

For standard image classification tasks, it is common to filter the softmaxed predictions with respect to a predefined confidence threshold Xie et al. (2020); Sohn et al. (2020); Zoph et al. (2020). In semantic segmentation, we are extending the classification task to a per-pixel case where each pixel of an image needs to be categorized. In order to filter out the low-confidence predictions, we checked if a pre-specified % of the pixel values (in the range of [0, 1]) in each individual predicted segmented maps were above a predefined threshold. Mathematically, we denote this in (2).

(2)

where is the prediction vector, and denote the confidence and pixel proportion thresholds respectively (both being 0.9 in our case), and and are the spatial resolutions of the predicted segmented maps. We compute using (3).

(3)

where

is the logit vector.

It is crucial to note that works like AdaMatch Berthelot et al. (2021) extends this line of work by using an adaptive thresholding mechanism.

3.2 Implementation Details

Our code is in PyTorch 1.9 Paszke et al. (2019)

. We use a number of open-source packages to develop our training and inference workflows. Here we enlist all the major ones. For data augmentation, we use the

albumentations package Buslaev et al. (2020). segmentation_models_pytorch (smp for short) package Yakubovskiy (2020b) is used for developing the segmentation models. Thanks to the timm package Wightman (2019) which allowed us to rapidly experiment with different encoder backbones in smp. Test-time augmentation during inference is performed using the ttach Yakubovskiy (2020a) package. For post processing the initial predictions, we apply CRFs leveraging the pydensecrf package Beyer (2015). To further accelerate the post processing time, we use the Ray framework Moritz et al. (2018) to achieve parallelism in applying CRF to the individual predictions. Our hardware setup includes four NVIDIA Tesla V100 GPUs. By utilizing mixed-precision training Micikevicius et al. (2018) (via torch.cuda.amp) and a distributed training setup (via torch.nn.parallelDistributedDataParallel) we obtain significant boosts in the overall model training time. Our code and pre-trained model weights are available on GitHub888https://git.io/JW2B7.

For the initial training of UNet and UNet++ (as per Section 3.1), we use Adam Kingma and Ba (2015) as the optimizer with a learning rate (LR) of 1e-3999Rest of the hyperparameters were kept to their defaults as provided in torch.optim.Adam.

and we train both the networks for 15 epochs with a batch size of 384. For the second round of training with the initial training set and the generated pseudo-labeled dataset (as per Section

3.1, we keep all the settings similar except for the number of epochs and LR scheduling. We train the networks for 20 epochs (with the same batch size of 384) in this round to account for the larger dataset and also use a cosine decay LR schedule Loshchilov and Hutter (2017)

since we are fine-tuning the pre-trained weights. We do not make use of weight decay for any of these stages. For additional details on the other hyperparameters, we refer the readers to our code repository on GitHub

101010https://git.io/JW3P8.

4 Results

We report all the results obtained from the various approaches in Table 1. As discussed in Section 3.1, we fine-tune a UNet model from the latest previous iteration. This is denoted by (b) in Table 1. When trained from scratch the same model resulted in worse performance. This is denoted by (a) in the Table 1. The Noisy Student Training method is also quite competitive with respect to our best score taking less compute. From our experiments, we believe that with additional tweaks inspired from Zou et al. (2021); Berthelot et al. (2021) it is possible to further push this performance and we aim to explore this. The impact of using a UNet-based architecture with a MobileNetV2 encoder backend is emperically reported in Table 2. We conclude that using this combination of UNet-based architecture with a MobileNetV2 encoder backend for data with extremely fine segments is effective. Additionally, utilizing TTA during inference was motivated due to data distribution differences, and, to better model the uncertainties and emperically we emphasize its impact in Table 3. All scores are noted as of July 15, 2021, 11 p.m., GMT at the end of Phase-2 of the competition.

Method Description IoU
UNet (1) 0.57
UNet++ (2) 0.56
Ensemble of (1) and (2) 0.59
Ensemble of (1) and (2) with CRF post processing
0.68
Noisy Student 0.7584
Pseudo labeling + Ensembles with CRF post processing (a)
0.7321
Pseudo labeling + Ensembles with CRF post processing (b)
0.7654
Table 1: Leaderboard results for the test set. By “Noisy Student Method”, we denote the methodology to utilize the test data. We still apply ensembling and CRF post processing after training a model using this methodology. (a) denotes training from scratch, while (b) indicates fine-tuning a UNet model from the previous iteration.
Model Architecture + Encoder Backbone IoU
UNet + ResNet34 He et al. (2016) 0.55
UNet + RegNetY-002 Radosavovic et al. (2020) 0.56
DeepLabV3Plus + MobileNetV2
0.52
DeepLabV3Plus + RegNetY-002
0.46
UNet + MobileNetV2 0.57
Table 2: Comparing the impact of various combinations of model architectures and encoder backbones. Using a UNet with MobileNetV2 encoder backend outperforms amongst all others, under the same training configurations.
Method IoU
UNet
0.52
UNet + TTA
0.57
Table 3: Using TTA during inference in our case significantly helped boost performance. The trained model in both cases is consistent with a UNet architecture with MobileNetV2 backend.

5 Future Work

Our work suggests CRFs are a crucial element for post processing of the predictions as they provide substantial performance improvements (ensemble model improvements from 0.59 to 0.68, and, pseudo labeled model improvements to 0.7654). We did not notice any eventual plateauing of performance, and believe improved performance is possible with broader hyperparameter ranges. However, running standalone CRFs can get computationally expensive, and we need to strike a balance especially for a real time use case as flood detection where every second counts. Despite its usefulness there does not exist an implementation of CRF that can be run on GPUs or even in batch mode. This could further be mitigated if CRFs were implemented as a standalone layer inside our networks, but we leave this as a future work. Additionally, we plan on exploring uncertainty estimation in conjunction with the predicted labels to provide results over a spectrum of confidences. Combining the flood extent mapping with local topography is also of interest as that can generate a plan of action with downstream results including predicting the direction of flow of water, redirecting flood waters, organizing resources for distribution etc. Such a system can also recommend a path of least flood levels in real-time that disaster response professionals can potentially adopt. Research in the domain of reducing the impact of swath gaps Chen et al. (2021) also exists, but due to limited time we will explore this in future works.

6 Acknowledgement

We would would like to thank the NASA Earth Science Data Systems Program, NASA Digital Transformation AI/ML thrust, and IEEE GRSS for organizing the ETCI competition We are grateful to the Google Developers Experts program111111https://developers.google.com/programs/experts/ for providing Google Cloud Platform credits to support our experiments and would like to thank Charmi Chokshi and domain experts Nick Leach and Veda Sunkara for insightful discussions.

References

  • A. Anuar (2019) SIIM-acr pneumothorax segmentation winning solution. Kaggle. Note: https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/discussion/107824 Cited by: §3.
  • A. Arnab, S. Zheng, S. Jayasumana, B. Romera-Paredes, M. Larsson, A. Kirillov, B. Savchynskyy, C. Rother, F. Kahl, and P. H.S. Torr (2018)

    Conditional random fields meet deep neural networks for semantic segmentation: combining probabilistic graphical models with deep learning for structured prediction

    .
    IEEE Signal Processing Magazine 35 (1), pp. 37–52. External Links: Document Cited by: §3.
  • Y. Babakhin (2019) How to cook pseudo-labels. Kaggle. Note: https://www.youtube.com/watch?v=SsnWM1xWDu4 Cited by: §3.1.
  • D. Berthelot, R. Roelofs, K. Sohn, N. Carlini, and A. Kurakin (2021) AdaMatch: a unified approach to semi-supervised learning and domain adaptation. External Links: 2106.04732 Cited by: §3.1, §4.
  • L. Beyer (2015) Pydensecrf. GitHub. Note: https://github.com/lucasb-eyer/pydensecrf Cited by: §3.2.
  • A. Buslaev, V. I. Iglovikov, E. Khvedchenya, A. Parinov, M. Druzhinin, and A. A. Kalinin (2020) Albumentations: fast and flexible image augmentations. Information 11 (2). External Links: Link, ISSN 2078-2489, Document Cited by: §3.2.
  • O. Chapelle, B. Scholkopf, and A. Zien (2009) Semi-supervised learning (chapelle, o. et al., eds.; 2006) [book reviews]. IEEE Transactions on Neural Networks 20 (3), pp. 542–542. External Links: Document Cited by: §3.
  • A. Chaurasia and E. Culurciello (2017) LinkNet: exploiting encoder representations for efficient semantic segmentation. 2017 IEEE Visual Communications and Image Processing (VCIP). External Links: ISBN 9781538604625, Link, Document Cited by: §3.
  • L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille (2018a) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (4), pp. 834–848. External Links: Document Cited by: §3.
  • L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam (2018b) Encoder-decoder with atrous separable convolution for semantic image segmentation. In Computer Vision – ECCV 2018, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss (Eds.), Cham, pp. 833–851. External Links: ISBN 978-3-030-01234-2 Cited by: §3.
  • S. Chen, E. Cao, A. Koul, S. Ganju, S. Praveen, and M. A. Kasam (2021)

    Reducing effects of swath gaps in unsupervised machine learning

    .
    Committee on Space Research Machine Learning for Space Sciences Workshop, Cross-Disciplinary Workshop on Cloud Computing. Cited by: §5.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    Vol. , pp. 770–778. External Links: Document Cited by: Table 2.
  • G. Hinton, O. Vinyals, and J. Dean (2015) Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, External Links: Link Cited by: §3.
  • G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger (2016) Deep networks with stochastic depth. In Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling (Eds.), Cham, pp. 646–661. External Links: ISBN 978-3-319-46493-0 Cited by: footnote 7.
  • D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun (Eds.), External Links: Link Cited by: §3.2.
  • P. Krähenbühl and V. Koltun (2012) Efficient inference in fully connected crfs with gaussian edge potentials. Advances in Neural Information Processing Systems 24 (2011) 109-117 abs/1210.5644. External Links: Link, 1210.5644 Cited by: §3, §3.
  • D. Lee (2013) Pseudo-label : the simple and efficient semi-supervised learning method for deep neural networks. ICML 2013 Workshop : Challenges in Representation Learning (WREPL), pp. . Cited by: §3.
  • T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár (2017) Focal loss for dense object detection. CoRR abs/1708.02002. External Links: Link, 1708.02002 Cited by: §3.
  • I. Loshchilov and F. Hutter (2017)

    SGDR: stochastic gradient descent with warm restarts

    .
    In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, External Links: Link Cited by: §3.2.
  • P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, and H. Wu (2018) Mixed precision training. In International Conference on Learning Representations, External Links: Link Cited by: §3.2.
  • F. Milletari, N. Navab, and S. Ahmadi (2016)

    V-net: fully convolutional neural networks for volumetric medical image segmentation

    .
    CoRR abs/1606.04797. External Links: Link, 1606.04797 Cited by: §3.
  • P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan, and I. Stoica (2018) Ray: a distributed framework for emerging ai applications. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, OSDI’18, USA, pp. 561–577. External Links: ISBN 9781931971478 Cited by: §3.2.
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Eds.), pp. 8024–8035. External Links: Link Cited by: §2, §3.2.
  • I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollár (2020) Designing network design spaces. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 10425–10433. External Links: Document Cited by: Table 2.
  • O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi (Eds.), Cham, pp. 234–241. External Links: ISBN 978-3-319-24574-4 Cited by: §3.
  • M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L. Chen (2018) Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4510-4520 abs/1801.04381. External Links: Link, 1801.04381 Cited by: §3.
  • P. Sharma (2020)

    Dihedral group d4—a new feature extraction algorithm

    .
    Symmetry 12 (4). External Links: Link, ISSN 2073-8994, Document Cited by: §2.
  • K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C. Li (2020) FixMatch: simplifying semi-supervised learning with consistency and confidence. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33, pp. 596–608. External Links: Link Cited by: §3.1.
  • N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15 (1), pp. 1929–1958. External Links: ISSN 1532-4435 Cited by: footnote 7.
  • R. Wightman (2019) PyTorch image models. GitHub. Note: https://github.com/rwightman/pytorch-image-models External Links: Document Cited by: §3.2.
  • Q. Xie, M. Luong, E. Hovy, and Q. V. Le (2020)

    Self-training with noisy student improves imagenet classification

    .
    In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 10684–10695. External Links: Document Cited by: Figure 3, §3, §3.1.
  • P. Yakubovskiy (2020a) Image test time augmentation with pytorch. GitHub. Note: https://github.com/qubvel/ttach Cited by: §3.2.
  • P. Yakubovskiy (2020b) Segmentation models pytorch. GitHub. Note: https://github.com/qubvel/segmentation_models.pytorch Cited by: §3.2.
  • Y. Zhao, J. Jiao, and T. Zhang (2020) MANet: multimodal attention network based point- view fusion for 3d shape recognition. External Links: 2002.12573 Cited by: §3.
  • S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. S. Torr (2015)

    Conditional random fields as recurrent neural networks

    .
    CoRR abs/1502.03240. External Links: Link, 1502.03240 Cited by: §3.
  • Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang (2018) UNet++: A nested u-net architecture for medical image segmentation. 4th Deep Learning in Medical Image Analysis (DLMIA) Workshop abs/1807.10165. External Links: Link, 1807.10165 Cited by: §3.
  • X. Zhu and A. B. Goldberg (2009) Introduction to semi-supervised learning.

    Synthesis Lectures on Artificial Intelligence and Machine Learning

    3 (1), pp. 1–130.
    External Links: Document, Link, https://doi.org/10.2200/S00196ED1V01Y200906AIM006 Cited by: §3.
  • X. Zhu (2005) Semi-supervised learning literature survey. Technical report Technical Report 1530, Computer Sciences, University of Wisconsin-Madison. External Links: Link Cited by: §3.
  • B. Zoph, G. Ghiasi, T. Lin, Y. Cui, H. Liu, E. D. Cubuk, and Q. Le (2020) Rethinking pre-training and self-training. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33, pp. 3833–3845. External Links: Link Cited by: §3.1.
  • Y. Zou, Z. Zhang, H. Zhang, C. Li, X. Bian, J. Huang, and T. Pfister (2021) PseudoSeg: designing pseudo labels for semantic segmentation. In International Conference on Learning Representations, External Links: Link Cited by: §4.