GAMMA Challenge:Glaucoma grAding from Multi-Modality imAges

02/14/2022
by   Junde Wu, et al.
5

Color fundus photography and Optical Coherence Tomography (OCT) are the two most cost-effective tools for glaucoma screening. Both two modalities of images have prominent biomarkers to indicate glaucoma suspected. Clinically, it is often recommended to take both of the screenings for a more accurate and reliable diagnosis. However, although numerous algorithms are proposed based on fundus images or OCT volumes in computer-aided diagnosis, there are still few methods leveraging both of the modalities for the glaucoma assessment. Inspired by the success of Retinal Fundus Glaucoma Challenge (REFUGE) we held previously, we set up the Glaucoma grAding from Multi-Modality imAges (GAMMA) Challenge to encourage the development of fundus & OCT-based glaucoma grading. The primary task of the challenge is to grade glaucoma from both the 2D fundus images and 3D OCT scanning volumes. As part of GAMMA, we have publicly released a glaucoma annotated dataset with both 2D fundus color photography and 3D OCT volumes, which is the first multi-modality dataset for glaucoma grading. In addition, an evaluation framework is also established to evaluate the performance of the submitted methods. During the challenge, 1272 results were submitted, and finally, top-10 teams were selected to the final stage. We analysis their results and summarize their methods in the paper. Since all these teams submitted their source code in the challenge, a detailed ablation study is also conducted to verify the effectiveness of the particular modules proposed. We find many of the proposed techniques are practical for the clinical diagnosis of glaucoma. As the first in-depth study of fundus & OCT multi-modality glaucoma grading, we believe the GAMMA Challenge will be an essential starting point for future research.

READ FULL TEXT VIEW PDF

page 2

page 4

page 8

page 18

01/11/2022

COROLLA: An Efficient Multi-Modality Fusion Framework with Supervised Contrastive Learning for Glaucoma Grading

Glaucoma is one of the ophthalmic diseases that may cause blindness, for...
02/18/2022

REFUGE2 Challenge: Treasure for Multi-Domain Learning in Glaucoma Assessment

Glaucoma is the second leading cause of blindness and is the leading cau...
09/23/2022

Learning to screen Glaucoma like the ophthalmologists

GAMMA Challenge is organized to encourage the AI models to screen the gl...
02/16/2022

ADAM Challenge: Detecting Age-related Macular Degeneration from Fundus Images

Age-related macular degeneration (AMD) is the leading cause of visual im...
05/05/2020

AGE Challenge: Angle Closure Glaucoma Evaluation in Anterior Segment Optical Coherence Tomography

Angle closure glaucoma (ACG) is a more aggressive disease than open-angl...
02/24/2022

Computer Aided Diagnosis and Out-of-Distribution Detection in Glaucoma Screening Using Color Fundus Photography

Artificial Intelligence for RObust Glaucoma Screening (AIROGS) Challenge...
07/16/2019

Fused Detection of Retinal Biomarkers in OCT Volumes

Optical Coherence Tomography (OCT) is the primary imaging modality for d...

1 Introduction

Figure 1: An illustration of the GAMMA Challenge. The goal of the challenge is to predict the samples as normal, early-glaucoma or progressive-glaucoma from fundus-OCT pairs.

Glaucoma is one of the major leading causes of blindness among eye diseases. Clinically, function-based visual field testing is the gold standard for glaucoma screening, but visual field testing requires specialized perimetric equipment not ordinarily present in primary healthcare clinics. Optic nerve head (ONH) assessment instead is a convenient way to detect early glaucoma and is currently performed widely for glaucoma screening (jonas1999ophthalmoscopic; morgan2005digital; fu2017segmentation). As practical and noninvasive tools, Fundus photography and Optical coherence tomography (OCT) are most commonly used to evaluate the optic nerve structure in clinical practice.

The advantage of fundus photos is that they can clearly show the optic disc, optic cup, and blood vessels. Among them, the clinical parameters like the ratio of vertical cup to disc ratio (CDR), disc diameter, and the ratio of blood vessels area in inferior-superior side to area of blood vessel in the nasal-temporal side have been validated to be of great significance for the glaucoma diagnosis (jonas2000ranking; hancox1999optic; nayak2009automated). OCT measures retinal nerve fiber layer (RNFL) thickness based on its optical properties. RNFL thickness value computed from circumpapillary OCT images or OCT volumes that are acquired in cylindrical sections surrounding the optic disc, is often used to identify glaucoma suspect. OCT volumes and fundus photographs are effective tools for diagnosing early glaucoma, but both cannot be used to exclude it. Thus, it is often recommended to conduct both screenings for a more accurate and reliable diagnosis.

However, in terms of computer-aided glaucoma diagnosis, most algorithms are designed for a single input modality. Although fundus photographs and OCT volumes are two mainstream screening tools in clinical practice, few algorithms are established based on both of the two modalities. It mainly has two reasons: a) there is no publicly available dataset to train and evaluate the models. b) due to the large discrepancy between the two modalities, the task is technically challenging.

In order to overcome these issues, the Glaucoma grAding from Multi-Modality imAges (GAMMA) Challenge is held to encourage the development of fundus & OCT-based multi-modality glaucoma grading algorithms. Given a pair of fundus image and 3D OCT volume, the submitted algorithms need to predict the case as normal, early-glaucoma, or progressive-glaucoma (intermediate and advanced glaucoma). An illustration is shown in Figure 1. We also propose an evaluation framework to rank the participated teams. Top-10 teams are invited to share their technical reports and source code. In brief, the primary contribution of the GAMMA Challenge is two-fold:

a) The first publicly available multi-modality glaucoma grading dataset is released, sourcing from fundus photography and OCT volume pairs

b) State-of-the-art (SOTA) machine learning methods are evaluated to encourage the development of novel technologies for fundus & OCT-based glaucoma grading.


Due to the success of the GAMMA Challenge, it is expected to be one of the main benchmarks for this task in the future.

Besides glaucoma grading labels, the optic disc & cup mask labels and fovea location labels are also provided in the GAMMA dataset. Participants can also submit algorithms for the optic disc & cup segmentation task and fovea localization task on GAMMA dataset. The performance will also be evaluated as the scores of the auxiliary tasks. In the GAMMA Challenge, the participants are encouraged to utilize the auxiliary tasks to improve the performance of glaucoma grading. The auxiliary tasks are proposed to investigate the role of optic disc & cup mask and fovea location in glaucoma grading.

The inception of the GAMMA challenge encourages many participants to contribute SOTA machine learning techniques on this task. This manuscript summarizes the GAMMA Challenge, analyzes their results, and investigates their particular approaches. Since all top-10 teams submitted the source code, a detailed ablation study is also conducted to verify the techniques effective for this task. We believe that the investigation of SOTA machine learning methods will greatly benefit the future algorithm design on this task.

2 The GAMMA challenge

The GAMMA challenge focuses on the field of glaucoma grading based on multi-modality images. We released a dataset of 300 pairs of fundus image and 3D OCT volume. Moreover, we also provided a uniform evaluation framework for a fair comparison of different models. The challenge consisted of a preliminary stage and a final stage. During the preliminary stage, we released a training set for the participating teams to train the models. Teams are able to submit their methods and see their performance on the preliminary set. The preliminary stage lasted 30 days. Each team is allowed to submit 5 times a day at most. A total of 70 teams submitted 1272 valid results to the platform during the preliminary, and 10 teams, which with top ranking and willing to participate in the OMIA8 workshop, were selected to the final stage. 10 final teams are then ranked based on their performance on the final test set. During the final stage, teams are not allowed to modify their models anymore. Besides glaucoma grading, there are also two auxiliary tasks: optic disc/cup segmentation and fovea localization on the fundus images. An illustration of the auxiliary tasks is shown in Figure 2. Participants are encouraged to leverage the OD/OC and fovea information to improve the glaucoma grading performance.

2.1 GAMMA Database

The dataset released by GAMMA was provided by Sun Yat-sen Ophthalmic Center, Sun Yat-sen University, Guangzhou, China. The dataset contained 300 samples with two clinical modality images, including 2D fundus color images and 3D Optical Coherence Tomography (OCT) volume. These two modalities are commonly used in the clinical fundus examination. The examinations were performed in a standardized darkroom, and the patients were requested to sit upright. The OCT volumes were all acquired with a Topcon DRI OCT Triton. The volume was centered on the macula, and with 3 x 3mm scan size, and each volume contained 256 two-dimensional cross-sectional OCT images with a size of . The fundus images were acquired using a KOWA camera with a resolution of pixels and a Topcon TRC-NW400 camera with a resolution of pixels. The fundus color images in our dataset were centered on the macula or on the midpoint between optic disc and macula, with both optic disc and macula visible. The image quality was checked manually. The samples in the GAMMA dataset correspond to Chinese patients (42% female), which ranged in age from 19-77 and averaged at 40.64±14.53 years old. Among the samples, glaucoma accounted for 50% of the samples, including 52% in the early stage, 28.67% in the intermediate stage, and 19.33% in the advanced stage. The average ages of early, intermediate, and advanced glaucoma patients were 43.47±15.49, 47.98±17.38, and 46.24±14.47 years old. The intermediate and advanced glaucoma are combined as a progressive-glaucoma class in the machine learning task. We divided the collected samples into three sets equally, that is, we respectively prepared 100 data pairs for the training process, the preliminary competition process and the final process.

Figure 2: An illustration of the GAMMA auxiliary tasks: optic disc/cup (OD/OC) segmentation and fovea localization on fundus images.

The data in the GAMMA dataset included the respective glaucoma grades, the fovea coordinates, and the mask of the cup and optic disc. The following sections describe the implementation of the various annotations according to three sub-tasks.

2.1.1 Glaucoma Grading

The ground truth of glaucoma grading task for each sample was determined based on mean deviation (MD) values from visual field reports following the criteria below: early-stage with ME value higher than -6 dB, intermediate stage with ME value between -6 and -12 dB, advanced stage with MD value worse than -12 dB. These visual field reports were generated on the same day as the OCT examination and were reliable with fixation losses of under 2/13 and false-positive rate under 15% and false-negative rate under 25% (li2020development; xiong2021multimodal).

2.1.2 Fovea Localization

The initial fovea coordinate annotation of each fundus image was performed manually by four clinical ophthalmologists from Sun Yat-sen Ophthalmic Center, Sun Yat-sen University, China, who had an average of 8 years of experience in the field (range 5-10 years). All ophthalmologists independently located the fovea in the image using a cross marker without having access to any patient information or knowledge of disease prevalence in the data. The results from the four ophthalmologists are then fused by a more advanced ophthalmologist (who has more than ten years of experience in glaucoma), who checks the four markers and decides which of these markers should be retained to be averaged for the final ground truth.

2.1.3 Optic Disc & Cup Segmentation

Similar to task 2, the four clinical ophthalmologists manually annotated the initial segmentation region of the optic cup and disc for each fundus image. The senior ophthalmologist then fused the results of the four initial segmentation results. The fusion ophthalmologist examined the initial segmentation marks and selected the intersection of the annotated results of several ophthalmologists as the final ground truth.

2.2 Challenge Evaluation

2.2.1 Glaucoma Grading

For each instance, the participants will predict normal, early-glaucoma or progressive-glaucoma. We use weighted Cohen’s kappa as an evaluation metric for this ordinal ternary classification problem. Cohen’s Kappa coefficient is calculated based on the confusion matrix, with the value between -1 and 1.

denotes random guess. Higher Kappa denotes higher prediction performance. Since our categories are ordered, kappa is quadratically weighted to sense the extent of the error. The final score of glaucoma grading is represented as:

(1)

where is the relative correctness, and

is the hypothetical probability of chance correctness, using to calculate the probabilities of randomly predicting the right categories.

For categories, let denotes the number of subjects for which the prediction is and the label is . Let the total number of subjects is . There has:

(2)
(3)

is the weight assigned as for the correct prediction, for the prediction with one-grade deviation, e.g., predict the early-glaucoma as normal, and for the prediction with two-grade deviation, e.g., predict the progressive-glaucoma as normal.

2.2.2 Fovea Localization

Fovea location is given by its X and Y coordinates. If the image does not contain a fovea, the estimated coordinate is supposed to be

. We use the average Euclidean distance between the estimated coordinates and the real coordinates as the evaluation criterion for this task. It is worth noting that the estimated and ground-truth coordinate values are normalized according to the image size. The final score is based on the reciprocal of the average Euclidean distance (AED) value:

(4)

2.2.3 Optic Disc & Cup Segmentation

The Dice coefficient was calculated as the segmentation evaluation metric in the GAMMA challenge:

(5)

where and represent the pixel numbers of the prediction and ground truth, represents the pixel numbers of the overlap between the prediction and ground truth. In addition, we used Mean Absolute Error (MAE) to measure the differences of the vertical cup-to-disc ratio (vCDR) between the predicted results and the ground truth. vCDR has a direct clinical relevance as it is a measure used in ophthalmology and optometry to assess glaucoma progression. The solution of vCDR is to calculate the ratio of the maximum diameter of the optic cup region to the optic disc region in the vertical direction. Each team was ranked based on the three metrics of optic cup dice coefficient, optic disc dice coefficient, and MAE. The score for the optic disc & cup segmentation task is as follow:

(6)
Color
fundus
photography
3D
OCT
Disc
region
Ordinal
regression
Kappa
Single-modality 0.673
0.575
0.677
0.732
Multi-modality 0.702
0.770
0.752
0.812
Table 1: Performance of the baselines for glaucoma grading.
Figure 3: Dual-branch network architecture for glaucoma grading. Blue blocks denote the OCT network branch. Red blocks denote the fundus network branch. The features of two branches are concatenated for the final classification.

2.3 Baseline

Before the challenge, we implement a baseline for the challenge. A simple dual-branch network is implemented as the baseline to learn glaucoma grading from fundus images and 3D OCT volumes in an end-to-end manner. An illustration of the architecture is shown in Figure 3

. Specifically, two CNN-based encoders are used to extract the features from fundus images and OCT volume, respectively. Two networks share the same architecture except for the first convolutional layer. In the fundus branch, the input channel of the first convolutional layer is set as 3, and in the OCT branch, it is set as 256. The encoded features of the fundus branch and OCT branch are concatenated and classified by a fully connected layer. The model is supervised by cross-entropy loss function in the training stage. We used 100 pairs of images released in GAMMA Challenge for training and reported its performance on the test dataset.

Clinically, physicians use a combination of fundus photographs and OCT volumes for a more accurate and reliable diagnosis. We find this logic is still valid in deep learning-based computer-aided glaucoma diagnosis. We compare the performance of the single fundus branch, single OCT branch, and dual-branch baseline in Table

1. It is obvious that the dual-branch model outperforms the single branch one by a large margin. This indicates that despite the simple multimodal fusion strategy we adopt, multi-modality images can improve the glaucoma grading performance than either of the single modality. It motivated us to hold the GAMMA Challenge to encourage the further exploration of SOTA machine learning methods on this multimodal fusion task.

In the implementation of the baseline, we also find some techniques that are useful to improve the performance of the task (fang2021multi). The first is to utilize the local information of optic disc. Clinically, glaucoma leads to lesions in the optic disc region, such as cup-disc ratio enlargement and optic disc hemorrhage (orlando2020refuge). Thus, we cropped the optic disc region of fundus images as the network’s input to make the network focus on the optic disc and cup. The optic disc region is obtained through pre-trained optic disc segmentation network. According to the results in Table 1, the local information extraction gains about 10% improvement on kappa compared with the baseline.

We also note that glaucoma grading is actually an ordinal classification task. The three classifications: normal, early-glaucoma, and progressive-glaucoma, are the deterioration of glaucoma. Thus, in the training process, the loss should be smaller if the prediction is closer to the ground-truth. For example, predicting the early-glaucoma as normal is supposed to be punished less than predicting the progressive-glaucoma as normal. Therefore, we adopt ordinal regression strategy (niu2016ordinal) to perform two binary classifications, respectively. In this case, a severe error will be double-penalized by both of the classifiers. Specifically, the first classier divides the sample into 0 and 1, that is, to classify whether the input image is a glaucoma sample. The second classier divides the sample into 0 and 1 to identify the input image as progressive-glaucoma or early-glaucoma. The labels of the original triple classification task were converted according to the two binary classification tasks, that is, the labels of the normal samples were changed to (0,0), the labels of the early-glaucoma samples were changed to (1, 0), and those of the progressive-glaucoma samples were changed to (1,1). The loss function used in the training processes is the sum of the two binary cross-entropy losses. According to the results in Table 1, ordinal regression independently gains an average 4.5% improvement on the models.

Note that we only released the basic dual-branch model as the baseline to the participants, which is row 5 in Table 1.

3 Methods

The methods of top-10 teams in the GAMMA Challenge are summarized in Table 2. In this section, we introduce their methods in the aspects of data preprocessing, architecture and ensemble strategy. We also do the ablation study on some unique techniques proposed in the challenge. The comparison and analysis will help verify the effective techniques for the task and contribute to the future development of novel methods.

Team Architecture Preprocessing Ensemble Method
SmartDSP Dual-branch ResNet
Fundus: Add Gaussia noise
Resize to 512×512
OCT: Crop height to 150-662
Resize to 512×512
Default Data Augmentation
Pick 3 models with
best accuracy on normal,
early and progressive cases,
respectively. Predict the
results by different thresholds.
Ensemble the results by
the priorities of early,
progressive and normal.
Extract the features of
fundus images and OCT
volumes by two encoders.
Concatenate the encoded
features for the classification.
VoxelCloud
Dual-branch Network
implemented by
3D EfficientNet and EfficientNet
Fundus: Crop Black Margin
Resize to 512×512
OCT: Resize to 256×256
Downsample channels to 128
Default Data Augmentation
Pick 5 best models on 5
different validation fold.
Ensemble the results
by taking the average.
Extract the features of
fundus images by EfficientNet.
Extract the features of
OCT volumes by 3D-EfficientNet.
Concatenate the encoded
features for the classification.
EyeStar
Dual-branch Network
implemented by
Swin Transformer (liu2021swin)
and DENet
Fundus: Crop to optic disc region
by pretrained segmentation network
OCT: Randomly pick 10 successive
slices betwern 113-153 channels
Default Data Augmentation
During the testing process,
successively feed 30
groups of 10
successive OCT slices
into the network.
Taking the average of the 30
predictions as the final predictions
Extract the features of
fundus images by fundus disc-aware
ensemble network.
Extract the features of
OCT volumes by ResNet.
Concatenate the encoded
features for the classification.
HZL
UNet with
EfficientNet Backbone
Fundus: Resize to 1024×1024
OCT: Resize to 1024×1024
Default Data Augmentation
Pick 5 best models on 5
different validation folds.
Ensemble the results
by taking the average.
Design a multi-task UNet
to jointly learn glaucoma grading,
optic disc & cup segmentation
and fovea localization.
The embedding of the UNet encoder
will connect a full connected
layer for glaucoma grading.
MedIPBIT Dual-branch EfficientNet
Fundus: Crop to optic disc region
by pretrained segmentation network.
Resize to 128×128
OCT: Crop the Black Background
by gradient detector
Resize to 128×128
Default Data Augmentation
Split the dataset for
training and validation
by three different strategies.
Pick 2 best models
in each split,
to get a total of 6 models.
Ensemble 6 models’ results
by taking the average.
Extract the features of
fundus images and OCT
volumes by two encoders.
Concatenate the encoded
features for the classification.
IBME Dual-branch ResNet
Fundus: Resize to 256×256
OCT: Resize to 512×512
Default Data Augmentation
Extract the features of
fundus images and OCT
volumes by two encoders.
Concatenate the encoded
features for the classification.
WZMedTech Dual-branch ResNet
Fundus: Resize to 512×512
Default Data Augmentation + Image Jitter
OCT:Resize to 256×256
Pick the first and the
second best models.
Predict as normal when
both models predicted
the case as normal.
Predict by the second
model’s OCT branch when
either of the two
models predicts glaucoma.
Predict glaucoma grading
based on fundus images
and OCT volume by two networks.
Take the average of
the two networks’ results.
DIAGNOS-ETS
Dual-branch Network
implemented by
3D ResNet and ResNet
Fundus: Resize with the shorter spatial
side randomly sampled in 224 to 480
and randomly crop to 224×224
OCT: Downsample channels to 16
Randomly pick one slice in training
Pick specific slices in the inference
Crop width to 224-480
Resize the original images with
shorter spatial side randomly
sampled in 256 to 480
Default Data Augmentation
Ensemble multi-scale
prediction by averaging them
with temperature scaling
Extract the features of
fundus images and OCT volume
by ResNet and 3D ResNet,
respectively.
Concatenate the encoded
features for the classification.
The encoded features of
two networks are aligned by
minimizing the KL divergence
MedICAL Dual-branch EfficientNet
Fundus: Resize 1024×1024
Enhanced by optic disc and cup mask
OCT: Transfer to Retina Tickness Heatmap
Resize 400×400
Default Data Augmentation
Take the average
of multiple models
Extract the features of
fundus images and OCT
volumes by two encoders.
Concatenate the encoded
features for the classification.
FATRI-AI EfficientNet
Fundus: Crop Black Margin
Resize 224×224
OCT: Random pick 3 slices
Resize 224×224
Default Data Augmentation
Stack two models,
data with confidence
over than 0.7 in the first model
are used to train the second
model as pseudo labels.
Extract the features of
fundus images and OCT volume
by two encoders.
Concatenate the encoded
features for the classification.
Table 2: Summary of the glaucoma grading methods evaluated in the GAMMA Challenge.

3.1 Data Preprocessing

In the baseline, we provide a default data augmentation implemented by some commonly used data augmentation techniques, including random crop, random flip and random rotation. Most of the teams used the default augmentation for data preprocessing.

Besides the standard data augmentation, MedIPBIT cropped the fundus images to the optic disc region. In the training stage, they used the optic disc mask provided in GAMMA dataset for this cropping. In the inference stage, they used the masks estimated by the pre-trained segmentation network instead. The segmentation networks are which they trained on GAMMA dataset for the auxiliary task. Besides MedIPBIT, MedICAL also utilized optic cup & disc mask for data preprocessing. They enhanced their fundus image by optic cup & disc mask. Specifically, optic cup & disc region of the original image will be multiplied by a factor of 0.05 and added to the original image. MedICAL also transferred the 3D OCT volume to 2D retinal thickness heatmap by Iowa Reference Algorithm (lawonn_ophthalvis_2016). An illustration of their process is shown in Figure 4.

Figure 4: Data preprocessing of MedICAL. They enhanced fundus images by optic disc&cup mask and transferred 3D OCT volume to 2D retinal thickness heatmap.
Figure 5: Illustration of OCT 3D Network branch and fundus DENet branch. 3D Network is adopted by team VoxelCloud and team DIAGNOS-ETS. DENet is adopted by team EyeStar. For the OCT 3D Network branch, the encoded feature is flattened to concatenated with that of the fundus branch. For the fundus Disc-aware ensemble branch, the features of three subbranches are concatenated with OCT features for the classification, respectively. The final prediction is the average of the three subbranches.

3.2 Architecture

Toward fundus & OCT-based glaucoma grading, almost all the teams adopted dual-branch network structure. As the baseline, two branches respectively extract the features of fundus images and OCT volumes. The encoded features are then concatenated for the classification. Unlike this strategy, FATRI-AI used a single network inputted by concatenated fundus image and OCT volume. Besides FATRI-AI, HZL also used a single branch of network. They proposed a multi-task UNet network to jointly learn the glaucoma grading, optic disc & cup segmentation and fovea localization. The glaucoma grading head is connected after the UNet encoder, the segmentation head and localization head are connected after the UNet decoder. Through the multi-task learning strategy, the correlated features of different tasks will be enhanced and thus improve the performance of all the tasks.

Although most of the teams adopted a dual-branch network architecture, their implementation has great differences. VoxelCloud and DIAGNOS-ETS adopted 3D-Network (tran2015learning) in OCT branch to extract the features from 3D OCT volume. EyeStar adopted fundus Disc-aware Ensemble Network (DENet) (Fu2018DiscAware) in fundus branch. Fundus disc-aware ensemble network uses three networks to respectively process the raw fundus image, optic disc region of the fundus image, and polar transformed optic disc region. The predictions of three networks are combined to obtain the final prediction. An illustration of 3D network and DENet is shown in Figure 5. WZMedTech used two independent networks to predict glaucoma grades based on fundus image and OCT volume, respectively. The final result is the average of the two predictions.

In order to fairly verify the effectiveness of proposed architectures, we do an ablation study over the vanilla baseline. We keep everything the same as the baseline and only change the architectures. The quantitative results are shown in Table 3. We measure the results by the overall kappa and also the accuracy value of each class. N-Acc, E-Acc and P-Acc denotes the accuracy values of normal, early-glaucoma and progressive-glaucoma, respectively. G-Acc denotes the glaucoma accuracy value of both early-glaucoma and progressive-glaucoma classes. DualRes is the dual-branch ResNet architecture adopted by SmartDSP, MedIPBIT, IBME, WZMedTech and MedICAL. Res-3D denotes a dual-branch ResNet architecture with 3D-ResNet OCT branch and standard fundus branch, which VoxelCloud and DIAGNOS-ETS adopt. Res-DEN denotes a dual-branch ResNet architecture with DENet fundus branch and standard OCT branch, which EyeStar adopts. SinCat and SinMulti denote the single network with concatenation input and multi-task learning strategy adopted by FATRI-AI and HZL, respectively.

N-Acc E-Acc P-Acc G-Acc Kappa
DualRes 88.24 12.00 79.17 44.90 70.61
Res-3D 84.31 32.00 70.83 51.02 73.56
Res-DEN 94.12 16.00 83.33 48.98 77.26
SinCat 82.35 00.00 91.67 44.90 62.88
SinMulti 90.20 28.00 70.83 48.98 75.27
3D-DEN 98.04 28.00 58.33 42.86 79.24
Table 3: Comparison of the network architectures in the GAMMA Challenge

From Table 3

, we can see, first, the awareness of optic disc region will be helpful in the glaucoma grading. Res-DEN and SinMulti utilized the optic disc & cup segmentation mask, and their performance has been significantly improved. In addition, Res-3D outperforms DualRes, indicating 3D neural network works better than the standard network in OCT branch. We also tried to combine the two advantages, by adopting 3D network on OCT branch and adopting DENet on fundus branch. The results are denoted as 3D-DEN in Table

3. We can see that the combined architecture outperforms either of the two approaches. In conclusion, in terms of the architecture, a 3D neural network OCT branch with a DENet fundus branch is suggested in clinical application.

As for the model supervision, most teams applied cross-entropy loss on the final prediction. DIAGNOS-ETS has an extra loss to align the fundus feature and OCT feature. Toward that end, they minimize the Kullback–Leibler (KL) divergence between these two encoded features. Instead of supervising the integrated features of two modalities, EyeStar and WZMedTech supervised the two branches independently. They took the average of the independent predictions as the final result. In the experiment, we did not see apparent differences in these supervision strategies.

3.3 Ensemble strategy

Ensemble strategy can significantly improve the quantitative result of glaucoma grading. A basic idea is to pick the best models on different validation folds and take the average of the results. Teams VoxelCloud, HZL, MedIPBIT, MedICAL adopted this strategy.

Another strategy multiply adopted in GAMMA Challenge is ordinal ensemble strategy. Before the challenge, we have found this ordered nature, i.e., identifying progressive-glaucoma as normal is a more serious mistake than identifying early glaucoma as normal (fang2021multi). Separating the triple-classification problem to two binary-classification ones can help to improve. Both SmartDSP and WZMedTech adopted similar ideas to the ensemble strategy. WZMedTech discriminated early/progressive cases based on the dual-model agreed glaucoma cases. They double-checked the normal cases by two different models, i.e., first discriminated the normal/glaucoma cases, then classified progressive/early by the second model on predicted glaucoma cases. SmartDSP followed the same high-level idea but adopted a more sophisticated strategy. They first picked three models with the best accuracy on normal, early, and progressive cases, respectively. Then they discriminated the cases as glaucoma by thresholding the normal model with 0.5, discriminated the progressive cases by thresholding the progressive model with 0.6, discriminated early-glaucoma by thresholding the early model with 0.9. The samples rejected by all three models will be classified as early-glaucoma by default.

Besides these ensemble strategies, DIAGNOS-ETS rescaled the input images to different sizes in the inference stage, and combined the multi-scale predictions by averaging them with temperature scaling. FATRI-AI stacked two models. The instances gaining high confidence in the first model (over 0.7) will be used as pseudo labels to train the next model.

Ensemble N-Acc E-Acc P-Acc G-Acc Kappa
Stacked 90.20 12.00 83.33 46.94 71.23
Rescale 92.16 24.00 54.17 38.78 72.92
3-fold-ave 92.16 36.00 58.33 46.94 74.18
5-fold-ave 90.20 28.00 70.83 48.98 75.41
2-ordinal 96.08 44.00 54.17 48.98 79.81
3-ordinal 98.04 52.00 66.67 59.18 82.79
Table 4: Performance of proposed ensemble strategies in the GAMMA Challenge

We do the ablation study over vanilla baseline to verify the effectiveness of these ensemble methods. The quantitative results are shown in Table 4. ’Stacked’ denotes pseudo-label retraining strategy adopted by FATRI-AI, ’Rescale ’denotes multi-scale models ensemble strategy adopted by DIAGNOS-ETS, ’3-fold’, ’5-fold’ denote averaging model predictions on 3-fold, 5-fold validation set respectively, ’2-ordinal’ denotes dual-model ordinal ensemble strategy adopted by WZMedTech, ’3-ordinal’ denotes triple-model ordinal ensemble strategy adopted by SmartDSP. A valuable conclusion that can be drawn from the results is that multi-models ordinal ensemble method, which WZMedTech and SmartDSP adopt, can gain noticeable improvement on glaucoma grading task. This improvement comes from their divide and conquer strategy, i.e., separating this triple classification task to multiple binary classification tasks. Models that perform the best on each sub-tasks will be picked for the final ensemble. SmartDSP also classified the sample as early-glaucoma by default when all the models do not have high confidence in their prediction. Such an ensemble strategy is actually very similar to that of clinical practice. Clinically, when multiple experts give diverse opinions but are not confident, this case will be considered suspected early glaucoma for further screening. Due to the similarity of strategies, the ordinal ensemble strategy may also apply to the real clinical scenario.

4 Results

The top-10 teams ranked by glaucoma grading score are SmartDSP, Voxelcloud, EyeStar, HZL, MedIPBIT, IBME, WZMedTech, DIAGNOS-ETS, MedICAL, and FATRI_AI. The quantitative scores of the glaucoma grading task measured by kappa are shown in Table 5. We reported their performances in the preliminary stage (evaluate on validation set) and the final stage (evaluate on test set). Comparing the ranking in the preliminary stage with that of the final stage, Gray denotes no change in the ranking, Red and blue denote the rise and fall of the rankings, respectively. We can see SmartDSP, Voxelcloud, EyeStar, HZL, IBME all keep or raise the rankings on the test dataset, indicating they are more robust than the other methods. There is also no significant gap in their scores on the test set. The teams ranked lower are generally caused by the worse generalization capability. In particular, for MedIPBIT, IBME, WZMedTech, DIAGNOS-ETS, and MedICAL, we can see a dramatic decrease of the performance on the final test set.

The confusion matrices calculated on the test set are also shown in Figure 6. We note that there is no significant performance gap in the prediction of normal/glaucoma. The error of predicting glaucoma as normal is generally in 4% to 8%. This rate is significantly lower than the reported misdiagnosed rate of junior ophthalmologists (trobe1980optic), indicating the high clinical application potential of the models.

Different approaches widened the gap in the performance of early/progressive-glaucoma classification. Teams ranked higher generally gain better performance on both the early-glaucoma accuracy and progressive-glaucoma accuracy. It is also worth noting that the accuracy of early glaucoma and progressive glaucoma has different significance in clinical scenarios. In clinical scenarios, predicting progressive-glaucoma as early-glaucoma is generally more intolerable than the counterpart. Thus, for models with similar overall performance, which with higher progressive-glaucoma accuracy will be a better choice in clinical practice.

Figure 6: Glaucoma grading confusion matrix of each team. N, E, P denote normal, early-glaucoma and progressive-glaucoma respectively
Rank Team Preliminary Final
1 SmartDSP 93.38 (1) 85.49
2 VoxelCloud 90.71 (6) 85.00
3 EyeStar 88.28 (7) 84.77
4 HZL 89.89 (8) 84.01
5 IBME 87.60 (9) 82.56
6 MedIPBIT 93.43 (2) 80.48
7 WZMedTech 90.44 (5) 79.46
8 DIAGNOS-ETS 91.70 (3) 75.36
9 MedICAL 90.65 (4) 72.90
10 FATRI-AI 87.34 (10) 69.62
Table 5: Glaucoma grading results in the GAMMA Challenge. Kappa(%) is calculated to measure the performances. Teams are ranked by the overall score.

5 Discussion

In the GAMMA Challenge, teams are ranked by the enclosed test set to ensure the generalization ability of the models. However, this final ranking is also affected by the pick of the models. We observed that the models that perform the same on the validation set might have a gap of 23 % on the test set. This disturbance will be mitigated to an end of 0.31 % depending on different ensemble strategies. It indicates that teams with a gap of less than 1% can be regarded as equivalent.

We note that most multimodal fusion methods that gain high performance in GAMMA are very straightforward. Many advanced multimodal fusion techniques proposed recently are not adopted to this task. A plausible explanation may be that the participants are prone to implement simple adaptation based on the provided baseline. However, some evidence suggests that this may not be the reason. First, on the auxiliary tasks, most teams adopted very different models from the baseline (see supplementary material for the details). In addition, some teams also reported several failed attempts in their technical reports. Rule out this reason, we think it may be because this task is very different from the other multimodal fusion tasks. Actually, advanced multimodal fusion algorithms are generally proposed for image-language pairs, e.g. image caption (xu2015show; chen2015mind; ding2019image) and visual question answering (antol2015vqa; shih2016look; lu2016hierarchical). However, unlike the image-language fusion, the regional relation information of fundus & OCT cannot indicate the final prediction. A large body of attention based method (vqa1; vqa2; vqa3; vqa4; vqa5) will be invalid in this task. In medical image processing, multimodal fusion techniques are often proposed for the Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) pairs. However, this task is still different from fundus & OCT fusion. CT and MRI have spatial correspondence, while OCT is actually a kind of supplementary information on another dimension for the fundus images. It makes the domain adaptation based multimodal fusion models (9514499; 9627926; 10.5555/3304415.3304514) also to be invalid on this task. To our knowledge, few multimodal fusion techniques can be directly adopted on fundus & OCT fusion task. This also explains why the straightforward dual-branch concatenation model is the first choice in the GAMMA Challenge. As we can see, more specific multimodal fusion algorithms are required in this field.

Participants are also encouraged to utilize the optic disc & cup mask and fovea location information to improve glaucoma grading. In the GAMMA Challenge, we see the prior knowledge of optic disc & cup mask can help to improve the glaucoma grading performance. EyeStar and MedIPBIT both cropped optic disc regions from the fundus images in data preprocessing. EyeStar also adopted DENet to individually process the optic disc region and polar transformed optic disc region. MedICAL utilized the optic disc & cup mask to enhance the fundus inputs. This is also in line with the previous studies (wu2020leveraging; zhao2019weakly) and what we found before the challenge. In Table 1, we can also see optic disc region improves the dual-branch model more than the single fundus-branch. This is because OCT volume corresponds to the optic disc region of the fundus image. Cropping the optic disc region helps to align the features of the two branches. However, we also note that this improvement decreases on the high-performance models. This may be because the stronger models can extract the optic disc region by themselves and do not need this prior knowledge anymore.

6 Conclusion

Following the clinical glaucoma screening standard, we hold a challenge toward automated glaucoma grading from both fundus images and OCT volumes, called Glaucoma grAding from Multi-Modality imAges (GAMMA) Challenge. In this paper, we introduced the released GAMMA dataset, the process of the challenge, the evaluation framework and the top-ranked algorithms in the challenge. Detailed comparisons and analyses are also conducted on the proposed technologies. As the first in-depth study of fundus & OCT multi-modality glaucoma grading task, we believe GAMMA will be an essential starting point for further research on this task.

The data and evaluation framework are publicly accessible through the AI Studio platform at https://aistudio.baidu.com/aistudio/competition/detail/119/;https://aistudio.baidu.com/aistudio/competition/detail/120/;https://aistudio.baidu.com/aistudio/competition/detail/121/. Future participants are welcome to use our dataset and submit their results on the website and use it for benchmarking their methods.

Acknowledgments

This research was supported by the High-level Hospital Construction Project, Zhongshan Ophthalmic Center, Sun Yat-sen University (303020104).

References

Supplementary Material

In the following sections, we briefly introduce the methods proposed for the auxiliary tasks. The final rank of the teams considering all three tasks is also reported.

Fovea Localization

The ranking of fovea localization task is shown in Table 6. The results are evaluated by the fovea localization score (see Section 2.2) and Euclidean distance (ED). Teams are ranked by the fovea localization score. The methods of the teams are summarized in Table 9. Like the glaucoma grading task, we also implemented a baseline for fovea localization task, which is shown in Figure 7

. The input of the network is the whole fundus image, and the output of it is a 2D vector indicating the coordinate of the fovea center. The backbone of the network is ResNet50 and is supervised by the combination of Euclidean distance and MSE loss.

Rank Team Score ED
1 DIAGNOS-ETS 9.60294 0.00413
2 IBME 9.58847 0.00429
3 SmartDSP 9.57458 0.00444
4 MedIPBIT 9.53757 0.00485
5 Voxelcloud 9.53443 0.00488
6 EyeStar 9.51465 0.0051
7 WZMedTech 9.45846 0.00573
8 MedICAL 9.34639 0.00699
9 FATRI_AI 9.33749 0.0071
10 HZL 9.22303 0.00842
Table 6: Fovea localization ranking in the GAMMA Challenge.

On fovea localization task, the methods of the teams varies a lot. In the top-10 teams, SmartDSP, MedIPBIT, WZMedTech processed the task as a coordinate regression task, just like what we did in the baseline. VoxelCloud and DIAGNOS-ETS processed the task as a binary segmentation task. They generated a circle centered on the fovea location. The circle is then taken as the segmentation target for the binary segmentation task. The center of the segmented result is finally taken as the fovea location. Eyestar, IBME and MedICAL processed the task as a heatmap prediction task. They generated the ground-truth heatmap by Gaussian kernel. This strategy is similar to the binary segmentation, except it is supervised by a soft target, which is a normal distribution centered on fovea location. Different from them, FATRI-AI processed the task as a detection task. They generated a 160×160 square centered on the fovea location and used a YOLO (

redmon2016you) network to detect the region.

Almost half of the ranked teams utilized a coarse-to-fine multi-stage strategy, including SmartDSP, EyeStar, MedIPBIT, WZMedTech and MedICAL. Most of them cropped Region Of Interest (ROI) based on the coarse stage predictions. The cropped region is then refined by the later stage. EyeStar proposed a more sophisticated architecture based on this strategy and named it Two-Stage Self-Adaptive localization Architecture (TSSAA). They first cropped multi-scale ROI based on the coarse predictions. Then they fused both multi-scale ROI and coarse-level features using sequential ROI Align layer, concatenation, self-attention modules (vaswani2017attention) and Fuse layer. An illustration of TSSAA is shown in Figure 7.

OD/OC Segmentation

The ranking of OD/OC segmentation task is shown in Table 7. The results are evaluated by Dice loss, vertical optic Cup-to-Disc Ratio (vCDR) and the OD/OC segmentation score (see Section 2.2) in the GAMMA Challenge. Teams are ranked by the OD/OC segmentation score. The methods of the teams are summarized in Table 7. A standard UNet is also adopted as the baseline of the task.

Rank Team Score Dice-disc(%) Dice-cup(%) vCDR
1 Voxelcloud 8.36384 96.25 87.84 0.04292
2 DIAGNOS-ETS 8.3275 95.96 87.74 0.04411
3 WZMedTech 8.31621 96.11 88.04 0.04538
4 HZL 8.30093 95.83 88.00 0.04562
5 SmartDSP 8.28488 95.79 88.01 0.04642
6 MedICAL 8.27264 95.75 87.57 0.0464
7 IBME 8.2309 95.79 87.66 0.04887
8 FATRI_AI 8.18773 95.40 86.69 0.04917
9 MedIPBIT 8.15502 95.49 87.67 0.05258
10 EyeStar 8.07253 94.77 85.83 0.05326
Table 7: OD/OC segmentation ranking in the GAMMA Challenge.
Rank Team Member Institute Score
1 SmartDSP
Jiongcheng Li, Lexing Huang, Senlin Cai,
Yue Huang, Xinghao Ding
Xiamen University 8.88892
2 Voxelcloud
Qinji Yu, Sifan Song, Kang Dang, Wenxiu
Shi, Jingqi Niu
Shanghai Jiao Tong University; Xi’an
Jiaotong-Liverpool University; VoxelCloud
Inc.
8.83127
3 EyeStar
Xinxing Xu, Shaohua Li, Xiaofeng Lei, Yanyu
Xu, Yong Liu
Institute of High Performance Computing, ASTAR 8.72345
4 IBME Wensai Wang, Lingxiao Wang
Chinese Academy of Medical Sciences and
Peking Union Medical College
8.70783
5 MedIPBIT
Shuai Lu, Zeheng Li,Hang Tian,Shengzhu
Yang,Jiapeng Wu
Beijing Institute of Technology 8.70561
6 HZL Shihua Huang, Zhichao Lu
Hong Kong Polytechnic University;
Southern University of Science and
Technology
8.68781
7 WZMedTech
Chubin Ou, Xifei Wei, Yong Peng,
Zhongrong Ye
Southern Medical University; Tianjin
Medical University; Xinjiang University
8.65384
8 DIAGNOS-ETS
Adrian Galdran,Bingyuan Liu,José
Dolz,Waziha Kabir,Riadh Kobbi,Ismail Ben
Ayed
ETS Montreal; DIAGNOS Inc. 8.59884
9 MedICAL Li Lin, Huaqing He, Zhiyuan Cai
Southern University of Science and
Technology
8.43841
10 FATRI_AI
Qiang Zhou, Hu Qiang, Cheng Zheng,
Tieshan Liu, Dongsheng Lu, Xinting Xiao
Suixin (Shanghai) Technology Co., LTD. 8.27601
Table 8: Final rank of the GAMMA challenge.

Like the fovea localization task, all the teams except HZL adopted a coarse-to-fine multi-stage strategy. Generally speaking, OD ROI will be first obtained through the coarse OD segmentation stage. The cropped OD patches will be sent to a subsequent fine-grain OD/OC segmentation network to obtain the final result. Different from the others, VoxelCloud utilized the blood vessel information to improve the OC/OD segmentation. They first used a pre-trained model to obtain the fundus images’ blood vessel segmentation masks. The vessel masks are then concatenated with fundus images as the input. An illustration of their method is shown in Figure 8.

Final Rank of the GAMMA challenge

To encourage the teams participate in all three tasks in the GAMMA challenge, the official rank released is calculated with the three competition tasks:

(7)

The released rank is shown in Table 8. The detailed leaderboards can be accessed on the GAMMA challenge website at https://aistudio.baidu.com/aistudio/competition/detail/90/0/leaderboard.

Figure 7: An illustration of TSSAA proposed by EyeStar for fovea localization. TSSAA first predicts a coarse heatmap in the coarse stage. Then multi-scale ROI is cropped from the raw image as the input of the subsequent refine stage. In the refine stage, the coarse-level features will also be aligned and fused again for the final prediction.
Figure 8: The coarse stage of OD/OC segmentation model proposed by VoxelCloud. The blood vessel segmentation results predicted from a pre-trained network will be concatenated as the input for the coarse OD segmentation
Team Architecture Preprocessing Ensemble Method
SmartDSP Efficientnet-b4
(i) Center crop to 2000×2000,
padding when height
or width less than 2000
(ii) Resize to 224×224
(iii) Default Data Augmentation
2-fold esmble by averaging
Two stages coordinate regression:
(i) Coarse localization,
crop to 512×512,
(ii) Fine-grain localization.
VoxelCloud TransUNet-like architecture
(i) Remove black background
(ii) Pad and resize to 512×512
(iii) Default Data Augmentation +
Blur + JPEG compression +
GaussNoise + Coarse Dropout
Ensemble the predictions
of 30 models train with
different hyper-parameters
Binary segmentation the
fovea centered circle.
Using the sum of
binary cross-entropy loss,
SoftDice loss, SSIM loss,
IOU loss and L1 loss
to supervise
EyeStar
Poposed Two-Stage
Self-Adaptive
localization Architecture
(TSSAA)
(i) Resize to 998×998
(ii) Crop to 896×896
(iii) Resize to 224×224
(iiii) Default data augmentation
Two stages heatmap prediction:
(i) Coarse heatmap prediction,
crop to multi-scale ROI,
(ii) Fine-grain localization fusing
multi-scale ROI and
coarse-level features
HZL
UNet with
EfficientNet Backbone
Fundus: Resize to 1024×1024
OCT: Resize to 1024×1024
Default Data Augmentation
Pick 5 best models on 5
different validation folds.
Ensemble the results
by taking the average.
A multi-task UNet
to jointly learn glaucoma grading,
OD/OC segmentation
and fovea localization.
Recurrently run the
model for coarse-to-fine
localization
MedIPBIT
ResNet50 for coarse
localization
ResNet101 for fine-grain
localization
Resize 512×512
Three stages coordinate regression:
(i) Coarse localization, crop to ROI
(ii) Sequential two-stage
fine-grain localization.
IBME
UNet with EfficientNetB5
backbone
(i) Padding to 2000×2992
(ii) Default Data Augmentation
End-to-end heatmap prediction
with maximization likelihood
for the localization
WZMedTech
HDRNet (xie2020end)
for the first
and second stage
ResNet50 for the third stage
(i) Center crop to 1920×1920
(ii) Resize to 224×224
(iii) Default Data Augmentation
Three stages coordinate regression,
predicted ROI of last
stage is cropped as
the input of the next stage
DIAGNOS-ETS Double stacked W-Net
(i) Resize to 512×512
(ii) Default data augmentation +
Color Normalization
4-fold temperature ensemble
End-to-end binary segmentation
the fovea centered circle
MedICAL
ResNet50 for coordinate
regression branch
EfficientNet-B0 for heatmap
predication branch
(i) Pick G channel of RGB image
(ii) Histogram equalization
(iii) Default data augmentation
Ensemble the results of
heatmap branch and
coordinate regression branch.
If Euclidean distance of
them larger than 30,
believe regression result.
If else, take the average of
two results
Two stages: (i) Coarse OD/Macular
segmentation, crop ROI to
128×128 and 256×256
(ii) Feed 128×128 patches and
256×256 patches
to a heatmap predication
network and coordinate regression
network respectively,
fuse the results of two branches
for the final predication
FATRI-AI Yolov5s (redmon2016you)
(i) Crop black background
(ii) Default data augmentation
+ Mosaic (chen2020dynamic)
+ Cutout
End-to-end macular
region detection,
macular region is generated by
a 160×160 square
centered on fovea location
Table 9: Summary of the fovea localization methods in the GAMMA Challenge
Team Architecture Preprocessing Ensemble Method
SmartDSP
DeepLabv3 with ResNet34
encoder for coarse
segmentation DeepLabv3 with
EfficientNet-b2 encoder for
fine-grain segmentation
(i) Crop to 512×512 centered
on the highest brightness point
(ii) Default Data Augmentation
2-fold ensemble by averaging
Two stages: (i) Coarse OD
segmentation, cropping
(ii) Fine-grain OD/OC segmentation
VoxelCloud
TransUNet-like architecture
for coarse segmentation
CENet, TransUNet and Segtran
for fine-grain segmentation
(i) Resize to 512×512
(ii) Default data augmentation
5-fold ensemble by
averaging for coarse segmentation
Ensemble the predictions of
five folds, three networks and
two kinds of input by
averaging for fine-grain
segmentation
Two stages: (i) Coarse OD
segmentation taking blood vessel
mask concatenated fundus
image as input, cropping
(ii) Fine-grain OD/OC segmentation
taking cropped patches
and polar transformed
patches as inputs.
Model supervised by
BCE loss + Dice loss
EyeStar
Segtran (li2021medical) with
EfficientNet-B4 backbone
(i) Crop to 576×576 disc
region by
MNet DeepCDR (fu2018joint)
(ii) Resize to 288×288
(iii) Default data augmentation
Two stages: (i) Coarse OD
segmentation using
CNN, cropping
(ii) Fine-grain OD/OC segmentation
using Segtran
HZL
UNet with
EfficientNet Backbone
Fundus: Resize to 1024×1024
OCT: Resize to 1024×1024
Default Data Augmentation
Pick 5 best models on 5
different validation folds.
Ensemble the results
by taking the average.
A multi-task UNet
to jointly learn glaucoma grading,
OD/OC segmentation
and fovea localization.
FAM (huang2021fapn) is
adopted for
the better segmentation
MedIPBIT
CNN-Transformer Mixed UNet
CNN backbone implemented
by ResNet34
Resize to 512×512
Two stages: (i) Coarse OD
segmentation, cropping
(ii) Fine-grain OC segmentation
IBME
UNet with
EfficientNetB3 backbone
for OC center localization
UNet with
EfficientNetB6 backbone
for fine-grain segmentation
Default data augmentation
Two stages: (i) OC center
localization, crop ROI
to 512×512
(ii) Fine-grain OD/OC segmentation
WZMedTech
DeepLabv3 for
coarse segmentation
TransUNet for
fine-grain segmentation
(i) Center crop to 1920×1920
(ii) Default data augmentation
In the fine-grain stage,
ensemble the models supervised
by cross-entropy loss +
boundary loss + dice loss
and that supervided by focal loss +
dice loss by taking the average
Two stages: (i) Coarse OD
segmentation, crop ROI
to 512×512
(ii) Fine-grain OD/OC segmentation
DIAGNOS-ETS Double stacked W-Net
(i) Resize to 512×512
(ii) Default data augmentation +
Color normalization
In coarse OD segmentation:
4-fold ensemble by taking average
In fine-grain OD/OC segmentation:
4-fold temperature ensemble
Two stages: (i) Coarse OD
segmentation, crop ROI
to 512×512
(ii) Fine-grain OD/OC segmentation
MedICAL
UNet with EfficientNet-B4
backbone
(i) Resize to 512×512
(ii) Default data augmentation
Three stages: (i) Coarse OD/Macular
segmentation, crop OD ROI
to 448×448
(ii) Fine-grain OD/OC segmentation,
crop OC ROI TO 256×256
(iii) Fine-grain OC segmentation
FATRI-AI
YOLOV5s for coarse segmentation
HRNet for fing-grain segmentation
(i) Resize to 608×608
(ii) Default data augmentation
+ Mosaic (chen2020dynamic)
+ Cutout
Two stages: (i) Coarse OD
segmentation, crop ROI
to 512×512
(ii) Fine-grain OD/OC segmentation.
Final results will be
smoothed as ellipses
Table 10: Summary of the OD/OC segmentation methods in the GAMMA Challenge