COVD- and COVD-SLL Results

In addition to the M2U-Net architecture, we also evaluated the larger DRIU network and a variation of it that contains batch normalization (DRIU BN) on COVD- and COVD-SSL. Perhaps surprisingly, for the majority of combinations, the performance of the DRIU variants are roughly equal or worse than the M2U-Net. We anticipate that one reason for this could be overparameterization of large VGG16 models that are pretrained on ImageNet.

F1 Scores

Comparison of F1-micro-scores (std) of DRIU and M2U-Net on COVD- and COVD-SSL. Standard deviation across test-images in brackets.

F1 score

DRIU/DRIUSSL

DRIUBN/DRIUBNSSL

M2UNet/M2UNetSSL

COVD-DRIVE

0.788 (0.018)

0.797 (0.019)

0.789 (0.018)

COVD-DRIVE_SSL

0.785 (0.018)

0.783 (0.019)

0.791 (0.014)

COVD-STARE

0.778 (0.117)

0.778 (0.122)

0.812 (0.046)

COVD-STARE_SSL

0.788 (0.102)

0.811 (0.074)

0.820 (0.044)

COVD-CHASEDB1

0.796 (0.027)

0.791 (0.025)

0.788 (0.024)

COVD-CHASEDB1_SSL

0.796 (0.024)

0.798 (0.025)

0.799 (0.026)

COVD-HRF

0.799 (0.044)

0.800 (0.045)

0.802 (0.045)

COVD-HRF_SSL

0.799 (0.044)

0.784 (0.048)

0.797 (0.044)

COVD-IOSTARVESSEL

0.791 (0.021)

0.777 (0.032)

0.793 (0.015)

COVD-IOSTARVESSEL_SSL

0.797 (0.017)

0.811 (0.074)

0.785 (0.018)

M2U-Net Precision vs. Recall Curves

Precision vs. recall curves for each evaluated dataset. Note that here the F1-score is calculated on a macro level (see paper for more details).

model comparisons

Fig. 1 CHASE_DB1: Precision vs Recall curve and F1 scores

model comparisons

Fig. 2 DRIVE: Precision vs Recall curve and F1 scores

model comparisons

Fig. 3 HRF: Precision vs Recall curve and F1 scores

model comparisons

Fig. 4 IOSTAR: Precision vs Recall curve and F1 scores

model comparisons

Fig. 5 STARE: Precision vs Recall curve and F1 scores