COVD- and COVD-SLL Results¶

In addition to the M2U-Net architecture, we also evaluated the larger DRIU network and a variation of it that contains batch normalization (DRIU BN) on COVD- and COVD-SSL. Perhaps surprisingly, for the majority of combinations, the performance of the DRIU variants are roughly equal or worse than the M2U-Net. We anticipate that one reason for this could be overparameterization of large VGG16 models that are pretrained on ImageNet.

F1 Scores¶

Comparison of F1-micro-scores (std) of DRIU and M2U-Net on COVD- and COVD-SSL. Standard deviation across test-images in brackets.

F1 score	DRIU/DRIUSSL	DRIUBN/DRIUBNSSL	M2UNet/M2UNetSSL
COVD-DRIVE	0.788 (0.018)	0.797 (0.019)	0.789 (0.018)
COVD-DRIVE_SSL	0.785 (0.018)	0.783 (0.019)	0.791 (0.014)
COVD-STARE	0.778 (0.117)	0.778 (0.122)	0.812 (0.046)
COVD-STARE_SSL	0.788 (0.102)	0.811 (0.074)	0.820 (0.044)
COVD-CHASEDB1	0.796 (0.027)	0.791 (0.025)	0.788 (0.024)
COVD-CHASEDB1_SSL	0.796 (0.024)	0.798 (0.025)	0.799 (0.026)
COVD-HRF	0.799 (0.044)	0.800 (0.045)	0.802 (0.045)
COVD-HRF_SSL	0.799 (0.044)	0.784 (0.048)	0.797 (0.044)
COVD-IOSTARVESSEL	0.791 (0.021)	0.777 (0.032)	0.793 (0.015)
COVD-IOSTARVESSEL_SSL	0.797 (0.017)	0.811 (0.074)	0.785 (0.018)

M2U-Net Precision vs. Recall Curves¶

Precision vs. recall curves for each evaluated dataset. Note that here the F1-score is calculated on a macro level (see paper for more details).

Fig. 1 CHASE_DB1: Precision vs Recall curve and F1 scores¶

Fig. 2 DRIVE: Precision vs Recall curve and F1 scores¶

Fig. 3 HRF: Precision vs Recall curve and F1 scores¶

Fig. 4 IOSTAR: Precision vs Recall curve and F1 scores¶

Fig. 5 STARE: Precision vs Recall curve and F1 scores¶