COVD- and COVD-SLL Results¶

In addition to the M2U-Net architecture, we also evaluated the larger DRIU network and a variation of it that contains batch normalization (DRIU+BN) on COVD- (Combined Vessel Dataset from all training data minus target test set) and COVD-SSL (COVD- and Semi-Supervised Learning). Perhaps surprisingly, for the majority of combinations, the performance of the DRIU variants are roughly equal or worse to the ones obtained with the much smaller M2U-Net. We anticipate that one reason for this could be overparameterization of large VGG-16 models that are pretrained on ImageNet.

F1 Scores¶

Comparison of F1 Scores (micro-level and standard deviation) of DRIU and M2U-Net on COVD- and COVD-SSL. Standard deviation across test-images in brackets.

F1 score	`DRIU`/`DRIU@SSL`	`DRIU+BN`/`DRIU+BN@SSL`	`M2U-Net`/`M2U-Net@SSL`
`COVD-DRIVE`	0.788 (0.018)	0.797 (0.019)	0.789 (0.018)
`COVD-DRIVE+SSL`	0.785 (0.018)	0.783 (0.019)	0.791 (0.014)
`COVD-STARE`	0.778 (0.117)	0.778 (0.122)	0.812 (0.046)
`COVD-STARE+SSL`	0.788 (0.102)	0.811 (0.074)	0.820 (0.044)
`COVD-CHASEDB1`	0.796 (0.027)	0.791 (0.025)	0.788 (0.024)
`COVD-CHASEDB1+SSL`	0.796 (0.024)	0.798 (0.025)	0.799 (0.026)
`COVD-HRF`	0.799 (0.044)	0.800 (0.045)	0.802 (0.045)
`COVD-HRF+SSL`	0.799 (0.044)	0.784 (0.048)	0.797 (0.044)
`COVD-IOSTAR-VESSEL`	0.791 (0.021)	0.777 (0.032)	0.793 (0.015)
`COVD-IOSTAR-VESSEL+SSL`	0.797 (0.017)	0.811 (0.074)	0.785 (0.018)

M2U-Net Precision vs. Recall Curves¶

Precision vs. recall curves for each evaluated dataset. Note that here the F1-score is calculated on a macro level (see paper for more details).

Fig. 18 CHASE_DB1: Precision vs Recall curve and F1 scores¶

Fig. 19 DRIVE: Precision vs Recall curve and F1 scores¶

Fig. 20 HRF: Precision vs Recall curve and F1 scores¶

Fig. 21 IOSTAR: Precision vs Recall curve and F1 scores¶

Fig. 22 STARE: Precision vs Recall curve and F1 scores¶