.. -*- coding: utf-8 -*- .. _bob.ip.binseg.results.old: .. todo:: This section is outdated and needs re-factoring. ============================ COVD- and COVD-SLL Results ============================ In addition to the M2U-Net architecture, we also evaluated the larger DRIU network and a variation of it that contains batch normalization (DRIU+BN) on COVD- (Combined Vessel Dataset from all training data minus target test set) and COVD-SSL (COVD- and Semi-Supervised Learning). Perhaps surprisingly, for the majority of combinations, the performance of the DRIU variants are roughly equal or worse to the ones obtained with the much smaller M2U-Net. We anticipate that one reason for this could be overparameterization of large VGG-16 models that are pretrained on ImageNet. F1 Scores --------- Comparison of F1 Scores (micro-level and standard deviation) of DRIU and M2U-Net on COVD- and COVD-SSL. Standard deviation across test-images in brackets. .. list-table:: :header-rows: 1 * - F1 score - :py:mod:`DRIU `/:py:mod:`DRIU@SSL ` - :py:mod:`DRIU+BN `/:py:mod:`DRIU+BN@SSL ` - :py:mod:`M2U-Net `/:py:mod:`M2U-Net@SSL ` * - :py:mod:`COVD-DRIVE ` - 0.788 (0.018) - 0.797 (0.019) - `0.789 (0.018) `_ * - :py:mod:`COVD-DRIVE+SSL ` - 0.785 (0.018) - 0.783 (0.019) - `0.791 (0.014) `_ * - :py:mod:`COVD-STARE ` - 0.778 (0.117) - 0.778 (0.122) - `0.812 (0.046) `_ * - :py:mod:`COVD-STARE+SSL ` - 0.788 (0.102) - 0.811 (0.074) - `0.820 (0.044) `_ * - :py:mod:`COVD-CHASEDB1 ` - 0.796 (0.027) - 0.791 (0.025) - `0.788 (0.024) `_ * - :py:mod:`COVD-CHASEDB1+SSL ` - 0.796 (0.024) - 0.798 (0.025) - `0.799 (0.026) `_ * - :py:mod:`COVD-HRF ` - 0.799 (0.044) - 0.800 (0.045) - `0.802 (0.045) `_ * - :py:mod:`COVD-HRF+SSL ` - 0.799 (0.044) - 0.784 (0.048) - `0.797 (0.044) `_ * - :py:mod:`COVD-IOSTAR-VESSEL ` - 0.791 (0.021) - 0.777 (0.032) - `0.793 (0.015) `_ * - :py:mod:`COVD-IOSTAR-VESSEL+SSL ` - 0.797 (0.017) - 0.811 (0.074) - `0.785 (0.018) `_ M2U-Net Precision vs. Recall Curves ----------------------------------- Precision vs. recall curves for each evaluated dataset. Note that here the F1-score is calculated on a macro level (see paper for more details). .. figure:: pr_CHASEDB1.png :scale: 50 % :align: center :alt: model comparisons CHASE_DB1: Precision vs Recall curve and F1 scores .. figure:: pr_DRIVE.png :scale: 50 % :align: center :alt: model comparisons DRIVE: Precision vs Recall curve and F1 scores .. figure:: pr_HRF.png :scale: 50 % :align: center :alt: model comparisons HRF: Precision vs Recall curve and F1 scores .. figure:: pr_IOSTARVESSEL.png :scale: 50 % :align: center :alt: model comparisons IOSTAR: Precision vs Recall curve and F1 scores .. figure:: pr_STARE.png :scale: 50 % :align: center :alt: model comparisons STARE: Precision vs Recall curve and F1 scores .. include:: ../../links.rst