Command-Line Interface (CLI)¶

This package provides a single entry point for all of its applications using Bob’s unified CLI mechanism. A list of available applications can be retrieved using:

$ bob binseg --help
Usage: bob binseg [OPTIONS] COMMAND [ARGS]...

  Binary 2D Image Segmentation Benchmark commands.

Options:
  -h, -?, --help  Show this message and exit.

Commands:
  analyze       Runs a complete evaluation from prediction to comparison...
  compare       Compares multiple systems together
  config        Commands for listing, describing and copying configuration...
  dataset       Commands for listing and verifying datasets
  evaluate      Evaluates an FCN on a binary segmentation task.
  experiment    Runs a complete experiment, from training, to prediction
                and...

  predict       Predicts vessel map (probabilities) on input images It is...
  significance  Evaluates how significantly different are two models on the...
  train         Trains an FCN to perform binary segmentation Training is...

Setup¶

A CLI application to list and check installed (raw) datasets.

$ bob binseg dataset --help
Usage: bob binseg dataset [OPTIONS] COMMAND [ARGS]...

  Commands for listing and verifying datasets

Options:
  -h, -?, --help  Show this message and exit.

Commands:
  check  Checks file access on one or more datasets
  list   Lists all supported and configured datasets

List available datasets¶

Lists supported and configured raw datasets.

$ bob binseg dataset list --help
Usage: bob binseg dataset list [OPTIONS]

  Lists all supported and configured datasets

Options:
  -v, --verbose   Increase the verbosity level from 0 (only error messages) to
                  1 (warnings), 2 (log messages), 3 (debug information) by
                  adding the --verbose option as often as desired (e.g. '-vvv'
                  for debug).

  -?, -h, --help  Show this message and exit.

  Examples:

      1. To install a dataset, set up its data directory ("datadir").  For
         example, to setup access to DRIVE files you downloaded locally at
         the directory "/path/to/drive/files", do the following:
  
         $ bob config set "bob.ip.binseg.drive.datadir" "/path/to/drive/files"

         Notice this setting **is** case-sensitive.

      2. List all raw datasets supported (and configured):

         $ bob binseg dataset list

Check available datasets¶

Checks if we can load all files listed for a given dataset (all subsets in all protocols).

$ bob binseg dataset check --help
Usage: bob binseg dataset check [OPTIONS] [DATASET]...

  Checks file access on one or more datasets

Options:
  -l, --limit INTEGER RANGE  Limit check to the first N samples in each
                             dataset, making the check sensibly faster.  Set
                             it to zero to check everything.  [required]

  -v, --verbose              Increase the verbosity level from 0 (only error
                             messages) to 1 (warnings), 2 (log messages), 3
                             (debug information) by adding the --verbose
                             option as often as desired (e.g. '-vvv' for
                             debug).

  -h, -?, --help             Show this message and exit.

  Examples:

      1. Check if all files of the DRIVE dataset can be loaded:

         $ bob binseg dataset check -vv drive

      2. Check if all files of multiple installed datasets can be loaded:

         $ bob binseg dataset check -vv drive stare

      3. Check if all files of all installed datasets can be loaded:

         $ bob binseg dataset check

Preset Configuration Resources¶

A CLI application allows one to list, inspect and copy available configuration resources exported by this package.

$ bob binseg config --help
Usage: bob binseg config [OPTIONS] COMMAND [ARGS]...

  Commands for listing, describing and copying configuration resources

Options:
  -?, -h, --help  Show this message and exit.

Commands:
  copy      Copies a specific configuration resource so it can be modified...
  describe  Describes a specific configuration file
  list      Lists configuration files installed

Listing Resources¶

$ bob binseg config list --help
Usage: bob binseg config list [OPTIONS]

  Lists configuration files installed

Options:
  -v, --verbose   Increase the verbosity level from 0 (only error messages) to
                  1 (warnings), 2 (log messages), 3 (debug information) by
                  adding the --verbose option as often as desired (e.g. '-vvv'
                  for debug).

  -h, -?, --help  Show this message and exit.

  Examples:

    1. Lists all configuration resources (type: bob.ip.binseg.config) installed:

       $ bob binseg config list

    2. Lists all configuration resources and their descriptions (notice this may
       be slow as it needs to load all modules once):

       $ bob binseg config list -v

Available Resources¶

Here is a list of all resources currently exported.

$ bob binseg config list -v
module: bob.ip.binseg.configs.datasets
  chasedb1              CHASE-DB1 dataset for Vessel Segmentation (first-anno...
  chasedb1-2nd          CHASE-DB1 dataset for Vessel Segmentation (second-ann...
  chasedb1-covd         COVD-CHASEDB1 for Vessel Segmentation
  chasedb1-mtest        CHASE-DB1 cross-evaluation dataset with matched resol...
  chasedb1-ssl          COVD-CHASE-DB1 + SSL for Vessel Segmentation
  chasedb1-xtest        CHASE-DB1 cross-evaluation dataset
  csv-dataset-example   Example CSV-based custom filelist dataset
  drionsdb              DRIONS-DB for Optic Disc Segmentation (expert #1 anno...
  drionsdb-2nd          DRIONS-DB for Optic Disc Segmentation (expert #2 anno...
  drishtigs1-cup        DRISHTI-GS1 dataset for Cup Segmentation (agreed by a...
  drishtigs1-cup-any    DRISHTI-GS1 dataset for Cup Segmentation (agreed by a...
  drishtigs1-disc       DRISHTI-GS1 dataset for Optic Disc Segmentation (agre...
  drishtigs1-disc-any   DRISHTI-GS1 dataset for Optic Disc Segmentation (agre...
  drive                 DRIVE dataset for Vessel Segmentation (default protoc...
  drive-2nd             DRIVE dataset for Vessel Segmentation (second annotat...
  drive-covd            COVD-DRIVE for Vessel Segmentation
  drive-mtest           DRIVE cross-evaluation dataset with matched resolutio...
  drive-ssl             COVD-DRIVE + SSL for Vessel Segmentation
  drive-xtest           DRIVE cross-evaluation dataset
  hrf                   HRF dataset for Vessel Segmentation (default protocol...
  hrf-covd              COVD-HRF for Vessel Segmentation
  hrf-highres           HRF dataset for Vessel Segmentation (default protocol...
  hrf-mtest             HRF cross-evaluation dataset with matched resolution
  hrf-ssl               COVD-HRF + SSL for Vessel Segmentation
  hrf-xtest             HRF cross-evaluation dataset
  iostar-disc           IOSTAR dataset for Optic Disc Segmentation (default p...
  iostar-vessel         IOSTAR dataset for Vessel Segmentation (default proto...
  iostar-vessel-covd    COVD-IOSTAR for Vessel Segmentation
  iostar-vessel-mtest   IOSTAR vessel cross-evaluation dataset with matched r...
  iostar-vessel-ssl     COVD-IOSTAR + SSL for Vessel Segmentation
  iostar-vessel-xtest   IOSTAR vessel cross-evaluation dataset
  refuge-cup            REFUGE dataset for Optic Cup Segmentation (default pr...
  refuge-disc           REFUGE dataset for Optic Disc Segmentation (default p...
  rimoner3-cup          RIM-ONE r3 for Optic Cup Segmentation (expert #1 anno...
  rimoner3-cup-2nd      RIM-ONE r3 for Optic Cup Segmentation (expert #2 anno...
  rimoner3-disc         RIM-ONE r3 for Optic Disc Segmentation (expert #1 ann...
  rimoner3-disc-2nd     RIM-ONE r3 for Optic Disc Segmentation (expert #2 ann...
  stare                 STARE dataset for Vessel Segmentation (annotator AH)
  stare-2nd             STARE dataset for Vessel Segmentation (annotator VK)
  stare-covd            COVD-STARE for Vessel Segmentation
  stare-mtest           STARE cross-evaluation dataset with matched resolutio...
  stare-ssl             COVD-STARE + SSL (training set) for Vessel Segmentati...
  stare-xtest           STARE cross-evaluation dataset
module: bob.ip.binseg.configs.models
  driu          DRIU Network for Vessel Segmentation
  driu-bn       DRIU Network for Vessel Segmentation with Batch Normalization
  driu-bn-ssl   DRIU Network for Vessel Segmentation using SSL and Batch Norm...
  driu-od       DRIU Network for Optic Disc Segmentation
  driu-ssl      DRIU Network for Vessel Segmentation using SSL
  hed           HED Network for image segmentation
  m2unet        MobileNetV2 U-Net model for image segmentation
  m2unet-ssl    MobileNetV2 U-Net model for image segmentation using SSL
  resunet       Residual U-Net for image segmentation
  unet          U-Net for image segmentation

Describing a Resource¶

$ bob binseg config describe --help
Usage: bob binseg config describe [OPTIONS] NAME...

  Describes a specific configuration file

Options:
  -v, --verbose   Increase the verbosity level from 0 (only error messages) to
                  1 (warnings), 2 (log messages), 3 (debug information) by
                  adding the --verbose option as often as desired (e.g. '-vvv'
                  for debug).

  -?, -h, --help  Show this message and exit.

  Examples:

    1. Describes the DRIVE (training) dataset configuration:

       $ bob binseg config describe drive

    2. Describes the DRIVE (training) dataset configuration and lists its
       contents:

       $ bob binseg config describe drive -v

Copying a Resource¶

You may use this command to locally copy a resource file so you can change it.

$ bob binseg config copy --help
Usage: bob binseg config copy [OPTIONS] SOURCE DESTINATION

  Copies a specific configuration resource so it can be modified locally

Options:
  -v, --verbose   Increase the verbosity level from 0 (only error messages) to
                  1 (warnings), 2 (log messages), 3 (debug information) by
                  adding the --verbose option as often as desired (e.g. '-vvv'
                  for debug).

  -h, -?, --help  Show this message and exit.

  Examples:

    1. Makes a copy of one of the stock configuration files locally, so it can be
       adapted:

       $ bob binseg config copy drive -vvv newdataset.py

Running and Analyzing Experiments¶

These applications run a combined set of steps in one go. They work well with our preset configuration resources.

Running a Full Experiment Cycle¶

This command can run training, prediction, evaluation and comparison from a single, multi-step application.

$ bob binseg experiment --help
Usage: bob binseg experiment [OPTIONS] [CONFIG]...

  Runs a complete experiment, from training, to prediction and evaluation

      This script is just a wrapper around the individual scripts for
      training,     running prediction, evaluating and comparing FCN model
      performance.  It     organises the output in a preset way::

         └─ <output-folder>/
            ├── model/  #the generated model will be here
            ├── predictions/  #the prediction outputs for the train/test set
            ├── overlayed/  #the overlayed outputs for the train/test set
               ├── predictions/  #predictions overlayed on the input images
               ├── analysis/  #predictions overlayed on the input images
               ├              #including analysis of false positives, negatives
               ├              #and true positives
               └── second-annotator/  #if set, store overlayed images for the
                                      #second annotator here
            └── analysis /  #the outputs of the analysis of both train/test sets
                            #includes second-annotator "mesures" as well, if
                            # configured

      Training is performed for a configurable number of epochs, and
      generates at     least a final_model.pth.  It may also generate a
      number of intermediate     checkpoints.  Checkpoints are model files
      (.pth files) that are stored     during the training and useful to
      resume the procedure in case it stops     abruptly.

      N.B.: The tool is designed to prevent analysis bias and allows one to
      provide separate subsets for training and evaluation.  Instead of
      using     simple datasets, datasets for full experiment running should
      be     dictionaries with specific subset names:

      * ``__train__``: dataset used for training, prioritarily.  It is
      typically       the dataset containing data augmentation pipelines.
      * ``__valid__``: dataset used for validation.  It is typically
      disjoint       from the training and test sets.  In such a case, we
      checkpoint the model       with the lowest loss on the validation set
      as well, throughout all the       training, besides the model at the
      end of training.     * ``train`` (optional): a copy of the
      ``__train__`` dataset, without data       augmentation, that will be
      evaluated alongside other sets available     * ``*``: any other name,
      not starting with an underscore character (``_``),       will be
      considered a test set for evaluation.

      N.B.2: The threshold used for calculating the F1-score on the test
      set, or     overlay analysis (false positives, negatives and true
      positives overprinted     on the original image) also follows the
      logic above.

  It is possible to pass one or several Python files (or names of
  ``bob.ip.binseg.config`` entry points or module names) as CONFIG arguments
  to the command line which contain the parameters listed below as Python
  variables. The options through the command-line (see below) will override
  the values of configuration files. You can run this command with
  ``<COMMAND> -H example_config.py`` to create a template config file.

Options:
  -o, --output-folder PATH        Path where to store experiment outputs
                                  (created if does not exist)  [required]

  -m, --model TEXT                A torch.nn.Module instance implementing the
                                  network to be trained, and then evaluated
                                  [required]

  -d, --dataset TEXT              A dictionary mapping string keys to bob.ip.b
                                  inseg.data.utils.SampleList2TorchDataset's.
                                  At least one key named 'train' must be
                                  available.  This dataset will be used for
                                  training the network model.  All other
                                  datasets will be used for prediction and
                                  evaluation. Dataset descriptions include all
                                  required pre-processing, including eventual
                                  data augmentation, which may be eventually
                                  excluded for prediction and evaluation
                                  purposes  [required]

  -S, --second-annotator TEXT     A dataset or dictionary, like in --dataset,
                                  with the same sample keys, but with
                                  annotations from a different annotator that
                                  is going to be compared to the one in
                                  --dataset

  --optimizer TEXT                A torch.optim.Optimizer that will be used to
                                  train the network  [required]

  --criterion TEXT                A loss function to compute the FCN error for
                                  every sample respecting the PyTorch API for
                                  loss functions (see torch.nn.modules.loss)
                                  [required]

  --scheduler TEXT                A learning rate scheduler that drives
                                  changes in the learning rate depending on
                                  the FCN state (see torch.optim.lr_scheduler)
                                  [required]

  -b, --batch-size INTEGER RANGE  Number of samples in every batch (this
                                  parameter affects memory requirements for
                                  the network).  If the number of samples in
                                  the batch is larger than the total number of
                                  samples available for training, this value
                                  is truncated.  If this number is smaller,
                                  then batches of the specified size are
                                  created and fed to the network until there
                                  are no more new samples to feed (epoch is
                                  finished).  If the total number of training
                                  samples is not a multiple of the batch-size,
                                  the last batch will be smaller than the
                                  first, unless --drop-incomplete--batch is
                                  set, in which case this batch is not used.
                                  [default: 2; required]

  -D, --drop-incomplete-batch / --no-drop-incomplete-batch
                                  If set, then may drop the last batch in an
                                  epoch, in case it is incomplete.  If you set
                                  this option, you should also consider
                                  increasing the total number of epochs of
                                  training, as the total number of training
                                  steps may be reduced  [default: False;
                                  required]

  -e, --epochs INTEGER RANGE      Number of epochs (complete training set
                                  passes) to train for  [default: 1000;
                                  required]

  -p, --checkpoint-period INTEGER RANGE
                                  Number of epochs after which a checkpoint is
                                  saved. A value of zero will disable check-
                                  pointing. If checkpointing is enabled and
                                  training stops, it is automatically resumed
                                  from the last saved checkpoint if training
                                  is restarted with the same configuration.
                                  [default: 0; required]

  -d, --device TEXT               A string indicating the device to use (e.g.
                                  "cpu" or "cuda:0")  [default: cpu; required]

  -s, --seed INTEGER RANGE        Seed to use for the random number generator
                                  [default: 42]

  --ssl / --no-ssl                Switch ON/OFF semi-supervised training mode
                                  [default: False; required]

  -r, --rampup INTEGER RANGE      Ramp-up length in epochs (for SSL training
                                  only)  [default: 900; required]

  -O, --overlayed / --no-overlayed
                                  Creates overlayed representations of the
                                  output probability maps, similar to
                                  --overlayed in prediction-mode, except it
                                  includes distinctive colours for true and
                                  false positives and false negatives.  If not
                                  set, or empty then do **NOT** output
                                  overlayed images.  [default: False]

  -S, --steps INTEGER             This number is used to define the number of
                                  threshold steps to consider when evaluating
                                  the highest possible F1-score on test data.
                                  [default: 1000; required]

  -v, --verbose                   Increase the verbosity level from 0 (only
                                  error messages) to 1 (warnings), 2 (log
                                  messages), 3 (debug information) by adding
                                  the --verbose option as often as desired
                                  (e.g. '-vvv' for debug).

  -H, --dump-config FILENAME      Name of the config file to be generated
  -?, -h, --help                  Show this message and exit.

  Examples:

      1. Trains an M2U-Net model (VGG-16 backbone) with DRIVE (vessel
         segmentation), on the CPU, for only two epochs, then runs inference and
         evaluation on stock datasets, report performance as a table and a figure:

         $ bob binseg experiment -vv m2unet drive --epochs=2

Running Complete Experiment Analysis¶

This command can run prediction, evaluation and comparison from a single, multi-step application.

$ bob binseg analyze --help
Usage: bob binseg analyze [OPTIONS] [CONFIG]...

  Runs a complete evaluation from prediction to comparison

      This script is just a wrapper around the individual scripts for
      running     prediction and evaluating FCN models.  It organises the
      output in a     preset way::

         └─ <output-folder>/
            ├── predictions/  #the prediction outputs for the train/test set
            ├── overlayed/  #the overlayed outputs for the train/test set
               ├── predictions/  #predictions overlayed on the input images
               ├── analysis/  #predictions overlayed on the input images
               ├              #including analysis of false positives, negatives
               ├              #and true positives
               └── second-annotator/  #if set, store overlayed images for the
                                      #second annotator here
            └── analysis /  #the outputs of the analysis of both train/test sets
                            #includes second-annotator "mesures" as well, if
                            # configured

      N.B.: The tool is designed to prevent analysis bias and allows one to
      provide separate subsets for training and evaluation.  Instead of
      using     simple datasets, datasets for full experiment running should
      be     dictionaries with specific subset names:

      * ``__train__``: dataset used for training, prioritarily.  It is
      typically       the dataset containing data augmentation pipelines.
      * ``train`` (optional): a copy of the ``__train__`` dataset, without
      data       augmentation, that will be evaluated alongside other sets
      available     * ``*``: any other name, not starting with an underscore
      character (``_``),       will be considered a test set for evaluation.

      N.B.2: The threshold used for calculating the F1-score on the test
      set, or     overlay analysis (false positives, negatives and true
      positives overprinted     on the original image) also follows the
      logic above.

  It is possible to pass one or several Python files (or names of
  ``bob.ip.binseg.config`` entry points or module names) as CONFIG arguments
  to the command line which contain the parameters listed below as Python
  variables. The options through the command-line (see below) will override
  the values of configuration files. You can run this command with
  ``<COMMAND> -H example_config.py`` to create a template config file.

Options:
  -o, --output-folder PATH        Path where to store experiment outputs
                                  (created if does not exist)  [required]

  -m, --model TEXT                A torch.nn.Module instance implementing the
                                  network to be trained, and then evaluated
                                  [required]

  -d, --dataset TEXT              A dictionary mapping string keys to bob.ip.b
                                  inseg.data.utils.SampleList2TorchDataset's.
                                  At least one key named 'train' must be
                                  available.  This dataset will be used for
                                  training the network model.  All other
                                  datasets will be used for prediction and
                                  evaluation. Dataset descriptions include all
                                  required pre-processing, including eventual
                                  data augmentation, which may be eventually
                                  excluded for prediction and evaluation
                                  purposes  [required]

  -S, --second-annotator TEXT     A dataset or dictionary, like in --dataset,
                                  with the same sample keys, but with
                                  annotations from a different annotator that
                                  is going to be compared to the one in
                                  --dataset

  -b, --batch-size INTEGER RANGE  Number of samples in every batch (this
                                  parameter affects memory requirements for
                                  the network).  If the number of samples in
                                  the batch is larger than the total number of
                                  samples available for training, this value
                                  is truncated.  If this number is smaller,
                                  then batches of the specified size are
                                  created and fed to the network until there
                                  are no more new samples to feed (epoch is
                                  finished).  If the total number of training
                                  samples is not a multiple of the batch-size,
                                  the last batch will be smaller than the
                                  first.  [default: 1; required]

  -d, --device TEXT               A string indicating the device to use (e.g.
                                  "cpu" or "cuda:0")  [default: cpu; required]

  -O, --overlayed / --no-overlayed
                                  Creates overlayed representations of the
                                  output probability maps, similar to
                                  --overlayed in prediction-mode, except it
                                  includes distinctive colours for true and
                                  false positives and false negatives.  If not
                                  set, or empty then do **NOT** output
                                  overlayed images.  [default: False]

  -w, --weight TEXT               Path or URL to pretrained model file (.pth
                                  extension)  [required]

  -S, --steps INTEGER             This number is used to define the number of
                                  threshold steps to consider when evaluating
                                  the highest possible F1-score on test data.
                                  [default: 1000; required]

  -v, --verbose                   Increase the verbosity level from 0 (only
                                  error messages) to 1 (warnings), 2 (log
                                  messages), 3 (debug information) by adding
                                  the --verbose option as often as desired
                                  (e.g. '-vvv' for debug).

  -H, --dump-config FILENAME      Name of the config file to be generated
  -h, -?, --help                  Show this message and exit.

  Examples:

      1. Re-evaluates a pre-trained M2U-Net model with DRIVE (vessel
      segmentation), on the CPU, by running inference and evaluation on results
      from its test set:

         $ bob binseg analyze -vv m2unet drive --weight=model.path

Single-Step Applications¶

These applications allow finer control over the experiment cycle. They also work well with our preset configuration resources, but allow finer control on the input datasets.

Training FCNs¶

Training creates of a new PyTorch model. This model can be used for evaluation tests or for inference.

$ bob binseg train --help
Usage: bob binseg train [OPTIONS] [CONFIG]...

  Trains an FCN to perform binary segmentation

  Training is performed for a configurable number of epochs, and generates
  at least a final_model.pth.  It may also generate a number of intermediate
  checkpoints.  Checkpoints are model files (.pth files) that are stored
  during the training and useful to resume the procedure in case it stops
  abruptly.

  It is possible to pass one or several Python files (or names of
  ``bob.ip.binseg.config`` entry points or module names) as CONFIG arguments
  to the command line which contain the parameters listed below as Python
  variables. The options through the command-line (see below) will override
  the values of configuration files. You can run this command with
  ``<COMMAND> -H example_config.py`` to create a template config file.

Options:
  -o, --output-folder PATH        Path where to store the generated model
                                  (created if does not exist)  [required]

  -m, --model TEXT                A torch.nn.Module instance implementing the
                                  network to be trained  [required]

  -d, --dataset TEXT              A torch.utils.data.dataset.Dataset instance
                                  implementing a dataset to be used for
                                  training the model, possibly including all
                                  pre-processing pipelines required or,
                                  optionally, a dictionary mapping string keys
                                  to torch.utils.data.dataset.Dataset
                                  instances.  At least one key named ``train``
                                  must be available.  This dataset will be
                                  used for training the network model.  The
                                  dataset description must include all
                                  required pre-processing, including eventual
                                  data augmentation.  If a dataset named
                                  ``__train__`` is available, it is used
                                  prioritarily for training instead of
                                  ``train``.  If a dataset named ``__valid__``
                                  is available, it is used for model
                                  validation (and automatic check-pointing) at
                                  each epoch.  [required]

  --optimizer TEXT                A torch.optim.Optimizer that will be used to
                                  train the network  [required]

  --criterion TEXT                A loss function to compute the FCN error for
                                  every sample respecting the PyTorch API for
                                  loss functions (see torch.nn.modules.loss)
                                  [required]

  --scheduler TEXT                A learning rate scheduler that drives
                                  changes in the learning rate depending on
                                  the FCN state (see torch.optim.lr_scheduler)
                                  [required]

  -b, --batch-size INTEGER RANGE  Number of samples in every batch (this
                                  parameter affects memory requirements for
                                  the network).  If the number of samples in
                                  the batch is larger than the total number of
                                  samples available for training, this value
                                  is truncated.  If this number is smaller,
                                  then batches of the specified size are
                                  created and fed to the network until there
                                  are no more new samples to feed (epoch is
                                  finished).  If the total number of training
                                  samples is not a multiple of the batch-size,
                                  the last batch will be smaller than the
                                  first, unless --drop-incomplete--batch is
                                  set, in which case this batch is not used.
                                  [default: 2; required]

  -D, --drop-incomplete-batch / --no-drop-incomplete-batch
                                  If set, then may drop the last batch in an
                                  epoch, in case it is incomplete.  If you set
                                  this option, you should also consider
                                  increasing the total number of epochs of
                                  training, as the total number of training
                                  steps may be reduced  [default: False;
                                  required]

  -e, --epochs INTEGER RANGE      Number of epochs (complete training set
                                  passes) to train for  [default: 1000;
                                  required]

  -p, --checkpoint-period INTEGER RANGE
                                  Number of epochs after which a checkpoint is
                                  saved. A value of zero will disable check-
                                  pointing. If checkpointing is enabled and
                                  training stops, it is automatically resumed
                                  from the last saved checkpoint if training
                                  is restarted with the same configuration.
                                  [default: 0; required]

  -d, --device TEXT               A string indicating the device to use (e.g.
                                  "cpu" or "cuda:0")  [default: cpu; required]

  -s, --seed INTEGER RANGE        Seed to use for the random number generator
                                  [default: 42]

  --ssl / --no-ssl                Switch ON/OFF semi-supervised training mode
                                  [default: False; required]

  -r, --rampup INTEGER RANGE      Ramp-up length in epochs (for SSL training
                                  only)  [default: 900; required]

  -v, --verbose                   Increase the verbosity level from 0 (only
                                  error messages) to 1 (warnings), 2 (log
                                  messages), 3 (debug information) by adding
                                  the --verbose option as often as desired
                                  (e.g. '-vvv' for debug).

  -H, --dump-config FILENAME      Name of the config file to be generated
  -h, -?, --help                  Show this message and exit.

  Examples:

      1. Trains a U-Net model (VGG-16 backbone) with DRIVE (vessel segmentation),
         on a GPU (``cuda:0``):

         $ bob binseg train -vv unet drive --batch-size=4 --device="cuda:0"

      2. Trains a HED model with HRF on a GPU (``cuda:0``):

         $ bob binseg train -vv hed hrf --batch-size=8 --device="cuda:0"

      3. Trains a M2U-Net model on the COVD-DRIVE dataset on the CPU:

         $ bob binseg train -vv m2unet covd-drive --batch-size=8

      4. Trains a DRIU model with SSL on the COVD-HRF dataset on the CPU:

         $ bob binseg train -vv --ssl driu-ssl covd-drive-ssl --batch-size=1

Prediction with FCNs¶

Inference takes as input a PyTorch model and generates output probabilities as HDF5 files. The probability map has the same size as the input and indicates, from 0 to 1 (floating-point number), the probability of a vessel in that pixel, from less probable (0.0) to more probable (1.0).

$ bob binseg predict --help
Usage: bob binseg predict [OPTIONS] [CONFIG]...

  Predicts vessel map (probabilities) on input images

  It is possible to pass one or several Python files (or names of
  ``bob.ip.binseg.config`` entry points or module names) as CONFIG arguments
  to the command line which contain the parameters listed below as Python
  variables. The options through the command-line (see below) will override
  the values of configuration files. You can run this command with
  ``<COMMAND> -H example_config.py`` to create a template config file.

Options:
  -o, --output-folder PATH        Path where to store the predictions (created
                                  if does not exist)  [required]

  -m, --model TEXT                A torch.nn.Module instance implementing the
                                  network to be evaluated  [required]

  -d, --dataset TEXT              A torch.utils.data.dataset.Dataset instance
                                  implementing a dataset to be used for
                                  running prediction, possibly including all
                                  pre-processing pipelines required or,
                                  optionally, a dictionary mapping string keys
                                  to torch.utils.data.dataset.Dataset
                                  instances.  All keys that do not start with
                                  an underscore (_) will be processed.
                                  [required]

  -b, --batch-size INTEGER RANGE  Number of samples in every batch (this
                                  parameter affects memory requirements for
                                  the network)  [default: 1; required]

  -d, --device TEXT               A string indicating the device to use (e.g.
                                  "cpu" or "cuda:0")  [default: cpu; required]

  -w, --weight TEXT               Path or URL to pretrained model file (.pth
                                  extension)  [required]

  -O, --overlayed TEXT            Creates overlayed representations of the
                                  output probability maps on top of input
                                  images (store results as PNG files).   If
                                  not set, or empty then do **NOT** output
                                  overlayed images.  Otherwise, the parameter
                                  represents the name of a folder where to
                                  store those

  -v, --verbose                   Increase the verbosity level from 0 (only
                                  error messages) to 1 (warnings), 2 (log
                                  messages), 3 (debug information) by adding
                                  the --verbose option as often as desired
                                  (e.g. '-vvv' for debug).

  -H, --dump-config FILENAME      Name of the config file to be generated
  -h, -?, --help                  Show this message and exit.

  Examples:

      1. Runs prediction on an existing dataset configuration:
  
         $ bob binseg predict -vv m2unet drive --weight=path/to/model_final.pth --output-folder=path/to/predictions
  
      2. To run prediction on a folder with your own images, you must first
         specify resizing, cropping, etc, so that the image can be correctly
         input to the model.  Failing to do so will likely result in poor
         performance.  To figure out such specifications, you must consult the
         dataset configuration used for **training** the provided model.  Once
         you figured this out, do the following:
  
         $ bob binseg config copy csv-dataset-example mydataset.py
         # modify "mydataset.py" to include the base path and required transforms
         $ bob binseg predict -vv m2unet mydataset.py --weight=path/to/model_final.pth --output-folder=path/to/predictions

FCN Performance Evaluation¶

Evaluation takes inference results and compares it to ground-truth, generating a series of analysis figures which are useful to understand model performance.

$ bob binseg evaluate --help
Usage: bob binseg evaluate [OPTIONS] [CONFIG]...

  Evaluates an FCN on a binary segmentation task.

  It is possible to pass one or several Python files (or names of
  ``bob.ip.binseg.config`` entry points or module names) as CONFIG arguments
  to the command line which contain the parameters listed below as Python
  variables. The options through the command-line (see below) will override
  the values of configuration files. You can run this command with
  ``<COMMAND> -H example_config.py`` to create a template config file.

Options:
  -o, --output-folder PATH        Path where to store the analysis result
                                  (created if does not exist)  [required]

  -p, --predictions-folder DIRECTORY
                                  Path where predictions are currently stored
                                  [required]

  -d, --dataset TEXT              A torch.utils.data.dataset.Dataset instance
                                  implementing a dataset to be used for
                                  evaluation purposes, possibly including all
                                  pre-processing pipelines required or,
                                  optionally, a dictionary mapping string keys
                                  to torch.utils.data.dataset.Dataset
                                  instances.  All keys that do not start with
                                  an underscore (_) will be processed.
                                  [required]

  -S, --second-annotator TEXT     A dataset or dictionary, like in --dataset,
                                  with the same sample keys, but with
                                  annotations from a different annotator that
                                  is going to be compared to the one in
                                  --dataset.  The same rules regarding dataset
                                  naming conventions apply

  -O, --overlayed TEXT            Creates overlayed representations of the
                                  output probability maps, similar to
                                  --overlayed in prediction-mode, except it
                                  includes distinctive colours for true and
                                  false positives and false negatives.  If not
                                  set, or empty then do **NOT** output
                                  overlayed images.  Otherwise, the parameter
                                  represents the name of a folder where to
                                  store those

  -t, --threshold TEXT            This number is used to define positives and
                                  negatives from probability maps, and report
                                  F1-scores (a priori). It should either come
                                  from the training set or a separate
                                  validation set to avoid biasing the
                                  analysis.  Optionally, if you provide a
                                  multi-set dataset as input, this may also be
                                  the name of an existing set from which the
                                  threshold will be estimated (highest
                                  F1-score) and then applied to the subsequent
                                  sets.  This number is also used to print the
                                  test set F1-score a priori performance

  -S, --steps INTEGER             This number is used to define the number of
                                  threshold steps to consider when evaluating
                                  the highest possible F1-score on test data.
                                  [default: 1000; required]

  -v, --verbose                   Increase the verbosity level from 0 (only
                                  error messages) to 1 (warnings), 2 (log
                                  messages), 3 (debug information) by adding
                                  the --verbose option as often as desired
                                  (e.g. '-vvv' for debug).

  -H, --dump-config FILENAME      Name of the config file to be generated
  -?, -h, --help                  Show this message and exit.

  Examples:

      1. Runs evaluation on an existing dataset configuration:
  
         $ bob binseg evaluate -vv drive --predictions-folder=path/to/predictions --output-folder=path/to/results
  
      2. To run evaluation on a folder with your own images and annotations, you
         must first specify resizing, cropping, etc, so that the image can be
         correctly input to the model.  Failing to do so will likely result in
         poor performance.  To figure out such specifications, you must consult
         the dataset configuration used for **training** the provided model.
         Once you figured this out, do the following:
  
         $ bob binseg config copy csv-dataset-example mydataset.py
         # modify "mydataset.py" to your liking
         $ bob binseg evaluate -vv mydataset.py --predictions-folder=path/to/predictions --output-folder=path/to/results

Performance Comparison¶

Performance comparison takes the performance evaluation results and generate combined figures and tables that compare results of multiple systems.

$ bob binseg compare --help
Usage: bob binseg compare [OPTIONS] [LABEL_PATH]...

  Compares multiple systems together

Options:
  -f, --output-figure FILE        Path where write the output figure (any
                                  extension supported by matplotlib is
                                  possible).  If not provided, does not
                                  produce a figure.

  -T, --table-format [fancy_grid|github|grid|html|jira|latex|latex_booktabs|latex_raw|mediawiki|moinmoin|orgtbl|pipe|plain|presto|psql|rst|simple|textile|tsv|youtrack]
                                  The format to use for the comparison table
                                  [default: rst; required]

  -u, --output-table FILE         Path where write the output table. If not
                                  provided, does not write write a table to
                                  file, only to stdout.

  -t, --threshold TEXT            This number is used to select which F1-score
                                  to use for representing a system
                                  performance.  If not set, we report the
                                  maximum F1-score in the set, which is
                                  equivalent to threshold selection a
                                  posteriori (biased estimator).  You can
                                  either set this value to a floating-point
                                  number in the range [0.0, 1.0], or to a
                                  string, naming one of the systems which will
                                  be used to calculate the threshold leading
                                  to the maximum F1-score and then applied to
                                  all other sets.

  -v, --verbose                   Increase the verbosity level from 0 (only
                                  error messages) to 1 (warnings), 2 (log
                                  messages), 3 (debug information) by adding
                                  the --verbose option as often as desired
                                  (e.g. '-vvv' for debug).

  -h, -?, --help                  Show this message and exit.

  Examples:

      1. Compares system A and B, with their own pre-computed measure files:
  
         $ bob binseg compare -vv A path/to/A/train.csv B path/to/B/test.csv

Performance Difference Significance¶

Calculates the significance between results obtained through 2 systems on the same dataset.

$ bob binseg significance --help
Usage: bob binseg significance [OPTIONS] [CONFIG]...

  Evaluates how significantly different are two models on the same dataset

  This application calculates the significance of results of two models
  operating on the same dataset, and subject to a priori threshold tunning.

  It is possible to pass one or several Python files (or names of
  ``bob.ip.binseg.config`` entry points or module names) as CONFIG arguments
  to the command line which contain the parameters listed below as Python
  variables. The options through the command-line (see below) will override
  the values of configuration files. You can run this command with
  ``<COMMAND> -H example_config.py`` to create a template config file.

Options:
  -n, --names TEXT...             Names of the two systems to compare
                                  [required]

  -p, --predictions DIRECTORY...  Path where predictions of system 2 are
                                  currently stored.  You may also input
                                  predictions from a second-annotator.  This
                                  application will adequately handle it.
                                  [required]

  -d, --dataset TEXT              A dictionary mapping string keys to
                                  torch.utils.data.dataset.Dataset instances
                                  [required]

  -t, --threshold TEXT            This number is used to define positives and
                                  negatives from probability maps, and report
                                  F1-scores (a priori). By default, we expect
                                  a set named 'validation' to be available at
                                  the input data. If that is not the case, we
                                  use 'train', if available.  You may provide
                                  the name of another dataset to be used for
                                  threshold tunning otherwise. If not set, or
                                  a string is input, threshold tunning is done
                                  per system, individually.  Optionally, you
                                  may also provide a floating-point number
                                  between [0.0, 1.0] as the threshold to use
                                  for both systems.  [default: validation;
                                  required]

  -e, --evaluate TEXT             Name of the dataset to evaluate  [default:
                                  test; required]

  -S, --steps INTEGER             This number is used to define the number of
                                  threshold steps to consider when evaluating
                                  the highest possible F1-score on train/test
                                  data.  [default: 1000; required]

  -s, --size INTEGER...           This is a tuple with two values indicating
                                  the size of windows to be used for sliding
                                  window analysis.  The values represent
                                  height and width respectively.  [default:
                                  128, 128; required]

  -t, --stride INTEGER...         This is a tuple with two values indicating
                                  the stride of windows to be used for sliding
                                  window analysis.  The values represent
                                  height and width respectively.  [default:
                                  32, 32; required]

  -f, --figure TEXT               The name of a performance figure (e.g.
                                  f1_score, or jaccard) to use when comparing
                                  performances  [default: accuracy; required]

  -o, --output-folder PATH        Path where to store visualizations
  -R, --remove-outliers / --no-remove-outliers
                                  If set, removes outliers from both score
                                  distributions before running statistical
                                  analysis.  Outlier removal follows a 1.5 IQR
                                  range check from the difference in figures
                                  between both systems and assumes most of the
                                  distribution is contained within that range
                                  (like in a normal distribution)  [default:
                                  False; required]

  -R, --remove-zeros / --no-remove-zeros
                                  If set, removes instances from the
                                  statistical analysis in which both systems
                                  had a performance equal to zero.  [default:
                                  False; required]

  -x, --parallel INTEGER          Set the number of parallel processes to use
                                  when running using multiprocessing.  A value
                                  of zero uses all reported cores.  [default:
                                  1; required]

  -k, --checkpoint-folder PATH    Path where to store checkpointed versions of
                                  sliding window performances

  -v, --verbose                   Increase the verbosity level from 0 (only
                                  error messages) to 1 (warnings), 2 (log
                                  messages), 3 (debug information) by adding
                                  the --verbose option as often as desired
                                  (e.g. '-vvv' for debug).

  -H, --dump-config FILENAME      Name of the config file to be generated
  -h, -?, --help                  Show this message and exit.

  Examples:

      1. Runs a significance test using as base the calculated predictions of two
         different systems, on the **same** dataset:
  
         $ bob binseg significance -vv drive --names system1 system2 --predictions=path/to/predictions/system-1 path/to/predictions/system-2
  
      2. By default, we use a "validation" dataset if it is available, to infer
         the a priori threshold for the comparison of two systems.  Otherwise,
         you may need to specify the name of a set to be used as validation set
         for choosing a threshold.  The same goes for the set to be used for
         testing the hypothesis - by default we use the "test" dataset if it is
         available, otherwise, specify.
  
         $ bob binseg significance -vv drive --names system1 system2 --predictions=path/to/predictions/system-1 path/to/predictions/system-2 --threshold=train --evaluate=alternate-test