Command-Line Interface (CLI)¶
This package provides a single entry point for all of its applications using Bob’s unified CLI mechanism. A list of available applications can be retrieved using:
$ bob binseg --help
Usage: bob binseg [OPTIONS] COMMAND [ARGS]...
Binary 2D Image Segmentation Benchmark commands.
Options:
-h, -?, --help Show this message and exit.
Commands:
analyze Runs a complete evaluation from prediction to comparison...
compare Compares multiple systems together
config Commands for listing, describing and copying configuration...
dataset Commands for listing and verifying datasets
evaluate Evaluates an FCN on a binary segmentation task.
experiment Runs a complete experiment, from training, to prediction
and...
predict Predicts vessel map (probabilities) on input images It is...
significance Evaluates how significantly different are two models on the...
train Trains an FCN to perform binary segmentation Training is...
Setup¶
A CLI application to list and check installed (raw) datasets.
$ bob binseg dataset --help
Usage: bob binseg dataset [OPTIONS] COMMAND [ARGS]...
Commands for listing and verifying datasets
Options:
-h, -?, --help Show this message and exit.
Commands:
check Checks file access on one or more datasets
list Lists all supported and configured datasets
List available datasets¶
Lists supported and configured raw datasets.
$ bob binseg dataset list --help
Usage: bob binseg dataset list [OPTIONS]
Lists all supported and configured datasets
Options:
-v, --verbose Increase the verbosity level from 0 (only error messages) to
1 (warnings), 2 (log messages), 3 (debug information) by
adding the --verbose option as often as desired (e.g. '-vvv'
for debug).
-?, -h, --help Show this message and exit.
Examples:
1. To install a dataset, set up its data directory ("datadir"). For
example, to setup access to DRIVE files you downloaded locally at
the directory "/path/to/drive/files", do the following:
$ bob config set "bob.ip.binseg.drive.datadir" "/path/to/drive/files"
Notice this setting **is** case-sensitive.
2. List all raw datasets supported (and configured):
$ bob binseg dataset list
Check available datasets¶
Checks if we can load all files listed for a given dataset (all subsets in all protocols).
$ bob binseg dataset check --help
Usage: bob binseg dataset check [OPTIONS] [DATASET]...
Checks file access on one or more datasets
Options:
-l, --limit INTEGER RANGE Limit check to the first N samples in each
dataset, making the check sensibly faster. Set
it to zero to check everything. [required]
-v, --verbose Increase the verbosity level from 0 (only error
messages) to 1 (warnings), 2 (log messages), 3
(debug information) by adding the --verbose
option as often as desired (e.g. '-vvv' for
debug).
-h, -?, --help Show this message and exit.
Examples:
1. Check if all files of the DRIVE dataset can be loaded:
$ bob binseg dataset check -vv drive
2. Check if all files of multiple installed datasets can be loaded:
$ bob binseg dataset check -vv drive stare
3. Check if all files of all installed datasets can be loaded:
$ bob binseg dataset check
Preset Configuration Resources¶
A CLI application allows one to list, inspect and copy available configuration resources exported by this package.
$ bob binseg config --help
Usage: bob binseg config [OPTIONS] COMMAND [ARGS]...
Commands for listing, describing and copying configuration resources
Options:
-?, -h, --help Show this message and exit.
Commands:
copy Copies a specific configuration resource so it can be modified...
describe Describes a specific configuration file
list Lists configuration files installed
Listing Resources¶
$ bob binseg config list --help
Usage: bob binseg config list [OPTIONS]
Lists configuration files installed
Options:
-v, --verbose Increase the verbosity level from 0 (only error messages) to
1 (warnings), 2 (log messages), 3 (debug information) by
adding the --verbose option as often as desired (e.g. '-vvv'
for debug).
-h, -?, --help Show this message and exit.
Examples:
1. Lists all configuration resources (type: bob.ip.binseg.config) installed:
$ bob binseg config list
2. Lists all configuration resources and their descriptions (notice this may
be slow as it needs to load all modules once):
$ bob binseg config list -v
Available Resources¶
Here is a list of all resources currently exported.
$ bob binseg config list -v
module: bob.ip.binseg.configs.datasets
chasedb1 CHASE-DB1 dataset for Vessel Segmentation (first-anno...
chasedb1-2nd CHASE-DB1 dataset for Vessel Segmentation (second-ann...
chasedb1-covd COVD-CHASEDB1 for Vessel Segmentation
chasedb1-mtest CHASE-DB1 cross-evaluation dataset with matched resol...
chasedb1-ssl COVD-CHASE-DB1 + SSL for Vessel Segmentation
chasedb1-xtest CHASE-DB1 cross-evaluation dataset
csv-dataset-example Example CSV-based custom filelist dataset
drionsdb DRIONS-DB for Optic Disc Segmentation (expert #1 anno...
drionsdb-2nd DRIONS-DB for Optic Disc Segmentation (expert #2 anno...
drishtigs1-cup DRISHTI-GS1 dataset for Cup Segmentation (agreed by a...
drishtigs1-cup-any DRISHTI-GS1 dataset for Cup Segmentation (agreed by a...
drishtigs1-disc DRISHTI-GS1 dataset for Optic Disc Segmentation (agre...
drishtigs1-disc-any DRISHTI-GS1 dataset for Optic Disc Segmentation (agre...
drive DRIVE dataset for Vessel Segmentation (default protoc...
drive-2nd DRIVE dataset for Vessel Segmentation (second annotat...
drive-covd COVD-DRIVE for Vessel Segmentation
drive-mtest DRIVE cross-evaluation dataset with matched resolutio...
drive-ssl COVD-DRIVE + SSL for Vessel Segmentation
drive-xtest DRIVE cross-evaluation dataset
hrf HRF dataset for Vessel Segmentation (default protocol...
hrf-covd COVD-HRF for Vessel Segmentation
hrf-highres HRF dataset for Vessel Segmentation (default protocol...
hrf-mtest HRF cross-evaluation dataset with matched resolution
hrf-ssl COVD-HRF + SSL for Vessel Segmentation
hrf-xtest HRF cross-evaluation dataset
iostar-disc IOSTAR dataset for Optic Disc Segmentation (default p...
iostar-vessel IOSTAR dataset for Vessel Segmentation (default proto...
iostar-vessel-covd COVD-IOSTAR for Vessel Segmentation
iostar-vessel-mtest IOSTAR vessel cross-evaluation dataset with matched r...
iostar-vessel-ssl COVD-IOSTAR + SSL for Vessel Segmentation
iostar-vessel-xtest IOSTAR vessel cross-evaluation dataset
refuge-cup REFUGE dataset for Optic Cup Segmentation (default pr...
refuge-disc REFUGE dataset for Optic Disc Segmentation (default p...
rimoner3-cup RIM-ONE r3 for Optic Cup Segmentation (expert #1 anno...
rimoner3-cup-2nd RIM-ONE r3 for Optic Cup Segmentation (expert #2 anno...
rimoner3-disc RIM-ONE r3 for Optic Disc Segmentation (expert #1 ann...
rimoner3-disc-2nd RIM-ONE r3 for Optic Disc Segmentation (expert #2 ann...
stare STARE dataset for Vessel Segmentation (annotator AH)
stare-2nd STARE dataset for Vessel Segmentation (annotator VK)
stare-covd COVD-STARE for Vessel Segmentation
stare-mtest STARE cross-evaluation dataset with matched resolutio...
stare-ssl COVD-STARE + SSL (training set) for Vessel Segmentati...
stare-xtest STARE cross-evaluation dataset
module: bob.ip.binseg.configs.models
driu DRIU Network for Vessel Segmentation
driu-bn DRIU Network for Vessel Segmentation with Batch Normalization
driu-bn-ssl DRIU Network for Vessel Segmentation using SSL and Batch Norm...
driu-od DRIU Network for Optic Disc Segmentation
driu-ssl DRIU Network for Vessel Segmentation using SSL
hed HED Network for image segmentation
m2unet MobileNetV2 U-Net model for image segmentation
m2unet-ssl MobileNetV2 U-Net model for image segmentation using SSL
resunet Residual U-Net for image segmentation
unet U-Net for image segmentation
Describing a Resource¶
$ bob binseg config describe --help
Usage: bob binseg config describe [OPTIONS] NAME...
Describes a specific configuration file
Options:
-v, --verbose Increase the verbosity level from 0 (only error messages) to
1 (warnings), 2 (log messages), 3 (debug information) by
adding the --verbose option as often as desired (e.g. '-vvv'
for debug).
-?, -h, --help Show this message and exit.
Examples:
1. Describes the DRIVE (training) dataset configuration:
$ bob binseg config describe drive
2. Describes the DRIVE (training) dataset configuration and lists its
contents:
$ bob binseg config describe drive -v
Copying a Resource¶
You may use this command to locally copy a resource file so you can change it.
$ bob binseg config copy --help
Usage: bob binseg config copy [OPTIONS] SOURCE DESTINATION
Copies a specific configuration resource so it can be modified locally
Options:
-v, --verbose Increase the verbosity level from 0 (only error messages) to
1 (warnings), 2 (log messages), 3 (debug information) by
adding the --verbose option as often as desired (e.g. '-vvv'
for debug).
-h, -?, --help Show this message and exit.
Examples:
1. Makes a copy of one of the stock configuration files locally, so it can be
adapted:
$ bob binseg config copy drive -vvv newdataset.py
Running and Analyzing Experiments¶
These applications run a combined set of steps in one go. They work well with our preset configuration resources.
Running a Full Experiment Cycle¶
This command can run training, prediction, evaluation and comparison from a single, multi-step application.
$ bob binseg experiment --help
Usage: bob binseg experiment [OPTIONS] [CONFIG]...
Runs a complete experiment, from training, to prediction and evaluation
This script is just a wrapper around the individual scripts for
training, running prediction, evaluating and comparing FCN model
performance. It organises the output in a preset way::
└─ <output-folder>/
├── model/ #the generated model will be here
├── predictions/ #the prediction outputs for the train/test set
├── overlayed/ #the overlayed outputs for the train/test set
├── predictions/ #predictions overlayed on the input images
├── analysis/ #predictions overlayed on the input images
├ #including analysis of false positives, negatives
├ #and true positives
└── second-annotator/ #if set, store overlayed images for the
#second annotator here
└── analysis / #the outputs of the analysis of both train/test sets
#includes second-annotator "mesures" as well, if
# configured
Training is performed for a configurable number of epochs, and
generates at least a final_model.pth. It may also generate a
number of intermediate checkpoints. Checkpoints are model files
(.pth files) that are stored during the training and useful to
resume the procedure in case it stops abruptly.
N.B.: The tool is designed to prevent analysis bias and allows one to
provide separate subsets for training and evaluation. Instead of
using simple datasets, datasets for full experiment running should
be dictionaries with specific subset names:
* ``__train__``: dataset used for training, prioritarily. It is
typically the dataset containing data augmentation pipelines.
* ``__valid__``: dataset used for validation. It is typically
disjoint from the training and test sets. In such a case, we
checkpoint the model with the lowest loss on the validation set
as well, throughout all the training, besides the model at the
end of training. * ``train`` (optional): a copy of the
``__train__`` dataset, without data augmentation, that will be
evaluated alongside other sets available * ``*``: any other name,
not starting with an underscore character (``_``), will be
considered a test set for evaluation.
N.B.2: The threshold used for calculating the F1-score on the test
set, or overlay analysis (false positives, negatives and true
positives overprinted on the original image) also follows the
logic above.
It is possible to pass one or several Python files (or names of
``bob.ip.binseg.config`` entry points or module names) as CONFIG arguments
to the command line which contain the parameters listed below as Python
variables. The options through the command-line (see below) will override
the values of configuration files. You can run this command with
``<COMMAND> -H example_config.py`` to create a template config file.
Options:
-o, --output-folder PATH Path where to store experiment outputs
(created if does not exist) [required]
-m, --model TEXT A torch.nn.Module instance implementing the
network to be trained, and then evaluated
[required]
-d, --dataset TEXT A dictionary mapping string keys to bob.ip.b
inseg.data.utils.SampleList2TorchDataset's.
At least one key named 'train' must be
available. This dataset will be used for
training the network model. All other
datasets will be used for prediction and
evaluation. Dataset descriptions include all
required pre-processing, including eventual
data augmentation, which may be eventually
excluded for prediction and evaluation
purposes [required]
-S, --second-annotator TEXT A dataset or dictionary, like in --dataset,
with the same sample keys, but with
annotations from a different annotator that
is going to be compared to the one in
--dataset
--optimizer TEXT A torch.optim.Optimizer that will be used to
train the network [required]
--criterion TEXT A loss function to compute the FCN error for
every sample respecting the PyTorch API for
loss functions (see torch.nn.modules.loss)
[required]
--scheduler TEXT A learning rate scheduler that drives
changes in the learning rate depending on
the FCN state (see torch.optim.lr_scheduler)
[required]
-b, --batch-size INTEGER RANGE Number of samples in every batch (this
parameter affects memory requirements for
the network). If the number of samples in
the batch is larger than the total number of
samples available for training, this value
is truncated. If this number is smaller,
then batches of the specified size are
created and fed to the network until there
are no more new samples to feed (epoch is
finished). If the total number of training
samples is not a multiple of the batch-size,
the last batch will be smaller than the
first, unless --drop-incomplete--batch is
set, in which case this batch is not used.
[default: 2; required]
-D, --drop-incomplete-batch / --no-drop-incomplete-batch
If set, then may drop the last batch in an
epoch, in case it is incomplete. If you set
this option, you should also consider
increasing the total number of epochs of
training, as the total number of training
steps may be reduced [default: False;
required]
-e, --epochs INTEGER RANGE Number of epochs (complete training set
passes) to train for [default: 1000;
required]
-p, --checkpoint-period INTEGER RANGE
Number of epochs after which a checkpoint is
saved. A value of zero will disable check-
pointing. If checkpointing is enabled and
training stops, it is automatically resumed
from the last saved checkpoint if training
is restarted with the same configuration.
[default: 0; required]
-d, --device TEXT A string indicating the device to use (e.g.
"cpu" or "cuda:0") [default: cpu; required]
-s, --seed INTEGER RANGE Seed to use for the random number generator
[default: 42]
--ssl / --no-ssl Switch ON/OFF semi-supervised training mode
[default: False; required]
-r, --rampup INTEGER RANGE Ramp-up length in epochs (for SSL training
only) [default: 900; required]
-O, --overlayed / --no-overlayed
Creates overlayed representations of the
output probability maps, similar to
--overlayed in prediction-mode, except it
includes distinctive colours for true and
false positives and false negatives. If not
set, or empty then do **NOT** output
overlayed images. [default: False]
-S, --steps INTEGER This number is used to define the number of
threshold steps to consider when evaluating
the highest possible F1-score on test data.
[default: 1000; required]
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-H, --dump-config FILENAME Name of the config file to be generated
-?, -h, --help Show this message and exit.
Examples:
1. Trains an M2U-Net model (VGG-16 backbone) with DRIVE (vessel
segmentation), on the CPU, for only two epochs, then runs inference and
evaluation on stock datasets, report performance as a table and a figure:
$ bob binseg experiment -vv m2unet drive --epochs=2
Running Complete Experiment Analysis¶
This command can run prediction, evaluation and comparison from a single, multi-step application.
$ bob binseg analyze --help
Usage: bob binseg analyze [OPTIONS] [CONFIG]...
Runs a complete evaluation from prediction to comparison
This script is just a wrapper around the individual scripts for
running prediction and evaluating FCN models. It organises the
output in a preset way::
└─ <output-folder>/
├── predictions/ #the prediction outputs for the train/test set
├── overlayed/ #the overlayed outputs for the train/test set
├── predictions/ #predictions overlayed on the input images
├── analysis/ #predictions overlayed on the input images
├ #including analysis of false positives, negatives
├ #and true positives
└── second-annotator/ #if set, store overlayed images for the
#second annotator here
└── analysis / #the outputs of the analysis of both train/test sets
#includes second-annotator "mesures" as well, if
# configured
N.B.: The tool is designed to prevent analysis bias and allows one to
provide separate subsets for training and evaluation. Instead of
using simple datasets, datasets for full experiment running should
be dictionaries with specific subset names:
* ``__train__``: dataset used for training, prioritarily. It is
typically the dataset containing data augmentation pipelines.
* ``train`` (optional): a copy of the ``__train__`` dataset, without
data augmentation, that will be evaluated alongside other sets
available * ``*``: any other name, not starting with an underscore
character (``_``), will be considered a test set for evaluation.
N.B.2: The threshold used for calculating the F1-score on the test
set, or overlay analysis (false positives, negatives and true
positives overprinted on the original image) also follows the
logic above.
It is possible to pass one or several Python files (or names of
``bob.ip.binseg.config`` entry points or module names) as CONFIG arguments
to the command line which contain the parameters listed below as Python
variables. The options through the command-line (see below) will override
the values of configuration files. You can run this command with
``<COMMAND> -H example_config.py`` to create a template config file.
Options:
-o, --output-folder PATH Path where to store experiment outputs
(created if does not exist) [required]
-m, --model TEXT A torch.nn.Module instance implementing the
network to be trained, and then evaluated
[required]
-d, --dataset TEXT A dictionary mapping string keys to bob.ip.b
inseg.data.utils.SampleList2TorchDataset's.
At least one key named 'train' must be
available. This dataset will be used for
training the network model. All other
datasets will be used for prediction and
evaluation. Dataset descriptions include all
required pre-processing, including eventual
data augmentation, which may be eventually
excluded for prediction and evaluation
purposes [required]
-S, --second-annotator TEXT A dataset or dictionary, like in --dataset,
with the same sample keys, but with
annotations from a different annotator that
is going to be compared to the one in
--dataset
-b, --batch-size INTEGER RANGE Number of samples in every batch (this
parameter affects memory requirements for
the network). If the number of samples in
the batch is larger than the total number of
samples available for training, this value
is truncated. If this number is smaller,
then batches of the specified size are
created and fed to the network until there
are no more new samples to feed (epoch is
finished). If the total number of training
samples is not a multiple of the batch-size,
the last batch will be smaller than the
first. [default: 1; required]
-d, --device TEXT A string indicating the device to use (e.g.
"cpu" or "cuda:0") [default: cpu; required]
-O, --overlayed / --no-overlayed
Creates overlayed representations of the
output probability maps, similar to
--overlayed in prediction-mode, except it
includes distinctive colours for true and
false positives and false negatives. If not
set, or empty then do **NOT** output
overlayed images. [default: False]
-w, --weight TEXT Path or URL to pretrained model file (.pth
extension) [required]
-S, --steps INTEGER This number is used to define the number of
threshold steps to consider when evaluating
the highest possible F1-score on test data.
[default: 1000; required]
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-H, --dump-config FILENAME Name of the config file to be generated
-h, -?, --help Show this message and exit.
Examples:
1. Re-evaluates a pre-trained M2U-Net model with DRIVE (vessel
segmentation), on the CPU, by running inference and evaluation on results
from its test set:
$ bob binseg analyze -vv m2unet drive --weight=model.path
Single-Step Applications¶
These applications allow finer control over the experiment cycle. They also work well with our preset configuration resources, but allow finer control on the input datasets.
Training FCNs¶
Training creates of a new PyTorch model. This model can be used for evaluation tests or for inference.
$ bob binseg train --help
Usage: bob binseg train [OPTIONS] [CONFIG]...
Trains an FCN to perform binary segmentation
Training is performed for a configurable number of epochs, and generates
at least a final_model.pth. It may also generate a number of intermediate
checkpoints. Checkpoints are model files (.pth files) that are stored
during the training and useful to resume the procedure in case it stops
abruptly.
It is possible to pass one or several Python files (or names of
``bob.ip.binseg.config`` entry points or module names) as CONFIG arguments
to the command line which contain the parameters listed below as Python
variables. The options through the command-line (see below) will override
the values of configuration files. You can run this command with
``<COMMAND> -H example_config.py`` to create a template config file.
Options:
-o, --output-folder PATH Path where to store the generated model
(created if does not exist) [required]
-m, --model TEXT A torch.nn.Module instance implementing the
network to be trained [required]
-d, --dataset TEXT A torch.utils.data.dataset.Dataset instance
implementing a dataset to be used for
training the model, possibly including all
pre-processing pipelines required or,
optionally, a dictionary mapping string keys
to torch.utils.data.dataset.Dataset
instances. At least one key named ``train``
must be available. This dataset will be
used for training the network model. The
dataset description must include all
required pre-processing, including eventual
data augmentation. If a dataset named
``__train__`` is available, it is used
prioritarily for training instead of
``train``. If a dataset named ``__valid__``
is available, it is used for model
validation (and automatic check-pointing) at
each epoch. [required]
--optimizer TEXT A torch.optim.Optimizer that will be used to
train the network [required]
--criterion TEXT A loss function to compute the FCN error for
every sample respecting the PyTorch API for
loss functions (see torch.nn.modules.loss)
[required]
--scheduler TEXT A learning rate scheduler that drives
changes in the learning rate depending on
the FCN state (see torch.optim.lr_scheduler)
[required]
-b, --batch-size INTEGER RANGE Number of samples in every batch (this
parameter affects memory requirements for
the network). If the number of samples in
the batch is larger than the total number of
samples available for training, this value
is truncated. If this number is smaller,
then batches of the specified size are
created and fed to the network until there
are no more new samples to feed (epoch is
finished). If the total number of training
samples is not a multiple of the batch-size,
the last batch will be smaller than the
first, unless --drop-incomplete--batch is
set, in which case this batch is not used.
[default: 2; required]
-D, --drop-incomplete-batch / --no-drop-incomplete-batch
If set, then may drop the last batch in an
epoch, in case it is incomplete. If you set
this option, you should also consider
increasing the total number of epochs of
training, as the total number of training
steps may be reduced [default: False;
required]
-e, --epochs INTEGER RANGE Number of epochs (complete training set
passes) to train for [default: 1000;
required]
-p, --checkpoint-period INTEGER RANGE
Number of epochs after which a checkpoint is
saved. A value of zero will disable check-
pointing. If checkpointing is enabled and
training stops, it is automatically resumed
from the last saved checkpoint if training
is restarted with the same configuration.
[default: 0; required]
-d, --device TEXT A string indicating the device to use (e.g.
"cpu" or "cuda:0") [default: cpu; required]
-s, --seed INTEGER RANGE Seed to use for the random number generator
[default: 42]
--ssl / --no-ssl Switch ON/OFF semi-supervised training mode
[default: False; required]
-r, --rampup INTEGER RANGE Ramp-up length in epochs (for SSL training
only) [default: 900; required]
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-H, --dump-config FILENAME Name of the config file to be generated
-h, -?, --help Show this message and exit.
Examples:
1. Trains a U-Net model (VGG-16 backbone) with DRIVE (vessel segmentation),
on a GPU (``cuda:0``):
$ bob binseg train -vv unet drive --batch-size=4 --device="cuda:0"
2. Trains a HED model with HRF on a GPU (``cuda:0``):
$ bob binseg train -vv hed hrf --batch-size=8 --device="cuda:0"
3. Trains a M2U-Net model on the COVD-DRIVE dataset on the CPU:
$ bob binseg train -vv m2unet covd-drive --batch-size=8
4. Trains a DRIU model with SSL on the COVD-HRF dataset on the CPU:
$ bob binseg train -vv --ssl driu-ssl covd-drive-ssl --batch-size=1
Prediction with FCNs¶
Inference takes as input a PyTorch model and generates output probabilities as HDF5 files. The probability map has the same size as the input and indicates, from 0 to 1 (floating-point number), the probability of a vessel in that pixel, from less probable (0.0) to more probable (1.0).
$ bob binseg predict --help
Usage: bob binseg predict [OPTIONS] [CONFIG]...
Predicts vessel map (probabilities) on input images
It is possible to pass one or several Python files (or names of
``bob.ip.binseg.config`` entry points or module names) as CONFIG arguments
to the command line which contain the parameters listed below as Python
variables. The options through the command-line (see below) will override
the values of configuration files. You can run this command with
``<COMMAND> -H example_config.py`` to create a template config file.
Options:
-o, --output-folder PATH Path where to store the predictions (created
if does not exist) [required]
-m, --model TEXT A torch.nn.Module instance implementing the
network to be evaluated [required]
-d, --dataset TEXT A torch.utils.data.dataset.Dataset instance
implementing a dataset to be used for
running prediction, possibly including all
pre-processing pipelines required or,
optionally, a dictionary mapping string keys
to torch.utils.data.dataset.Dataset
instances. All keys that do not start with
an underscore (_) will be processed.
[required]
-b, --batch-size INTEGER RANGE Number of samples in every batch (this
parameter affects memory requirements for
the network) [default: 1; required]
-d, --device TEXT A string indicating the device to use (e.g.
"cpu" or "cuda:0") [default: cpu; required]
-w, --weight TEXT Path or URL to pretrained model file (.pth
extension) [required]
-O, --overlayed TEXT Creates overlayed representations of the
output probability maps on top of input
images (store results as PNG files). If
not set, or empty then do **NOT** output
overlayed images. Otherwise, the parameter
represents the name of a folder where to
store those
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-H, --dump-config FILENAME Name of the config file to be generated
-h, -?, --help Show this message and exit.
Examples:
1. Runs prediction on an existing dataset configuration:
$ bob binseg predict -vv m2unet drive --weight=path/to/model_final.pth --output-folder=path/to/predictions
2. To run prediction on a folder with your own images, you must first
specify resizing, cropping, etc, so that the image can be correctly
input to the model. Failing to do so will likely result in poor
performance. To figure out such specifications, you must consult the
dataset configuration used for **training** the provided model. Once
you figured this out, do the following:
$ bob binseg config copy csv-dataset-example mydataset.py
# modify "mydataset.py" to include the base path and required transforms
$ bob binseg predict -vv m2unet mydataset.py --weight=path/to/model_final.pth --output-folder=path/to/predictions
FCN Performance Evaluation¶
Evaluation takes inference results and compares it to ground-truth, generating a series of analysis figures which are useful to understand model performance.
$ bob binseg evaluate --help
Usage: bob binseg evaluate [OPTIONS] [CONFIG]...
Evaluates an FCN on a binary segmentation task.
It is possible to pass one or several Python files (or names of
``bob.ip.binseg.config`` entry points or module names) as CONFIG arguments
to the command line which contain the parameters listed below as Python
variables. The options through the command-line (see below) will override
the values of configuration files. You can run this command with
``<COMMAND> -H example_config.py`` to create a template config file.
Options:
-o, --output-folder PATH Path where to store the analysis result
(created if does not exist) [required]
-p, --predictions-folder DIRECTORY
Path where predictions are currently stored
[required]
-d, --dataset TEXT A torch.utils.data.dataset.Dataset instance
implementing a dataset to be used for
evaluation purposes, possibly including all
pre-processing pipelines required or,
optionally, a dictionary mapping string keys
to torch.utils.data.dataset.Dataset
instances. All keys that do not start with
an underscore (_) will be processed.
[required]
-S, --second-annotator TEXT A dataset or dictionary, like in --dataset,
with the same sample keys, but with
annotations from a different annotator that
is going to be compared to the one in
--dataset. The same rules regarding dataset
naming conventions apply
-O, --overlayed TEXT Creates overlayed representations of the
output probability maps, similar to
--overlayed in prediction-mode, except it
includes distinctive colours for true and
false positives and false negatives. If not
set, or empty then do **NOT** output
overlayed images. Otherwise, the parameter
represents the name of a folder where to
store those
-t, --threshold TEXT This number is used to define positives and
negatives from probability maps, and report
F1-scores (a priori). It should either come
from the training set or a separate
validation set to avoid biasing the
analysis. Optionally, if you provide a
multi-set dataset as input, this may also be
the name of an existing set from which the
threshold will be estimated (highest
F1-score) and then applied to the subsequent
sets. This number is also used to print the
test set F1-score a priori performance
-S, --steps INTEGER This number is used to define the number of
threshold steps to consider when evaluating
the highest possible F1-score on test data.
[default: 1000; required]
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-H, --dump-config FILENAME Name of the config file to be generated
-?, -h, --help Show this message and exit.
Examples:
1. Runs evaluation on an existing dataset configuration:
$ bob binseg evaluate -vv drive --predictions-folder=path/to/predictions --output-folder=path/to/results
2. To run evaluation on a folder with your own images and annotations, you
must first specify resizing, cropping, etc, so that the image can be
correctly input to the model. Failing to do so will likely result in
poor performance. To figure out such specifications, you must consult
the dataset configuration used for **training** the provided model.
Once you figured this out, do the following:
$ bob binseg config copy csv-dataset-example mydataset.py
# modify "mydataset.py" to your liking
$ bob binseg evaluate -vv mydataset.py --predictions-folder=path/to/predictions --output-folder=path/to/results
Performance Comparison¶
Performance comparison takes the performance evaluation results and generate combined figures and tables that compare results of multiple systems.
$ bob binseg compare --help
Usage: bob binseg compare [OPTIONS] [LABEL_PATH]...
Compares multiple systems together
Options:
-f, --output-figure FILE Path where write the output figure (any
extension supported by matplotlib is
possible). If not provided, does not
produce a figure.
-T, --table-format [fancy_grid|github|grid|html|jira|latex|latex_booktabs|latex_raw|mediawiki|moinmoin|orgtbl|pipe|plain|presto|psql|rst|simple|textile|tsv|youtrack]
The format to use for the comparison table
[default: rst; required]
-u, --output-table FILE Path where write the output table. If not
provided, does not write write a table to
file, only to stdout.
-t, --threshold TEXT This number is used to select which F1-score
to use for representing a system
performance. If not set, we report the
maximum F1-score in the set, which is
equivalent to threshold selection a
posteriori (biased estimator). You can
either set this value to a floating-point
number in the range [0.0, 1.0], or to a
string, naming one of the systems which will
be used to calculate the threshold leading
to the maximum F1-score and then applied to
all other sets.
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-h, -?, --help Show this message and exit.
Examples:
1. Compares system A and B, with their own pre-computed measure files:
$ bob binseg compare -vv A path/to/A/train.csv B path/to/B/test.csv
Performance Difference Significance¶
Calculates the significance between results obtained through 2 systems on the same dataset.
$ bob binseg significance --help
Usage: bob binseg significance [OPTIONS] [CONFIG]...
Evaluates how significantly different are two models on the same dataset
This application calculates the significance of results of two models
operating on the same dataset, and subject to a priori threshold tunning.
It is possible to pass one or several Python files (or names of
``bob.ip.binseg.config`` entry points or module names) as CONFIG arguments
to the command line which contain the parameters listed below as Python
variables. The options through the command-line (see below) will override
the values of configuration files. You can run this command with
``<COMMAND> -H example_config.py`` to create a template config file.
Options:
-n, --names TEXT... Names of the two systems to compare
[required]
-p, --predictions DIRECTORY... Path where predictions of system 2 are
currently stored. You may also input
predictions from a second-annotator. This
application will adequately handle it.
[required]
-d, --dataset TEXT A dictionary mapping string keys to
torch.utils.data.dataset.Dataset instances
[required]
-t, --threshold TEXT This number is used to define positives and
negatives from probability maps, and report
F1-scores (a priori). By default, we expect
a set named 'validation' to be available at
the input data. If that is not the case, we
use 'train', if available. You may provide
the name of another dataset to be used for
threshold tunning otherwise. If not set, or
a string is input, threshold tunning is done
per system, individually. Optionally, you
may also provide a floating-point number
between [0.0, 1.0] as the threshold to use
for both systems. [default: validation;
required]
-e, --evaluate TEXT Name of the dataset to evaluate [default:
test; required]
-S, --steps INTEGER This number is used to define the number of
threshold steps to consider when evaluating
the highest possible F1-score on train/test
data. [default: 1000; required]
-s, --size INTEGER... This is a tuple with two values indicating
the size of windows to be used for sliding
window analysis. The values represent
height and width respectively. [default:
128, 128; required]
-t, --stride INTEGER... This is a tuple with two values indicating
the stride of windows to be used for sliding
window analysis. The values represent
height and width respectively. [default:
32, 32; required]
-f, --figure TEXT The name of a performance figure (e.g.
f1_score, or jaccard) to use when comparing
performances [default: accuracy; required]
-o, --output-folder PATH Path where to store visualizations
-R, --remove-outliers / --no-remove-outliers
If set, removes outliers from both score
distributions before running statistical
analysis. Outlier removal follows a 1.5 IQR
range check from the difference in figures
between both systems and assumes most of the
distribution is contained within that range
(like in a normal distribution) [default:
False; required]
-R, --remove-zeros / --no-remove-zeros
If set, removes instances from the
statistical analysis in which both systems
had a performance equal to zero. [default:
False; required]
-x, --parallel INTEGER Set the number of parallel processes to use
when running using multiprocessing. A value
of zero uses all reported cores. [default:
1; required]
-k, --checkpoint-folder PATH Path where to store checkpointed versions of
sliding window performances
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-H, --dump-config FILENAME Name of the config file to be generated
-h, -?, --help Show this message and exit.
Examples:
1. Runs a significance test using as base the calculated predictions of two
different systems, on the **same** dataset:
$ bob binseg significance -vv drive --names system1 system2 --predictions=path/to/predictions/system-1 path/to/predictions/system-2
2. By default, we use a "validation" dataset if it is available, to infer
the a priori threshold for the comparison of two systems. Otherwise,
you may need to specify the name of a set to be used as validation set
for choosing a threshold. The same goes for the set to be used for
testing the hypothesis - by default we use the "test" dataset if it is
available, otherwise, specify.
$ bob binseg significance -vv drive --names system1 system2 --predictions=path/to/predictions/system-1 path/to/predictions/system-2 --threshold=train --evaluate=alternate-test