Python API for bob.bio.base

Generic functions

Functions dealing with resources

bob.bio.base.valid_keywords tuple() -> empty tuple

Miscellaneous functions

bob.bio.base.get_config() Returns a string containing the configuration information.

Tools to run recognition experiments

Command line generation

Controlling of elements

Preprocessing

Feature Extraction

Algorithm

Scoring

Details

bob.bio.base.valid_keywords

Valid keywords, for which resources are defined, are ('database', 'preprocessor', 'extractor', 'algorithm', 'grid')

bob.bio.base.get_config()[source]

Returns a string containing the configuration information.

class bob.bio.base.Singleton(decorated)[source]

A non-thread-safe helper class to ease implementing singletons. This should be used as a decorator – not a metaclass – to the class that should be a singleton.

The decorated class can define one __init__ function that takes an arbitrary list of parameters.

To get the singleton instance, use the instance() method. Trying to use __call__() will result in a TypeError being raised.

Limitations:

  • The decorated class cannot be inherited from.
  • The documentation of the decorated class is replaced with the documentation of this class.
__call__()[source]
create(*args, **kwargs)[source]

Creates the singleton instance, by passing the given parameters to the class’ constructor.

instance()[source]

Returns the singleton instance. The function create() must have been called before.

bob.bio.base.check_file(filename, force, expected_file_size=1)[source]

Checks if the file with the given filename exists and has size greater or equal to expected_file_size. If the file is to small, or if the force option is set to True, the file is removed. This function returns True is the file exists (and has not been removed), otherwise False

bob.bio.base.close_compressed(filename, hdf5_file, compression_type='bz2', create_link=False)[source]

Closes the compressed hdf5_file that was opened with open_compressed. When the file was opened for writing (using the ‘w’ flag in open_compressed), the created HDF5 file is compressed into the given file name. To be able to read the data using the real tools, a link with the correct extension might is created, when create_link is set to True.

bob.bio.base.database_directories(strip=['dummy'], replacements=None)[source]

Returns a dictionary of original directories for all registered databases.

bob.bio.base.extensions(keywords=valid_keywords) → extensions[source]

Returns a list of packages that define extensions using the given keywords.

Parameters:

keywords : [str]
A list of keywords to load entry points for. Defaults to all valid_keywords.
bob.bio.base.list_resources(keyword, strip=['dummy'])[source]

Returns a string containing a detailed list of resources that are registered with the given keyword.

bob.bio.base.load(file)[source]

Loads data from file. The given file might be an HDF5 file open for reading or a string.

bob.bio.base.load_compressed(filename, compression_type='bz2')[source]

Extracts the data to a temporary HDF5 file using HDF5 and reads its contents. Note that, though the file name is .hdf5, it contains compressed data! Accepted compression types are ‘gz’, ‘bz2’, ‘’

bob.bio.base.load_resource(resource, keyword, imports = ['bob.bio.base'], preferred_extension = None) → resource[source]

Loads the given resource that is registered with the given keyword. The resource can be:

  1. a resource as defined in the setup.py
  2. a configuration file
  3. a string defining the construction of an object. If imports are required for the construction of this object, they can be given as list of strings.

Parameters:

resource : str
Any string interpretable as a resource (see above).
keyword : str
A valid resource keyword, can be one of valid_keywords.
imports : [str]
A list of strings defining which modules to import, when constructing new objects (option 3).
preferred_extension : str or None
When several resources with the same name are found in different extension (in different bob.bio packages), this specifies the preferred extension to load the resource from. If not specified, the extension that is not bob.bio.base is selected.

Returns:

resource : object
The resulting resource object is returned, either read from file or resource, or created newly.
bob.bio.base.open_compressed(filename, open_flag='r', compression_type='bz2')[source]

Opens a compressed HDF5File with the given opening flags. For the ‘r’ flag, the given compressed file will be extracted to a local space. For ‘w’, an empty HDF5File is created. In any case, the opened HDF5File is returned, which needs to be closed using the close_compressed() function.

bob.bio.base.read_config_file(filename, keyword = None) → config[source]

Use this function to read the given configuration file. If a keyword is specified, only the configuration according to this keyword is returned. Otherwise a dictionary of the configurations read from the configuration file is returned.

Parameters:

filename : str
The name of the configuration file to read.
keyword : str or None
If specified, only the contents of the variable with the given name is returned. If None, the whole configuration is returned (a local namespace)

Returns:

config : object or namespace
If keyword is specified, the object inside the configuration with the given name is returned. Otherwise, the whole configuration is returned (as a local namespace).
bob.bio.base.resource_keys(keyword, exclude_packages=[], strip=['dummy'])[source]

Reads and returns all resources that are registered with the given keyword. Entry points from the given exclude_packages are ignored.

bob.bio.base.save(data, file, compression=0)[source]

Saves the data to file using HDF5. The given file might be an HDF5 file open for writing, or a string. If the given data contains a save method, this method is called with the given HDF5 file. Otherwise the data is written to the HDF5 file using the given compression.

bob.bio.base.save_compressed(data, filename, compression_type='bz2', create_link=False)[source]

Saves the data to a temporary file using HDF5. Afterwards, the file is compressed using the given compression method and saved using the given file name. Note that, though the file name will be .hdf5, it will contain compressed data! Accepted compression types are ‘gz’, ‘bz2’, ‘’

bob.bio.base.score_fusion_strategy(strategy_name='avarage')[source]

Returns a function to compute a fusion strategy between different scores.

Different strategies are employed:

  • 'average' : The averaged score is computed using the numpy.average() function.
  • 'min' : The minimum score is computed using the min() function.
  • 'max' : The maximum score is computed using the max() function.
  • 'median' : The median score is computed using the numpy.median() function.
  • None is also accepted, in which case None is returned.
bob.bio.base.selected_elements(list_of_elements, desired_number_of_elements=None)[source]

Returns a list of elements that are sub-selected from the given list (or the list itself, if its length is smaller). These elements are selected such that they are evenly spread over the whole list.

bob.bio.base.selected_indices(total_number_of_indices, desired_number_of_indices=None)[source]

Returns a list of indices that will contain exactly the number of desired indices (or the number of total items in the list, if this is smaller). These indices are selected such that they are evenly spread over the whole sequence.

class bob.bio.base.tools.FileSelector[source]

This class provides shortcuts for selecting different files for different stages of the verification process.

It communicates with the database and provides lists of file names for all steps of the tool chain.

Todo

Find a way that this class’ methods get correctly documented, instead of the bob.bio.base.Singleton wrapper class.

Parameters:

database : bob.bio.base.database.Database or derived
The database object that provides the list of files.
preprocessed_directory : str
The directory, where preprocessed data should be written to.
extractor_file : str
The filename, where the extractor should be written to (if any).
extracted_directory : str
The directory, where extracted features should be written to.
projector_file : str
The filename, where the projector should be written to (if any).
projected_directory : str
The directory, where projetced features should be written to (if required).
enroller_file : str
The filename, where the enroller should be written to (if required).
model_directories : (str, str)
The directories, where models and t-norm models should be written to.
score_directories : (str, str)
The directories, where score files for no-norm and ZT-norm should be written to.
zt_score_directories : (str, str, str, str, str) or None
If given, specify the directories, where intermediate score files required to compute the ZT-norm should be written. The 5 directories are for 1: normal scores; 2: Z-scores; 3: T-scores; 4: ZT-scores; 5: ZT-samevalue scores.
default_extension : str
The default extension of all intermediate files.
compressed_extension : str
The extension for writing compressed score files. By default, no compression is performed.
__call__()
create(*args, **kwargs)

Creates the singleton instance, by passing the given parameters to the class’ constructor.

instance()

Returns the singleton instance. The function create() must have been called before.

bob.bio.base.tools.calibrate(compute_zt_norm, groups=['dev', 'eval'], prior=0.5, write_compressed=False)[source]

Calibrates the score files by learning a linear calibration from the dev files (first element of the groups) and executing the on all groups.

This function is intended to compute the calibration parameters on the scores of the development set using the bob.learn.linear.CGLogRegTrainer. Afterward, both the scores of the development and evaluation sets are calibrated and written to file. For ZT-norm scores, the calibration is performed independently, if enabled. The names of the calibrated score files that should be written are obtained from the bob.bio.base.tools.FileSelector.

Parameters:

compute_zt_norm : bool
If set to True, also score files for ZT-norm are calibrated.
groups : some of ('dev', 'eval')
The list of groups, for which score files should be calibrated. The first of the given groups is used to train the logistic regression parameters, while the calibration is performed for all given groups.
write_compressed : bool
If enabled, calibrated score files are compressed as .tar.bz2 files.
bob.bio.base.tools.command_line(cmdline) → str[source]

Converts the given options to a string that can be executed in a terminal. Parameters are enclosed into '...' quotes so that the command line can interpret them (e.g., if they contain spaces or special characters).

Parameters:

cmdline : [str]
A list of command line options to be converted into a string.

Returns:

str : str
The command line string that can be copy-pasted into the terminal.
bob.bio.base.tools.command_line_parser(description=__doc__, exclude_resources_from=[]) → parsers[source]

Creates an argparse.ArgumentParser object that includes the minimum set of command line options (which is not so few). The description can be overwritten, but has a (small) default.

Included in the parser, several groups are defined. Each group specifies a set of command line options. For the configurations, registered resources are listed, which can be limited by the exclude_resources_from list of extensions.

It returns a dictionary, containing the parser object itself (in the 'main' keyword), and a list of command line groups.

Parameters:

description : str
The documentation of the script.
exclude_resources_from : [str]
A list of extension packages, for which resources should not be listed.

Returns:

parsers : dict
A dictionary of parser groups, with the main parser under the ‘main’ key. Feel free to add more options to any of the parser groups.
bob.bio.base.tools.compute_scores(algorithm, compute_zt_norm, force=False, indices=None, groups=['dev', 'eval'], types=['A', 'B', 'C', 'D'], write_compressed=False)[source]

Computes the scores for the given groups.

This function computes all scores for the experiment, and writes them to files, one per model. When compute_zt_norm is enabled, scores are computed for all four matrices, i.e. A: normal scores; B: Z-norm scores; C: T-norm scores; D: ZT-norm scores and ZT-samevalue scores. By default, scores are computed for both groups 'dev' and 'eval'.

Parameters:

algorithm : py:class:bob.bio.base.algorithm.Algorithm or derived
The algorithm, used for enrolling model and writing them to file.
force : bool
If given, files are regenerated, even if they already exist.
compute_zt_norm : bool
If set to True, also ZT-norm scores are computed.
indices : (int, int) or None

If specified, scores are computed only for the models in the given index range range(begin, end). This is usually given, when parallel threads are executed.

Note

The probe files are not limited by the indices.

groups : some of ('dev', 'eval')
The list of groups, for which scores should be computed.
types : some of ['A', 'B', 'C', 'D']
A list of score types to be computed. If compute_zt_norm = False, only the 'A' scores are computed.
write_compressed : bool
If enabled, score files are compressed as .tar.bz2 files.
bob.bio.base.tools.concatenate(compute_zt_norm, groups=['dev', 'eval'], write_compressed=False)[source]

Concatenates all results into one (or two) score files per group.

Score files, which were generated per model, are concatenated into a single score file, which can be interpreter by bob.measure.load.split_four_column(). The score files are always re-computed, regardless if they exist or not.

Parameters:

compute_zt_norm : bool
If set to True, also score files for ZT-norm are concatenated.
groups : some of ('dev', 'eval')
The list of groups, for which score files should be concatenated.
write_compressed : bool
If enabled, concatenated score files are compressed as .tar.bz2 files.
bob.bio.base.tools.enroll(algorithm, extractor, compute_zt_norm, indices=None, groups=['dev', 'eval'], types=['N', 'T'], force=False)[source]
Enroll the models for the given groups, eventually for both models and T-Norm-models.
This function uses the extracted or projected features to compute the models, depending on your setup of the given algorithm.

The given algorithm is used to enroll all models required for the current experiment. It writes the models into the directories specified by the bob.bio.base.tools.FileSelector. By default, if target files already exist, they are not re-created.

The extractor is only used to load features in a coherent way.

Parameters:

algorithm : py:class:bob.bio.base.algorithm.Algorithm or derived
The algorithm, used for enrolling model and writing them to file.
extractor : py:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for reading the extracted features, if the algorithm enrolls models from unprojected data.
compute_zt_norm : bool
If set to True and ‘T’` is part of the types, also T-norm models are extracted.
indices : (int, int) or None
If specified, only the models for the given index range range(begin, end) should be enrolled. This is usually given, when parallel threads are executed.
groups : some of ('dev', 'eval')
The list of groups, for which models should be enrolled.
force : bool
If given, files are regenerated, even if they already exist.
bob.bio.base.tools.extract(extractor, preprocessor, groups=None, indices=None, force=False)[source]

Extracts features from the preprocessed data using the given extractor.

The given extractor is used to extract all features required for the current experiment. It writes the extracted data into the directory specified by the bob.bio.base.tools.FileSelector. By default, if target files already exist, they are not re-created.

The preprocessor is only used to load the data in a coherent way.

Parameters:

extractor : py:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for extracting and writing the features.
preprocessor : py:class:bob.bio.base.preprocessor.Preprocessor or derived
The preprocessor, used for reading the preprocessed data.
groups : some of ('world', 'dev', 'eval') or None
The list of groups, for which the data should be extracted.
indices : (int, int) or None
If specified, only the features for the given index range range(begin, end) should be extracted. This is usually given, when parallel threads are executed.
force : bool
If given, files are regenerated, even if they already exist.
bob.bio.base.tools.groups(args) → groups[source]

Returns the groups, for which the files must be preprocessed, and features must be extracted and projected. This function should be used in order to eliminate the training files (the 'world' group), when no training is required in this experiment.

Parameters:

args : namespace
The interpreted command line arguments as returned by the initialize() function.

Returns:

groups : [str]
A list of groups, for which data needs to be treated.
bob.bio.base.tools.indices(list_to_split, number_of_parallel_jobs, task_id=None)[source]

This function returns the first and last index for the files for the current job ID. If no job id is set (e.g., because a sub-job is executed locally), it simply returns all indices.

bob.bio.base.tools.initialize(parsers, command_line_parameters = None, skips = []) → args[source]

Parses the command line and arranges the arguments accordingly. Afterward, it loads the resources for the database, preprocessor, extractor, algorithm and grid (if specified), and stores the results into the returned args.

This function also initializes the FileSelector instance by arranging the directories and files according to the command line parameters.

If the skips are given, an ‘–execute-only’ parameter is added to the parser, according skips are selected.

Parameters:

parsers : dict
The dictionary of command line parsers, as returned from command_line_parser(). Additional arguments might have been added.
command_line_parameters : [str] or None
The command line parameters that should be interpreted. By default, the parameters specified by the user on command line are considered.
skips : [str]
A list of possible --skip-... options to be added and evaluated automatically.

Returns:

args : namespace

A namespace of arguments as read from the command line.

Note

The database, preprocessor, extractor, algorithm and grid (if specified) are actual instances of the according classes.

bob.bio.base.tools.preprocess(preprocessor, groups=None, indices=None, force=False)[source]

Preprocesses the original data of the database with the given preprocessor.

The given preprocessor is used to preprocess all data required for the current experiment. It writes the preprocessed data into the directory specified by the bob.bio.base.tools.FileSelector. By default, if target files already exist, they are not re-created.

Parameters:

preprocessor : py:class:bob.bio.base.preprocessor.Preprocessor or derived
The preprocessor, which should be applied to all data.
groups : some of ('world', 'dev', 'eval') or None
The list of groups, for which the data should be preprocessed.
indices : (int, int) or None
If specified, only the data for the given index range range(begin, end) should be preprocessed. This is usually given, when parallel threads are executed.
force : bool
If given, files are regenerated, even if they already exist.
bob.bio.base.tools.project(algorithm, extractor, groups=None, indices=None, force=False)[source]

Projects the features for all files of the database.

The given algorithm is used to project all features required for the current experiment. It writes the projected data into the directory specified by the bob.bio.base.tools.FileSelector. By default, if target files already exist, they are not re-created.

The extractor is only used to load the data in a coherent way.

Parameters:

algorithm : py:class:bob.bio.base.algorithm.Algorithm or derived
The algorithm, used for projecting features and writing them to file.
extractor : py:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for reading the extracted features, which should be projected.
groups : some of ('world', 'dev', 'eval') or None
The list of groups, for which the data should be projected.
indices : (int, int) or None
If specified, only the features for the given index range range(begin, end) should be projected. This is usually given, when parallel threads are executed.
force : bool
If given, files are regenerated, even if they already exist.
bob.bio.base.tools.read_features(file_names, extractor, split_by_client = False) → extracted[source]

Reads the extracted features from file_names using the given extractor. If split_by_client is set to True, it is assumed that the file_names are already sorted by client.

Parameters:

file_names : [str] or [[str]]
A list of names of files to be read. If split_by_client = True, file names are supposed to be split into groups.
extractor : py:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for reading the extracted features.
split_by_client : bool
Indicates if the given file_names are split into groups.

Returns:

extracted : [object] or [[object]]
The list of extracted features, in the same order as in the file_names.
bob.bio.base.tools.read_preprocessed_data(file_names, preprocessor, split_by_client = False) → preprocessed[source]

Reads the preprocessed data from file_names using the given preprocessor. If split_by_client is set to True, it is assumed that the file_names are already sorted by client.

Parameters:

file_names : [str] or [[str]]
A list of names of files to be read. If split_by_client = True, file names are supposed to be split into groups.
preprocessor : py:class:bob.bio.base.preprocessor.Preprocessor or derived
The preprocessor, which can read the preprocessed data.
split_by_client : bool
Indicates if the given file_names are split into groups.

Returns:

preprocessed : [object] or [[object]]
The list of preprocessed data, in the same order as in the file_names.
bob.bio.base.tools.train_enroller(algorithm, extractor, force=False)[source]

Trains the model enroller using the extracted or projected features, depending on your setup of the algorithm.

This function should only be called, when the algorithm actually requires enroller training. The enroller of the given algorithm is trained using extracted or projected features. It writes the enroller to the file specified by the bob.bio.base.tools.FileSelector. By default, if the target file already exist, it is not re-created.

Parameters:

algorithm : py:class:bob.bio.base.algorithm.Algorithm or derived
The algorithm, in which the enroller should be trained. It is assured that the projector file is read (if required) before the enroller training is started.
extractor : py:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for reading the training data, if unprojected features are used for enroller training.
force : bool
If given, the enroller file is regenerated, even if it already exists.
bob.bio.base.tools.train_extractor(extractor, preprocessor, force=False)[source]

Trains the feature extractor using preprocessed data of the 'world' group, if the feature extractor requires training.

This function should only be called, when the extractor actually requires training. The given extractor is trained using preprocessed data. It writes the extractor to the file specified by the bob.bio.base.tools.FileSelector. By default, if the target file already exist, it is not re-created.

Parameters:

extractor : py:class:bob.bio.base.extractor.Extractor or derived
The extractor to be trained.
preprocessor : py:class:bob.bio.base.preprocessor.Preprocessor or derived
The preprocessor, used for reading the preprocessed data.
force : bool
If given, the extractor file is regenerated, even if it already exists.
bob.bio.base.tools.train_projector(algorithm, extractor, force=False)[source]

Trains the feature projector using extracted features of the 'world' group, if the algorithm requires projector training.

This function should only be called, when the algorithm actually requires projector training. The projector of the given algorithm is trained using extracted features. It writes the projector to the file specified by the bob.bio.base.tools.FileSelector. By default, if the target file already exist, it is not re-created.

Parameters:

algorithm : py:class:bob.bio.base.algorithm.Algorithm or derived
The algorithm, in which the projector should be trained.
extractor : py:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for reading the training data.
force : bool
If given, the projector file is regenerated, even if it already exists.
bob.bio.base.tools.write_info(args, command_line_parameters, executable)[source]

Writes information about the current experimental setup into a file specified on command line.

Parameters:

args : namespace
The interpreted command line arguments as returned by the initialize() function.
command_line_parameters : [str] or None
The command line parameters that have been interpreted. If None, the parameters specified by the user on command line are considered.
executable : str
The name of the executable (such as './bin/verify.py') that is used to run the experiments.
bob.bio.base.tools.zt_norm(groups=['dev', 'eval'], write_compressed=False)[source]

Computes ZT-Norm using the previously generated A, B, C, D and D-samevalue matrix files.

This function computes the ZT-norm scores for all model ids for all desired groups and writes them into files defined by the bob.bio.base.tools.FileSelector. It loads the A, B, C, D and D-samevalue matrix files that need to be computed beforehand.

Parameters:

groups : some of ('dev', 'eval')
The list of groups, for which ZT-norm should be applied.
write_compressed : bool
If enabled, score files are compressed as .tar.bz2 files.