Python API for bob.bio.base¶
Pipelines¶
Database¶
|
Base class for PipelineSimple databases |
Returns |
|
Returns references to enroll biometric references |
|
Returns probes to score biometric references |
Database implementations¶
Biometric Algorithm¶
Describes a base biometric comparator for the PipelineSimple Biometric Algorithm. |
|
Creates enroll or probe templates from multiple sets of features. |
|
Computes the similarity score between all enrollment and probe templates. |
Writing Scores¶
|
Defines base methods to read, write scores and concatenate scores for |
Read and write scores using the four columns format |
|
Read and write scores in CSV format, shipping all metadata with the scores |
Assembling the pipeline¶
|
The simplest possible pipeline |
Apply Z, T or ZT Score normalization on top of Pimple Pipeline |
Creating Transformers from legacy constructs¶
Scikit learn transformer for |
|
Scikit learn transformer for |
Legacy Constructs¶
Base classes¶
This is the base class for all preprocessors. |
|
This is the base class for all feature extractors. |
Implementations¶
A distance algorithm to compare feature vectors. |
|
|
Algorithm for computing UBM and Gaussian Mixture Models of the features. |
|
ISV transformer and bioalgorithm to be used in pipelines |
|
JFA transformer and bioalgorithm to be used in pipelines |
An OrdinalEncoder that can converts reference_id strings to integers. |
|
The AT&T (aka ORL) database of faces (http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html). |
Generic functions¶
Functions dealing with resources¶
Loads the given resource that is registered with the given keyword. |
|
Use this function to read the given configuration file. |
|
Reads and returns all resources that are registered with the given keyword. |
|
Returns a list of packages that define extensions using the given keywords. |
|
Keywords for which resources are defined. |
Miscellaneous functions¶
Returns a string containing the configuration information. |
|
Returns a function to compute a fusion strategy between different scores. |
|
|
Returns a list of elements that are sub-selected from the given list (or the list itself, if its length is smaller). |
|
Returns a list of indices that will contain exactly the number of desired indices (or the number of total items in the list, if this is smaller). |
Loading data¶
|
Opens the given score file for reading. |
|
Loads the scores from the given score file and yield its lines. |
|
Loads the scores from the given score file and splits them into positives and negatives. |
|
Loads scores to compute CMC curves. |
|
Loads a score set from a single file and yield its lines |
Loads a score set from a single file and splits the scores |
|
|
Loads scores to compute CMC curves from a file in four column format. |
|
Loads a score set from a single file and yield its lines |
Loads a score set from a single file and splits the scores |
|
|
Loads scores to compute CMC curves from a file in five column format. |
Plotting¶
|
Handles the plotting of Cmc |
|
|
|
Handles the plotting of DIR curve |
|
Histograms for biometric scores |
|
Details¶
- bob.bio.base.check_file(filename, force, expected_file_size=1)[source]¶
Checks if the file with the given
filename
exists and has size greater or equal toexpected_file_size
. If the file is to small, or if theforce
option is set toTrue
, the file is removed. This function returnsTrue
is the file exists (and has not been removed), otherwiseFalse
- bob.bio.base.close_compressed(filename, hdf5_file, compression_type='bz2', create_link=False)[source]¶
Closes the compressed hdf5_file that was opened with open_compressed. When the file was opened for writing (using the ‘w’ flag in open_compressed), the created HDF5 file is compressed into the given file name. To be able to read the data using the real tools, a link with the correct extension might is created, when create_link is set to True.
- bob.bio.base.database_directories(strip=['dummy'], replacements=None, package_prefix='bob.bio.')[source]¶
Returns a dictionary of original directories for all registered databases.
- bob.bio.base.extensions(keywords=valid_keywords, package_prefix='bob.bio.') extensions [source]¶
Returns a list of packages that define extensions using the given keywords.
Parameters:
- keywords[str]
A list of keywords to load entry points for. Defaults to all
bob.bio.base.utils.resources.valid_keywords
.- package_prefixstr
Package namespace, in which we search for entry points, e.g.,
bob.bio
.
- bob.bio.base.filter_missing_files(file_names, split_by_client=False, allow_missing_files=True)[source]¶
This function filters out files that do not exist, but only if
allow_missing_files
is set toTrue
, otherwise the list offile_names
is returned unaltered.
- bob.bio.base.filter_none(data, split_by_client=False)[source]¶
This function filters out
None
values from the given list (or list of lists, whensplit_by_client
is enabled).
- bob.bio.base.is_argument_available(argument, method)[source]¶
Check if an argument (or keyword argument) is available in a method
- bob.bio.base.method¶
Pointer to the method
- bob.bio.base.list_resources(keyword, strip=['dummy'], package_prefix='bob.bio.', verbose=False, packages=None)[source]¶
Returns a string containing a detailed list of resources that are registered with the given keyword.
- bob.bio.base.load(file)[source]¶
Loads data from file. The given file might be an HDF5 file open for reading or a string.
- bob.bio.base.load_compressed(filename, compression_type='bz2')[source]¶
Extracts the data to a temporary HDF5 file using HDF5 and reads its contents. Note that, though the file name is .hdf5, it contains compressed data! Accepted compression types are ‘gz’, ‘bz2’, ‘’
- bob.bio.base.load_resource(resource, keyword, imports=['bob.bio.base'], package_prefix='bob.bio.', preferred_package=None)[source]¶
Loads the given resource that is registered with the given keyword. The resource can be:
a resource as defined in the setup.py
a configuration file
a string defining the construction of an object. If imports are required for the construction of this object, they can be given as list of strings.
Parameters:
- resourcestr
Any string interpretable as a resource (see above).
- keywordstr
A valid resource keyword, can be one of
bob.bio.base.utils.resources.valid_keywords
.- imports[str]
A list of strings defining which modules to import, when constructing new objects (option 3).
- package_prefixstr
Package namespace, in which we search for entry points, e.g.,
bob.bio
.- preferred_packagestr or
None
When several resources with the same name are found in different packages (e.g., in different
bob.bio
or other packages), this specifies the preferred package to load the resource from. If not specified, the extension that is not frombob.bio
is selected.
Returns:
- resourceobject
The resulting resource object is returned, either read from file or resource, or created newly.
- bob.bio.base.open_compressed(filename, open_flag='r', compression_type='bz2')[source]¶
Opens a compressed HDF5File with the given opening flags. For the ‘r’ flag, the given compressed file will be extracted to a local space. For ‘w’, an empty HDF5File is created. In any case, the opened HDF5File is returned, which needs to be closed using the close_compressed() function.
- bob.bio.base.pretty_print(obj, kwargs)[source]¶
Returns a pretty-print of the parameters to the constructor of a class, which should be able to copy-paste on the command line to create the object (with few exceptions).
- bob.bio.base.read_config_file(filenames, keyword=None)[source]¶
Use this function to read the given configuration file. If a keyword is specified, only the configuration according to this keyword is returned. Otherwise a dictionary of the configurations read from the configuration file is returned.
Parameters:
- filenames[str]
A list (pontentially empty) of configuration files or resources to read running options from
- keywordstr or
None
If specified, only the contents of the variable with the given name is returned. If
None
, the whole configuration is returned (a local namespace)
Returns:
- configobject or namespace
If
keyword
is specified, the object inside the configuration with the given name is returned. Otherwise, the whole configuration is returned (as a local namespace).
- bob.bio.base.read_original_data(biofile, directory, extension)[source]¶
This function reads the original data using the given
biofile
instance. It simply callsload(directory, extension)
frombob.bio.base.database.BioFile
or one of its derivatives.- Parameters
biofile (
bob.bio.base.database.BioFile
or one of its derivatives) – The file to read the original data.directory (str) – The base directory of the database.
extension (str or
None
) – The extension of the original data. Might beNone
if thebiofile
itself has the extension stored.
- Returns
Whatver
biofile.load
returns; usually anumpy.ndarray
- Return type
- bob.bio.base.resource_keys(keyword, exclude_packages=[], package_prefix='bob.bio.', strip=['dummy'])[source]¶
Reads and returns all resources that are registered with the given keyword. Entry points from the given
exclude_packages
are ignored.
- bob.bio.base.save(data, file, compression=0)[source]¶
Saves the data to file using HDF5. The given file might be an HDF5 file open for writing, or a string. If the given data contains a
save
method, this method is called with the given HDF5 file. Otherwise the data is written to the HDF5 file using the given compression.
- bob.bio.base.save_compressed(data, filename, compression_type='bz2', create_link=False)[source]¶
Saves the data to a temporary file using HDF5. Afterwards, the file is compressed using the given compression method and saved using the given file name. Note that, though the file name will be .hdf5, it will contain compressed data! Accepted compression types are ‘gz’, ‘bz2’, ‘’
- bob.bio.base.score_fusion_strategy(strategy_name='average')[source]¶
Returns a function to compute a fusion strategy between different scores.
Different strategies are employed:
'average'
: The averaged score is computed using thenumpy.average()
function.'min'
: The minimum score is computed using themin()
function.'max'
: The maximum score is computed using themax()
function.'median'
: The median score is computed using thenumpy.median()
function.None
is also accepted, in which caseNone
is returned.
- bob.bio.base.selected_elements(list_of_elements, desired_number_of_elements=None)[source]¶
Returns a list of elements that are sub-selected from the given list (or the list itself, if its length is smaller). These elements are selected such that they are evenly spread over the whole list.
- bob.bio.base.selected_indices(total_number_of_indices, desired_number_of_indices=None)[source]¶
Returns a list of indices that will contain exactly the number of desired indices (or the number of total items in the list, if this is smaller). These indices are selected such that they are evenly spread over the whole sequence.
- class bob.bio.base.annotator.Annotator¶
Bases:
TransformerMixin
,BaseEstimator
Annotator class for all annotators. This class is meant to be used in conjunction with the bob bio annotate script or to be used in pipelines.
- transform(samples, **kwargs)[source]¶
Annotates a sample and returns annotations in a dictionary.
- Parameters
samples (numpy.ndarray) – The samples that are being annotated.
**kwargs – The extra arguments that may be passed.
- Returns
A dictionary containing the annotations of the biometric sample. If the program fails to annotate the sample, it should return an empty dictionary.
- Return type
- class bob.bio.base.annotator.Callable(callable, **kwargs)¶
Bases:
Annotator
A class that wraps a callable object that annotates a sample into a bob.bio.annotator object.
- callable¶
A callable with the following signature:
annotations = callable(sample, **kwargs)
that takes numpy array and returns annotations in dictionary format for that biometric sample. Please seeAnnotator
for more information.
- transform(sample, **kwargs)[source]¶
Annotates a sample and returns annotations in a dictionary.
- Parameters
samples (numpy.ndarray) – The samples that are being annotated.
**kwargs – The extra arguments that may be passed.
- Returns
A dictionary containing the annotations of the biometric sample. If the program fails to annotate the sample, it should return an empty dictionary.
- Return type
- class bob.bio.base.annotator.FailSafe(annotators, required_keys, only_required_keys=False, **kwargs)¶
Bases:
Annotator
A fail-safe annotator. This annotator takes a list of annotator and tries them until you get your annotations. The annotations of previous annotator is passed to the next one.
- required_keys¶
A list of keys that should be available in annotations to stop trying different annotators.
- Type
- transform(samples, **kwargs)[source]¶
Takes a batch of data and tries annotating them while unsuccessful.
Tries each annotator given at the creation of FailSafe when the previous one fails.
Each
kwargs
value is a list of parameters, with each element of those lists corresponding to each element ofsample_batch
(for example: with[s1, s2, ...]
assamples_batch
,kwargs['annotations']
should contain[{<s1_annotations>}, {<s2_annotations>}, ...]
).
- class bob.bio.base.pipelines.BetaCalibration¶
Bases:
TransformerMixin
,BaseEstimator
Implements the weibull calibration using a pair of Beta pdf’s defined in:
- class bob.bio.base.pipelines.BioAlgCheckpointWrapper(biometric_algorithm, base_dir, extension=None, save_func=None, load_func=None, group=None, force=False, hash_fn=None, **kwargs)¶
Bases:
BioAlgorithmBaseWrapper
Wrapper used to checkpoint enrolled and Scoring samples.
- Parameters
biometric_algorithm (
bob.bio.base.pipelines.BioAlgorithm
) – An implementedbob.bio.base.pipelines.BioAlgorithm
base_dir (str) – Path to store biometric references and scores
extension (str) – Default extension of the enrolled references files. If None, will use the
bob_checkpoint_extension
tag in the estimator, or default to.h5
.save_func (callable) – Pointer to a customized function that saves an enrolled reference to the disk. If None, will use the
bob_enrolled_save_fn
tag in the estimator, or default to h5py.load_func (callable) – Pointer to a customized function that loads an enrolled reference from disk. If None, will use the
bob_enrolled_load_fn
tag in the estimator, or default to h5py.force (bool) – If True, will recompute scores and biometric references no matter if a file exists
hash_fn – Pointer to a hash function. This hash function maps sample.key to a hash code and this hash code corresponds a relative directory where a single sample will be checkpointed. This is useful when is desirable file directories with less than a certain number of files.
Examples
>>> from bob.bio.base.algorithm import Distance >>> from bob.bio.base.pipelines import BioAlgCheckpointWrapper >>> biometric_algorithm = BioAlgCheckpointWrapper(Distance(), base_dir="./") >>> biometric_algorithm.create_templates(samples, enroll=True)
- class bob.bio.base.pipelines.BioAlgDaskWrapper(biometric_algorithm: BioAlgorithm, **kwargs)¶
Bases:
BioAlgorithmBaseWrapper
Wrap
bob.bio.base.pipelines.BioAlgorithm
to work with DASK- create_templates_from_samplesets(list_of_samplesets, enroll)[source]¶
Creates enroll or probe templates from multiple SampleSets.
- Parameters
- Returns
templates – A list of Samples which has the same length as
list_of_samplesets
. Each Sample contains a template.- Return type
- score_sample_templates(probe_samples, enroll_samples, score_all_vs_all)[source]¶
Computes the similarity score between all probe and enroll templates.
- Parameters
probe_samples (list) – A list (length N) of Samples containing probe templates.
enroll_samples (list) – A list (length M) of Samples containing enroll templates.
score_all_vs_all (bool) – If True, the similarity scores between all probe and enroll templates are computed. If False, the similarity scores between the probes and their associated enroll templates are computed.
- Returns
score_samplesets – A list of N SampleSets each containing a list of M score Samples if score_all_vs_all is True. Otherwise, a list of N SampleSets each containing a list of <=M score Samples depending on the database.
- Return type
- class bob.bio.base.pipelines.BioAlgorithm(probes_score_fusion='max', enrolls_score_fusion='max', **kwargs)¶
Bases:
BaseEstimator
Describes a base biometric comparator for the PipelineSimple Biometric Algorithm.
A biometric algorithm converts each SampleSet (which is a list of samples/features) into a single template. Template creation is done for both enroll and probe samples but the format of the templates can be different between enrollment and probe samples. After the creation of the templates, the algorithm computes one similarity score for comparison of an enroll template with a probe template.
Examples
>>> import numpy as np >>> from bob.bio.base.pipelines import BioAlgorithm >>> class MyAlgorithm(BioAlgorithm): ... ... def create_templates(self, list_of_feature_sets, enroll): ... # you cannot call np.mean(list_of_feature_sets, axis=1) because the ... # number of features in each feature set may vary. ... return [np.mean(feature_set, axis=0) for feature_set in list_of_feature_sets] ... ... def compare(self, enroll_templates, probe_templates): ... scores = [] ... for enroll_template in enroll_templates: ... scores.append([]) ... for probe_template in probe_templates: ... similarity = 1 / np.linalg.norm(model - probe) ... scores[-1].append(similarity) ... scores = np.array(scores, dtype=float) ... return scores
- abstract compare(enroll_templates, probe_templates)[source]¶
Computes the similarity score between all enrollment and probe templates.
- Parameters
- Returns
scores – A matrix of shape (N, M) containing the similarity scores.
- Return type
- abstract create_templates(list_of_feature_sets, enroll)[source]¶
Creates enroll or probe templates from multiple sets of features.
The enroll template format can be different from the probe templates.
- Parameters
list_of_feature_sets (list) – A list of list of features with the shape of Nx?xD. N templates should be computed. Note that you cannot call np.array(list_of_feature_sets) because the number of features per set can be different depending on the database.
enroll (bool) – If True, the features are for enrollment. If False, the features are for probe.
- Returns
templates – A list of templates which has the same length as
list_of_feature_sets
.- Return type
- create_templates_from_samplesets(list_of_samplesets, enroll)[source]¶
Creates enroll or probe templates from multiple SampleSets.
- Parameters
- Returns
templates – A list of Samples which has the same length as
list_of_samplesets
. Each Sample contains a template.- Return type
- score_sample_templates(probe_samples, enroll_samples, score_all_vs_all)[source]¶
Computes the similarity score between all probe and enroll templates.
- Parameters
probe_samples (list) – A list (length N) of Samples containing probe templates.
enroll_samples (list) – A list (length M) of Samples containing enroll templates.
score_all_vs_all (bool) – If True, the similarity scores between all probe and enroll templates are computed. If False, the similarity scores between the probes and their associated enroll templates are computed.
- Returns
score_samplesets – A list of N SampleSets each containing a list of M score Samples if score_all_vs_all is True. Otherwise, a list of N SampleSets each containing a list of <=M score Samples depending on the database.
- Return type
- class bob.bio.base.pipelines.CSVScoreWriter(path, exclude_list=('data', 'samples', 'key', 'references', 'annotations'), **kwargs)¶
Bases:
ScoreWriter
Read and write scores in CSV format, shipping all metadata with the scores
- Parameters
- write(probe_sampleset)[source]¶
Write scores and returns a
bob.pipelines.DelayedSample
containing the instruction to open the score file
- class bob.bio.base.pipelines.CategoricalCalibration(field_name, field_values, score_selection_method='all', reduction_function=<function mean>, fit_estimator=None)¶
Bases:
TransformerMixin
,BaseEstimator
Implements an adaptation of the Categorical Calibration defined in:
Mandasari, Miranti Indar, et al. “Score calibration in face recognition.” Iet Biometrics 3.4 (2014): 246-256.
- In such a work the calibration is defined as::
\(s = \sum_{i=0}^{N} (\text{calibrator}_i)(X)\)
The category calibration is implemented in the tails of the score distributions in this implementation. For the impostor score distribution, the tail is defined between \(q_3(x)\) and \(q3(x)+\alpha * (q3(x)-q1(x))\), where \(q_n\) represents the quantile and \(\alpha\) represents an offset. For the genuines score distribution, the tail is defined between \(q_1(x)\) and \(q1(x)-\alpha * (q3(x)-q1(x))\).
In this implementation one calibrator per category is fitted at training time. At test time, the maximum of the calibrated scores is returned.
- Parameters
field_name (str) – Reference field name in the csv score file. E.g. race, gender, ..,
field_values (list) – Possible values for field_name. E.g [‘male’, ‘female’], [‘black’, ‘white’]
score_selection_method (str) –
- Method to select the scores for fetting the calibration models.
median-q3: It will select the scores from the median to q3 from the impostor scores (q1 to median for genuines) q3-outlier: It will select the scores from q3 to outlier (q3+1.5*IQD) from the impostor scores (q1 to outlier for genuines) q1-median: all: It will select all the scores. Default to median-q3
reduction_function – Pointer to a function to reduce the scores. Default to np.mean
fit_estimator (None) – Estimator used for calibrations. Default to LLRCalibration
- class bob.bio.base.pipelines.Database(name, protocol, score_all_vs_all=False, annotation_type=None, fixed_positions=None, memory_demanding=False, **kwargs)¶
Bases:
object
Base class for PipelineSimple databases
- abstract all_samples(groups=None)[source]¶
Returns all the samples of the dataset
- Parameters
groups (list or None) – List of groups to consider (like ‘dev’ or ‘eval’). If None, will return samples from all the groups.
- Returns
samples – List of all the samples of the dataset.
- Return type
- abstract background_model_samples()[source]¶
Returns
bob.pipelines.Sample
’s to train a background model- Returns
samples – List of samples for background model training.
- Return type
- class bob.bio.base.pipelines.FourColumnsScoreWriter(path, extension='.txt', **kwargs)¶
Bases:
ScoreWriter
Read and write scores using the four columns format
bob.bio.base.score.load.four_column()
- write(probe_sampleset)[source]¶
Write scores and returns a
bob.pipelines.DelayedSample
containing the instruction to open the score file
- class bob.bio.base.pipelines.GammaCalibration¶
Bases:
TransformerMixin
,BaseEstimator
Implements the weibull calibration using a pair of Gamma pdf’s defined in:
- class bob.bio.base.pipelines.LLRCalibration¶
Bases:
TransformerMixin
,BaseEstimator
Implements the linear calibration using a logistic function defined in:
Mandasari, Miranti Indar, et al. “Score calibration in face recognition.” Iet Biometrics 3.4 (2014): 246-256.
- class bob.bio.base.pipelines.PipelineScoreNorm(pipeline_simple: PipelineSimple, post_processor)¶
Bases:
object
Apply Z, T or ZT Score normalization on top of Pimple Pipeline
Reference bibliography from: A Generative Model for Score Normalization in Speaker Recognition https://arxiv.org/pdf/1709.09868.pdf
Example
>>> from sklearn.preprocessing import FunctionTransformer >>> from sklearn.pipeline import make_pipeline >>> from bob.bio.base.algorithm import Distance >>> from bob.bio.base.pipelines import PipelineSimple, PipelineScoreNorm, ZNormScores >>> from bob.pipelines import wrap >>> import numpy >>> linearize = lambda samples: [numpy.reshape(x, (-1,)) for x in samples] >>> transformer = wrap(["sample"], FunctionTransformer(linearize)) >>> transformer_pipeline = make_pipeline(transformer) >>> biometric_algorithm = Distance() >>> pipeline_simple = PipelineSimple(transformer_pipeline, biometric_algorithm) >>> z_norm_postprocessor = ZNormScores() >>> z_pipeline = PipelineScoreNorm(pipeline_simple, z_norm_postprocessor) >>> zt_pipeline(...)
- Parameters
pipeline_simple (
PipelineSimple
) – An instancePipelineSimple
to the wrapped with score normalizationpost_processor (:py:class`sklearn.pipeline.Pipeline` or a sklearn.base.BaseEstimator) – Transformer that will post process the scores
score_writer – A ScoreWriter to write the scores
- property biometric_algorithm¶
- property score_writer¶
- property transformer¶
- class bob.bio.base.pipelines.PipelineSimple(transformer: Pipeline, biometric_algorithm: BioAlgorithm, score_writer=None)¶
Bases:
object
The simplest possible pipeline
This is the backbone of most biometric recognition systems. It implements three subpipelines and they are the following:
PipelineSimple.train_background_model
: Initializes or trains your transformer.It will run
sklearn.base.BaseEstimator.fit()
PipelineSimple.enroll_templates
: Creates enrollment templatesIt will run
sklearn.base.BaseEstimator.transform()
followed by a sequence ofbob.bio.base.pipelines.abstract_classes.BioAlgorithm.create_templates()
PipelineSimple.probe_templates
: Creates probe templatesIt will run
sklearn.base.BaseEstimator.transform()
followed by a sequence ofbob.bio.base.pipelines.abstract_classes.BioAlgorithm.create_templates()
PipelineSimple.compute_scores
: Computes scoresIt will run
bob.bio.base.pipelines.abstract_classes.BioAlgorithm.compare()
Example
>>> from sklearn.preprocessing import FunctionTransformer >>> from sklearn.pipeline import make_pipeline >>> from bob.bio.base.algorithm import Distance >>> from bob.bio.base.pipelines import PipelineSimple >>> from bob.pipelines import wrap >>> import numpy >>> linearize = lambda samples: [numpy.reshape(x, (-1,)) for x in samples] >>> transformer = wrap(["sample"], FunctionTransformer(linearize)) >>> transformer_pipeline = make_pipeline(transformer) >>> biometric_algorithm = Distance() >>> pipeline = PipelineSimple(transformer_pipeline, biometric_algorithm) >>> pipeline(samples_for_training_back_ground_model, samplesets_for_enroll, samplesets_for_scoring)
To run this pipeline using Dask, used the function
dask_bio_pipeline()
.Example
>>> from bob.bio.base.pipelines import dask_bio_pipeline >>> pipeline = PipelineSimple(transformer_pipeline, biometric_algorithm) >>> pipeline = dask_bio_pipeline(pipeline) >>> pipeline(samples_for_training_back_ground_model, samplesets_for_enroll, samplesets_for_scoring).compute()
- Parameters
transformer (:py:class`sklearn.pipeline.Pipeline` or a sklearn.base.BaseEstimator) – Transformer that will preprocess your data
biometric_algorithm (
bob.bio.base.pipelines.abstract_classes.BioAlgorithm
) – Biometrics algorithm object that implements the methods enroll and score methodsscore_writer (
bob.bio.base.pipelines.ScoreWriter
) – Format to write scores. Default tobob.bio.base.pipelines.FourColumnsScoreWriter
- class bob.bio.base.pipelines.ScoreWriter(path, extension='.txt', **kwargs)¶
Bases:
object
Defines base methods to read, write scores and concatenate scores for
bob.bio.base.pipelines.BioAlgorithm
- class bob.bio.base.pipelines.TNormScores(top_norm=False, top_norm_score_fraction=0.8, **kwargs)¶
Bases:
TransformerMixin
,BaseEstimator
Apply T-Norm Score normalization on top of Simple Pipeline
Reference bibliography from: A Generative Model for Score Normalization in Speaker Recognition https://arxiv.org/pdf/1709.09868.pdf
- post_process_template = 'enroll'¶
- class bob.bio.base.pipelines.WeibullCalibration¶
Bases:
TransformerMixin
,BaseEstimator
Implements the weibull calibration using a pair of Weibull pdf’s defined in:
Macarulla Rodriguez, Andrea, Zeno Geradts, and Marcel Worring. “Likelihood Ratios for Deep Neural Networks in Face Comparison.” Journal of forensic sciences 65.4 (2020): 1169-1183.
- class bob.bio.base.pipelines.ZNormScores(top_norm=False, top_norm_score_fraction=0.8, **kwargs)¶
Bases:
TransformerMixin
,BaseEstimator
Apply Z-Norm Score normalization on top of Simple Pipeline
Reference bibliography from: A Generative Model for Score Normalization in Speaker Recognition https://arxiv.org/pdf/1709.09868.pdf
- post_process_template = 'probe'¶
- bob.bio.base.pipelines.checkpoint_pipeline_simple(pipeline, base_dir, biometric_algorithm_dir=None, hash_fn=None, force=False)[source]¶
Given a
bob.bio.base.pipelines.PipelineSimple
, wrapsbob.bio.base.pipelines.PipelineSimple
andbob.bio.base.pipelines.BioAlgorithm
to be checkpointed- Parameters
pipeline (
bob.bio.base.pipelines.PipelineSimple
) – pipeline to be checkpointedbase_dir (str) – Path to store transformed input data and possibly biometric references and scores
biometric_algorithm_dir (str) – If set, it will checkpoint the biometric references and scores to this path. If not, base_dir will be used. This is useful when it’s suitable to have the transformed data path, and biometric references and scores in different paths.
hash_fn – Pointer to a hash function. This hash function will map sample.key to a hash code and this hash code will be the relative directory where a single sample will be checkpointed. This is useful when is desireable file directories with more than a certain number of files.
- bob.bio.base.pipelines.dask_bio_pipeline(pipeline, npartitions=None, partition_size=None)[source]¶
Given a
bob.bio.base.pipelines.PipelineSimple
, wrapsbob.bio.base.pipelines.PipelineSimple
andbob.bio.base.pipelines.BioAlgorithm
to be executed with dask- Parameters
pipeline (
bob.bio.base.pipelines.PipelineSimple
) – pipeline to be daskednpartitions (int) – Number of partitions for the initial dask.bag
partition_size (int) – Size of the partition for the initial dask.bag
- bob.bio.base.pipelines.execute_pipeline_score_norm(pipeline, database, dask_client, groups, output, write_metadata_scores, checkpoint, dask_partition_size, dask_n_workers, checkpoint_dir=None, top_norm=False, top_norm_score_fraction=0.8, score_normalization_type='znorm', force=False)[source]¶
Function that extends the capabilities of the PipelineSimple to run score normalization.
This is called when using the
bob bio pipeline score-norm
command.This is also callable from a script without fear of interrupting the running Dask instance, allowing chaining multiple experiments while keeping the workers alive.
- Parameters
pipeline (Instance of
bob.bio.base.pipelines.PipelineSimple
) – A constructed PipelineSimple object.database (Instance of
bob.bio.base.pipelines.abstract_class.Database
) – A database interface instancedask_client (instance of
dask.distributed.Client
orNone
) – A Dask client instance used to run the experiment in parallel on multiple machines, or locally. Basic configs can be found inbob.pipelines.config.distributed
.groups (list of str) – Groups of the dataset that will be requested from the database interface.
output (str) – Path where the results and checkpoints will be saved to.
write_metadata_scores (bool) – Use the CSVScoreWriter instead of the FourColumnScoreWriter when True.
checkpoint (bool) – Whether checkpoint files will be created for every step of the pipelines.
dask_partition_size (int) – If using Dask, this option defines the size of each dask.bag.partition. Use this option if the current heuristic that sets this value doesn’t suit your experiment. (https://docs.dask.org/en/latest/bag-api.html?highlight=partition_size#dask.bag.from_sequence).
dask_n_workers (int) – If using Dask, this option defines the number of workers to start your experiment. Dask automatically scales up/down the number of workers due to the current load of tasks to be solved. Use this option if the current amount of workers set to start an experiment doesn’t suit you.
top_norm (bool) –
top_norm_score_fraction (float) – Sets the percentage of samples used for t-norm and z-norm. Sometimes you don’t want to use all the t/z samples for normalization
checkpoint_dir (str) – If checkpoint is set, this path will be used to save the checkpoints. If None, the content of output will be used.
- bob.bio.base.pipelines.execute_pipeline_simple(pipeline, database, dask_client, groups, output, write_metadata_scores, checkpoint, dask_n_partitions, dask_partition_size, dask_n_workers, checkpoint_dir=None, force=False)[source]¶
Function that executes the PipelineSimple.
This is called when using the
bob bio pipeline simple
command.This is also callable from a script without fear of interrupting the running Dask instance, allowing chaining multiple experiments while keeping the workers alive.
When using Dask, something to keep in mind is that we want to split our data and processing time on multiple workers. There is no recipe to make everything work on any system. So if you encounter some balancing error (a few of all the available workers actually working while the rest waits, or the scheduler being overloaded trying to organise millions of tiny tasks), you can specify
dask_n_partitions
ordask_partition_size
. The first will try to split any set of data into a number of chunks (ideally, we would want one per worker), and the second creates similar-sized partitions in each set. If the memory on the workers is not sufficient, try reducing the size of the partitions (or increasing the number of partitions).- Parameters
pipeline (Instance of
bob.bio.base.pipelines.PipelineSimple
) – A constructed PipelineSimple object.database (Instance of
bob.bio.base.pipelines.abstract_class.Database
) – A database interface instancedask_client (instance of
dask.distributed.Client
orNone
) – A Dask client instance used to run the experiment in parallel on multiple machines, or locally. Basic configs can be found inbob.pipelines.config.distributed
.dask_n_partitions (int or None) – Specifies a number of partitions to split the data into.
dask_partition_size (int or None) – Specifies a data partition size when using dask. Ignored when dask_n_partitions is set.
dask_n_workers (int or None) – Sets the starting number of Dask workers. Does not prevent Dask from requesting more or releasing workers depending on load.
groups (list of str) – Groups of the dataset that will be requested from the database interface.
output (str) – Path where the scores will be saved.
write_metadata_scores (bool) – Use the CSVScoreWriter instead of the FourColumnScoreWriter when True.
checkpoint (bool) – Whether checkpoint files will be created for every step of the pipelines.
checkpoint_dir (str) – If checkpoint is set, this path will be used to save the checkpoints. If None, the content of output will be used.
force (bool) – If set, it will force generate all the checkpoints of an experiment. This option doesn’t work if –memory is set
- bob.bio.base.pipelines.is_biopipeline_checkpointed(pipeline)[source]¶
Check if
bob.bio.base.pipelines.PipelineSimple
is checkpointed- Parameters
pipeline (
bob.bio.base.pipelines.PipelineSimple
) – pipeline to be checkpointed
- class bob.bio.base.database.AtntBioDatabase(protocol='idiap_protocol', dataset_original_directory=None, **kwargs)[source]¶
Bases:
CSVDataset
The AT&T (aka ORL) database of faces (http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html). This class defines a simple protocol for training, enrollment and probe by splitting the few images of the database in a reasonable manner. Due to the small size of the database, there is only a ‘dev’ group, and I did not define an ‘eval’ group.
- class bob.bio.base.database.BioDatabase(name, all_files_options={}, extractor_training_options={}, projector_training_options={}, enroller_training_options={}, check_original_files_for_existence=False, original_directory=None, original_extension=None, annotation_directory=None, annotation_extension=None, annotation_type=None, protocol='Default', training_depends_on_protocol=False, models_depend_on_protocol=False, **kwargs)¶
Bases:
FileDatabase
This class represents the basic API for database access. Please use this class as a base class for your database access classes. Do not forget to call the constructor of this base class in your derived class.
Parameters:
name : str A unique name for the database.
all_files_options : dict Dictionary of options passed to the
bob.bio.base.database.BioDatabase.objects()
database query when retrieving all data.extractor_training_options : dict Dictionary of options passed to the
bob.bio.base.database.BioDatabase.objects()
database query used to retrieve the files for the extractor training.projector_training_options : dict Dictionary of options passed to the
bob.bio.base.database.BioDatabase.objects()
database query used to retrieve the files for the projector training.enroller_training_options : dict Dictionary of options passed to the
bob.bio.base.database.BioDatabase.objects()
database query used to retrieve the files for the enroller training.check_original_files_for_existence : bool Enables to test for the original data files when querying the database.
original_directory : str The directory where the original data of the database are stored.
original_extension : str The file name extension of the original data.
annotation_directory : str The directory where the image annotations of the database are stored, if any.
annotation_extension : str The file name extension of the annotation files.
annotation_type : str The type of the annotation file to read, only json works.
protocol : str or
None
The name of the protocol that defines the default experimental setup for this database.training_depends_on_protocol : bool Specifies, if the training set used for training the extractor and the projector depend on the protocol. This flag is used to avoid re-computation of data when running on the different protocols of the same database.
models_depend_on_protocol : bool Specifies, if the models depend on the protocol. This flag is used to avoid re-computation of models when running on the different protocols of the same database.
kwargs :
key=value
pairs The arguments of the Database base class constructor.- all_files(groups=None) files [source]¶
Returns all files of the database, respecting the current protocol. The files can be limited using the
all_files_options
in the constructor.Parameters:
- groupssome of
('world', 'dev', 'eval')
orNone
The groups to get the data for. If
None
, data for all groups is returned.
kwargs: ignored
Returns:
- files[
bob.bio.base.database.BioFile
] The sorted and unique list of all files of the database.
- groupssome of
- annotations(file)[source]¶
Returns the annotations for the given File object, if available. You need to override this method in your high-level implementation. If your database does not have annotations, it should return
None
.Parameters:
- file
bob.bio.base.database.BioFile
The file for which annotations should be returned.
Returns:
- annotsdict or None
The annotations for the file, if available.
- file
- arrange_by_client(files) files_by_client [source]¶
Arranges the given list of files by client id. This function returns a list of lists of File’s.
Parameters:
- files
bob.bio.base.database.BioFile
A list of files that should be split up by BioFile.client_id.
Returns:
- files_by_client[[
bob.bio.base.database.BioFile
]] The list of lists of files, where each sub-list groups the files with the same BioFile.client_id
- files
- client_id_from_model_id(model_id, group='dev')[source]¶
Return the client id associated with the given model id. In this base class implementation, it is assumed that only one model is enrolled for each client and, thus, client id and model id are identical. All key word arguments are ignored. Please override this function in derived class implementations to change this behavior.
- enroll_files(model_id, group='dev') files [source]¶
Returns a list of File objects that should be used to enroll the model with the given model id from the given group, respecting the current protocol. If the model_id is None (the default), enrollment files for all models are returned.
Parameters:
- model_idint or str
A unique ID that identifies the model.
- groupone of
('dev', 'eval')
The group to get the enrollment files for.
Returns:
- files[
bob.bio.base.database.BioFile
] The list of files used for to enroll the model with the given model id.
- file_names(files, directory, extension) paths [source]¶
Returns the full path of the given File objects.
Parameters:
- files[
bob.bio.base.database.BioFile
] The list of file object to retrieve the file names for.
- directorystr
The base directory, where the files can be found.
- extensionstr
The file name extension to add to all files.
Returns:
- paths[str] or [[str]]
The paths extracted for the files, in the same order. If this database provides file sets, a list of lists of file names is returned, one sub-list for each file set.
- files[
- groups(protocol=None)[source]¶
Returns the names of all registered groups in the database
Keyword parameters:
- protocol: str
The protocol for which the groups should be retrieved. If you do not have protocols defined, just ignore this field.
- model_ids(group='dev') ids [source]¶
Returns a list of model ids for the given group, respecting the current protocol.
Parameters:
- groupone of
('dev', 'eval')
The group to get the model ids for.
Returns:
- ids[int] or [str]
The list of (unique) model ids for models of the given group.
- groupone of
- abstract model_ids_with_protocol(groups=None, protocol=None, **kwargs) ids [source]¶
Returns a list of model ids for the given groups and given protocol.
Parameters:
- groupsone or more of
('world', 'dev', 'eval')
The groups to get the model ids for.
protocol: a protocol name
Returns:
- ids[int] or [str]
The list of (unique) model ids for the given groups.
- groupsone or more of
- object_sets(groups=None, protocol=None, purposes=None, model_ids=None, **kwargs)[source]¶
This function returns lists of FileSet objects, which fulfill the given restrictions.
Keyword parameters:
- groupsstr or [str]
The groups of which the clients should be returned. Usually, groups are one or more elements of (‘world’, ‘dev’, ‘eval’)
- protocol
The protocol for which the clients should be retrieved. The protocol is dependent on your database. If you do not have protocols defined, just ignore this field.
- purposesstr or [str]
The purposes for which File objects should be retrieved. Usually, purposes are one of (‘enroll’, ‘probe’).
- model_ids[various type]
The model ids for which the File objects should be retrieved. What defines a ‘model id’ is dependent on the database. In cases, where there is only one model per client, model ids and client ids are identical. In cases, where there is one model per file, model ids and file ids are identical. But, there might also be other cases.
- abstract objects(groups=None, protocol=None, purposes=None, model_ids=None, **kwargs)[source]¶
This function returns a list of
bob.bio.base.database.BioFile
objects or the list of objects which inherit from this class. Returned files fulfill the given restrictions.Keyword parameters:
- groupsstr or [str]
The groups of which the clients should be returned. Usually, groups are one or more elements of (‘world’, ‘dev’, ‘eval’)
- protocol
The protocol for which the clients should be retrieved. The protocol is dependent on your database. If you do not have protocols defined, just ignore this field.
- purposesstr or [str]
The purposes for which File objects should be retrieved. Usually, purposes are one of (‘enroll’, ‘probe’).
- model_ids[various type]
The model ids for which the File objects should be retrieved. What defines a ‘model id’ is dependent on the database. In cases, where there is only one model per client, model ids and client ids are identical. In cases, where there is one model per file, model ids and file ids are identical. But, there might also be other cases.
- probe_file_sets(model_id=None, group='dev') files [source]¶
Returns a list of probe FileSet objects, respecting the current protocol. If a
model_id
is specified, only the probe files that should be compared with the given model id are returned (for most databases, these are all probe files of the given group). Otherwise, all probe files of the given group are returned.Parameters:
- model_idint or str or
None
A unique ID that identifies the model.
- groupone of
('dev', 'eval')
The group to get the enrollment files for.
Returns:
- files[
bob.bio.base.database.BioFileSet
] or something similar The list of file sets used to probe the model with the given model id.
- model_idint or str or
- probe_files(model_id=None, group='dev') files [source]¶
Returns a list of probe File objects, respecting the current protocol. If a
model_id
is specified, only the probe files that should be compared with the given model id are returned (for most databases, these are all probe files of the given group). Otherwise, all probe files of the given group are returned.Parameters:
- model_idint or str or
None
A unique ID that identifies the model.
- groupone of
('dev', 'eval')
The group to get the enrollment files for.
Returns:
- files[
bob.bio.base.database.BioFile
] The list of files used for to probe the model with the given model id.
- model_idint or str or
- replace_directories(replacements=None)[source]¶
This helper function replaces the
original_directory
and theannotation_directory
of the database with the directories read from the given replacement file.This function is provided for convenience, so that the database configuration files do not need to be modified. Instead, this function uses the given dictionary of replacements to change the original directory and the original extension (if given).
The given
replacements
can be of typedict
, including all replacements, or a file name (as astr
), in which case the file is read. The structure of the file should be:# Comments starting with # and empty lines are ignored [YOUR_..._DATA_DIRECTORY] = /path/to/your/data [YOUR_..._ANNOTATION_DIRECTORY] = /path/to/your/annotations
If no annotation files are available (e.g. when they are stored inside the
database
), the annotation directory can be left out.Parameters:
- replacementsdict or str
A dictionary with replacements, or a name of a file to read the dictionary from. If the file name does not exist, no directories are replaced.
- test_files(groups=['dev']) files [source]¶
Returns all test files (i.e., files used for enrollment and probing) for the given groups, respecting the current protocol. The files for the steps can be limited using the
all_files_options
defined in the constructor.Parameters:
- groupssome of
('dev', 'eval')
The groups to get the data for.
Returns:
- files[
bob.bio.base.database.BioFile
] The sorted and unique list of test files of the database.
- groupssome of
- training_files(step=None, arrange_by_client=False) files [source]¶
Returns all training files for the given step, and arranges them by client, if desired, respecting the current protocol. The files for the steps can be limited using the
..._training_options
defined in the constructor.Parameters:
- stepone of
('train_extractor', 'train_projector', 'train_enroller')
orNone
The step for which the training data should be returned.
- arrange_by_clientbool
Should the training files be arranged by client? If set to
True
, training files will be returned in [[bob.bio.base.database.BioFile
]], where each sub-list contains the files of a single client. Otherwise, all files will be stored in a simple [bob.bio.base.database.BioFile
].
Returns:
- files[
bob.bio.base.database.BioFile
] or [[bob.bio.base.database.BioFile
]] The (arranged) list of files used for the training of the given step.
- stepone of
- uses_probe_file_sets(protocol=None)[source]¶
Defines if, for the current protocol, the database uses several probe files to generate a score. Returns True if the given protocol specifies file sets for probes, instead of a single probe file. In this default implementation, False is returned, throughout. If you need different behavior, please overload this function in your derived class.
- class bob.bio.base.database.BioFile(client_id, path, file_id=None, original_directory=None, original_extension=None, annotation_directory=None, annotation_extension=None, annotation_type=None, **kwargs)¶
Bases:
File
,_ReprMixin
A simple base class that defines basic properties of File object for the use in verification experiments
- client_id¶
The id of the client this file belongs to. Its type depends on your implementation. If you use an SQL database, this should be an SQL type like Integer or String.
- original_extension¶
The extension of the original files. This attribute is deprecated. Please try to include the extension in the
path
attribute- Type
str or None
- annotation_type¶
The type of the annotation file, see :bob.bio.base.utils.read_annotation_file. Default is
json
.- Type
str or None
- property annotations¶
- load(original_directory=None, original_extension=None)[source]¶
Loads the data at the specified location and using the given extension. Override it if you need to load differently.
- Parameters
- Returns
The loaded data (normally
numpy.ndarray
).- Return type
- class bob.bio.base.database.BioFileSet(file_set_id, files, path=None, **kwargs)¶
Bases:
BioFile
This class defines the minimum interface of a set of database files that needs to be exported. Use this class, whenever the database provides several files that belong to the same probe. Each file set has an id, and a list of associated files, which are of type
bob.bio.base.database.BioFile
of the same client. The file set id can be anything hashable, but needs to be unique all over the database.- Parameters
file_set_id (str or int) – A unique ID that identifies the file set.
files ([
bob.bio.base.database.BioFile
]) – A non-empty list of BioFile objects that should be stored inside this file. All files of that list need to have the same client ID.
- class bob.bio.base.database.CSVDataset(*, name, protocol, dataset_protocol_path, csv_to_sample_loader=None, is_sparse=False, score_all_vs_all=False, group_probes_by_reference_id=False, **kwargs)¶
Bases:
Database
Generic filelist dataset for :any:` bob.bio.base.pipelines.PipelineSimple` pipeline. Check PipelineSimple: Advanced features for more details about the PipelineSimple Dataset interface.
To create a new dataset, you need to provide a directory structure similar to the one below:
my_dataset/ my_dataset/my_protocol/norm/train_world.csv my_dataset/my_protocol/dev/for_models.csv my_dataset/my_protocol/dev/for_probes.csv my_dataset/my_protocol/eval/for_models.csv my_dataset/my_protocol/eval/for_probes.csv ...
In the above directory structure, inside of my_dataset should contain the directories with all evaluation protocols this dataset might have. Inside of the my_protocol directory should contain at least two csv files:
for_models.csv
for_probes.csv
Those csv files should contain in each row i-) the path to raw data and ii-) the reference_id label for enrollment (
bob.bio.base.pipelines.Database.references
) and probing (bob.bio.base.pipelines.Database.probes
). The structure of each CSV file should be as below:PATH,reference_id path_1,reference_id_1 path_2,reference_id_2 path_i,reference_id_j ...
You might want to ship metadata within your Samples (e.g gender, age, annotation, …) To do so is simple, just do as below:
PATH,reference_id,METADATA_1,METADATA_2,METADATA_k path_1,reference_id_1,A,B,C path_2,reference_id_2,A,B,1 path_i,reference_id_j,2,3,4 ...
The files my_dataset/my_protocol/train.csv/eval_enroll.csv and my_dataset/my_protocol/train.csv/eval_probe.csv are optional and it is used in case a protocol contains data for evaluation.
Finally, the content of the file my_dataset/my_protocol/train.csv is used in the case a protocol contains data for training (bob.bio.base.pipelines.Database.background_model_samples)
- Parameters
name (str) – Name of the dataset (root folder name containing the protocol folders)
protocol (str) – Name of the protocol (folder name containing the dev, eval and norm folders)
dataset_protocol_path (str) – Absolute path or a tarball of the dataset protocol description.
protocol_name (str) – The name of the protocol
csv_to_sample_loader (bob.pipelines.sample_loaders.CSVToSampleLoader) – Base class that whose objective is to generate
bob.pipelines.Sample
and/orbob.pipelines.SampleSet
from csv rowsis_sparse (bool) – If True, will look for a for_scores.lst file instead of a for_probes.lst (legacy format)
score_all_vs_all (bool) – Optimization trick for Dask. If True, all references will be passed for scoring against the probes.
group_probes_by_reference_id (bool) – If True, probe SampleSet will contain all the samples with a given reference_id, otherwise, one SampleSet will be created for each sample.
- all_samples(groups=None)[source]¶
Reads and returns all the samples in groups.
- Parameters
groups (list or None) – Groups to consider (‘train’, ‘dev’, and/or ‘eval’). If None is given, returns the samples from all groups.
- Returns
samples – List of
bob.pipelines.Sample
objects.- Return type
- background_model_samples()[source]¶
Returns
bob.pipelines.Sample
’s to train a background model- Returns
samples – List of samples for background model training.
- Return type
- groups()[source]¶
This function returns the list of groups for this database.
- Returns
A list of groups
- Return type
[str]
- class bob.bio.base.database.CSVDatasetCrossValidation(*, name, protocol='Default', csv_file_name='metadata.csv', random_state=0, test_size=0.8, samples_for_enrollment=1, csv_to_sample_loader=None, score_all_vs_all=True, group_probes_by_reference_id=False, **kwargs)¶
Bases:
Database
Generic filelist dataset for
bob.bio.base.pipelines.PipelineSimple
pipeline that handles CROSS VALIDATION.Check PipelineSimple: Advanced features for more details about the PipelineSimple Dataset interface.
This interface will take one csv_file as input and split into i-) data for training and ii-) data for testing. The data for testing will be further split in data for enrollment and data for probing. The input CSV file should be casted in the following format:
PATH,reference_id path_1,reference_id_1 path_2,reference_id_2 path_i,reference_id_j ...
- Parameters
csv_file_name (str) – CSV file containing all the samples from your database
random_state (int) – Pseudo-random number generator seed
test_size (float) – Percentage of the reference_ids used for testing
samples_for_enrollment (float) – Number of samples used for enrollment
csv_to_sample_loader (bob.pipelines.sample_loaders.CSVToSampleLoader) – Base class that whose objective is to generate
bob.pipelines.Sample
and/orbob.pipelines.SampleSet
from csv rows
- all_samples(groups=None)[source]¶
Reads and returns all the samples in groups.
- Parameters
groups (list or None) – Groups to consider (‘train’ and/or ‘dev’). If None is given, returns the samples from all groups.
- Returns
samples – List of
bob.pipelines.Sample
objects.- Return type
- background_model_samples()[source]¶
Returns
bob.pipelines.Sample
’s to train a background model- Returns
samples – List of samples for background model training.
- Return type
- class bob.bio.base.database.CSVDatasetZTNorm(**kwargs)¶
Bases:
CSVDataset
Generic filelist dataset for
bob.bio.base.pipelines.PipelineSimple
pipelines. Check PipelineSimple: Advanced features for more details about the PipelineSimple Dataset interface.This dataset interface takes as in put a
CSVDataset
as input and have two extra methods:CSVDatasetZTNorm.zprobes
andCSVDatasetZTNorm.treferences
.To create a new dataset, you need to provide a directory structure similar to the one below:
my_dataset/ my_dataset/my_protocol/norm/train_world.csv my_dataset/my_protocol/norm/for_znorm.csv my_dataset/my_protocol/norm/for_tnorm.csv my_dataset/my_protocol/dev/for_models.csv my_dataset/my_protocol/dev/for_probes.csv my_dataset/my_protocol/eval/for_models.csv my_dataset/my_protocol/eval/for_probes.csv
- Parameters
database (
CSVDataset
) –CSVDataset
to be aggregated
- class bob.bio.base.database.CSVToSampleLoaderBiometrics(data_loader, dataset_original_directory='', extension='', reference_id_equal_subject_id=True)¶
Bases:
CSVToSampleLoader
Base class that converts the lines of a CSV file, like the one below to
bob.pipelines.DelayedSample
orbob.pipelines.SampleSet
PATH,REFERENCE_ID path_1,reference_id_1 path_2,reference_id_2 path_i,reference_id_j ...
- Parameters
- class bob.bio.base.database.FileListBioDatabase(filelists_directory, name, protocol=None, bio_file_class=<class 'bob.bio.base.database.BioFile'>, original_directory=None, original_extension=None, annotation_directory=None, annotation_extension='.json', annotation_type='json', dev_sub_directory=None, eval_sub_directory=None, world_filename=None, optional_world_1_filename=None, optional_world_2_filename=None, models_filename=None, probes_filename=None, scores_filename=None, tnorm_filename=None, znorm_filename=None, use_dense_probe_file_list=None, keep_read_lists_in_memory=True, **kwargs)¶
Bases:
ZTBioDatabase
This class provides a user-friendly interface to databases that are given as file lists.
- Parameters
filelists_directory (str) – The directory that contains the filelists defining the protocol(s). If you use the protocol attribute when querying the database, it will be appended to the base directory, such that several protocols are supported by the same class instance of bob.bio.base.
name (str) – The name of the database
protocol (str) – The protocol of the database. This should be a folder inside
filelists_directory
.bio_file_class (
class
) – The class that should be used to return the files. This can bebob.bio.base.database.BioFile
,bob.bio.spear.database.AudioBioFile
,bob.bio.face.database.FaceBioFile
, or anything similar.original_directory (str or
None
) – The directory, where the original data can be found.original_extension (str or [str] or
None
) – The filename extension of the original data, or multiple extensions.annotation_directory (str or
None
) – The directory, where additional annotation files can be found.annotation_extension (str or
None
) – The filename extension of the annotation files.annotation_type (str or
None
) – The type of annotation that can be read. Currently, options are'eyecenter', 'named', 'idiap'
. Seeread_annotation_file()
for details.dev_sub_directory (str or
None
) – Specify a custom subdirectory for the filelists of the development set (default is'dev'
)eval_sub_directory (str or
None
) – Specify a custom subdirectory for the filelists of the development set (default is'eval'
)world_filename (str or
None
) – Specify a custom filename for the training filelist (default is'norm/train_world.lst'
)optional_world_1_filename (str or
None
) – Specify a custom filename for the (first optional) training filelist (default is'norm/train_optional_world_1.lst'
)optional_world_2_filename (str or
None
) – Specify a custom filename for the (second optional) training filelist (default is'norm/train_optional_world_2.lst'
)models_filename (str or
None
) – Specify a custom filename for the model filelists (default is'for_models.lst'
)probes_filename (str or
None
) – Specify a custom filename for the probes filelists (default is'for_probes.lst'
)scores_filename (str or
None
) – Specify a custom filename for the scores filelists (default is'for_scores.lst'
)tnorm_filename (str or
None
) – Specify a custom filename for the T-norm scores filelists (default is'for_tnorm.lst'
)znorm_filename (str or
None
) – Specify a custom filename for the Z-norm scores filelists (default is'for_znorm.lst'
)use_dense_probe_file_list (bool or None) – Specify which list to use among
probes_filename
(dense) orscores_filename
. IfNone
it is tried to be estimated based on the given parameters.keep_read_lists_in_memory (bool) – If set to
True
(the default), the lists are read only once and stored in memory. Otherwise the lists will be re-read for every query (not recommended).
- all_files(groups=['dev'], add_zt_files=True)[source]¶
Returns all files for the given group. The internally stored protocol is used, throughout.
- Parameters
groups ([str]) – A list of groups to retrieve the files for.
add_zt_files (bool) – If selected, also files for ZT-norm scoring will be added. Please select this option only if this dataset provides ZT-norm files, see
implements_zt()
.
- Returns
A list of all files that fulfill your query.
- Return type
[BioFile]
- annotations(file)[source]¶
Reads the annotations for the given file id from file and returns them in a dictionary.
- client_id_from_model_id(model_id, group='dev')[source]¶
Returns the client id that is connected to the given model id.
- Parameters
model_id (str or
None
) – The model id for which the client id should be returned.groups (str or [str] or
None
) – (optional) the groups, the client belongs to. Might be one or more of('dev', 'eval', 'world', 'optional_world_1', 'optional_world_2')
. If groups are given, only these groups are considered.protocol (str or
None
) – The protocol to consider.
- Returns
The client id for the given model id, if found.
- Return type
- client_id_from_t_model_id(t_model_id, group='dev')[source]¶
Returns the client id that is connected to the given T-Norm model id.
- Parameters
model_id (str or
None
) – The model id for which the client id should be returned.groups (str or [str] or
None
) – (optional) the groups, the client belongs to. Might be one or more of('dev', 'eval')
. If groups are given, only these groups are considered.
- Returns
The client id for the given model id of a T-Norm model, if found.
- Return type
- client_ids(protocol=None, groups=None)[source]¶
Returns a list of client ids for the specific query by the user.
- Parameters
protocol (str or
None
) – The protocol to considergroups (str or [str] or
None
) – The groups to which the clients belong('dev', 'eval', 'world', 'optional_world_1', 'optional_world_2')
.
- Returns
A list containing all the client ids which have the given properties.
- Return type
[str]
- get_base_directory()[source]¶
Returns the base directory where the filelists defining the database are located.
- groups(protocol=None, add_world=True, add_subworld=True)[source]¶
This function returns the list of groups for this database.
- Parameters
- Returns
A list of groups
- Return type
[str]
- implements_zt(protocol=None, groups=None)[source]¶
Checks if the file lists for the ZT score normalization are available.
- Parameters
protocol (str or
None
) – The protocol for which the groups should be retrieved.groups (str or [str] or
None
) – The groups for which the ZT score normalization file lists should be checked('dev', 'eval')
.
- Returns
True
if the all file lists for ZT score normalization exist, otherwiseFalse
.- Return type
- model_ids_with_protocol(groups=None, protocol=None, **kwargs)[source]¶
Returns a list of model ids for the specific query by the user.
- Parameters
protocol (str or
None
) – The protocol to considergroups (str or [str] or
None
) – The groups to which the models belong('dev', 'eval', 'world', 'optional_world_1', 'optional_world_2')
.
- Returns
A list containing all the model ids which have the given properties.
- Return type
[str]
- objects(groups=None, protocol=None, purposes=None, model_ids=None, classes=None, **kwargs)[source]¶
Returns a set of
bob.bio.base.database.BioFile
objects for the specific query by the user.- Parameters
protocol (str or
None
) – The protocol to considerpurposes (str or [str] or
None
) – The purposes required to be retrieved('enroll', 'probe')
or a tuple with several of them. IfNone
is given (this is the default), it is considered the same as a tuple with all possible values. This field is ignored for the data from the'world', 'optional_world_1', 'optional_world_2'
groups.model_ids (str or [str] or
None
) – Only retrieves the files for the provided list of model ids (claimed client id). IfNone
is given (this is the default), no filter over the model_ids is performed.groups (str or [str] or
None
) – One of the groups('dev', 'eval', 'world', 'optional_world_1', 'optional_world_2')
or a tuple with several of them. IfNone
is given (this is the default), it is considered to be the existing subset of('world', 'dev', 'eval')
.classes (str or [str] or
None
) –The classes (types of accesses) to be retrieved
('client', 'impostor')
or a tuple with several of them. IfNone
is given (this is the default), it is considered the same as a tuple with all possible values.Note
Classes are not allowed to be specified when ‘probes_filename’ is used in the constructor.
- Returns
A list of
BioFile
objects considering all the filtering criteria.- Return type
[BioFile]
- original_file_name(file, check_existence=True)[source]¶
Returns the original file name of the given file.
This interface supports several original extensions, so that file lists can contain images of different data types.
When multiple original extensions are specified, this function will check the existence of any of these file names, and return the first one that actually exists. In this case, the
check_existence
flag is ignored.- Parameters
- Returns
The full path of the original data file.
- Return type
- set_base_directory(filelists_directory)[source]¶
Resets the base directory where the filelists defining the database are located.
- tclient_ids(protocol=None, groups=None)[source]¶
Returns a list of T-Norm client ids for the specific query by the user.
- Parameters
protocol (str or
None
) – The protocol to considergroups (str or [str] or
None
) – The groups to which the clients belong (“dev”, “eval”).
- Returns
A list containing all the T-Norm client ids which have the given properties.
- Return type
[str]
- tmodel_ids_with_protocol(protocol=None, groups=None, **kwargs)[source]¶
Returns a list of T-Norm model ids for the specific query by the user.
- Parameters
protocol (str or
None
) – The protocol to considergroups (str or [str] or
None
) – The groups to which the models belong('dev', 'eval')
.
- Returns
A list containing all the T-Norm model ids belonging to the given group.
- Return type
[str]
- tobjects(groups=None, protocol=None, model_ids=None, **kwargs)[source]¶
Returns a list of
bob.bio.base.database.BioFile
objects for enrolling T-norm models for score normalization.- Parameters
protocol (str or
None
) – The protocol to considermodel_ids (str or [str] or
None
) – Only retrieves the files for the provided list of model ids (claimed client id). IfNone
is given (this is the default), no filter over the model_ids is performed.groups (str or [str] or
None
) – The groups to which the models belong('dev', 'eval')
.
- Returns
A list of
BioFile
objects considering all the filtering criteria.- Return type
[BioFile]
- uses_dense_probe_file(protocol)[source]¶
Determines if a dense probe file list is used based on the existence of parameters.
- zclient_ids(protocol=None, groups=None)[source]¶
Returns a list of Z-Norm client ids for the specific query by the user.
- Parameters
protocol (str or
None
) – The protocol to considergroups (str or [str] or
None
) – The groups to which the clients belong (“dev”, “eval”).
- Returns
A list containing all the Z-Norm client ids which have the given properties.
- Return type
[str]
- zobjects(groups=None, protocol=None, **kwargs)[source]¶
Returns a list of
BioFile
objects to perform Z-norm score normalization.- Parameters
protocol (str or
None
) – The protocol to considergroups (str or [str] or
None
) – The groups to which the clients belong('dev', 'eval')
.
- Returns
A list of File objects considering all the filtering criteria.
- Return type
[BioFile]
- class bob.bio.base.database.LSTToSampleLoader(data_loader, dataset_original_directory='', extension='')¶
Bases:
CSVToSampleLoader
Simple mechanism that converts the lines of a LST file to
bob.pipelines.DelayedSample
orbob.pipelines.SampleSet
- transform(X)[source]¶
Transform one CVS line to ONE
bob.pipelines.DelayedSample
- Parameters
X – CSV File Object (open file)
- class bob.bio.base.database.ZTBioDatabase(name, z_probe_options={}, **kwargs)¶
Bases:
BioDatabase
This class defines another set of abstract functions that need to be implemented if your database provides the interface for computing scores used for ZT-normalization.
- all_files(groups=None) files [source]¶
Returns all files of the database, including those for ZT norm, respecting the current protocol. The files can be limited using the
all_files_options
and the thez_probe_options
in the constructor.Parameters:
- groupssome of
('world', 'dev', 'eval')
orNone
The groups to get the data for. If
None
, data for all groups is returned.- add_zt_files: bool
If set (the default), files for ZT score normalization are added.
Returns:
- files[
bob.bio.base.database.BioFile
] The sorted and unique list of all files of the database.
- groupssome of
- client_id_from_t_model_id(t_model_id, group='dev') client_id [source]¶
Returns the client id for the given T-Norm model id. In this base class implementation, we just use the
BioDatabase.client_id_from_model_id()
function. Overload this function if you need another behavior.Parameters:
- t_model_idint or str
A unique ID that identifies the T-Norm model.
- groupone of
('dev', 'eval')
The group to get the client ids for.
Returns:
- client_id[int] or [str]
A unique ID that identifies the client, to which the T-Norm model belongs.
- t_enroll_files(t_model_id, group='dev') files [source]¶
Returns a list of File objects that should be used to enroll the T-Norm model with the given model id from the given group, respecting the current protocol.
Parameters:
- t_model_idint or str
A unique ID that identifies the model.
- groupone of
('dev', 'eval')
The group to get the enrollment files for.
Returns:
- files[
bob.bio.base.database.BioFile
] The sorted list of files used for to enroll the model with the given model id.
- t_model_ids(group='dev') ids [source]¶
Returns a list of model ids of T-Norm models for the given group, respecting the current protocol.
Parameters:
- groupone of
('dev', 'eval')
The group to get the model ids for.
Returns:
- ids[int] or [str]
The list of (unique) model ids for T-Norm models of the given group.
- groupone of
- abstract tmodel_ids_with_protocol(protocol=None, groups=None, **kwargs)[source]¶
This function returns the ids of the T-Norm models of the given groups for the given protocol.
Keyword parameters:
- groupsstr or [str]
The groups of which the model ids should be returned. Usually, groups are one or more elements of (‘dev’, ‘eval’)
- protocolstr
The protocol for which the model ids should be retrieved. The protocol is dependent on your database. If you do not have protocols defined, just ignore this field.
- abstract tobjects(groups=None, protocol=None, model_ids=None, **kwargs)[source]¶
This function returns the File objects of the T-Norm models of the given groups for the given protocol and the given model ids.
Keyword parameters:
- groupsstr or [str]
The groups of which the model ids should be returned. Usually, groups are one or more elements of (‘dev’, ‘eval’)
- protocolstr
The protocol for which the model ids should be retrieved. The protocol is dependent on your database. If you do not have protocols defined, just ignore this field.
- model_ids[various type]
The model ids for which the File objects should be retrieved. What defines a ‘model id’ is dependent on the database. In cases, where there is only one model per client, model ids and client ids are identical. In cases, where there is one model per file, model ids and file ids are identical. But, there might also be other cases.
- z_probe_file_sets(group='dev') files [source]¶
Returns a list of probe FileSet objects used to compute the Z-Norm. This function needs to be implemented in derived class implementations.
Parameters:
- groupone of
('dev', 'eval')
The group to get the Z-norm probe files for.
Returns:
- files[
bob.bio.base.database.BioFileSet
] The unique list of file sets used to compute the Z-norm.
- groupone of
- z_probe_files(group='dev') files [source]¶
Returns a list of probe files used to compute the Z-Norm, respecting the current protocol. The Z-probe files can be limited using the
z_probe_options
in the query tobob.bio.base.database.ZTBioDatabase.z_probe_files()
Parameters:
- groupone of
('dev', 'eval')
The group to get the Z-norm probe files for.
Returns:
- files[
bob.bio.base.database.BioFile
] The unique list of files used to compute the Z-norm.
- groupone of
- abstract zobjects(groups=None, protocol=None, **kwargs)[source]¶
This function returns the File objects of the Z-Norm impostor files of the given groups for the given protocol.
Keyword parameters:
- groupsstr or [str]
The groups of which the model ids should be returned. Usually, groups are one or more elements of (‘dev’, ‘eval’)
- protocolstr
The protocol for which the model ids should be retrieved. The protocol is dependent on your database. If you do not have protocols defined, just ignore this field.
- class bob.bio.base.preprocessor.Preprocessor(writes_data=True, read_original_data=None, min_preprocessed_file_size=1000, **kwargs)¶
Bases:
object
This is the base class for all preprocessors. It defines the minimum requirements for all derived proprocessor classes.
Parameters:
- writes_databool
Select, if the preprocessor actually writes preprocessed images, or if it is simply returning values.
- read_original_data: callable or
None
This function is used to read the original data from file. It takes three inputs: A
bob.bio.base.database.BioFile
(or one of its derivatives), the original directory (asstr
) and the original extension (asstr
). IfNone
, the default functionbob.bio.base.read_original_data()
is used.- min_preprocessed_file_size: int
The minimum file size of a saved preprocessd data in bytes. If the saved preprocessed data file size is smaller than this, it is assumed to be a corrupt file and the data will be processed again.
- kwargs
key=value
pairs A list of keyword arguments to be written in the __str__ function.
- read_data(data_file) data [source]¶
Reads the preprocessed data from file. In this base class implementation, it uses
bob.bio.base.load()
to do that. If you have different format, please overwrite this function.Parameters:
- data_filestr or
h5py.File
The file open for reading or the name of the file to read from.
Returns:
- dataobject (usually
numpy.ndarray
) The preprocessed data read from file.
- data_filestr or
- write_data(data, data_file)[source]¶
Writes the given preprocessed data to a file with the given name. In this base class implementation, we simply use
bob.bio.base.save()
for that. If you have a different format (e.g. not images), please overwrite this function.Parameters:
- dataobject
The preprocessed data, i.e., what is returned from __call__.
- data_filestr or
h5py.File
The file open for writing, or the name of the file to write.
- class bob.bio.base.extractor.Extractor(requires_training=False, split_training_data_by_client=False, min_extractor_file_size=1000, min_feature_file_size=1000, **kwargs)¶
Bases:
object
This is the base class for all feature extractors. It defines the minimum requirements that a derived feature extractor class need to implement.
If your derived class requires training, please register this here.
Parameters
- requires_trainingbool
Set this flag to
True
if your feature extractor needs to be trained. In that case, please override thetrain()
andload()
methods- split_training_data_by_clientbool
Set this flag to
True
if your feature extractor requires the training data to be split by clients. Ignored, ifrequires_training
isFalse
- min_extractor_file_sizeint
The minimum file size of a saved extractor file for extractors that require training in bytes. If the saved file size is smaller than this, it is assumed to be a corrupt file and the extractor will be trained again.
- min_feature_file_sizeint
The minimum file size of extracted features in bytes. If the saved file size is smaller than this, it is assumed to be a corrupt file and the features will be extracted again.
- kwargs
key=value
pairs A list of keyword arguments to be written in the __str__ function.
- load(extractor_file)[source]¶
Loads the parameters required for feature extraction from the extractor file. This function usually is only useful in combination with the
train()
function. In this base class implementation, it does nothing.Parameters:
- extractor_filestr
The file to read the extractor from.
- read_feature(feature_file)[source]¶
Reads the extracted feature from file. In this base class implementation, it uses
bob.bio.base.load()
to do that. If you have different format, please overwrite this function.Parameters:
- feature_filestr or
h5py.File
The file open for reading or the name of the file to read from.
Returns:
- featureobject (usually
numpy.ndarray
) The feature read from file.
- feature_filestr or
- train(training_data, extractor_file)[source]¶
This function can be overwritten to train the feature extractor. If you do this, please also register the function by calling this base class constructor and enabling the training by
requires_training = True
.Parameters:
- training_data[object] or [[object]]
A list of preprocessed data that can be used for training the extractor. Data will be provided in a single list, if
split_training_features_by_client = False
was specified in the constructor, otherwise the data will be split into lists, each of which contains the data of a single (training-)client.- extractor_filestr
The file to write. This file should be readable with the
load()
function.
- write_feature(feature, feature_file)[source]¶
Writes the given extracted feature to a file with the given name. In this base class implementation, we simply use
bob.bio.base.save()
for that. If you have a different format, please overwrite this function.Parameters:
- featureobject
The extracted feature, i.e., what is returned from __call__.
- feature_filestr or
h5py.File
The file open for writing, or the name of the file to write.
- class bob.bio.base.transformers.ExtractorTransformer(instance, model_path=None, **kwargs)¶
Bases:
TransformerMixin
,BaseEstimator
Scikit learn transformer for
bob.bio.base.extractor.Extractor
.- Parameters
instance (object) – An instance of
bob.bio.base.extractor.Extractor
model_path (
str
) – Model path in caseinstance.requires_training
is equal toTrue
.
- class bob.bio.base.transformers.PreprocessorTransformer(instance, **kwargs)¶
Bases:
TransformerMixin
,BaseEstimator
Scikit learn transformer for
bob.bio.base.preprocessor.Preprocessor
.- Parameters
instance (object) – An instance of bob.bio.base.preprocessor.Preprocessor
- class bob.bio.base.transformers.ReferenceIdEncoder(*, categories='auto', dtype=<class 'int'>, handle_unknown='use_encoded_value', unknown_value=-1, **kwargs)¶
Bases:
OrdinalEncoder
An OrdinalEncoder that can converts reference_id strings to integers. This is used to prepare labels used in training supervised transformers like the ISV algorithm.
- class bob.bio.base.transformers.defaultdict¶
Bases:
dict
defaultdict(default_factory=None, /, […]) –> dict with default factory
The default factory is called without arguments to produce a new value when a key is not present, in __getitem__ only. A defaultdict compares equal to a dict with the same items. All remaining arguments are treated the same as if they were passed to the dict constructor, including keyword arguments.
- copy() a shallow copy of D. ¶
- default_factory¶
Factory for default value called by __missing__().
- class bob.bio.base.algorithm.Distance(distance_function='cosine', factor=- 1, average_on_enroll=True, average_probes=False, probes_score_fusion='max', enrolls_score_fusion='max', **kwargs)¶
Bases:
BioAlgorithm
A distance algorithm to compare feature vectors. Many biometric algorithms are based on comparing feature vectors that are usually extracted by using deep neural networks. The most common distance function is the cosine similarity, which is the default in this class.
- class bob.bio.base.algorithm.GMM(n_gaussians: int, k_means_trainer=None, max_fitting_steps: int = 25, convergence_threshold: float = 0.0005, mean_var_update_threshold: float = 0.0005, update_means: bool = True, update_variances: bool = True, update_weights: bool = True, enroll_iterations: int = 1, enroll_update_means: bool = True, enroll_update_variances: bool = False, enroll_update_weights: bool = False, enroll_relevance_factor: ~typing.Optional[float] = 4, enroll_alpha: float = 0.5, scoring_function: ~typing.Callable = <function linear_scoring>, random_state: int = 5489, return_stats_in_transform: bool = False, **kwargs)¶
Bases:
GMMMachine
,BioAlgorithm
Algorithm for computing UBM and Gaussian Mixture Models of the features.
Features must be normalized to zero mean and unit standard deviation.
Models are MAP GMM machines trained from a UBM on the enrollment feature set.
The UBM is a ML GMM machine trained on the training feature set.
Probes are GMM statistics of features projected on the UBM.
- compare(enroll_templates, probe_templates)[source]¶
Computes the similarity score between all enrollment and probe templates.
- Parameters
- Returns
scores – A matrix of shape (N, M) containing the similarity scores.
- Return type
- create_templates(list_of_feature_sets, enroll)[source]¶
Creates enroll or probe templates from multiple sets of features.
The enroll template format can be different from the probe templates.
- Parameters
list_of_feature_sets (list) – A list of list of features with the shape of Nx?xD. N templates should be computed. Note that you cannot call np.array(list_of_feature_sets) because the number of features per set can be different depending on the database.
enroll (bool) – If True, the features are for enrollment. If False, the features are for probe.
- Returns
templates – A list of templates which has the same length as
list_of_feature_sets
.- Return type
- enroll(data)[source]¶
Enrolls a GMM using MAP adaptation given a reference’s feature vectors
Returns a GMMMachine tuned from the UBM with MAP on a biometric reference data.
- project(array)[source]¶
Computes GMM statistics against a UBM, given a 2D array of feature vectors
This is applied to the probes before scoring.
- read_biometric_reference(model_file)[source]¶
Reads an enrolled reference model, which is a MAP GMMMachine.
- write_biometric_reference(model: GMMMachine, model_file)[source]¶
Write the enrolled reference (MAP GMMMachine) into a file.
- class bob.bio.base.algorithm.ISV(r_U, em_iterations=10, relevance_factor=4.0, random_state=0, ubm=None, ubm_kwargs=None, **kwargs)¶
Bases:
ISVMachine
,BioAlgorithm
ISV transformer and bioalgorithm to be used in pipelines
- compare(enroll_templates, probe_templates)[source]¶
Computes the similarity score between all enrollment and probe templates.
- Parameters
- Returns
scores – A matrix of shape (N, M) containing the similarity scores.
- Return type
- create_templates(list_of_feature_sets, enroll)[source]¶
Creates enroll or probe templates from multiple sets of features.
The enroll template format can be different from the probe templates.
- Parameters
list_of_feature_sets (list) – A list of list of features with the shape of Nx?xD. N templates should be computed. Note that you cannot call np.array(list_of_feature_sets) because the number of features per set can be different depending on the database.
enroll (bool) – If True, the features are for enrollment. If False, the features are for probe.
- Returns
templates – A list of templates which has the same length as
list_of_feature_sets
.- Return type
- class bob.bio.base.algorithm.JFA(r_U, r_V, em_iterations=10, relevance_factor=4.0, random_state=0, ubm=None, ubm_kwargs=None, **kwargs)¶
Bases:
JFAMachine
,BioAlgorithm
JFA transformer and bioalgorithm to be used in pipelines
- compare(enroll_templates, probe_templates)[source]¶
Computes the similarity score between all enrollment and probe templates.
- Parameters
- Returns
scores – A matrix of shape (N, M) containing the similarity scores.
- Return type
- create_templates(list_of_feature_sets, enroll)[source]¶
Creates enroll or probe templates from multiple sets of features.
The enroll template format can be different from the probe templates.
- Parameters
list_of_feature_sets (list) – A list of list of features with the shape of Nx?xD. N templates should be computed. Note that you cannot call np.array(list_of_feature_sets) because the number of features per set can be different depending on the database.
enroll (bool) – If True, the features are for enrollment. If False, the features are for probe.
- Returns
templates – A list of templates which has the same length as
list_of_feature_sets
.- Return type
A set of utilities to load score files with different formats.
- bob.bio.base.score.load.open_file(filename, mode='rt')[source]¶
Opens the given score file for reading.
Score files might be raw text files, or a tar-file including a single score file inside.
- Parameters
filename (
str
,file-like
) – The name of the score file to open, or a file-like object open for reading. If a file name is given, the according file might be a raw text file or a (compressed) tar file containing a raw text file.- Returns
A read-only file-like object as it would be returned by
open()
.- Return type
file-like
- bob.bio.base.score.load.four_column(filename)[source]¶
Loads a score set from a single file and yield its lines
Loads a score set from a single file and yield its lines (to avoid loading the score file at once into memory). This function verifies that all fields are correctly placed and contain valid fields. The score file must contain the following information in each line:
claimed_id real_id test_label score
- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.- Yields
str – The claimed identity – the client name of the model that was used in the comparison
str: The real identity – the client name of the probe that was used in the comparison
str: A label of the probe – usually the probe file name, or the probe id
float: The result of the comparison of the model and the probe
- bob.bio.base.score.load.split_four_column(filename)[source]¶
Loads a score set from a single file and splits the scores
Loads a score set from a single file and splits the scores between negatives and positives. The score file has to respect the 4 column format as defined in the method
four_column()
.This method avoids loading and allocating memory for the strings present in the file. We only keep the scores.
- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.- Returns
- negatives, 1D float array containing the list of scores, for which
the
claimed_id
and thereal_id
are different (seefour_column()
)- array: positives, 1D float array containing the list of scores, for which
the
claimed_id
and thereal_id
are identical (seefour_column()
)
- Return type
array
- bob.bio.base.score.load.get_split_dataframe(filename)[source]¶
Loads a score set that was written with
bob.bio.base.pipelines.CSVScoreWriter
Returns two dataframes, split between positives and negatives.
:param filename (
str
: opened withopen_file()
containing the scores. :type filename (str
: The file object that will be :paramfile-like
): opened withopen_file()
containing the scores. :typefile-like
): The file object that will be- Returns
dataframe (negatives, contains the list of scores (and metadata) for which) – the fields of the
bio_ref_subject_id
andprobe_subject_id
columns are different. (see PipelineSimple: Advanced features)dataframe (positives, contains the list of scores (and metadata) for which) – the fields of the
bio_ref_subject_id
andprobe_subject_id
columns are identical. (see PipelineSimple: Advanced features)
- bob.bio.base.score.load.split_csv_scores(filename)[source]¶
Loads a score set that was written with
bob.bio.base.pipelines.CSVScoreWriter
:param filename (
str
: opened withopen_file()
containing the scores. :type filename (str
: The file object that will be :paramfile-like
): opened withopen_file()
containing the scores. :typefile-like
): The file object that will be- Returns
array (negatives, 1D float array containing the list of scores, for which) – the fields of the
bio_ref_subject_id
andprobe_subject_id
columns are different. (see PipelineSimple: Advanced features)array (positives, 1D float array containing the list of scores, for which) – the fields of the
bio_ref_subject_id
andprobe_subject_id
columns are identical. (see PipelineSimple: Advanced features)
- bob.bio.base.score.load.cmc_four_column(filename)[source]¶
Loads scores to compute CMC curves from a file in four column format.
The four column file needs to be in the same format as described in
four_column()
, and thetest_label
(column 3) has to contain the test/probe file name or a probe id.This function returns a list of tuples. For each probe file, the tuple consists of a list of negative scores and a list of positive scores. Usually, the list of positive scores should contain only one element, but more are allowed. The result of this function can directly be passed to, e.g., the
bob.measure.cmc()
function.- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.- Returns
A list of tuples, where each tuple contains the
negative
andpositive
scores for one probe of the database. Bothnegatives
andpositives
can be either an 1Dnumpy.ndarray
of typefloat
, orNone
.- Return type
- bob.bio.base.score.load.five_column(filename)[source]¶
Loads a score set from a single file and yield its lines
Loads a score set from a single file and yield its lines (to avoid loading the score file at once into memory). This function verifies that all fields are correctly placed and contain valid fields. The score file must contain the following information in each line:
claimed_id model_label real_id test_label score
- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.- Yields
str – The claimed identity – the client name of the model that was used in the comparison
str: A label for the model – usually the model file name, or the model id
str: The real identity – the client name of the probe that was used in the comparison
str: A label of the probe – usually the probe file name, or the probe id
float: The result of the comparison of the model and the probe
- bob.bio.base.score.load.split_five_column(filename)[source]¶
Loads a score set from a single file and splits the scores
Loads a score set from a single file in five column format and splits the scores between negatives and positives. The score file has to respect the 5 column format as defined in the method
five_column()
.This method avoids loading and allocating memory for the strings present in the file. We only keep the scores.
- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.- Returns
- negatives, 1D float array containing the list of scores, for which
the
claimed_id
and thereal_id
are different (seefour_column()
)- array: positives, 1D float array containing the list of scores, for which
the
claimed_id
and thereal_id
are identical (seefour_column()
)
- Return type
array
- bob.bio.base.score.load.cmc_five_column(filename)[source]¶
Loads scores to compute CMC curves from a file in five column format.
The five column file needs to be in the same format as described in
five_column()
, and thetest_label
(column 4) has to contain the test/probe file name or a probe id.This function returns a list of tuples. For each probe file, the tuple consists of a list of negative scores and a list of positive scores. Usually, the list of positive scores should contain only one element, but more are allowed. The result of this function can directly be passed to, e.g., the
bob.measure.cmc()
function.- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.- Returns
A list of tuples, where each tuple contains the
negative
andpositive
scores for one probe of the database.- Return type
- bob.bio.base.score.load.scores(filename, ncolumns=None)[source]¶
Loads the scores from the given score file and yield its lines. Depending on the score file format, four or five elements are yielded, see
bob.bio.base.score.load.four_column()
andbob.bio.base.score.load.five_column()
for details.Parameters:
- filename:
str
,file-like
: The file object that will be opened with
open_file()
containing the scores.- ncolumns: any
ignored
Yields:
- filename:
- bob.bio.base.score.load.split(filename, ncolumns=None, sort=False)[source]¶
Loads the scores from the given score file and splits them into positives and negatives. Depending on the score file format, it calls see
bob.bio.base.score.load.split_four_column()
andbob.bio.base.score.load.split_five_column()
for details.- Parameters
filename (str) – The path to the score file.
ncolumns (int or
None
) – If specified to be4
or5
, the score file will be assumed to be in the given format. If not specified, the score file format will be estimated automaticallysort (
bool
, optional) – IfTrue
, will return sorted negatives and positives
- Returns
negatives (1D
numpy.ndarray
of type float) – This array contains the list of scores, for which theclaimed_id
and thereal_id
are different (seefour_column()
)positives (1D
numpy.ndarray
of type float) – This array contains the list of scores, for which theclaimed_id
and thereal_id
are identical (seefour_column()
)
- bob.bio.base.score.load.cmc(filename, ncolumns=None) list [source]¶
Loads scores to compute CMC curves.
Depending on the score file format, it calls see
bob.bio.base.score.load.cmc_four_column()
and :py:func:`bob.bio.base.score.load.cmc_five_column for details.- Parameters
filename (
str
orfile-like
) – The file object that will be opened withopen_file()
containing the scores.ncolumns – (
int
, Optional): If specified to be4
or5
, the score file will be assumed to be in the given format. If not specified, the score file format will be estimated automatically
Returns:
list
: [(neg,pos)] A list of tuples, where each tuple contains thenegative
andpositive
scores for one probe of the database.
- bob.bio.base.score.load.load_score(filename, ncolumns=None, minimal=False, **kwargs)[source]¶
Load scores using numpy.loadtxt and return the data as a numpy array.
- Parameters
filename (
str
,file-like
) – The file object that will be opened withopen_file()
containing the scores.ncolumns (
int
, optional) – 4, 5 or None (the default), specifying the number of columns in the score file. If None is provided, the number of columns will be guessed.minimal (
bool
, optional) – If True, only loadsclaimed_id
,real_id
, andscores
.**kwargs – Keyword arguments passed to
numpy.genfromtxt()
- Returns
An array which contains not only the actual scores but also the
claimed_id
,real_id
,test_label
and['model_label']
- Return type
array
- bob.bio.base.score.load.load_files(filenames, func_load)[source]¶
Load a list of score files and return a list of tuples of (neg, pos)
- Parameters
filenames (
list
) – list of file pathsfunc_load – function that can read files in the list
- Returns
:any:`list` ([(neg,pos)] A list of tuples, where each tuple contains the)
negative
andpositive
sceach system/probee.
- bob.bio.base.score.load.get_negatives_positives(score_lines)[source]¶
Take the output of load_score and return negatives and positives. This function aims to replace split_four_column and split_five_column but takes a different input. It’s up to you to use which one.
- bob.bio.base.score.load.get_negatives_positives_from_file(filename, **kwargs)[source]¶
Loads the scores first efficiently and then calls get_negatives_positives
- bob.bio.base.score.load.get_negatives_positives_all(score_lines_list)[source]¶
Take a list of outputs of load_score and return stacked negatives and positives.
- bob.bio.base.score.load.get_all_scores(score_lines_list)[source]¶
Take a list of outputs of load_score and return stacked scores
- bob.bio.base.score.load.dump_score(filename, score_lines)[source]¶
Dump scores that were loaded using
load_score()
The number of columns is automatically detected.
- bob.bio.base.score.load.split_csv_vuln(filename)[source]¶
Loads vulnerability scores from a CSV score file.
Returns the scores split between positive and negative as well as licit and presentation attack (spoof).
The CSV must contain a probe_attack_type column with each field either containing a str defining the attack type (spoof), or empty (licit).
- Parameters
filename (str) – The path to a CSV file containing all the scores
- Returns
split_scores – The licit negative and positive, and spoof scores for probes.
- Return type
dict of str: numpy.ndarray
Plots and measures for bob.bio.base
- class bob.bio.base.script.figure.Cmc(ctx, scores, evaluation, func_load)[source]¶
Bases:
PlotBase
Handles the plotting of Cmc
- compute(idx, input_scores, input_names)[source]¶
Plot CMC for dev and eval data using
bob.measure.plot.cmc()
- class bob.bio.base.script.figure.Dir(ctx, scores, evaluation, func_load)[source]¶
Bases:
PlotBase
Handles the plotting of DIR curve
- compute(idx, input_scores, input_names)[source]¶
Plot DIR for dev and eval data using
bob.measure.plot.detection_identification_curve()
- class bob.bio.base.script.figure.Metrics(ctx, scores, evaluation, func_load, names=('Failure to Acquire', 'False Match Rate', 'False Non Match Rate', 'False Accept Rate', 'False Reject Rate', 'Half Total Error Rate'))[source]¶
Bases:
Metrics
Compute metrics from score files
- class bob.bio.base.script.figure.MultiMetrics(ctx, scores, evaluation, func_load)[source]¶
Bases:
MultiMetrics
Compute metrics from score files
- class bob.bio.base.script.figure.Hist(ctx, scores, evaluation, func_load, nhist_per_system=2)[source]¶
Bases:
Hist
Histograms for biometric scores
Click commands for bob.bio.base
Generate random scores.
- bob.bio.base.script.gen.gen_score_distr(mean_neg, mean_pos, sigma_neg=10, sigma_pos=10, n_neg=5000, n_pos=5000, seed=0)[source]¶
Generate scores from normal distributions
- Parameters
mean_neg (float) – Mean for negative scores
mean_pos (float) – Mean for positive scores
sigma_neg (float) – STDev for negative scores
sigma_pos (float) – STDev for positive scores
n_pos (int) – The number of positive scores generated
n_neg (int) – The number of negative scores generated
seed (int) – A value to initialize the Random Number generator. Giving the same value (or not specifying ‘seed’) on two different calls will generate the same lists of scores.
- Returns
- bob.bio.base.script.gen.write_scores_to_file(neg, pos, filename, n_subjects=5, n_probes_per_subject=5, n_unknown_subjects=0, neg_unknown=None, to_csv=True, five_col=False, metadata={'meta0': 'data0', 'meta1': 'data1'})[source]¶
Writes score distributions
- Parameters
neg (
numpy.ndarray
) – Scores for negative samples.pos (
numpy.ndarray
) – Scores for positive samples.filename (str) – The path to write the score to.
n_subjects (int) – Number of different subjects
n_probes_per_subject (int) – Number of different samples used as probe for each subject
n_unknown_subjects (int) – The number of unknown (no registered model) subjects
neg_unknown (None or list) – The of unknown subjects scores
to_csv (bool) – Use the CSV format, else the legacy 4 or 5 columns format.
five_col (bool) – If 5-colum format, else 4-column
- bob.bio.base.utils.score_fusion_strategy(strategy_name='average')[source]¶
Returns a function to compute a fusion strategy between different scores.
Different strategies are employed:
'average'
: The averaged score is computed using thenumpy.average()
function.'min'
: The minimum score is computed using themin()
function.'max'
: The maximum score is computed using themax()
function.'median'
: The median score is computed using thenumpy.median()
function.None
is also accepted, in which caseNone
is returned.
- bob.bio.base.utils.selected_indices(total_number_of_indices, desired_number_of_indices=None)[source]¶
Returns a list of indices that will contain exactly the number of desired indices (or the number of total items in the list, if this is smaller). These indices are selected such that they are evenly spread over the whole sequence.
- bob.bio.base.utils.selected_elements(list_of_elements, desired_number_of_elements=None)[source]¶
Returns a list of elements that are sub-selected from the given list (or the list itself, if its length is smaller). These elements are selected such that they are evenly spread over the whole list.
- bob.bio.base.utils.pretty_print(obj, kwargs)[source]¶
Returns a pretty-print of the parameters to the constructor of a class, which should be able to copy-paste on the command line to create the object (with few exceptions).
- bob.bio.base.utils.is_argument_available(argument, method)[source]¶
Check if an argument (or keyword argument) is available in a method
- bob.bio.base.utils.method¶
Pointer to the method
- bob.bio.base.utils.resources.valid_keywords = ('database', 'preprocessor', 'extractor', 'algorithm', 'grid', 'client', 'config', 'annotator', 'pipeline')¶
Keywords for which resources are defined.
- bob.bio.base.utils.resources.read_config_file(filenames, keyword=None)[source]¶
Use this function to read the given configuration file. If a keyword is specified, only the configuration according to this keyword is returned. Otherwise a dictionary of the configurations read from the configuration file is returned.
Parameters:
- filenames[str]
A list (pontentially empty) of configuration files or resources to read running options from
- keywordstr or
None
If specified, only the contents of the variable with the given name is returned. If
None
, the whole configuration is returned (a local namespace)
Returns:
- configobject or namespace
If
keyword
is specified, the object inside the configuration with the given name is returned. Otherwise, the whole configuration is returned (as a local namespace).
- bob.bio.base.utils.resources.load_resource(resource, keyword, imports=['bob.bio.base'], package_prefix='bob.bio.', preferred_package=None)[source]¶
Loads the given resource that is registered with the given keyword. The resource can be:
a resource as defined in the setup.py
a configuration file
a string defining the construction of an object. If imports are required for the construction of this object, they can be given as list of strings.
Parameters:
- resourcestr
Any string interpretable as a resource (see above).
- keywordstr
A valid resource keyword, can be one of
bob.bio.base.utils.resources.valid_keywords
.- imports[str]
A list of strings defining which modules to import, when constructing new objects (option 3).
- package_prefixstr
Package namespace, in which we search for entry points, e.g.,
bob.bio
.- preferred_packagestr or
None
When several resources with the same name are found in different packages (e.g., in different
bob.bio
or other packages), this specifies the preferred package to load the resource from. If not specified, the extension that is not frombob.bio
is selected.
Returns:
- resourceobject
The resulting resource object is returned, either read from file or resource, or created newly.
- bob.bio.base.utils.resources.extensions(keywords=valid_keywords, package_prefix='bob.bio.') extensions [source]¶
Returns a list of packages that define extensions using the given keywords.
Parameters:
- keywords[str]
A list of keywords to load entry points for. Defaults to all
bob.bio.base.utils.resources.valid_keywords
.- package_prefixstr
Package namespace, in which we search for entry points, e.g.,
bob.bio
.
- bob.bio.base.utils.resources.resource_keys(keyword, exclude_packages=[], package_prefix='bob.bio.', strip=['dummy'])[source]¶
Reads and returns all resources that are registered with the given keyword. Entry points from the given
exclude_packages
are ignored.
- bob.bio.base.utils.resources.list_resources(keyword, strip=['dummy'], package_prefix='bob.bio.', verbose=False, packages=None)[source]¶
Returns a string containing a detailed list of resources that are registered with the given keyword.
- bob.bio.base.utils.resources.database_directories(strip=['dummy'], replacements=None, package_prefix='bob.bio.')[source]¶
Returns a dictionary of original directories for all registered databases.
- bob.bio.base.utils.resources.get_resource_filename(resource_name, group)[source]¶
Get the file name of a resource.
- bob.bio.base.utils.io.filter_missing_files(file_names, split_by_client=False, allow_missing_files=True)[source]¶
This function filters out files that do not exist, but only if
allow_missing_files
is set toTrue
, otherwise the list offile_names
is returned unaltered.
- bob.bio.base.utils.io.filter_none(data, split_by_client=False)[source]¶
This function filters out
None
values from the given list (or list of lists, whensplit_by_client
is enabled).
- bob.bio.base.utils.io.check_file(filename, force, expected_file_size=1)[source]¶
Checks if the file with the given
filename
exists and has size greater or equal toexpected_file_size
. If the file is to small, or if theforce
option is set toTrue
, the file is removed. This function returnsTrue
is the file exists (and has not been removed), otherwiseFalse
- bob.bio.base.utils.io.read_original_data(biofile, directory, extension)[source]¶
This function reads the original data using the given
biofile
instance. It simply callsload(directory, extension)
frombob.bio.base.database.BioFile
or one of its derivatives.- Parameters
biofile (
bob.bio.base.database.BioFile
or one of its derivatives) – The file to read the original data.directory (str) – The base directory of the database.
extension (str or
None
) – The extension of the original data. Might beNone
if thebiofile
itself has the extension stored.
- Returns
Whatver
biofile.load
returns; usually anumpy.ndarray
- Return type
- bob.bio.base.utils.io.load(file)[source]¶
Loads data from file. The given file might be an HDF5 file open for reading or a string.
- bob.bio.base.utils.io.save(data, file, compression=0)[source]¶
Saves the data to file using HDF5. The given file might be an HDF5 file open for writing, or a string. If the given data contains a
save
method, this method is called with the given HDF5 file. Otherwise the data is written to the HDF5 file using the given compression.
- bob.bio.base.utils.io.open_compressed(filename, open_flag='r', compression_type='bz2')[source]¶
Opens a compressed HDF5File with the given opening flags. For the ‘r’ flag, the given compressed file will be extracted to a local space. For ‘w’, an empty HDF5File is created. In any case, the opened HDF5File is returned, which needs to be closed using the close_compressed() function.
- bob.bio.base.utils.io.close_compressed(filename, hdf5_file, compression_type='bz2', create_link=False)[source]¶
Closes the compressed hdf5_file that was opened with open_compressed. When the file was opened for writing (using the ‘w’ flag in open_compressed), the created HDF5 file is compressed into the given file name. To be able to read the data using the real tools, a link with the correct extension might is created, when create_link is set to True.
- bob.bio.base.utils.io.load_compressed(filename, compression_type='bz2')[source]¶
Extracts the data to a temporary HDF5 file using HDF5 and reads its contents. Note that, though the file name is .hdf5, it contains compressed data! Accepted compression types are ‘gz’, ‘bz2’, ‘’
- bob.bio.base.utils.io.save_compressed(data, filename, compression_type='bz2', create_link=False)[source]¶
Saves the data to a temporary file using HDF5. Afterwards, the file is compressed using the given compression method and saved using the given file name. Note that, though the file name will be .hdf5, it will contain compressed data! Accepted compression types are ‘gz’, ‘bz2’, ‘’