Python API for bob.pad.base¶
Generic functions¶
Tools to run PAD experiments¶
Command line generation¶
Creates an |
|
|
Parses the command line and arranges the arguments accordingly. |
|
Converts the given options to a string that can be executed in a terminal. |
|
Writes information about the current experimental setup into a file specified on command line. |
This class provides shortcuts for selecting different files for different stages of the snti-spoofing process. |
Algorithm¶
Trains the feature projector using extracted features of the |
|
|
Projects the features for all files of the database. |
Scoring¶
|
Computes the scores for the given groups. |
Details¶
-
bob.pad.base.
padfile_to_label
(padfile)[source]¶ Returns an integer presenting the label of the current sample.
- Parameters
padfile (
bob.pad.base.database.PadFile
) – A pad file.- Returns
True (1) if it is a bona-fide sample, False (O) otherwise.
- Return type
-
bob.pad.base.
combinations
(input_dict)[source]¶ Obtain all possible key-value combinations in the input dictionary containing list values.
Parameters:
input_dict
dict
Input dictionary with list values.
Returns:
combinations
[dict
]A list of dictionaries containing the combinations.
-
bob.pad.base.
convert_and_prepare_features
(features, dtype='float64')[source]¶ This function converts a list or a frame container of features into a 2D array of features. If the input is a list of frame containers, features from different frame containers (individuals) are concatenated into the same list. This list is then converted to an array. The rows are samples, the columns are features.
Parameters:
features
[2Dnumpy.ndarray
] or [FrameContainer]A list or 2D feature arrays or a list of Frame Containers, see
bob.bio.video.utils.FrameContainer
. Each frame Container contains feature vectors for the particular individual/person.
Returns:
features_array
2Dnumpy.ndarray
An array containing features for all samples and frames.
-
bob.pad.base.
convert_array_to_list_of_frame_cont
(data)[source]¶ Convert an input 2D array to a list of FrameContainers.
Parameters:
data
2Dnumpy.ndarray
Input data array of the dimensionality (N_samples X N_features ).
Returns:
frame_container_list
[FrameContainer]A list of FrameContainers, see
bob.bio.video.utils.FrameContainer
for further details. Each frame container contains one feature vector.
-
bob.pad.base.
convert_frame_cont_to_array
(frame_container)[source]¶ This function converts a single Frame Container into an array of features. The rows are samples, the columns are features.
Parameters:
frame_container
objectA Frame Container conteining the features of an individual, see
bob.bio.video.utils.FrameContainer
.
Returns:
features_array
2Dnumpy.ndarray
An array containing features for all frames. The rows are samples, the columns are features.
-
bob.pad.base.
convert_list_of_frame_cont_to_array
(frame_containers)[source]¶ This function converts a list of Frame containers into an array of features. Features from different frame containers (individuals) are concatenated into the same list. This list is then converted to an array. The rows are samples, the columns are features.
Parameters:
frame_containers
[FrameContainer]A list of Frame Containers, , see
bob.bio.video.utils.FrameContainer
. Each frame Container contains feature vectors for the particular individual/person.
Returns:
features_array
2Dnumpy.ndarray
An array containing features for all frames of all individuals.
-
bob.pad.base.
mean_std_normalize
(features, features_mean=None, features_std=None, copy=True)[source]¶ The features in the input 2D array are mean-std normalized. The rows are samples, the columns are features. If
features_mean
andfeatures_std
are provided, then these vectors will be used for normalization. Otherwise, the mean and std of the features is computed on the fly.Parameters:
features
2Dnumpy.ndarray
Array of features to be normalized.
features_mean
1Dnumpy.ndarray
Mean of the features. Default: None.
features_std
2Dnumpy.ndarray
Standart deviation of the features. Default: None.
Returns:
features_norm
2Dnumpy.ndarray
Normalized array of features.
features_mean
1Dnumpy.ndarray
Mean of the features.
features_std
1Dnumpy.ndarray
Standart deviation of the features.
-
bob.pad.base.
norm_train_cv_data
(real_train, real_cv, attack_train, attack_cv, one_class_flag=False)[source]¶ Mean-std normalization of train and cross-validation data arrays.
Parameters:
real_train
2Dnumpy.ndarray
Subset of train features for the real class.
real_cv
2Dnumpy.ndarray
Subset of cross-validation features for the real class.
attack_train
2Dnumpy.ndarray
Subset of train features for the attack class.
attack_cv
2Dnumpy.ndarray
Subset of cross-validation features for the attack class.
one_class_flag
bool
If set to
True
, only positive/real samples will be used to compute the mean and std normalization vectors. Set toTrue
if using one-class SVM. Default: False.
Returns:
real_train_norm
2Dnumpy.ndarray
Normalized subset of train features for the real class.
real_cv_norm
2Dnumpy.ndarray
Normalized subset of cross-validation features for the real class.
attack_train_norm
2Dnumpy.ndarray
Normalized subset of train features for the attack class.
attack_cv_norm
2Dnumpy.ndarray
Normalized subset of cross-validation features for the attack class.
-
bob.pad.base.
norm_train_data
(real, attack)[source]¶ Mean-std normalization of input data arrays. The mean and std normalizers are computed using real class only.
Parameters:
real
2Dnumpy.ndarray
Training features for the real class.
attack
2Dnumpy.ndarray
Training features for the attack class.
Returns:
real_norm
2Dnumpy.ndarray
Mean-std normalized training features for the real class.
attack_norm
2Dnumpy.ndarray
Mean-std normalized training features for the attack class. Or an empty list if
one_class_flag = True
.features_mean
1Dnumpy.ndarray
Mean of the features.
features_std
1Dnumpy.ndarray
Standart deviation of the features.
-
bob.pad.base.
prepare_data_for_hyper_param_grid_search
(training_features, n_samples)[source]¶ This function converts a list of all training features returned by
read_features
method of the extractor to the subsampled train and cross-validation arrays for both real and attack classes.Parameters:
training_features
[[FrameContainer], [FrameContainer]]A list containing two elements: [0] - a list of Frame Containers with feature vectors for the real class; [1] - a list of Frame Containers with feature vectors for the attack class.
n_samples
int
Number of uniformly selected feature vectors per class.
Returns:
real_train
2Dnumpy.ndarray
Selected subset of train features for the real class. The number of samples in this set is n_samples/2, which is defined by split_data_to_train_cv method of this class.
real_cv
2Dnumpy.ndarray
Selected subset of cross-validation features for the real class. The number of samples in this set is n_samples/2, which is defined by split_data_to_train_cv method of this class.
attack_train
2Dnumpy.ndarray
Selected subset of train features for the attack class. The number of samples in this set is n_samples/2, which is defined by split_data_to_train_cv method of this class.
attack_cv
2Dnumpy.ndarray
Selected subset of cross-validation features for the attack class. The number of samples in this set is n_samples/2, which is defined by split_data_to_train_cv method of this class.
-
bob.pad.base.
select_quasi_uniform_data_subset
(features, n_samples)[source]¶ Select quasi uniformly N samples/feature vectors from the input array of samples. The rows in the input array are samples. The columns are features. Use this function if n_samples is close to the number of samples.
Parameters:
features
2Dnumpy.ndarray
Input array with feature vectors. The rows are samples, columns are features.
n_samples
int
The number of samples to be selected uniformly from the input array of features.
Returns:
features_subset
2Dnumpy.ndarray
Selected subset of features.
-
bob.pad.base.
select_uniform_data_subset
(features, n_samples)[source]¶ Uniformly select N samples/feature vectors from the input array of samples. The rows in the input array are samples. The columns are features.
Parameters:
features
2Dnumpy.ndarray
Input array with feature vectors. The rows are samples, columns are features.
n_samples
int
The number of samples to be selected uniformly from the input array of features.
Returns:
features_subset
2Dnumpy.ndarray
Selected subset of features.
-
bob.pad.base.
split_data_to_train_cv
(features)[source]¶ This function is designed to split the input array of features into two subset namely train and cross-validation. These subsets can be used to tune the hyper-parameters of the SVM. The splitting is 50/50, the first half of the samples in the input are selected to be train set, and the second half of samples is cross-validation.
Parameters:
features
2Dnumpy.ndarray
Input array with feature vectors. The rows are samples, columns are features.
Returns:
features_train
2Dnumpy.ndarray
Selected subset of train features.
features_cv
2Dnumpy.ndarray
Selected subset of cross-validation features.
-
bob.pad.base.
vstack_features
(reader, paths, same_size=False)[source]¶ Stacks all features in a memory efficient way.
- Parameters
reader (
collections.Callable
) – The function to load the features. The function should only take one argumentpath
and return loaded features. Usefunctools.partial
to accommodate your reader to this format. The features returned byreader
are expected to have the samenumpy.dtype
and the same shape except for their first dimension. First dimension should correspond to the number of samples.paths (
collections.Iterable
) – An iterable of paths to iterate on. Whatever is inside path is given toreader
so they do not need to be necessarily paths to actual files. Ifsame_size
isTrue
,len(paths)
must be valid.same_size (
bool
, optional) – IfTrue
, it assumes that arrays inside all the paths are the same shape. If you know the features are the same size in all paths, set this toTrue
to improve the performance.
- Returns
The read features with the shape
(n_samples, *features_shape[1:])
.- Return type
Examples
This function in a simple way is equivalent to calling
numpy.vstack(reader(p) for p in paths)
.>>> import numpy >>> from bob.io.base import vstack_features >>> def reader(path): ... # in each file, there are 5 samples and features are 2 dimensional. ... return numpy.arange(10).reshape(5,2) >>> paths = ['path1', 'path2'] >>> all_features = vstack_features(reader, paths) >>> numpy.allclose(all_features, numpy.array( ... [[0, 1], ... [2, 3], ... [4, 5], ... [6, 7], ... [8, 9], ... [0, 1], ... [2, 3], ... [4, 5], ... [6, 7], ... [8, 9]])) True >>> all_features_with_more_memory = numpy.vstack(reader(p) for p in paths) >>> numpy.allclose(all_features, all_features_with_more_memory) True
You can allocate the array at once to improve the performance if you know that all features in paths have the same shape and you know the total number of the paths:
>>> all_features = vstack_features(reader, paths, same_size=True) >>> numpy.allclose(all_features, numpy.array( ... [[0, 1], ... [2, 3], ... [4, 5], ... [6, 7], ... [8, 9], ... [0, 1], ... [2, 3], ... [4, 5], ... [6, 7], ... [8, 9]])) True
Note
This function runs very slowly. Only use it when RAM is precious.
-
class
bob.pad.base.tools.
FileSelector
(decorated)[source]¶ This class provides shortcuts for selecting different files for different stages of the snti-spoofing process.
It communicates with the database and provides lists of file names for all steps of the tool chain.
Parameters:
- database
bob.pad.base.database.PadDatabase
or derived. The database object that provides the list of files.
- preprocessed_directorystr
The directory, where preprocessed data should be written to.
- extractor_filestr
The filename, where the extractor should be written to (if any).
- extracted_directorystr
The directory, where extracted features should be written to.
- projector_filestr
The filename, where the projector should be written to (if any).
- projected_directorystr
The directory, where projected features should be written to (if required).
- score_directories(str, str)
The directories, where score files for no-norm should be written to.
- default_extensionstr
The default extension of all intermediate files.
- compressed_extensionstr
The extension for writing compressed score files. By default, no compression is performed.
- database
-
class
bob.pad.base.tools.
PadDatabase
(name, protocol='Default', original_directory=None, original_extension=None, **kwargs)¶ Bases:
bob.bio.base.database.BioDatabase
This class represents the basic API for database access. Please use this class as a base class for your database access classes. Do not forget to call the constructor of this base class in your derived class.
Parameters:
name : str A unique name for the database.
protocol : str or
None
The name of the protocol that defines the default experimental setup for this database.original_directory : str The directory where the original data of the database are stored.
original_extension : str The file name extension of the original data.
kwargs :
key=value
pairs The arguments of thebob.bio.base.database.BioDatabase
base class constructor.-
all_files
(groups=('train', 'dev', 'eval'), flat=False)[source]¶ Returns all files of the database, respecting the current protocol. The files can be limited using the
all_files_options
in the constructor.- Parameters
- Returns
files – The sorted and unique list of all files of the database.
- Return type
-
abstract
annotations
(file)[source]¶ Returns the annotations for the given File object, if available. You need to override this method in your high-level implementation. If your database does not have annotations, it should return
None
.Parameters:
- file
bob.pad.base.database.PadFile
The file for which annotations should be returned.
Returns:
- annotsdict or None
The annotations for the file, if available.
- file
-
model_ids_with_protocol
(groups = None, protocol = None, **kwargs) → ids[source]¶ Client-based PAD is not implemented.
-
abstract
objects
(groups=None, protocol=None, purposes=None, model_ids=None, **kwargs)[source]¶ This function returns lists of File objects, which fulfill the given restrictions.
Keyword parameters:
- groupsstr or [str]
The groups of which the clients should be returned. Usually, groups are one or more elements of (‘train’, ‘dev’, ‘eval’)
- protocol
The protocol for which the clients should be retrieved. The protocol is dependent on your database. If you do not have protocols defined, just ignore this field.
- purposesstr or [str]
The purposes for which File objects should be retrieved. Usually it is either ‘real’ or ‘attack’.
- model_ids[various type]
This parameter is not supported in PAD databases yet
-
original_file_names
(files) → paths[source]¶ Returns the full paths of the real and attack data of the given PadFile objects.
Parameters:
- files[[
bob.pad.base.database.PadFile
], [bob.pad.base.database.PadFile
] The list of lists ([real, attack]) of file object to retrieve the original data file names for.
Returns:
- paths[str] or [[str]]
The paths extracted for the concatenated real+attack files, in the preserved order.
- files[[
-
training_files
(step = None, arrange_by_client = False) → files[source]¶ Returns all training File objects This function needs to be implemented in derived class implementations.
- Parameters:
The parameters are not applicable in this version of anti-spoofing experiments
Returns:
- files[
bob.pad.base.database.PadFile
] or [[bob.pad.base.database.PadFile
]] The (arranged) list of files used for the training.
-
-
bob.pad.base.tools.
command_line
(cmdline) → str[source]¶ Converts the given options to a string that can be executed in a terminal. Parameters are enclosed into
'...'
quotes so that the command line can interpret them (e.g., if they contain spaces or special characters).Parameters:
- cmdline[str]
A list of command line options to be converted into a string.
Returns:
- strstr
The command line string that can be copy-pasted into the terminal.
-
bob.pad.base.tools.
command_line_parser
(description=__doc__, exclude_resources_from=[]) → parsers[source]¶ Creates an
argparse.ArgumentParser
object that includes the minimum set of command line options (which is not so few). Thedescription
can be overwritten, but has a (small) default.Included in the parser, several groups are defined. Each group specifies a set of command line options. For the configurations, registered resources are listed, which can be limited by the
exclude_resources_from
list of extensions.It returns a dictionary, containing the parser object itself (in the
'main'
keyword), and a list of command line groups.Parameters:
- descriptionstr
The documentation of the script.
- exclude_resources_from[str]
A list of extension packages, for which resources should not be listed.
Returns:
- parsersdict
A dictionary of parser groups, with the main parser under the ‘main’ key. Feel free to add more options to any of the parser groups.
-
bob.pad.base.tools.
compute_scores
(algorithm, extractor, force=False, groups=['dev', 'eval'], allow_missing_files=False, write_compressed=False)[source]¶ Computes the scores for the given groups.
This function computes all scores for the experiment and writes them to score files. By default, scores are computed for both groups
'dev'
and'eval'
.Parameters:
- algorithmpy:class:bob.bio.base.algorithm.Algorithm or derived
The algorithm, used for enrolling model and writing them to file.
extractor : py:class:bob.bio.base.extractor.Extractor or derived
- forcebool
If given, files are regenerated, even if they already exist.
- groupssome of
('dev', 'eval')
The list of groups, for which scores should be computed.
- write_compressedbool
If enabled, score files are compressed as
.tar.bz2
files.
-
bob.pad.base.tools.
extract
(extractor, preprocessor, groups=None, indices=None, allow_missing_files=False, force=False)[source]¶ Extracts features from the preprocessed data using the given extractor.
The given
extractor
is used to extract all features required for the current experiment. It writes the extracted data into the directory specified by thebob.pad.base.tools.FileSelector
. By default, if target files already exist, they are not re-created.The preprocessor is only used to load the data in a coherent way.
Parameters:
- extractorpy:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for extracting and writing the features.
- preprocessorpy:class:bob.bio.base.preprocessor.Preprocessor or derived
The preprocessor, used for reading the preprocessed data.
- groupssome of
('train', 'dev', 'eval')
orNone
The list of groups, for which the data should be extracted.
- indices(int, int) or None
If specified, only the features for the given index range
range(begin, end)
should be extracted. This is usually given, when parallel threads are executed.- allow_missing_filesbool
If set to
True
, preprocessed data files that are not found are silently ignored.- forcebool
If given, files are regenerated, even if they already exist.
-
bob.pad.base.tools.
groups
(args) → groups[source]¶ Returns the groups, for which the files must be preprocessed, and features must be extracted and projected. This function should be used in order to eliminate the training files (the
'train'
group), when no training is required in this experiment.Parameters:
- argsnamespace
The interpreted command line arguments as returned by the
initialize()
function.
Returns:
- groups[str]
A list of groups, for which data needs to be treated.
-
bob.pad.base.tools.
initialize
(parsers, command_line_parameters = None, skips = []) → args[source]¶ Parses the command line and arranges the arguments accordingly. Afterward, it loads the resources for the database, preprocessor, extractor, algorithm and grid (if specified), and stores the results into the returned args.
This function also initializes the
FileSelector
instance by arranging the directories and files according to the command line parameters.If the
skips
are given, an ‘–execute-only’ parameter is added to the parser, according skips are selected.Parameters:
- parsersdict
The dictionary of command line parsers, as returned from
command_line_parser()
. Additional arguments might have been added.- command_line_parameters[str] or None
The command line parameters that should be interpreted. By default, the parameters specified by the user on command line are considered.
- skips[str]
A list of possible
--skip-...
options to be added and evaluated automatically.
Returns:
- argsnamespace
A namespace of arguments as read from the command line.
Note
The database, preprocessor, extractor, algorithm and grid (if specified) are actual instances of the according classes.
-
bob.pad.base.tools.
preprocess
(preprocessor, groups=None, indices=None, allow_missing_files=False, force=False)[source]¶ Preprocesses the original data of the database with the given preprocessor.
The given
preprocessor
is used to preprocess all data required for the current experiment. It writes the preprocessed data into the directory specified by thebob.pad.base.tools.FileSelector
. By default, if target files already exist, they are not re-created.Parameters:
- preprocessorpy:class:bob.bio.base.preprocessor.Preprocessor or derived
The preprocessor, which should be applied to all data.
- groupssome of
('train', 'dev', 'eval')
orNone
The list of groups, for which the data should be preprocessed.
- indices(int, int) or None
If specified, only the data for the given index range
range(begin, end)
should be preprocessed. This is usually given, when parallel threads are executed.- allow_missing_filesbool
If set to
True
, files for which the preprocessor returnsNone
are silently ignored.- forcebool
If given, files are regenerated, even if they already exist.
-
bob.pad.base.tools.
project
(algorithm, extractor, groups=None, indices=None, allow_missing_files=False, force=False)[source]¶ Projects the features for all files of the database.
The given
algorithm
is used to project all features required for the current experiment. It writes the projected data into the directory specified by thebob.pad.base.tools.FileSelector
. By default, if target files already exist, they are not re-created.The extractor is only used to load the data in a coherent way.
Parameters:
- algorithmpy:class:bob.pad.base.algorithm.Algorithm or derived
The algorithm, used for projecting features and writing them to file.
- extractorpy:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for reading the extracted features, which should be projected.
- groupssome of
('train', 'dev', 'eval')
orNone
The list of groups, for which the data should be projected.
- indices(int, int) or None
If specified, only the features for the given index range
range(begin, end)
should be projected. This is usually given, when parallel threads are executed.- forcebool
If given, files are regenerated, even if they already exist.
-
bob.pad.base.tools.
read_features
(file_names, extractor, split_by_client = False) → extracted[source]¶ Reads the extracted features from
file_names
using the givenextractor
. Ifsplit_by_client
is set toTrue
, it is assumed that thefile_names
are already sorted by client.Parameters:
- file_names[str] or [[str]]
A list of names of files to be read. If
split_by_client = True
, file names are supposed to be split into groups.- extractorpy:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for reading the extracted features.
- split_by_clientbool
Indicates if the given
file_names
are split into groups.- allow_missing_filesbool
If set to
True
, extracted files that are not found are silently ignored.
Returns:
- extracted[object] or [[object]]
The list of extracted features, in the same order as in the
file_names
.
-
bob.pad.base.tools.
read_preprocessed_data
(file_names, preprocessor, split_by_client = False) → preprocessed[source]¶ Reads the preprocessed data from
file_names
using the given preprocessor. Ifsplit_by_client
is set toTrue
, it is assumed that thefile_names
are already sorted by client.Parameters:
- file_names[str] or [[str]]
A list of names of files to be read. If
split_by_client = True
, file names are supposed to be split into groups.- preprocessorpy:class:bob.bio.base.preprocessor.Preprocessor or derived
The preprocessor, which can read the preprocessed data.
- split_by_clientbool
Indicates if the given
file_names
are split into groups.- allow_missing_filesbool
If set to
True
, preprocessed data files that are not found are silently ignored.
Returns:
- preprocessed[object] or [[object]]
The list of preprocessed data, in the same order as in the
file_names
.
-
bob.pad.base.tools.
train_extractor
(extractor, preprocessor, allow_missing_files=False, force=False)[source]¶ Trains the feature extractor using preprocessed data of the
'train'
group, if the feature extractor requires training.This function should only be called, when the
extractor
actually requires training. The givenextractor
is trained using preprocessed data. It writes the extractor to the file specified by thebob.pad.base.tools.FileSelector
. By default, if the target file already exist, it is not re-created.Parameters:
- extractorpy:class:bob.bio.base.extractor.Extractor or derived
The extractor to be trained.
- preprocessorpy:class:bob.bio.base.preprocessor.Preprocessor or derived
The preprocessor, used for reading the preprocessed data.
- allow_missing_filesbool
If set to
True
, preprocessed data files that are not found are silently ignored during training.- forcebool
If given, the extractor file is regenerated, even if it already exists.
-
bob.pad.base.tools.
train_projector
(algorithm, extractor, allow_missing_files=False, force=False)[source]¶ Trains the feature projector using extracted features of the
'train'
group, if the algorithm requires projector training.This function should only be called, when the
algorithm
actually requires projector training. The projector of the givenalgorithm
is trained using extracted features. It writes the projector to the file specified by thebob.pad.base.tools.FileSelector
. By default, if the target file already exist, it is not re-created.Parameters:
- algorithmpy:class:bob.pad.base.algorithm.Algorithm or derived
The algorithm, in which the projector should be trained.
- extractorpy:class:bob.bio.base.extractor.Extractor or derived
The extractor, used for reading the training data.
- forcebool
If given, the projector file is regenerated, even if it already exists.
-
bob.pad.base.tools.
write_info
(args, command_line_parameters, executable)[source]¶ Writes information about the current experimental setup into a file specified on command line.
Parameters:
- argsnamespace
The interpreted command line arguments as returned by the
initialize()
function.- command_line_parameters[str] or
None
The command line parameters that have been interpreted. If
None
, the parameters specified by the user on command line are considered.- executablestr
The name of the executable (such as
'./bin/spoof.py'
) that is used to run the experiments.