Python API for bob.pad.base¶
Generic functions¶
Tools to run PAD experiments¶
Command line generation¶
bob.pad.base.tools.command_line_parser (…) |
Creates an argparse.ArgumentParser object that includes the minimum set of command line options (which is not so few). |
bob.pad.base.tools.initialize ((parsers, …) |
Parses the command line and arranges the arguments accordingly. |
bob.pad.base.tools.command_line ((cmdline) -> str) |
Converts the given options to a string that can be executed in a terminal. |
bob.pad.base.tools.write_info (args, …) |
Writes information about the current experimental setup into a file specified on command line. |
bob.pad.base.tools.FileSelector |
This class provides shortcuts for selecting different files for different stages of the snti-spoofing process. |
Algorithm¶
bob.pad.base.tools.train_projector (…[, …]) |
Trains the feature projector using extracted features of the 'train' group, if the algorithm requires projector training. |
bob.pad.base.tools.project (algorithm, extractor) |
Projects the features for all files of the database. |
Scoring¶
bob.bio.base.tools.compute_scores (algorithm, …) |
Computes the scores for the given groups. |
Details¶
-
bob.pad.base.
padfile_to_label
(padfile)[source]¶ Returns an integer presenting the label of the current sample.
Parameters: padfile ( bob.pad.base.database.PadFile
) – A pad file.Returns: True (1) if it is a bona-fide sample, False (O) otherwise. Return type: bool
-
bob.pad.base.
combinations
(input_dict)[source]¶ Obtain all possible key-value combinations in the input dictionary containing list values.
Parameters:
input_dict
:dict
- Input dictionary with list values.
Returns:
combinations
: [dict
]- A list of dictionaries containing the combinations.
-
bob.pad.base.
convert_and_prepare_features
(features)[source]¶ This function converts a list or a frame container of features into a 2D array of features. If the input is a list of frame containers, features from different frame containers (individuals) are concatenated into the same list. This list is then converted to an array. The rows are samples, the columns are features.
Parameters:
features
: [2Dnumpy.ndarray
] or [FrameContainer]- A list or 2D feature arrays or a list of Frame Containers, see
bob.bio.video.utils.FrameContainer
. Each frame Container contains feature vectors for the particular individual/person.
Returns:
features_array
: 2Dnumpy.ndarray
- An array containing features for all samples and frames.
-
bob.pad.base.
convert_array_to_list_of_frame_cont
(data)[source]¶ Convert an input 2D array to a list of FrameContainers.
Parameters:
data
: 2Dnumpy.ndarray
Input data array of the dimensionality (N_samples X N_features ).
Returns:
frame_container_list
: [FrameContainer]- A list of FrameContainers, see
bob.bio.video.utils.FrameContainer
for further details. Each frame container contains one feature vector.
-
bob.pad.base.
convert_frame_cont_to_array
(frame_container)[source]¶ This function converts a single Frame Container into an array of features. The rows are samples, the columns are features.
Parameters:
frame_container
: object- A Frame Container conteining the features of an individual,
see
bob.bio.video.utils.FrameContainer
.
Returns:
features_array
: 2Dnumpy.ndarray
- An array containing features for all frames. The rows are samples, the columns are features.
-
bob.pad.base.
convert_list_of_frame_cont_to_array
(frame_containers)[source]¶ This function converts a list of Frame containers into an array of features. Features from different frame containers (individuals) are concatenated into the same list. This list is then converted to an array. The rows are samples, the columns are features.
Parameters:
frame_containers
: [FrameContainer]- A list of Frame Containers, , see
bob.bio.video.utils.FrameContainer
. Each frame Container contains feature vectors for the particular individual/person.
Returns:
features_array
: 2Dnumpy.ndarray
- An array containing features for all frames of all individuals.
-
bob.pad.base.
mean_std_normalize
(features, features_mean=None, features_std=None)[source]¶ The features in the input 2D array are mean-std normalized. The rows are samples, the columns are features. If
features_mean
andfeatures_std
are provided, then these vectors will be used for normalization. Otherwise, the mean and std of the features is computed on the fly.Parameters:
features
: 2Dnumpy.ndarray
- Array of features to be normalized.
features_mean
: 1Dnumpy.ndarray
- Mean of the features. Default: None.
features_std
: 2Dnumpy.ndarray
- Standart deviation of the features. Default: None.
Returns:
features_norm
: 2Dnumpy.ndarray
- Normalized array of features.
features_mean
: 1Dnumpy.ndarray
- Mean of the features.
features_std
: 1Dnumpy.ndarray
- Standart deviation of the features.
-
bob.pad.base.
norm_train_cv_data
(real_train, real_cv, attack_train, attack_cv, one_class_flag=False)[source]¶ Mean-std normalization of train and cross-validation data arrays.
Parameters:
real_train
: 2Dnumpy.ndarray
- Subset of train features for the real class.
real_cv
: 2Dnumpy.ndarray
- Subset of cross-validation features for the real class.
attack_train
: 2Dnumpy.ndarray
- Subset of train features for the attack class.
attack_cv
: 2Dnumpy.ndarray
- Subset of cross-validation features for the attack class.
one_class_flag
:bool
- If set to
True
, only positive/real samples will be used to compute the mean and std normalization vectors. Set toTrue
if using one-class SVM. Default: False.
Returns:
real_train_norm
: 2Dnumpy.ndarray
- Normalized subset of train features for the real class.
real_cv_norm
: 2Dnumpy.ndarray
- Normalized subset of cross-validation features for the real class.
attack_train_norm
: 2Dnumpy.ndarray
- Normalized subset of train features for the attack class.
attack_cv_norm
: 2Dnumpy.ndarray
- Normalized subset of cross-validation features for the attack class.
-
bob.pad.base.
norm_train_data
(real, attack)[source]¶ Mean-std normalization of input data arrays. The mean and std normalizers are computed using real class only.
Parameters:
real
: 2Dnumpy.ndarray
- Training features for the real class.
attack
: 2Dnumpy.ndarray
- Training features for the attack class.
Returns:
real_norm
: 2Dnumpy.ndarray
- Mean-std normalized training features for the real class.
attack_norm
: 2Dnumpy.ndarray
- Mean-std normalized training features for the attack class.
Or an empty list if
one_class_flag = True
. features_mean
: 1Dnumpy.ndarray
- Mean of the features.
features_std
: 1Dnumpy.ndarray
- Standart deviation of the features.
-
bob.pad.base.
prepare_data_for_hyper_param_grid_search
(training_features, n_samples)[source]¶ This function converts a list of all training features returned by
read_features
method of the extractor to the subsampled train and cross-validation arrays for both real and attack classes.Parameters:
training_features
: [[FrameContainer], [FrameContainer]]- A list containing two elements: [0] - a list of Frame Containers with feature vectors for the real class; [1] - a list of Frame Containers with feature vectors for the attack class.
n_samples
:int
- Number of uniformly selected feature vectors per class.
Returns:
real_train
: 2Dnumpy.ndarray
- Selected subset of train features for the real class. The number of samples in this set is n_samples/2, which is defined by split_data_to_train_cv method of this class.
real_cv
: 2Dnumpy.ndarray
- Selected subset of cross-validation features for the real class. The number of samples in this set is n_samples/2, which is defined by split_data_to_train_cv method of this class.
attack_train
: 2Dnumpy.ndarray
- Selected subset of train features for the attack class. The number of samples in this set is n_samples/2, which is defined by split_data_to_train_cv method of this class.
attack_cv
: 2Dnumpy.ndarray
- Selected subset of cross-validation features for the attack class. The number of samples in this set is n_samples/2, which is defined by split_data_to_train_cv method of this class.
-
bob.pad.base.
select_quasi_uniform_data_subset
(features, n_samples)[source]¶ Select quasi uniformly N samples/feature vectors from the input array of samples. The rows in the input array are samples. The columns are features. Use this function if n_samples is close to the number of samples.
Parameters:
features
: 2Dnumpy.ndarray
- Input array with feature vectors. The rows are samples, columns are features.
n_samples
:int
- The number of samples to be selected uniformly from the input array of features.
Returns:
features_subset
: 2Dnumpy.ndarray
- Selected subset of features.
-
bob.pad.base.
select_uniform_data_subset
(features, n_samples)[source]¶ Uniformly select N samples/feature vectors from the input array of samples. The rows in the input array are samples. The columns are features.
Parameters:
features
: 2Dnumpy.ndarray
- Input array with feature vectors. The rows are samples, columns are features.
n_samples
:int
- The number of samples to be selected uniformly from the input array of features.
Returns:
features_subset
: 2Dnumpy.ndarray
- Selected subset of features.
-
bob.pad.base.
split_data_to_train_cv
(features)[source]¶ This function is designed to split the input array of features into two subset namely train and cross-validation. These subsets can be used to tune the hyper-parameters of the SVM. The splitting is 50/50, the first half of the samples in the input are selected to be train set, and the second half of samples is cross-validation.
Parameters:
features
: 2Dnumpy.ndarray
- Input array with feature vectors. The rows are samples, columns are features.
Returns:
features_train
: 2Dnumpy.ndarray
- Selected subset of train features.
features_cv
: 2Dnumpy.ndarray
- Selected subset of cross-validation features.
-
class
bob.pad.base.tools.
FileSelector
(decorated)[source]¶ This class provides shortcuts for selecting different files for different stages of the snti-spoofing process.
It communicates with the database and provides lists of file names for all steps of the tool chain.
Parameters:
- database :
bob.pad.base.database.PadDatabase
or derived. - The database object that provides the list of files.
- preprocessed_directory : str
- The directory, where preprocessed data should be written to.
- extractor_file : str
- The filename, where the extractor should be written to (if any).
- extracted_directory : str
- The directory, where extracted features should be written to.
- projector_file : str
- The filename, where the projector should be written to (if any).
- projected_directory : str
- The directory, where projected features should be written to (if required).
- score_directories : (str, str)
- The directories, where score files for no-norm should be written to.
- default_extension : str
- The default extension of all intermediate files.
- compressed_extension : str
- The extension for writing compressed score files. By default, no compression is performed.
- database :
-
class
bob.pad.base.tools.
PadDatabase
(name, protocol='Default', original_directory=None, original_extension=None, **kwargs)¶ Bases:
bob.bio.base.database.BioDatabase
This class represents the basic API for database access. Please use this class as a base class for your database access classes. Do not forget to call the constructor of this base class in your derived class.
Parameters:
name : str A unique name for the database.
protocol : str or
None
The name of the protocol that defines the default experimental setup for this database.original_directory : str The directory where the original data of the database are stored.
original_extension : str The file name extension of the original data.
kwargs :
key=value
pairs The arguments of thebob.bio.base.database.BioDatabase
base class constructor.-
all_files
(groups=('train', 'dev', 'eval'), flat=False)[source]¶ Returns all files of the database, respecting the current protocol. The files can be limited using the
all_files_options
in the constructor.Parameters: Returns: files – The sorted and unique list of all files of the database.
Return type:
-
annotations
(file)[source]¶ Returns the annotations for the given File object, if available. You need to override this method in your high-level implementation. If your database does not have annotations, it should return
None
.Parameters:
- file :
bob.pad.base.database.PadFile
- The file for which annotations should be returned.
Returns:
- annots : dict or None
- The annotations for the file, if available.
- file :
-
model_ids_with_protocol
(groups = None, protocol = None, **kwargs) → ids[source]¶ Client-based PAD is not implemented.
-
objects
(groups=None, protocol=None, purposes=None, model_ids=None, **kwargs)[source]¶ This function returns lists of File objects, which fulfill the given restrictions.
Keyword parameters:
- groups : str or [str]
- The groups of which the clients should be returned. Usually, groups are one or more elements of (‘train’, ‘dev’, ‘eval’)
- protocol
- The protocol for which the clients should be retrieved. The protocol is dependent on your database. If you do not have protocols defined, just ignore this field.
- purposes : str or [str]
- The purposes for which File objects should be retrieved. Usually it is either ‘real’ or ‘attack’.
- model_ids : [various type]
- This parameter is not supported in PAD databases yet
-
original_file_names
(files) → paths[source]¶ Returns the full paths of the real and attack data of the given PadFile objects.
Parameters:
- files : [[
bob.pad.base.database.PadFile
], [bob.pad.base.database.PadFile
] - The list of lists ([real, attack]) of file object to retrieve the original data file names for.
Returns:
- paths : [str] or [[str]]
- The paths extracted for the concatenated real+attack files, in the preserved order.
- files : [[
-
training_files
(step = None, arrange_by_client = False) → files[source]¶ Returns all training File objects This function needs to be implemented in derived class implementations.
- Parameters:
- The parameters are not applicable in this version of anti-spoofing experiments
Returns:
- files : [
bob.pad.base.database.PadFile
] or [[bob.pad.base.database.PadFile
]] - The (arranged) list of files used for the training.
-
-
bob.pad.base.tools.
command_line
(cmdline) → str[source]¶ Converts the given options to a string that can be executed in a terminal. Parameters are enclosed into
'...'
quotes so that the command line can interpret them (e.g., if they contain spaces or special characters).Parameters:
- cmdline : [str]
- A list of command line options to be converted into a string.
Returns:
- str : str
- The command line string that can be copy-pasted into the terminal.
-
bob.pad.base.tools.
command_line_parser
(description=__doc__, exclude_resources_from=[]) → parsers[source]¶ Creates an
argparse.ArgumentParser
object that includes the minimum set of command line options (which is not so few). Thedescription
can be overwritten, but has a (small) default.Included in the parser, several groups are defined. Each group specifies a set of command line options. For the configurations, registered resources are listed, which can be limited by the
exclude_resources_from
list of extensions.It returns a dictionary, containing the parser object itself (in the
'main'
keyword), and a list of command line groups.Parameters:
- description : str
- The documentation of the script.
- exclude_resources_from : [str]
- A list of extension packages, for which resources should not be listed.
Returns:
- parsers : dict
- A dictionary of parser groups, with the main parser under the ‘main’ key. Feel free to add more options to any of the parser groups.
-
bob.pad.base.tools.
compute_scores
(algorithm, force=False, groups=['dev', 'eval'], allow_missing_files=False, write_compressed=False)[source]¶ Computes the scores for the given groups.
This function computes all scores for the experiment and writes them to score files. By default, scores are computed for both groups
'dev'
and'eval'
.Parameters:
- algorithm : py:class:bob.bio.base.algorithm.Algorithm or derived
- The algorithm, used for enrolling model and writing them to file.
- force : bool
- If given, files are regenerated, even if they already exist.
- groups : some of
('dev', 'eval')
- The list of groups, for which scores should be computed.
- write_compressed : bool
- If enabled, score files are compressed as
.tar.bz2
files.
-
bob.pad.base.tools.
extract
(extractor, preprocessor, groups=None, indices=None, allow_missing_files=False, force=False)[source]¶ Extracts features from the preprocessed data using the given extractor.
The given
extractor
is used to extract all features required for the current experiment. It writes the extracted data into the directory specified by thebob.pad.base.tools.FileSelector
. By default, if target files already exist, they are not re-created.The preprocessor is only used to load the data in a coherent way.
Parameters:
- extractor : py:class:bob.bio.base.extractor.Extractor or derived
- The extractor, used for extracting and writing the features.
- preprocessor : py:class:bob.bio.base.preprocessor.Preprocessor or derived
- The preprocessor, used for reading the preprocessed data.
- groups : some of
('train', 'dev', 'eval')
orNone
- The list of groups, for which the data should be extracted.
- indices : (int, int) or None
- If specified, only the features for the given index range
range(begin, end)
should be extracted. This is usually given, when parallel threads are executed. - allow_missing_files : bool
- If set to
True
, preprocessed data files that are not found are silently ignored. - force : bool
- If given, files are regenerated, even if they already exist.
-
bob.pad.base.tools.
groups
(args) → groups[source]¶ Returns the groups, for which the files must be preprocessed, and features must be extracted and projected. This function should be used in order to eliminate the training files (the
'train'
group), when no training is required in this experiment.Parameters:
- args : namespace
- The interpreted command line arguments as returned by the
initialize()
function.
Returns:
- groups : [str]
- A list of groups, for which data needs to be treated.
-
bob.pad.base.tools.
initialize
(parsers, command_line_parameters = None, skips = []) → args[source]¶ Parses the command line and arranges the arguments accordingly. Afterward, it loads the resources for the database, preprocessor, extractor, algorithm and grid (if specified), and stores the results into the returned args.
This function also initializes the
FileSelector
instance by arranging the directories and files according to the command line parameters.If the
skips
are given, an ‘–execute-only’ parameter is added to the parser, according skips are selected.Parameters:
- parsers : dict
- The dictionary of command line parsers, as returned from
command_line_parser()
. Additional arguments might have been added. - command_line_parameters : [str] or None
- The command line parameters that should be interpreted. By default, the parameters specified by the user on command line are considered.
- skips : [str]
- A list of possible
--skip-...
options to be added and evaluated automatically.
Returns:
- args : namespace
- A namespace of arguments as read from the command line.
Note
The database, preprocessor, extractor, algorithm and grid (if specified) are actual instances of the according classes.
-
bob.pad.base.tools.
preprocess
(preprocessor, groups=None, indices=None, allow_missing_files=False, force=False)[source]¶ Preprocesses the original data of the database with the given preprocessor.
The given
preprocessor
is used to preprocess all data required for the current experiment. It writes the preprocessed data into the directory specified by thebob.pad.base.tools.FileSelector
. By default, if target files already exist, they are not re-created.Parameters:
- preprocessor : py:class:bob.bio.base.preprocessor.Preprocessor or derived
- The preprocessor, which should be applied to all data.
- groups : some of
('train', 'dev', 'eval')
orNone
- The list of groups, for which the data should be preprocessed.
- indices : (int, int) or None
- If specified, only the data for the given index range
range(begin, end)
should be preprocessed. This is usually given, when parallel threads are executed. - allow_missing_files : bool
- If set to
True
, files for which the preprocessor returnsNone
are silently ignored. - force : bool
- If given, files are regenerated, even if they already exist.
-
bob.pad.base.tools.
project
(algorithm, extractor, groups=None, indices=None, allow_missing_files=False, force=False)[source]¶ Projects the features for all files of the database.
The given
algorithm
is used to project all features required for the current experiment. It writes the projected data into the directory specified by thebob.pad.base.tools.FileSelector
. By default, if target files already exist, they are not re-created.The extractor is only used to load the data in a coherent way.
Parameters:
- algorithm : py:class:bob.pad.base.algorithm.Algorithm or derived
- The algorithm, used for projecting features and writing them to file.
- extractor : py:class:bob.bio.base.extractor.Extractor or derived
- The extractor, used for reading the extracted features, which should be projected.
- groups : some of
('train', 'dev', 'eval')
orNone
- The list of groups, for which the data should be projected.
- indices : (int, int) or None
- If specified, only the features for the given index range
range(begin, end)
should be projected. This is usually given, when parallel threads are executed. - force : bool
- If given, files are regenerated, even if they already exist.
-
bob.pad.base.tools.
read_features
(file_names, extractor, split_by_client = False) → extracted[source]¶ Reads the extracted features from
file_names
using the givenextractor
. Ifsplit_by_client
is set toTrue
, it is assumed that thefile_names
are already sorted by client.Parameters:
- file_names : [str] or [[str]]
- A list of names of files to be read.
If
split_by_client = True
, file names are supposed to be split into groups. - extractor : py:class:bob.bio.base.extractor.Extractor or derived
- The extractor, used for reading the extracted features.
- split_by_client : bool
- Indicates if the given
file_names
are split into groups. - allow_missing_files : bool
- If set to
True
, extracted files that are not found are silently ignored.
Returns:
- extracted : [object] or [[object]]
- The list of extracted features, in the same order as in the
file_names
.
-
bob.pad.base.tools.
read_preprocessed_data
(file_names, preprocessor, split_by_client = False) → preprocessed[source]¶ Reads the preprocessed data from
file_names
using the given preprocessor. Ifsplit_by_client
is set toTrue
, it is assumed that thefile_names
are already sorted by client.Parameters:
- file_names : [str] or [[str]]
- A list of names of files to be read.
If
split_by_client = True
, file names are supposed to be split into groups. - preprocessor : py:class:bob.bio.base.preprocessor.Preprocessor or derived
- The preprocessor, which can read the preprocessed data.
- split_by_client : bool
- Indicates if the given
file_names
are split into groups. - allow_missing_files : bool
- If set to
True
, preprocessed data files that are not found are silently ignored.
Returns:
- preprocessed : [object] or [[object]]
- The list of preprocessed data, in the same order as in the
file_names
.
-
bob.pad.base.tools.
train_extractor
(extractor, preprocessor, allow_missing_files=False, force=False)[source]¶ Trains the feature extractor using preprocessed data of the
'train'
group, if the feature extractor requires training.This function should only be called, when the
extractor
actually requires training. The givenextractor
is trained using preprocessed data. It writes the extractor to the file specified by thebob.pad.base.tools.FileSelector
. By default, if the target file already exist, it is not re-created.Parameters:
- extractor : py:class:bob.bio.base.extractor.Extractor or derived
- The extractor to be trained.
- preprocessor : py:class:bob.bio.base.preprocessor.Preprocessor or derived
- The preprocessor, used for reading the preprocessed data.
- allow_missing_files : bool
- If set to
True
, preprocessed data files that are not found are silently ignored during training. - force : bool
- If given, the extractor file is regenerated, even if it already exists.
-
bob.pad.base.tools.
train_projector
(algorithm, extractor, allow_missing_files=False, force=False)[source]¶ Trains the feature projector using extracted features of the
'train'
group, if the algorithm requires projector training.This function should only be called, when the
algorithm
actually requires projector training. The projector of the givenalgorithm
is trained using extracted features. It writes the projector to the file specified by thebob.pad.base.tools.FileSelector
. By default, if the target file already exist, it is not re-created.Parameters:
- algorithm : py:class:bob.pad.base.algorithm.Algorithm or derived
- The algorithm, in which the projector should be trained.
- extractor : py:class:bob.bio.base.extractor.Extractor or derived
- The extractor, used for reading the training data.
- force : bool
- If given, the projector file is regenerated, even if it already exists.
-
bob.pad.base.tools.
write_info
(args, command_line_parameters, executable)[source]¶ Writes information about the current experimental setup into a file specified on command line.
Parameters:
- args : namespace
- The interpreted command line arguments as returned by the
initialize()
function. - command_line_parameters : [str] or
None
- The command line parameters that have been interpreted.
If
None
, the parameters specified by the user on command line are considered. - executable : str
- The name of the executable (such as
'./bin/spoof.py'
) that is used to run the experiments.