Python API¶
The YouTube Faces database protocol interface. Please refer to http://www.cs.tau.ac.il/~wolf/ytfaces for information how to get a copy of the original data.
Note
There has been errata data published for the database. These errata is not considered in the protocols (yet).
The YouTube database consists of 10 different splits, which are called “fold” here (to be consistent with the LFW database).
In each fold 9/10 of the database are used for training, and one for evaluation.
In this implementation of the YouTube protocols, up to 7/10 of the data is used for training (groups='world'
),
2/10 are used for development (to estimate a threshold; groups='dev'
) and the last 1/10 is finally used to evaluate the system (groups='eval'
).
To compute recognition results, please execute experiments on all 10 protocols (protocol='fold1'
... protocol='fold10'
)
and average the resulting classification results (cf. http://vis-www.cs.umass.edu/lfw for details on scoring).
The design of this implementation differs slightly compared to the one from http://www.cs.tau.ac.il/~wolf/ytfaces. Originally, only lists of image pairs are provided by the creators of the YouTube database. To be consistent with other Bob databases, here the lists are split up into files to be enrolled, and probe files. The files to be enrolled are always the first file in the pair, while the second pair item is used as probe.
Note
When querying probe files, please always query probe files for a specific model id: objects(..., purposes = 'probe', model_ids = (model_id,))
.
In this case, you will follow the default protocols given by the database.
When querying training files objects(..., groups='world')
, you will automatically end up with the “image restricted configuration”.
When you want to respect the “unrestricted configuration” (cf. README on http://vis-www.cs.umass.edu/lfw),
please query the files that belong to the pairs, via objects(..., groups='world', world_type='unrestricted')
If you want to stick to the original protocol and use only the pairs for training and testing, feel free to query the pairs
function.
Note
The pairs that are provided using the pairs
function, and the files provided by the objects
function (see note above) correspond to the identical model/probe pairs.
Hence, either of the two approaches should give the same recognition results.
-
class
bob.db.youtube.
Client
(id, name)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Information about the clients (identities) of the Youtube Faces database.
-
id
¶
-
metadata
= MetaData(bind=None)¶
-
name
¶
-
-
class
bob.db.youtube.
Database
(original_directory=None, original_extension='/*.jpg', annotation_extension='.labeled_faces.txt')[source]¶ Bases:
bob.db.verification.utils.database.SQLiteDatabase
The dataset class opens and maintains a connection opened to the Database.
It provides many different ways to probe for the characteristics of the data and for the data itself inside the database.
Keyword parameters
- original_directory : str
- The directory where the original images (and annotations) can be found
- original_extension : str
- The filename filter to find the orignal images in the database; rarely changed
- annotation_extension : str
- The filename extension of the annotation files; rarely changed
-
all_files
(**kwargs)[source]¶ Returns the list of all File objects that satisfy your query. For possible keyword arguments, please check the
objects()
function.
-
annotations
(directory, image_names=None)[source]¶ Returns the annotations for the given file id as a dictionary of dictionaries, e.g. {‘1.56.jpg’ : {‘topleft’:(y,x), ‘bottomright’:(y,x)}, ‘1.57.jpg’ : {‘topleft’:(y,x), ‘bottomright’:(y,x)}, ...}. Here, the key of the dictionary is the full image file name of the original image.
Keyword parameters:
- directory
- The
Directory
object for which you want to retrieve the annotations - image_names
- If given, only the annotations for the given image names (without path, but including filaname extension) are extracted and returned
-
assert_validity
()¶ Raise a RuntimeError if the database back-end is not available.
-
check_parameter_for_validity
(parameter, parameter_description, valid_parameters, default_parameter=None)[source]¶ Checks the given parameter for validity, i.e., if it is contained in the set of valid parameters. If the parameter is ‘None’ or empty, the default_parameter will be returned, in case it is specified, otherwise a ValueError will be raised.
This function will return the parameter after the check tuple or list of parameters, or raise a ValueError.
Keyword parameters:
- parameter : str
- The single parameter to be checked. Might be a string or None.
- parameter_description : str
- A short description of the parameter. This will be used to raise an exception in case the parameter is not valid.
- valid_parameters : [str]
- A list/tuple of valid values for the parameters.
- default_parameters : [str] or None
- The default parameter that will be returned in case parameter is None or empty. If omitted and parameter is empty, a ValueError is raised.
-
check_parameters_for_validity
(parameters, parameter_description, valid_parameters, default_parameters=None)[source]¶ Checks the given parameters for validity, i.e., if they are contained in the set of valid parameters. It also assures that the parameters form a tuple or a list. If parameters is ‘None’ or empty, the default_parameters will be returned (if default_parameters is omitted, all valid_parameters are returned).
This function will return a tuple or list of parameters, or raise a ValueError.
Keyword parameters:
- parameters : str, [str] or None
- The parameters to be checked. Might be a string, a list/tuple of strings, or None.
- parameter_description : str
- A short description of the parameter. This will be used to raise an exception in case the parameter is not valid.
- valid_parameters : [str]
- A list/tuple of valid values for the parameters.
- default_parameters : [str] or None
- The list/tuple of default parameters that will be returned in case parameters is None or empty. If omitted, all valid_parameters are used.
-
clients
(protocol=None, groups=None, subworld='sevenfolds', world_type='unrestricted')[source]¶ Returns a list of Client objects for the specific query by the user.
Keyword Parameters:
- protocol
- The protocol to consider; one of: (‘fold1’, ..., ‘fold10’), or None
- groups
- The groups to which the clients belong; one or several of: (‘world’, ‘dev’, ‘eval’)
- subworld
- The subset of the training data. Has to be specified if groups includes ‘world’ and protocol is one of ‘fold1’, ..., ‘fold10’. It might be exactly one of (‘onefolds’, ‘twofolds’, ..., ‘sevenfolds’). Ignored for group ‘dev’ and ‘eval’.
- world_type
- One of (‘restricted’, ‘unrestricted’). Ignored.
Returns: A list containing all Client objects which have the desired properties.
-
enroll_files
(protocol=None, model_id=None, groups='dev', **kwargs)[source]¶ Returns the list of enrollment File objects from the given model id of the given protocol for the given groups that satisfy your query. If the model_id is None (the default), enrollment files for all models are returned. For possible keyword arguments, please check the
objects()
function.
-
file_names
(files, directory, extension)[source]¶ This function returns the list of original file names for the given list of File objects.
Keyword parameters:
- files : [
File
] - The list of File objects for which the file names should be retrieved
- directory : str
- The base directory where the files are stored
- extension : str
- The file name extension of the files
- Return value : [str]
- The file names for the given File objects, in the same order.
- files : [
-
files
(ids, preserve_order=True)¶ Returns a list of
File
objects with the given file idsKeyword Parameters:
- ids : [various type]
- The ids of the object in the database table “file”. This object should be a python iterable (such as a tuple or list).
- preserve_order : bool
- If True (the default) the order of elements is preserved, but the execution time increases.
Returns a list (that may be empty) of
File
objects.
-
get_client_id_from_file_id
(file_id, **kwargs)[source]¶ Returns the client_id (real client id) attached to the given file_id
Keyword Parameters:
- file_id
- The file_id to consider
Returns: The client_id attached to the given file_id
-
get_client_id_from_model_id
(model_id, **kwargs)[source]¶ Returns the client_id (real client id) attached to the given model id
Keyword Parameters:
- model_id
- The model to consider
Returns: The client_id attached to the given model
-
is_valid
()¶ Returns if a valid session has been opened for reading the database.
-
model_ids
(protocol=None, groups=None)[source]¶ Returns a list of model ids for the specific query by the user. For the ‘dev’ and ‘eval’ groups, the first element of each pair is extracted.
Keyword Parameters:
- protocol
- The protocol to consider; one of: (‘fold1’, ..., ‘fold10’), or None
- groups
- The groups to which the clients belong; one or several of: (‘dev’, ‘eval’) The ‘eval’ group does not exist for protocol ‘view1’.
Returns: A list containing all model ids which have the desired properties.
-
models
(protocol=None, groups=None)[source]¶ Returns a list of Directory objects (there are multiple models per client) for the specific query by the user. For the ‘dev’ and ‘eval’ groups, the first element of each pair is extracted.
Keyword Parameters:
- protocol
- The protocol to consider; one of: (‘fold1’, ..., ‘fold10’), or None
- groups
- The groups to which the clients belong; one or several of: (‘dev’, ‘eval’)
Returns: A list containing all Directory objects which have the desired properties.
-
objects
(protocol=None, model_ids=None, groups=None, purposes=None, subworld='sevenfolds', world_type='unrestricted')[source]¶ Returns a list of Directory objects for the specific query by the user.
Keyword Parameters:
- protocol
- The protocol to consider (‘fold1’, ..., ‘fold10’), or None
- groups
- The groups to which the objects belong (‘world’, ‘dev’, ‘eval’)
- purposes
- The purposes of the objects (‘enroll’, ‘probe’)
- subworld
- The subset of the training data. Has to be specified if groups includes ‘world’ and protocol is one of ‘fold1’, ..., ‘fold10’. It might be exactly one of (‘onefolds’, ‘twofolds’, ..., ‘sevenfolds’).
- world_type
- One of (‘restricted’, ‘unrestricted’). If ‘restricted’, only the files that are used in one of the training pairs are used. For ‘unrestricted’, all files of the training people are returned.
- model_ids
- Only retrieves the objects for the provided list of model ids. If ‘None’ is given (this is the default), no filter over the model_ids is performed. Note that the combination of ‘world’ group and ‘model_ids’ should be avoided.
Returns: A list of Directory objects considering all the filtering criteria.
-
original_file_name
(directory, check_existence=None)[source]¶ Returns the list of original image names for the given
directory
, sorted by frame number. In opposition to other bob databases, here a list of file names is returned.Keyword arguments:
- directory :
bob.db.youtube.Directory
- The Directory object to retrieve the list of file names for
- check_existence : bool
- Shall the existence of the files be checked?
- directory :
-
original_file_names
(files, check_existence=True)[source]¶ This function returns the list of original file names for the given list of File objects.
Keyword parameters:
- files : [
File
] - The list of File objects for which the file names should be retrieved
- check_existence : bool
- Check if the original files exists?
- Return value : [str]
- The original file names for the given File objects, in the same order.
- files : [
-
pairs
(protocol=None, groups=None, classes=None, subworld='sevenfolds')[source]¶ Queries a list of Pair’s of files.
Keyword Parameters:
- protocol
- The protocol to consider (‘fold1’, ..., ‘fold10’)
- groups
- The groups to which the objects belong (‘world’, ‘dev’, ‘eval’)
- classes
- The classes to which the pairs belong (‘matched’, ‘unmatched’)
- subworld
- The subset of the training data. Has to be specified if groups includes ‘world’ and protocol is one of ‘fold1’, ..., ‘fold10’. It might be exactly one of (‘onefolds’, ‘twofolds’, ..., ‘sevenfolds’).
Returns: A list of Pair’s considering all the filtering criteria.
-
paths
(ids, prefix=None, suffix=None, preserve_order=True)¶ Returns a full file paths considering particular file ids, a given directory and an extension
Keyword Parameters:
- ids : [various type]
- The ids of the object in the database table “file”. This object should be a python iterable (such as a tuple or list).
- prefix : str or None
- The bit of path to be prepended to the filename stem
- suffix : str or None
- The extension determines the suffix that will be appended to the filename stem.
- preserve_order : bool
- If True (the default) the order of elements is preserved, but the execution time increases.
Returns a list (that may be empty) of the fully constructed paths given the file ids.
-
probe_files
(protocol=None, model_id=None, groups='dev', **kwargs)[source]¶ Returns the list of probe File objects to probe the model with the given model id of the given protocol for the given groups that satisfy your query. If the model_id is None (the default), all possible probe files are returned. For possible keyword arguments, please check the
objects()
function.
-
provides_file_set_for_protocol
(protocol=None)[source]¶ Returns True if the given protocol specifies file sets for probes, instead of a single probe file. In this default implementation, False is returned, throughout. If you need different behavior, please overload this function in your derived class.
-
query
(*args)¶ Creates a query to the database using the given arguments.
-
reverse
(paths, preserve_order=True)¶ Reverses the lookup: from certain paths, return a list of File objects
Keyword Parameters:
- paths : [str]
- The filename stems to query for. This object should be a python iterable (such as a tuple or list)
- preserve_order : True
- If True (the default) the order of elements is preserved, but the execution time increases.
Returns a list (that may be empty).
-
test_files
(protocol=None, groups='dev', **kwargs)[source]¶ Returns the list of all test File objects of the given groups that satisfy your query. Test objects are all File objects that serve either for enrollment or probing. For possible keyword arguments, please check the
objects()
function.
-
tmodel_ids
(protocol, groups=None)[source]¶ Returns a list of T-Norm model ids that can be used for ZT norm. In fact, it uses the model ids from two other splits of the data, specifically, the last two of the training splits. Hence, to get training data independent from ZT-Norm data, use maximum subworld=’fivefolds’ in the world query.
Keyword Parameters:
- protocol
- The protocol to consider; one of: (‘fold1’, ..., ‘fold10’), or None
- groups
- Ignored.
Returns: A list containing all Directory objects which have the desired properties.
-
tmodels
(protocol=None, groups=None)[source]¶ Returns a list of T-Norm models that can be used for ZT norm. In fact, it uses the model ids from two other splits of the data, specifically, the last two of the training splits. Hence, to get training data independent from ZT-Norm data, use maximum subworld=’fivefolds’ in the world query.
Keyword Parameters:
- protocol
- The protocol to consider; one of: (‘fold1’, ..., ‘fold10’), or None
- groups
- Ignored.
Returns: A list containing all Directory objects which have the desired properties.
-
tobjects
(protocol, model_ids=None, groups=None)[source]¶ - Returns a set of filenames for enrolling T-norm models for score
- normalization.
Keyword Parameters:
- protocol
- The protocol to consider (‘fold1’, ..., ‘fold10’), or None
- model_ids
- Only retrieves the files for the provided list of model ids. If ‘None’ is given (this is the default), no filter over the model_ids is performed.
- groups
- Ignored.
Returns: A set of Directory objects with the given properties.
-
training_files
(protocol=None, **kwargs)[source]¶ Returns the list of all training (world) File objects that satisfy your query. For possible keyword arguments, please check the
objects()
function.
-
uniquify
(file_list)[source]¶ Sorts the given list of File objects and removes duplicates from it.
Keyword parameters:
- file_list : [
File
] - A list of File objects to be handled. Also other objects can be handled, as long as they are sortable.
- Returns
- A sorted copy of the given
file_list
with the duplicates removed.
- file_list : [
-
zobjects
(protocol, model_ids=None, groups=None)[source]¶ - Returns a set of filenames for Z-norm probing for score
- normalization.
Keyword Parameters:
- protocol
- The protocol to consider (‘fold1’, ..., ‘fold10’), or None
- model_ids
- Only retrieves the files for the provided list of model ids. If ‘None’ is given (this is the default), no filter over the model_ids is performed.
- groups
- Ignored.
Returns: A set of Directory objects with the given properties.
-
class
bob.db.youtube.
Directory
(file_id, client_id, path)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
,bob.db.verification.utils.file.File
Information about the directories of the Youtube Faces database.
-
client
¶
-
client_id
¶
-
id
¶
-
make_path
(directory=None, extension=None)¶ Wraps the current path so that a complete path is formed
Keyword parameters:
- directory : str or None
- An optional directory name that will be prefixed to the returned result.
- extension : str or None
- An optional extension that will be suffixed to the returned filename.
The extension normally includes the leading
.
character as in.jpg
or.hdf5
.
Returns a string containing the newly generated file path.
-
metadata
= MetaData(bind=None)¶
-
path
¶
-
save
(data, directory=None, extension='.hdf5', create_directories=True)¶ Saves the input data at the specified location and using the given extension.
Keyword parameters:
- data : various types
- The data blob to be saved (normally a
numpy.ndarray
). - directory : str or None
- If not empty or None, this directory is prefixed to the final file destination
- extension : str or None
- The extension of the filename. This extension will control the type of output and the codec for saving the input blob.
- create_directories : bool
- Should the directory structure be created (if necessary) before writing the data?
-
shot_id
¶
-
-
class
bob.db.youtube.
Pair
(protocol, enroll_id, probe_id, enroll_client_id, probe_client_id, is_match)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Information of the pairs (as given in the pairs.txt files) of the LFW database.
-
enroll_client
¶
-
enroll_client_id
¶
-
enroll_directory
¶
-
enroll_directory_id
¶
-
id
¶
-
is_match
¶
-
metadata
= MetaData(bind=None)¶
-
probe_client
¶
-
probe_client_id
¶
-
probe_directory
¶
-
probe_directory_id
¶
-
protocol
¶
-