Python API

The YouTube Faces database protocol interface. Please refer to http://www.cs.tau.ac.il/~wolf/ytfaces for information how to get a copy of the original data.

Note

There has been errata data published for the database. These errata is not considered in the protocols (yet).

The YouTube database consists of 10 different splits, which are called “fold” here (to be consistent with the LFW database). In each fold 9/10 of the database are used for training, and one for evaluation. In this implementation of the YouTube protocols, up to 7/10 of the data is used for training (groups='world'), 2/10 are used for development (to estimate a threshold; groups='dev') and the last 1/10 is finally used to evaluate the system (groups='eval').

To compute recognition results, please execute experiments on all 10 protocols (protocol='fold1' ... protocol='fold10') and average the resulting classification results (cf. http://vis-www.cs.umass.edu/lfw for details on scoring).

The design of this implementation differs slightly compared to the one from http://www.cs.tau.ac.il/~wolf/ytfaces. Originally, only lists of image pairs are provided by the creators of the YouTube database. To be consistent with other Bob databases, here the lists are split up into files to be enrolled, and probe files. The files to be enrolled are always the first file in the pair, while the second pair item is used as probe.

Note

When querying probe files, please always query probe files for a specific model id: objects(..., purposes = 'probe', model_ids = (model_id,)). In this case, you will follow the default protocols given by the database.

When querying training files objects(..., groups='world'), you will automatically end up with the “image restricted configuration”. When you want to respect the “unrestricted configuration” (cf. README on http://vis-www.cs.umass.edu/lfw), please query the files that belong to the pairs, via objects(..., groups='world', world_type='unrestricted')

If you want to stick to the original protocol and use only the pairs for training and testing, feel free to query the pairs function.

Note

The pairs that are provided using the pairs function, and the files provided by the objects function (see note above) correspond to the identical model/probe pairs. Hence, either of the two approaches should give the same recognition results.

bob.db.youtube.get_config()[source]

Returns a string containing the configuration information.

class bob.db.youtube.Client(id, name)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Information about the clients (identities) of the Youtube Faces database.

id
metadata = MetaData(bind=None)
name
class bob.db.youtube.Database(original_directory=None, original_extension='/*.jpg', annotation_extension='.labeled_faces.txt')[source]

Bases: bob.db.verification.utils.database.SQLiteDatabase

The dataset class opens and maintains a connection opened to the Database.

It provides many different ways to probe for the characteristics of the data and for the data itself inside the database.

Keyword parameters

original_directory : str
The directory where the original images (and annotations) can be found
original_extension : str
The filename filter to find the orignal images in the database; rarely changed
annotation_extension : str
The filename extension of the annotation files; rarely changed
all_files(**kwargs)[source]

Returns the list of all File objects that satisfy your query. For possible keyword arguments, please check the objects() function.

annotations(directory, image_names=None)[source]

Returns the annotations for the given file id as a dictionary of dictionaries, e.g. {‘1.56.jpg’ : {‘topleft’:(y,x), ‘bottomright’:(y,x)}, ‘1.57.jpg’ : {‘topleft’:(y,x), ‘bottomright’:(y,x)}, ...}. Here, the key of the dictionary is the full image file name of the original image.

Keyword parameters:

directory
The Directory object for which you want to retrieve the annotations
image_names
If given, only the annotations for the given image names (without path, but including filaname extension) are extracted and returned
assert_validity()

Raise a RuntimeError if the database back-end is not available.

check_parameter_for_validity(parameter, parameter_description, valid_parameters, default_parameter=None)[source]

Checks the given parameter for validity, i.e., if it is contained in the set of valid parameters. If the parameter is ‘None’ or empty, the default_parameter will be returned, in case it is specified, otherwise a ValueError will be raised.

This function will return the parameter after the check tuple or list of parameters, or raise a ValueError.

Keyword parameters:

parameter : str
The single parameter to be checked. Might be a string or None.
parameter_description : str
A short description of the parameter. This will be used to raise an exception in case the parameter is not valid.
valid_parameters : [str]
A list/tuple of valid values for the parameters.
default_parameters : [str] or None
The default parameter that will be returned in case parameter is None or empty. If omitted and parameter is empty, a ValueError is raised.
check_parameters_for_validity(parameters, parameter_description, valid_parameters, default_parameters=None)[source]

Checks the given parameters for validity, i.e., if they are contained in the set of valid parameters. It also assures that the parameters form a tuple or a list. If parameters is ‘None’ or empty, the default_parameters will be returned (if default_parameters is omitted, all valid_parameters are returned).

This function will return a tuple or list of parameters, or raise a ValueError.

Keyword parameters:

parameters : str, [str] or None
The parameters to be checked. Might be a string, a list/tuple of strings, or None.
parameter_description : str
A short description of the parameter. This will be used to raise an exception in case the parameter is not valid.
valid_parameters : [str]
A list/tuple of valid values for the parameters.
default_parameters : [str] or None
The list/tuple of default parameters that will be returned in case parameters is None or empty. If omitted, all valid_parameters are used.
clients(protocol=None, groups=None, subworld='sevenfolds', world_type='unrestricted')[source]

Returns a list of Client objects for the specific query by the user.

Keyword Parameters:

protocol
The protocol to consider; one of: (‘fold1’, ..., ‘fold10’), or None
groups
The groups to which the clients belong; one or several of: (‘world’, ‘dev’, ‘eval’)
subworld
The subset of the training data. Has to be specified if groups includes ‘world’ and protocol is one of ‘fold1’, ..., ‘fold10’. It might be exactly one of (‘onefolds’, ‘twofolds’, ..., ‘sevenfolds’). Ignored for group ‘dev’ and ‘eval’.
world_type
One of (‘restricted’, ‘unrestricted’). Ignored.

Returns: A list containing all Client objects which have the desired properties.

enroll_files(protocol=None, model_id=None, groups='dev', **kwargs)[source]

Returns the list of enrollment File objects from the given model id of the given protocol for the given groups that satisfy your query. If the model_id is None (the default), enrollment files for all models are returned. For possible keyword arguments, please check the objects() function.

file_names(files, directory, extension)[source]

This function returns the list of original file names for the given list of File objects.

Keyword parameters:

files : [File]
The list of File objects for which the file names should be retrieved
directory : str
The base directory where the files are stored
extension : str
The file name extension of the files
Return value : [str]
The file names for the given File objects, in the same order.
files(ids, preserve_order=True)

Returns a list of File objects with the given file ids

Keyword Parameters:

ids : [various type]
The ids of the object in the database table “file”. This object should be a python iterable (such as a tuple or list).
preserve_order : bool
If True (the default) the order of elements is preserved, but the execution time increases.

Returns a list (that may be empty) of File objects.

get_client_id_from_file_id(file_id, **kwargs)[source]

Returns the client_id (real client id) attached to the given file_id

Keyword Parameters:

file_id
The file_id to consider

Returns: The client_id attached to the given file_id

get_client_id_from_model_id(model_id, **kwargs)[source]

Returns the client_id (real client id) attached to the given model id

Keyword Parameters:

model_id
The model to consider

Returns: The client_id attached to the given model

groups()[source]

Returns the groups, which are available in the database.

is_valid()

Returns if a valid session has been opened for reading the database.

model_ids(protocol=None, groups=None)[source]

Returns a list of model ids for the specific query by the user. For the ‘dev’ and ‘eval’ groups, the first element of each pair is extracted.

Keyword Parameters:

protocol
The protocol to consider; one of: (‘fold1’, ..., ‘fold10’), or None
groups
The groups to which the clients belong; one or several of: (‘dev’, ‘eval’) The ‘eval’ group does not exist for protocol ‘view1’.

Returns: A list containing all model ids which have the desired properties.

models(protocol=None, groups=None)[source]

Returns a list of Directory objects (there are multiple models per client) for the specific query by the user. For the ‘dev’ and ‘eval’ groups, the first element of each pair is extracted.

Keyword Parameters:

protocol
The protocol to consider; one of: (‘fold1’, ..., ‘fold10’), or None
groups
The groups to which the clients belong; one or several of: (‘dev’, ‘eval’)

Returns: A list containing all Directory objects which have the desired properties.

objects(protocol=None, model_ids=None, groups=None, purposes=None, subworld='sevenfolds', world_type='unrestricted')[source]

Returns a list of Directory objects for the specific query by the user.

Keyword Parameters:

protocol
The protocol to consider (‘fold1’, ..., ‘fold10’), or None
groups
The groups to which the objects belong (‘world’, ‘dev’, ‘eval’)
purposes
The purposes of the objects (‘enroll’, ‘probe’)
subworld
The subset of the training data. Has to be specified if groups includes ‘world’ and protocol is one of ‘fold1’, ..., ‘fold10’. It might be exactly one of (‘onefolds’, ‘twofolds’, ..., ‘sevenfolds’).
world_type
One of (‘restricted’, ‘unrestricted’). If ‘restricted’, only the files that are used in one of the training pairs are used. For ‘unrestricted’, all files of the training people are returned.
model_ids
Only retrieves the objects for the provided list of model ids. If ‘None’ is given (this is the default), no filter over the model_ids is performed. Note that the combination of ‘world’ group and ‘model_ids’ should be avoided.

Returns: A list of Directory objects considering all the filtering criteria.

original_file_name(directory, check_existence=None)[source]

Returns the list of original image names for the given directory, sorted by frame number. In opposition to other bob databases, here a list of file names is returned.

Keyword arguments:

directory : bob.db.youtube.Directory
The Directory object to retrieve the list of file names for
check_existence : bool
Shall the existence of the files be checked?
original_file_names(files, check_existence=True)[source]

This function returns the list of original file names for the given list of File objects.

Keyword parameters:

files : [File]
The list of File objects for which the file names should be retrieved
check_existence : bool
Check if the original files exists?
Return value : [str]
The original file names for the given File objects, in the same order.
pairs(protocol=None, groups=None, classes=None, subworld='sevenfolds')[source]

Queries a list of Pair’s of files.

Keyword Parameters:

protocol
The protocol to consider (‘fold1’, ..., ‘fold10’)
groups
The groups to which the objects belong (‘world’, ‘dev’, ‘eval’)
classes
The classes to which the pairs belong (‘matched’, ‘unmatched’)
subworld
The subset of the training data. Has to be specified if groups includes ‘world’ and protocol is one of ‘fold1’, ..., ‘fold10’. It might be exactly one of (‘onefolds’, ‘twofolds’, ..., ‘sevenfolds’).

Returns: A list of Pair’s considering all the filtering criteria.

paths(ids, prefix=None, suffix=None, preserve_order=True)

Returns a full file paths considering particular file ids, a given directory and an extension

Keyword Parameters:

ids : [various type]
The ids of the object in the database table “file”. This object should be a python iterable (such as a tuple or list).
prefix : str or None
The bit of path to be prepended to the filename stem
suffix : str or None
The extension determines the suffix that will be appended to the filename stem.
preserve_order : bool
If True (the default) the order of elements is preserved, but the execution time increases.

Returns a list (that may be empty) of the fully constructed paths given the file ids.

probe_files(protocol=None, model_id=None, groups='dev', **kwargs)[source]

Returns the list of probe File objects to probe the model with the given model id of the given protocol for the given groups that satisfy your query. If the model_id is None (the default), all possible probe files are returned. For possible keyword arguments, please check the objects() function.

protocol_names()[source]

Returns the names of the valid protocols.

provides_file_set_for_protocol(protocol=None)[source]

Returns True if the given protocol specifies file sets for probes, instead of a single probe file. In this default implementation, False is returned, throughout. If you need different behavior, please overload this function in your derived class.

query(*args)

Creates a query to the database using the given arguments.

reverse(paths, preserve_order=True)

Reverses the lookup: from certain paths, return a list of File objects

Keyword Parameters:

paths : [str]
The filename stems to query for. This object should be a python iterable (such as a tuple or list)
preserve_order : True
If True (the default) the order of elements is preserved, but the execution time increases.

Returns a list (that may be empty).

subworld_names(protocol=None)[source]

Returns all valid sub-worlds for the fold.. protocols.

test_files(protocol=None, groups='dev', **kwargs)[source]

Returns the list of all test File objects of the given groups that satisfy your query. Test objects are all File objects that serve either for enrollment or probing. For possible keyword arguments, please check the objects() function.

tmodel_ids(protocol, groups=None)[source]

Returns a list of T-Norm model ids that can be used for ZT norm. In fact, it uses the model ids from two other splits of the data, specifically, the last two of the training splits. Hence, to get training data independent from ZT-Norm data, use maximum subworld=’fivefolds’ in the world query.

Keyword Parameters:

protocol
The protocol to consider; one of: (‘fold1’, ..., ‘fold10’), or None
groups
Ignored.

Returns: A list containing all Directory objects which have the desired properties.

tmodels(protocol=None, groups=None)[source]

Returns a list of T-Norm models that can be used for ZT norm. In fact, it uses the model ids from two other splits of the data, specifically, the last two of the training splits. Hence, to get training data independent from ZT-Norm data, use maximum subworld=’fivefolds’ in the world query.

Keyword Parameters:

protocol
The protocol to consider; one of: (‘fold1’, ..., ‘fold10’), or None
groups
Ignored.

Returns: A list containing all Directory objects which have the desired properties.

tobjects(protocol, model_ids=None, groups=None)[source]
Returns a set of filenames for enrolling T-norm models for score
normalization.

Keyword Parameters:

protocol
The protocol to consider (‘fold1’, ..., ‘fold10’), or None
model_ids
Only retrieves the files for the provided list of model ids. If ‘None’ is given (this is the default), no filter over the model_ids is performed.
groups
Ignored.

Returns: A set of Directory objects with the given properties.

training_files(protocol=None, **kwargs)[source]

Returns the list of all training (world) File objects that satisfy your query. For possible keyword arguments, please check the objects() function.

uniquify(file_list)[source]

Sorts the given list of File objects and removes duplicates from it.

Keyword parameters:

file_list : [File]
A list of File objects to be handled. Also other objects can be handled, as long as they are sortable.
Returns
A sorted copy of the given file_list with the duplicates removed.
world_types()[source]

Returns the valid types of worlds: (‘restricted’, ‘unrestricted’).

zobjects(protocol, model_ids=None, groups=None)[source]
Returns a set of filenames for Z-norm probing for score
normalization.

Keyword Parameters:

protocol
The protocol to consider (‘fold1’, ..., ‘fold10’), or None
model_ids
Only retrieves the files for the provided list of model ids. If ‘None’ is given (this is the default), no filter over the model_ids is performed.
groups
Ignored.

Returns: A set of Directory objects with the given properties.

class bob.db.youtube.Directory(file_id, client_id, path)[source]

Bases: sqlalchemy.ext.declarative.api.Base, bob.db.verification.utils.file.File

Information about the directories of the Youtube Faces database.

client
client_id
id
make_path(directory=None, extension=None)

Wraps the current path so that a complete path is formed

Keyword parameters:

directory : str or None
An optional directory name that will be prefixed to the returned result.
extension : str or None
An optional extension that will be suffixed to the returned filename. The extension normally includes the leading . character as in .jpg or .hdf5.

Returns a string containing the newly generated file path.

metadata = MetaData(bind=None)
path
save(data, directory=None, extension='.hdf5', create_directories=True)

Saves the input data at the specified location and using the given extension.

Keyword parameters:

data : various types
The data blob to be saved (normally a numpy.ndarray).
directory : str or None
If not empty or None, this directory is prefixed to the final file destination
extension : str or None
The extension of the filename. This extension will control the type of output and the codec for saving the input blob.
create_directories : bool
Should the directory structure be created (if necessary) before writing the data?
shot_id
class bob.db.youtube.Pair(protocol, enroll_id, probe_id, enroll_client_id, probe_client_id, is_match)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Information of the pairs (as given in the pairs.txt files) of the LFW database.

enroll_client
enroll_client_id
enroll_directory
enroll_directory_id
id
is_match
metadata = MetaData(bind=None)
probe_client
probe_client_id
probe_directory
probe_directory_id
protocol