High Level Database Interface How-To Guide¶
The high level database interface (HLDI) is needed to run biometric experiments using non-filelist databases (e.g. if one wants to use SQL-based database package).
This tutorial explains how to create a high level database
interface, using as an example bob.pad.*
framework (e.g.
bob.pad.face
). The process is similar for bob.bio
frameworks,
e.g. bob.bio.face
, bob.bio.vein
). High level database interface
is a link between low level database interface/package (e.g. bob.db.replay
) and a
corresponding framework used to run biometric experiments (e.g.
bob.pad.face
). Generally speaking, the low level interface has lot’s
of querying options, which are not always used in the corresponding biometric
framework. High level interface only contains the functionality, which
is needed to run biometric experiments. This, must have functionality,
is defined in the corresponding base classes and is discussed next.
First thing you need to do is to create a *.py
file containing
your high level implementation, for example:
bob/pad/face/database/replay.py
for the Replay database. This file
must be placed into corresponding biometric framework, which in this
case is bob.pad.face
package. The file must contain the
implementation of two classes:
<YourDatabaseName><Bio/Pad/Other>File
<YourDatabaseName><Bio/Pad/Other>Database
For example, the names of the above classes for the Replay database used in
the bob.pad.face
framework are: ReplayPadFile
and
ReplayPadDatabase
.
Implementation of the *File
class¶
First of all, the *File
class must inherit from the base file
class of the corresponding biometric framework. An example:
*File
class for the Replay database used in PAD (Presentation Attack Detection) experiments:class ReplayPadFile(PadFile):
*File
class for the Biowave V1 database used in verification experiments:class BiowaveV1BioFile(BioFile):
Base class defines the elements, which must be implemented in the derived
class. For example, the implementation of ReplayPadFile
class must
set the following elements of the base class: client_id
, path
,
attack_type
and file_id
. The corresponding high level
implementation of the ReplayPadFile
class might look as follows:
import bob.bio.video
from bob.pad.base.database import PadFile
class ReplayPadFile(PadFile):
def __init__(self, f):
self.__f = f # here ``f`` is an instance of the File class defined in the low level database interface
if f.is_real():
attack_type = None
else:
attack_type = 'attack'
super(ReplayPadFile, self).__init__(client_id=f.client, path=f.path,
attack_type=attack_type, file_id=f.id)
def load(self, directory=None, extension='.mov'):
path = self.f.make_path(directory=directory, extension=extension)
frame_selector = bob.bio.video.FrameSelector(selection_style = 'all')
video_data = frame_selector(path)
bbx_data = one_file.bbx(directory=directory)
return_dictionary = {}
return_dictionary["data"] = video_data
return_dictionary["annotations"] = bbx_data
return return_dictionary
Please, note, that in our case the ReplayPadFile
also has a
load()
method. Note: the load() method of the high level
``*File`` class is used by the preprocessor (a very first block in every
biometric pipeline) to read the data from the database. Not all high
level database interfaces require this method, but let’s try to
understand why ReplayPadFile
class has it. The necessity to have
this method comes from the fact, that Replay database contains video
files, not images. To understand why load()
method is needed in the
case of video-based database we need to take a look at the inheritance
structure of the class. For the ReplayPadFile
class it looks as
follows:
ReplayPadFile
->bob.pad.base.database.PadFile
->bob.bio.base.database.BioFile
->bob.db.base.File
Here the notation A
-> B
means A
inherits from B
. Well,
the inheritance is pretty deep, but no need to worry about this. The
class of interest for us is bob.db.base.File
containing the default
file managing methods, which might be overridden if necessary. One of
methods is load()
not supporting video files by default. Since a
different behavior is desired, we need to override it in the high level
implementation of the *File
class, ReplayPadFile
in this case.
In this example the load()
method returns the dictionary, which
contains the video frames, and annotations defining the face bounding
box in each frame. The preprocessor has to be “ready to deal” with that
type of input. With this, we are done configuring the high level
implementation of the *File
class.
Implementation of the *Database
class¶
The second unit to be implemented in HLDI is the *Database
class.
First of all the *Database
class must inherit from the base
database class of the corresponding biometric framework. An example:
*Database
class for the Replay database used in PAD (Presentation Attack Detection) experiments:class ReplayPadDatabase(PadDatabase):
*Database
class for the Biowave V1 database used in verification experiments:class BiowaveV1BioDatabase(BioDatabase):
Let’s consider an example of the ReplayPadDatabase
class. The implementation might look as follows, but don’t dive into the code yet:
from bob.pad.base.database import PadDatabase
class ReplayPadDatabase(PadDatabase):
def __init__(
self,
all_files_options={},
check_original_files_for_existence=False,
original_directory=None,
original_extension=None,
# here I have said grandtest because this is the name of the default
# protocol for this database
protocol='grandtest',
**kwargs):
self.db = LowLevelDatabase()
# Since the high level API expects different group names than what the low
# level API offers, you need to convert them when necessary
self.low_level_group_names = ('train', 'devel', 'test') # group names in the low-level database interface
self.high_level_group_names = ('train', 'dev', 'eval') # names are expected to be like that in objects() function
super(ReplayPadDatabase, self).__init__(
'replay',
all_files_options,
check_original_files_for_existence,
original_directory,
original_extension,
protocol,
**kwargs)
def objects(self, groups=None, protocol=None, purposes=None, model_ids=None, **kwargs):
# Convert group names to low-level group names here.
groups = self.convert_names_to_lowlevel(groups, self.low_level_group_names, self.high_level_group_names)
files = self.db.objects(protocol=protocol, groups=groups, cls=purposes, **kwargs)
files = [ReplayPadFile(f) for f in files]
return files
def annotations(self, file):
"""
Do nothing. In this particular implementation the annotations are returned in the *File class above.
"""
return None
Instead, let’s try to understand why the implementation looks like this. Again, the methods to be implemented are defined by the corresponding base class of our *Database
class.
In the case of PAD *Database
the inheritance structure is as follows:
ReplayPadDatabase
->bob.pad.base.database.PadDatabase
->bob.bio.base.database.BioDatabase
->bob.db.base.Database
For the verification database the inheritance would be:
bob.pad.base.database.PadDatabase
->bob.bio.base.database.BioDatabase
->bob.db.base.Database
For other biometric experiments it might look differently.
In the given example the behavior of the ReplayPadDatabase
class is defined by the bob.pad.base.database.PadDatabase
base class, which sates that two methods must be implemented in the high level database implementation: objects()
and annotations()
. The objects()
method returns a list of instances of ReplayPadFile
class. The annotations()
method is empty, since the developer of the code decided to return the annotations in the *File
class. Note: you are not obliged to do it that way, it’s just a matter of taste.
At this point, having all necessary classes in place, we are done with implementation of the high level database interface!
Just a few small things have to be done to register our high level interface in the corresponding biometric framework.
- First, import your package in the
__init__.py
file located in the folder containing the implementation of HLDI:from .replay import ReplayPadDatabase
- Next, create an instance of the
*Database
class with default configuration. For example, for theReplayPadDatabase
class used inbob.pad.face
framework, the default configuration file/bob/pad/face/config/database/replay.py
is as follows:
# The original_directory is taken from the .bob_bio_databases.txt file located in your home directory
original_directory = "[YOUR_REPLAY_ATTACK_DIRECTORY]"
original_extension = ".mov" # extension of the data files
database = ReplayPadDatabase(
protocol='grandtest',
original_directory=original_directory,
original_extension=original_extension,
training_depends_on_protocol=True,
)
- Finally, in the
setup.py
file of the corresponding biometric framework, add the entry pointing to your default configuration. In the case of observed PAD example the code is:
entry_points = {
'bob.pad.database': [
'replay = bob.pad.face.config.database.replay:database',
],
},
That’s it! Now we are ready to use our database in the corresponding biometric framework.