Extending packages as frameworks

It is often required to extend the functionality of your package as a framework. bob.bio.base is a good example; it provides an API and other packages build upon it. The utilities provided in this page are helpful in creating framework packages and building complex toolchians/pipelines.

Python-based Configuration System

This package also provides a configuration system that can be used by packages in the Bob-echosystem to load run-time configuration for applications (for package-level static variable configuration use Global Configuration System). It can be used to accept complex configurations from users through command-line. The run-time configuration system is pretty simple and uses Python itself to load and validate input files, making no a priori requirements on the amount or complexity of data that needs to be configured.

The configuration system is centered around a single function called bob.extension.config.load(). You call it to load the configuration objects from one or more configuration files, like this:

>>> from bob.extension.config import load
>>> #the variable `path` points to <path-to-bob.extension's root>/data
>>> configuration = load([os.path.join(path, 'basic_config.py')])

If the function bob.extension.config.load() succeeds, it returns a python dictionary containing strings as keys and objects (of any kind) which represent the configuration resource. For example, if the file basic_config.py contained:

Listing 1 “basic_config.py”
1
2
a = 1
b = a + 2

Then, the object configuration would look like this:

>>> print("a = %d\nb = %d"%(configuration.a, configuration.b))
a = 1
b = 3

The configuration file does not have to limit itself to simple Pythonic operations, you can import modules, define functions and more.

Chain Loading

It is possible to implement chain configuration loading and overriding by passing iterables with more than one filename to bob.extension.config.load(). Suppose we have two configuration files which must be loaded in sequence:

Listing 2 “basic_config.py” (first to be loaded)
1
2
a = 1
b = a + 2
Listing 3 “load_config.py” (loaded after basic_config.py)
1
2
3
# the b variable from the last config file is available here
c = b + 1
b = b + 3

Then, one can chain-load them like this:

>>> #the variable `path` points to <path-to-bob.extension's root>/data
>>> file1 = os.path.join(path, 'basic_config.py')
>>> file2 = os.path.join(path, 'load_config.py')
>>> configuration = load([file1, file2])
>>> print("a = %d \nb = %d"%(configuration.a, configuration.b)) 
a = 1
b = 6

The user wanting to override the values needs to manage the overriding and the order in which the override happens.

Entry Points

The function bob.extension.config.load() can also load config files through Setuptools entry points and module names. It is only needed to provide the group name of the entry points:

>>> group = 'bob.extension.test_config_load'  # the group name of entry points
>>> file1 = 'basic_config'  # an entry point name
>>> file2 = 'bob.extension.data.load_config' # module name
>>> configuration = load([file1, file2], entry_point_group=group)
>>> print("a = %d \nb = %d"%(configuration.a, configuration.b)) 
a = 1
b = 6

Stacked Processing

bob.extension.processors.SequentialProcessor and bob.extension.processors.ParallelProcessor are provided to help you build complex processing mechanisms. You can use these processors to apply a chain of processes on your data. For example, bob.extension.processors.SequentialProcessor accepts a list of callables and applies them on the data one by one sequentially. :

>>> import numpy as np
>>> from functools import  partial
>>> from bob.extension.processors import SequentialProcessor
>>> raw_data = np.array([[1, 2, 3], [1, 2, 3]])
>>> seq_processor = SequentialProcessor(
...     [np.cast['float64'], lambda x: x / 2, partial(np.mean, axis=1)])
>>> seq_processor(raw_data)
array([ 1.,  1.])
>>> np.all(seq_processor(raw_data) ==
...        np.mean(np.cast['float64'](raw_data) / 2, axis=1))
True

bob.extension.processors.ParallelProcessor accepts a list of callables and applies each them on the data independently and returns all the results. For example:

>>> import numpy as np
>>> from functools import  partial
>>> from bob.extension.processors import ParallelProcessor
>>> raw_data = np.array([[1, 2, 3], [1, 2, 3]])
>>> parallel_processor = ParallelProcessor(
...     [np.cast['float64'], lambda x: x / 2.0])
>>> list(parallel_processor(raw_data))
[array([[ 1.,  2.,  3.],
       [ 1.,  2.,  3.]]), array([[ 0.5,  1. ,  1.5],
       [ 0.5,  1. ,  1.5]])]

The data may be further processed using a bob.extension.processors.SequentialProcessor:

>>> from bob.extension.processors import SequentialProcessor
>>> total_processor = SequentialProcessor(
...     [parallel_processor, list, partial(np.concatenate, axis=1)])
>>> total_processor(raw_data)
array([[ 1. ,  2. ,  3. ,  0.5,  1. ,  1.5],
       [ 1. ,  2. ,  3. ,  0.5,  1. ,  1.5]])