Extending packages as frameworks¶
It is often required to extend the functionality of your package as a framework. bob.bio.base is a good example; it provides an API and other packages build upon it. The utilities provided in this page are helpful in creating framework packages and building complex toolchians/pipelines.
Python-based Configuration System¶
This package also provides a configuration system that can be used by packages in the Bob-echosystem to load run-time configuration for applications (for package-level static variable configuration use Global Configuration System). It can be used to accept complex configurations from users through command-line. The run-time configuration system is pretty simple and uses Python itself to load and validate input files, making no a priori requirements on the amount or complexity of data that needs to be configured.
The configuration system is centered around a single function called
bob.extension.config.load()
. You call it to load the configuration
objects from one or more configuration files, like this:
>>> from bob.extension.config import load
>>> #the variable `path` points to <path-to-bob.extension's root>/data
>>> configuration = load([os.path.join(path, 'basic_config.py')])
If the function bob.extension.config.load()
succeeds, it returns a
python dictionary containing strings as keys and objects (of any kind) which
represent the configuration resource. For example, if the file
basic_config.py
contained:
1 2 | a = 1
b = a + 2
|
Then, the object configuration
would look like this:
>>> print("a = %d\nb = %d"%(configuration.a, configuration.b))
a = 1
b = 3
The configuration file does not have to limit itself to simple Pythonic operations, you can import modules, define functions and more.
Chain Loading¶
It is possible to implement chain configuration loading and overriding by
passing iterables with more than one filename to
bob.extension.config.load()
. Suppose we have two configuration files
which must be loaded in sequence:
1 2 | a = 1
b = a + 2
|
1 2 3 | # the b variable from the last config file is available here
c = b + 1
b = b + 3
|
Then, one can chain-load them like this:
>>> #the variable `path` points to <path-to-bob.extension's root>/data
>>> file1 = os.path.join(path, 'basic_config.py')
>>> file2 = os.path.join(path, 'load_config.py')
>>> configuration = load([file1, file2])
>>> print("a = %d \nb = %d"%(configuration.a, configuration.b))
a = 1
b = 6
The user wanting to override the values needs to manage the overriding and the order in which the override happens.
Entry Points¶
The function bob.extension.config.load()
can also load config files
through Setuptools entry points and module names. It is only needed
to provide the group name of the entry points:
>>> group = 'bob.extension.test_config_load' # the group name of entry points
>>> file1 = 'basic_config' # an entry point name
>>> file2 = 'bob.extension.data.load_config' # module name
>>> configuration = load([file1, file2], entry_point_group=group)
>>> print("a = %d \nb = %d"%(configuration.a, configuration.b))
a = 1
b = 6
Resource Loading¶
The function bob.extension.config.load()
can also only return
variables from paths. To do this, you need provide a attribute_name. For
example, given the following config file:
1 2 | test_config_load = 1
b = 2
|
The loaded value can be either 1 or 2:
>>> group = 'bob.extension.test_config_load' # the group name of entry points
>>> attribute_name = 'test_config_load' # the common variable name
>>> value = load(['bob.extension.data.resource_config2'], entry_point_group=group, attribute_name=attribute_name)
>>> value == 1
True
>>> value = load(['bob.extension.data.resource_config2:b'], entry_point_group=group, attribute_name=attribute_name)
>>> value == 2
True
Stacked Processing¶
bob.extension.processors.SequentialProcessor
and
bob.extension.processors.ParallelProcessor
are provided to help you
build complex processing mechanisms. You can use these processors to apply a
chain of processes on your data. For example,
bob.extension.processors.SequentialProcessor
accepts a list of callables
and applies them on the data one by one sequentially. :
>>> import numpy as np; from numpy import array
>>> from functools import partial
>>> from bob.extension.processors import SequentialProcessor
>>> raw_data = np.array([[1, 2, 3], [1, 2, 3]])
>>> seq_processor = SequentialProcessor(
... [np.cast['float64'], lambda x: x / 2, partial(np.mean, axis=1)])
>>> np.allclose(seq_processor(raw_data),
... array([ 1., 1.]))
True
>>> np.all(seq_processor(raw_data) ==
... np.mean(np.cast['float64'](raw_data) / 2, axis=1))
True
bob.extension.processors.ParallelProcessor
accepts a list of callables
and applies each them on the data independently and returns all the results.
For example:
>>> from bob.extension.processors import ParallelProcessor
>>> raw_data = np.array([[1, 2, 3], [1, 2, 3]])
>>> parallel_processor = ParallelProcessor(
... [np.cast['float64'], lambda x: x / 2.0])
>>> np.allclose(list(parallel_processor(raw_data)),
... [array([[ 1., 2., 3.],
... [ 1., 2., 3.]]),
... array([[ 0.5, 1. , 1.5],
... [ 0.5, 1. , 1.5]])])
True
The data may be further processed using a
bob.extension.processors.SequentialProcessor
:
>>> total_processor = SequentialProcessor(
... [parallel_processor, list, partial(np.concatenate, axis=1)])
>>> np.allclose(total_processor(raw_data),
... array([[ 1. , 2. , 3. , 0.5, 1. , 1.5],
... [ 1. , 2. , 3. , 0.5, 1. , 1.5]]))
True
Unified Command Line Mechanism¶
Bob comes with a command line called bob
which provides a set of
commands by default:
$ bob --help
Usage: bob [OPTIONS] COMMAND [ARGS]...
The main command line interface for bob. Look below for available
commands.
Options:
--help Show this message and exit.
Commands:
config The manager for bob's global configuration.
...
Warning
This feature is experimental and most probably will break compatibility. If you are not willing to fix your code after changes are made here, please do not use this feature.
This command line is implemented using click. You can extend the commands of
this script through setuptools entry points (this is implemented using
click-plugins). To do so you implement your command-line using click
independently; then, advertise it as a command under bob script using the
bob.cli
entry point.
Note
If you are still not sure how this must be done, maybe you don’t know how to use click yet.
This feature is experimental and may change and break compatibility in future.
For a best practice example, please look at how the bob config
command is
implemented:
"""The manager for bob's main configuration.
"""
from .. import rc
from ..rc_config import _saverc, _rc_to_str, _get_rc_path
from .click_helper import verbosity_option, AliasedGroup
import logging
import click
# Use the normal logging module. Verbosity and format of logging will be set by
# adding the verbosity_option form bob.extension.scripts.click_helper
logger = logging.getLogger(__name__)
@click.group(cls=AliasedGroup)
@verbosity_option()
def config(**kwargs):
"""The manager for bob's global configuration."""
# Load the config file again. This may be needed since the environment
# variable might change the config path during the tests. Otherwise, this
# should not be important.
logger.debug('Reloading the global configuration file.')
from ..rc_config import _loadrc
rc.clear()
rc.update(_loadrc())
@config.command()
def show():
"""Shows the configuration.
Displays the content of bob's global configuration file.
"""
# always use click.echo instead of print
click.echo("Displaying `{}':".format(_get_rc_path()))
click.echo(_rc_to_str(rc))
@config.command()
@click.argument('key')
def get(key):
"""Prints a key.
Retrieves the value of the requested key and displays it.
\b
Arguments
---------
key : str
The key to return its value from the configuration.
\b
Fails
-----
* If the key is not found.
"""
value = rc[key]
if value is None:
# Exit the command line with ClickException in case of errors.
raise click.ClickException(
"The requested key `{}' does not exist".format(key))
click.echo(value)
@config.command()
@click.argument('key')
@click.argument('value')
def set(key, value):
"""Sets the value for a key.
Sets the value of the specified configuration key in bob's global
configuration file.
\b
Arguments
---------
key : str
The key to set the value for.
value : str
The value of the key.
\b
Fails
-----
* If something goes wrong.
"""
try:
rc[key] = value
_saverc(rc)
except Exception:
logger.error("Could not configure the rc file", exc_info=True)
raise click.ClickException("Failed to change the configuration.")
@config.command()
@click.argument('substr')
@click.option('-c', '--contain', is_flag=True, default=False, type=click.BOOL, show_default=True)
@click.option('-f', '--force', is_flag=True, default=False, type=click.BOOL, show_default=True)
def unset(substr, contain=False, force=False):
"""Clear all variables starting (containing) with substring.
Clear all the variables that starts with the provided substring.
Each key/value pair for which the key starts with substring will be
removed from bob's global configuration file.
\b
Arguments
---------
substring : str
The starting substring of one or several key(s)
\b
Parameters
----------
contain : bool
If set, check also for keys containing substring
force : bool
If set, unset values without confirmation
"""
found = False
to_delete = []
for key in list(rc.keys()):
if key.startswith(substr):
found = True
to_delete.append(key)
if contain:
if substr in key:
to_delete.append(key)
found = True
if not found:
if not contain:
logger.error("The key starting with '{}' was not found in the rc file".format(substr))
else:
logger.error("The key containing '{}' was not found in the rc file".format(substr))
raise click.ClickException("Failed to change the configuration.")
if force:
for key in to_delete:
del rc[key]
else:
click.echo("Registered for deletion:")
for key in to_delete:
click.echo('- "{}" : "{}"'.format(key, rc[key]))
delete = click.confirm("Are you sure you want to delete all this ?")
if delete:
for key in to_delete:
del rc[key]
_saverc(rc)
Command line interfaces with configurations¶
Sometimes your command line interface takes so many parameters and you want to be able to accept this parameters as both in command-line options and through configuration files. Bob can help you with that. See below for an example:
"""A script to help annotate databases.
"""
import logging
import click
from bob.extension.scripts.click_helper import (
verbosity_option, ConfigCommand, ResourceOption, log_parameters)
logger = logging.getLogger(__name__)
ANNOTATE_EPILOG = '''\b
Examples:
$ bob bio annotate -vvv -d <database> -a <annotator> -o /tmp/annotations
$ jman submit --array 64 -- bob bio annotate ... --array 64
'''
@click.command(entry_point_group='bob.bio.config', cls=ConfigCommand,
epilog=ANNOTATE_EPILOG)
@click.option('--database', '-d', required=True, cls=ResourceOption,
entry_point_group='bob.bio.database',
help='''The database that you want to annotate.''')
@click.option('--annotator', '-a', required=True, cls=ResourceOption,
entry_point_group='bob.bio.annotator',
help='A callable that takes the database and a sample (biofile) '
'of the database and returns the annotations in a dictionary.')
@click.option('--output-dir', '-o', required=True, cls=ResourceOption,
help='The directory to save the annotations.')
@click.option('--force', '-f', is_flag=True, cls=ResourceOption,
help='Whether to overwrite existing annotations.')
@click.option('--array', type=click.INT, default=1, cls=ResourceOption,
help='Use this option alongside gridtk to submit this script as '
'an array job.')
@verbosity_option(cls=ResourceOption)
def annotate(database, annotator, output_dir, force, array, **kwargs):
"""Annotates a database.
The annotations are written in text file (json) format which can be read
back using :any:`bob.db.base.read_annotation_file` (annotation_type='json')
"""
log_parameters(logger)
This will produce the following help message to the users:
Usage: bob bio annotate [OPTIONS] [CONFIG]...
Annotates a database.
The annotations are written in text file (json) format which can be read
back using :any:`bob.db.base.read_annotation_file`
(annotation_type='json')
It is possible to pass one or several Python files (or names of
``bob.bio.config`` entry points or module names) as CONFIG arguments to
the command line which contain the parameters listed below as Python
variables. The options through the command-line (see below) will override
the values of configuration files. You can run this command with
``<COMMAND> -H example_config.py`` to create a template config file.
Options:
-d, --database TEXT The database that you want to annotate. Can
be a ``bob.bio.database`` entry point, a
module name, or a path to a Python file
which contains a variable named `database`.
-a, --annotator TEXT A callable that takes the database and a
sample (biofile) of the database and returns
the annotations in a dictionary. Can be a
``bob.bio.annotator`` entry point, a module
name, or a path to a Python file which
contains a variable named `annotator`.
-o, --output-dir TEXT The directory to save the annotations.
-f, --force Whether to overwrite existing annotations.
--array INTEGER Use this option alongside gridtk to submit
this script as an array job.
databases.
-v, --verbose Increase the verbosity level from 0 (only
error messages) to 1 (warnings), 2 (log
messages), 3 (debug information) by adding
the --verbose option as often as desired
(e.g. '-vvv' for debug).
-H, --dump-config FILENAME Name of the config file to be generated
-?, -h, --help Show this message and exit.
Examples:
$ bob bio annotate -vvv -d <database> -a <annotator> -o /tmp/annotations
$ jman submit --array 64 -- bob bio annotate ... --array 64
This script takes configuration files (CONFIG
) and command line options
(e.g. --force
) as input and resolves the Parameters from the input.
Command line options, if given, override the values of Parameters that may
exist in configuration files. Configuration files are loaded through the
Python-based Configuration System mechanism so chain loading is supported.
CONFIG
can be a path to a file (e.g. /path/to/config.py
), a module name
(e.g. bob.package.config2
), or setuptools entry points with a specified
group name of the entry points. For example in the annotate script given above,
CONFIG
can be the name of bob.bio.config
entry points.
Some command line options (e.g. --database
in the example above) can be
complex Python objects. The way to specify them in the command line is like
--database atnt
and this string will be treated as a setuptools entry point
here (bob.bio.database
entry points in this example). The mechanism to load
this options is the same as loading CONFIG
’s but the entry point name is
different for each option.
By the time, the code enters into the implemented annotate
function, all
variables are resolved and validated and everything is ready to use.
Below you can see several ways that this script can be invoked:
# below, atnt is a bob.bio.database entry point
# below, face is a bob.bio.annotator entry point
$ bob annotate -d atnt -a face -o /tmp --force -vvv
# below, bob.db.atnt.config is a module name that resolves to a path to a config file
$ bob annotate -d bob.db.atnt.config -a face -o /tmp --force -vvv
# below, all parameters are inside a Python file and the path to that file is provided.
# If the configuration file has for example database defined as ``database = 'atnt'``
# the atnt name will be treated as a bob.bio.database entry point and will be loaded.
$ bob annotate /path/to/config_with_all_parameters.py
# below, the path of the config file is given as a module name
$ bob annotate bob.package.config_with_all_parameters
# below, the output will be /tmp even if there is an ``output`` variable inside the config file.
$ bob annotate bob.package.config_with_all_parameters -o /tmp
# below, each resource option can be loaded through config loading mechanism too.
$ bob annotate -d /path/to/config/database.py -a bob.package.annotate.config --output /tmp
# Using the command below users can generate a template config file
$ bob annotate -H example_config.py
As you can see the command line interface can accept its inputs through several
different mechanism. Normally to keep things simple, you would encourage users
to just provide one or several configuration files as entry point names or as
module names and maybe have them provide simple options like --verbose
or
--force
through the command line options.