.. vim: set fileencoding=utf-8 :

.. Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/          ..
.. Contact: beat.support@idiap.ch                                             ..
..                                                                            ..
.. This file is part of the beat.core module of the BEAT platform.            ..
..                                                                            ..
.. Commercial License Usage                                                   ..
.. Licensees holding valid commercial BEAT licenses may use this file in      ..
.. accordance with the terms contained in a written agreement between you     ..
.. and Idiap. For further information contact tto@idiap.ch                    ..
..                                                                            ..
.. Alternatively, this file may be used under the terms of the GNU Affero     ..
.. Public License version 3 as published by the Free Software and appearing   ..
.. in the file LICENSE.AGPL included in the packaging of this file.           ..
.. The BEAT platform is distributed in the hope that it will be useful, but   ..
.. WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY ..
.. or FITNESS FOR A PARTICULAR PURPOSE.                                       ..
..                                                                            ..
.. You should have received a copy of the GNU Affero Public License along     ..
.. with the BEAT platform. If not, see http://www.gnu.org/licenses/.          ..


==========
Databases
==========

A database is a collection of data files, one for each output of the database.
This data are inputs to the BEAT toolchains. Therefore, it is important to
define evaluation protocols, which describe how a specific system must use the
raw data of a given database.

For instance, a recognition system will typically use a subset of the data to
train a recognition `model`, while another subset of data will be used to
evaluate the performance of this model.


Structure of a database
-----------------------

A database has the following structure on disk::

    database_name/
        output1_name.data
        output2_name.data
        ...
        outputN_name.data

For a given database, the BEAT platform will typically stores information
about the root folder containing this raw data as well as a description of
it.


Evaluation protocols
--------------------

A BEAT evaluation protocol consists of several ``datasets``, each datasets
having several ``outputs`` with well-defined data formats. In practice,
each dataset will typically be used for a different purpose.

For instance, in the case of a simple face recognition protocol, the
database may be split into three datasets: one for training, one for enrolling
client-specific model, and one for testing these models.
The training dataset may have two outputs: grayscale images as two-dimensional
array of type `uint8` and client id as `uint64` integers.

The BEAT platform is data-driven, which means that all the outputs of a given
dataset are synchronized. The way the data is generated by each template
is defined in a piece of code called the ``database view``. It is important
that a database view has a deterministic behavior for reproducibility
purposes.


Database set templates
----------------------

In practice, different databases used for the same purpose may have the exact
same datasets with the exact same outputs (and attached data formats). In this
case, it is interesting to abstract the definition of the database sets from
a given database. BEAT defines ``database set templates`` for this purpose.

For instance, the simple face recognition evaluation protocol described above,
which consists of three datasets and few inputs may be abstracted in a
database set template. This template defines both the datasets, their outputs
as well as their corresponding data formats. Next, if several databases
implements such a protocol, they may rely on the same `database set template`.
Similarly, evaluation protocols testing different conditions (such as
enrolling on clean and testing on clean data vs. enrolling on clean and
testing on noisy data) may rely on the same database set template.

In practice, this reduces the amount of work to integrate new databases and/or
new evaluation protocols into the platform. Besides, at the experiment level,
this allows to re-use a toolchain on a different database, with almost no
configuration changes from the user.