Python API

This section includes information for using the pure Python API of bob.io.base.

Classes

class bob.io.base.File

Bases: object

File(filename, [mode=’r’, [pretend_extension=’‘]]) -> new bob::io::File

Use this object to read and write data into files.

Constructor parameters:

filename
[str] The file path to the file you want to open
mode
[str] A single character (one of 'r', 'w', 'a'), indicating if you’d like to read, write or append into the file. If you choose 'w' and the file already exists, it will be truncated.By default, the opening mode is read-only ('r').
pretend_extension
[str, optional] Normally we read the file matching the extension to one of the available codecs installed with the present release of Bob. If you set this parameter though, we will read the file as it had a given extension. The value should start with a '.'. For example '.hdf5', to make the file be treated like an HDF5 file.
append(array) → int

Adds the contents of an object to the file.

Parameters:

array
[array] The array to be added into the file. It can be a numpy, a bob.blitz.array or any other object which can be converted to either of them, as long as the number of dimensions and scalar type are supported by bob.blitz.array.

This method appends data to the file. If the file does not exist, creates a new file, else, makes sure that the inserted array respects the previously set file structure.

Returns the current position of the newly written array.

codec_name

Name of the File class implementation – available for compatibility reasons with the previous versions of this library.

describe([all]) → tuple

Returns a description (dtype, shape, stride) of data at the file.

Parameters:

all
[bool] If set, return the shape and strides for reading the whole file contents in one go.
filename

The path to the file being read/written

read([index]) → numpy.ndarray

Reads a specific object in the file, or the whole file.

Parameters:

index
[int|long, optional] The index to the object one wishes to retrieve from the file. Negative indexing is supported. If not given, impliess retrieval of the whole file contents.

This method reads data from the file. If you specified an index, it reads just the object indicated by the index, as you would do using the [] operator. If an index is not specified, reads the whole contents of the file into a numpy.ndarray.

write(array) → None

Writes the contents of an object to the file.

Parameters:

array
[array] The array to be written into the file. It can be a numpy, a bob.blitz.array or any other object which can be converted to either of them, as long as the number of dimensions and scalar type are supported by bob.blitz.array.

This method writes data to the file. It acts like the given array is the only piece of data that will ever be written to such a file. No more data appending may happen after a call to this method.

class bob.io.base.HDF5File

Bases: object

HDF5File(filename, [mode=’r’]) -> new bob::io::HDF5File

Reads and writes data to HDF5 files.

Constructor parameters:

filename
[str] The file path to the file you want to read from/write to
mode
[str, optional] The opening mode: Use 'r' for read-only, 'a' for read/write/append, 'w' for read/write/truncate or 'x' for (read/write/exclusive). This flag defaults to 'r'.

HDF5 stands for Hierarchical Data Format version 5. It is a flexible, binary file format that allows one to store and read data efficiently into files. It is a cross-platform, cross-architecture format.

Objects of this class allows users to read and write data from and to files in HDF5 format. For an introduction to HDF5, visit the HDF5 Website.

append(path, data[, compression=0]) → None

Appends a scalar or an array to a dataset

Parameters:

path
[str] The path to the dataset to read data from. Can be an absolute value (starting with a leading '/') or relative to the current working directory (cwd).
data
[scalar|numpy.ndarray] Object to append to the dataset. This value must be compatible with the typing information on the dataset, or an exception will be raised. You can also, optionally, set this to an iterable of scalars or arrays. This will cause this method to iterate over the elements and add each individually.
compression
This parameter is effective when appending arrays. Set this to a number betwen 0 (default) and 9 (maximum) to compress the contents of this dataset. This setting is only effective if the dataset does not yet exist, otherwise, the previous setting is respected.
cd(path) → None

Changes the current prefix path.

Parameters:

path
[str] The path to change directories to

When this object is started, the prefix path is empty, which means all following paths to data objects should be given using the full path. If you set this to a different value, it will be used as a prefix to any subsequent operation until you reset it. If path starts with '/', it is treated as an absolute path. '..' and '.' are supported. This object should be an str object. If the value is relative, it is added to the current path. If it is absolute, it causes the prefix to be reset. Note all operations taking a relative path, following a cd(), will be considered relative to the value defined by the cwd property of this object.

close() → None

Closes this file

This function closes the HDF5File after flushing all its contents to disk.After the HDF5File is closed, any operation on it will result in an exception.

copy(file) → None

Copies all accessible content to another HDF5 file

Parameters:

file
[HDF5File] The file (already opened), to copy the contents to. Unlinked contents of this file will not be copied. This can be used as a method to trim unwanted content in a file.
create_group(path) → None

Creates a new path (group) inside the file.

Parameters:

path
[str] The path to check

Creates a new directory (i.e., a group in HDF5 parlance) inside the file. A relative path is taken w.r.t. to the current directory. If the directory already exists (check it with HDF5File.has_group(), an exception will be raised.

cwd

The current working directory set on the file

del_attribute(name[, path='.']) → None

Removes a given attribute at the named resource.

Parameters:

name
[str] The name of the attribute to delete. A RuntimeError is raised if the attribute does not exist.
path
[str, optional] The path leading to the resource (dataset or group|directory) you would like to set an attribute at. If the path does not exist, a RuntimeError is raised.
del_attributes([attrs=None[, path='.']]) → None

Removes attributes in a given (existing) path

Parameters:

attrs
[list] An iterable containing the names of the attributes to be removed. If not given or set to None, then remove all attributes at the named resource.
path
[str, optional] The path leading to the resource (dataset or group|directory) you would like to set attributes at. If the path does not exist, a RuntimeError is raised.
describe(path) → tuple

Describes a dataset type/shape, if it exists inside a file

Parameters:

key
[str] The dataset path to describe

If a given path to an HDF5 dataset exists inside the file, return a type description of objects recorded in such a dataset, otherwise, raises an exception. The returned value type is a tuple of tuples (HDF5Type, number-of-objects, expandable) describing the capabilities if the file is read using theses formats.

filename

str <– The name (and path) of the underlying file on hard disk

flush() → None

Flushes the content of the HDF5 file to disk

When the HDF5File is open for writing, this function synchronizes the contents on the disk with the one from the file.When the file is open for reading, nothing happens.

get_attribute(name[, path='.']) → scalar|numpy.ndarray

Retrieve a given attribute from the named resource.

Parameters:

name
[str] The name of the attribute to retrieve. If the attribute is not available, a RuntimeError is raised.
path
[str, optional] The path leading to the resource (dataset or group|directory) you would like to get an attribute from. If the path does not exist, a RuntimeError is raised.

This method returns a single value corresponding to what is stored inside the attribute container for the given resource. If you would like to retrieve all attributes at once, use HDF5File.get_attributes() instead.

get_attributes([path='.']) → dict

All attributes of the given path organized in dictionary

Parameters:

path
[str, optional] The path leading to the resource (dataset or group|directory) you would like to get all attributes from. If the path does not exist, a RuntimeError is raised.

Attributes are returned in a dictionary in which each key corresponds to the attribute name and each value corresponds to the value stored inside the HDF5 file. To retrieve only a specific attribute, use HDF5File.get_attribute().

has_attribute(name[, path='.']) → bool

Checks existence of a given attribute at the named resource.

Parameters:

name
[str] The name of the attribute to check.
path
[str, optional] The path leading to the resource (dataset or group|directory) you would like to set an attribute at. If the path does not exist, a RuntimeError is raised.
has_dataset(key) → bool

Checks if a dataset exists inside a file

Parameters:

key
[str] The dataset path to check

Checks if a dataset exists inside a file, on the specified path. If the given path is relative, it is take w.r.t. to the current working directory.

has_group(path) → bool

Checks if a path (group) exists inside a file

Parameters:

path
[str] The path to check

Checks if a path (i.e. a group in HDF5 parlance) exists inside a file. This method does not work for datasets, only for directories. If the given path is relative, it is take w.r.t. to the current working directory.

has_key()

x.has_dataset(key) -> bool

Checks if a dataset exists inside a file

Parameters:

key
[str] The dataset path to check

Checks if a dataset exists inside a file, on the specified path. If the given path is relative, it is take w.r.t. to the current working directory.

keys()

x.paths([relative=False]) -> tuple

Lists datasets available inside this file

Parameters:

relative
[bool, optional] if set to True, the returned paths are relative to the current working directory, otherwise they are absolute.

Returns all paths to datasets available inside this file, stored under the current working directory. If relative is set to True, the returned paths are relative to the current working directory, otherwise they are absolute.

lread(key[, pos=-1]) → list|numpy.ndarray

Reads some contents of the dataset.

Parameters:

key
[str] The path to the dataset to read data from. Can be an absolute value (starting with a leading '/') or relative to the current working directory (cwd).
pos
[int, optional] Returns a single object if pos >= 0, otherwise a list by reading all objects in sequence.

This method reads contents from a dataset, treating the N-dimensional dataset like a container for multiple objects with N-1 dimensions. It returns a single numpy.ndarray in case pos is set to a value >= 0, or a list of arrays otherwise.

paths([relative=False]) → tuple

Lists datasets available inside this file

Parameters:

relative
[bool, optional] if set to True, the returned paths are relative to the current working directory, otherwise they are absolute.

Returns all paths to datasets available inside this file, stored under the current working directory. If relative is set to True, the returned paths are relative to the current working directory, otherwise they are absolute.

read(key[, pos=-1]) → numpy.ndarray

Reads whole datasets from the file.

Parameters:

key
[str] The path to the dataset to read data from. Can be an absolute value (starting with a leading '/') or relative to the current working directory (cwd).
rename(from, to) → None

Renames datasets in a file

Parameters:

from
[str] The path to the data being renamed
to
[str] The new name of the dataset
replace(path, pos, data) → None

Modifies the value of a scalar/array in a dataset.

Parameters:

key
[str] The path to the dataset to read data from. Can be an absolute value (starting with a leading '/') or relative to the current working directory (cwd).
pos
[int] Position, within the dataset, of the object to be replaced. The object position on the dataset must exist, or an exception is raised.
data
[scalar|numpy.ndarray] Object to replace the value with. This value must be compatible with the typing information on the dataset, or an exception will be raised.
set(path, data[, compression=0]) → None

Sets the scalar or array at position 0 to the given value.

Parameters:

path
[str] The path to the dataset to read data from. Can be an absolute value (starting with a leading '/') or relative to the current working directory (cwd).
data
[scalar|numpy.ndarray] Object to append to the dataset. This value must be compatible with the typing information on the dataset, or an exception will be raised. You can also, optionally, set this to an iterable of scalars or arrays. This will cause this method to iterate over the elements and add each individually.
compression
This parameter is effective when appending arrays. Set this to a number betwen 0 (default) and 9 (maximum) to compress the contents of this dataset. This setting is only effective if the dataset does not yet exist, otherwise, the previous setting is respected.

This method is equivalent to checking if the scalar or array at position 0 exists and then replacing it. If the path does not exist, we append the new scalar or array.

set_attribute(name, value[, path='.']) → None

Sets a given attribute at the named resource.

Parameters:

name
[str] The name of the attribute to set.
value
[scalar|numpy.ndarray] A simple scalar to set for the given attribute on the named resources (path). Only simple scalars (booleans, integers, floats and complex numbers) and arrays of those are supported at the time being. You can use numpy scalars to set values with arbitrary precision (e.g. numpy.uint8).
path
[str, optional] The path leading to the resource (dataset or group|directory) you would like to set an attribute at.

Warning

Attributes in HDF5 files are supposed to be small containers or simple scalars that provide extra information about the data stored on the main resource (dataset or group|directory). Attributes cannot be retrieved in chunks, contrary to data in datasets.

Currently, no limitations for the size of values stored on attributes is imposed.

set_attributes(attrs[, path='.']) → None

Sets attributes in a given (existing) path using a dictionary

Parameters:

attrs
[dict] A python dictionary containing pairs of strings and values. Each value in the dictionary should be simple scalars (booleans, integers, floats and complex numbers) or arrays of those are supported at the time being. You can use numpy scalars to set values with arbitrary precision (e.g. numpy.uint8).
path
[str, optional] The path leading to the resource (dataset or group|directory) you would like to set attributes at.

Warning

Attributes in HDF5 files are supposed to be small containers or simple scalars that provide extra information about the data stored on the main resource (dataset or group|directory). Attributes cannot be retrieved in chunks, contrary to data in datasets.

Currently, no limitations for the size of values stored on attributes is imposed.

sub_groups([relative=False[, recursive=True]]) → tuple

Lists groups (directories) in the current file.

Parameters:

relative
[bool, optional] if set to True, the returned sub-groups are relative to the current working directory, otherwise they are absolute.
recursive
[bool, optional] if set to False, the returned sub-groups are only the ones in the current directory. Otherwise, recurse down the directory structure.

Unlinks datasets inside the file making them invisible.

Parameters:

key
[str] The dataset path to describe

If a given path to an HDF5 dataset exists inside the file, unlinks it. Please note this will note remove the data from the file, just make it inaccessible. If you wish to cleanup, save the reacheable objects from this file to another HDF5File object using copy(), for example.

writable

bool <– Has this file been opened in writable mode?

Functions

bob.io.base.load(inputs)[source]

Loads the contents of a file, an iterable of files, or an iterable of bob.io.File‘s into a numpy.ndarray.

Parameters:

inputs

This might represent several different entities:

  1. The name of a file (full path) from where to load the data. In this case, this assumes that the file contains an array and returns a loaded numpy ndarray.
  2. An iterable of filenames to be loaded in memory. In this case, this would assume that each file contains a single 1D sample or a set of 1D samples, load them in memory and concatenate them into a single and returned 2D numpy ndarray.
  3. An iterable of bob.io.File. In this case, this would assume that each bob.io.File contains a single 1D sample or a set of 1D samples, load them in memory if required and concatenate them into a single and returned 2D numpy ndarray.
  4. An iterable with mixed filenames and bob.io.File. In this case, this would returned a 2D numpy.ndarray, as described by points 2 and 3 above.
bob.io.base.merge(filenames)[source]

Converts an iterable of filenames into an iterable over read-only bob.io.File’s.

Parameters:

filenames

This might represent:

  1. A single filename. In this case, an iterable with a single bob.io.File is returned.
  2. An iterable of filenames to be converted into an iterable of bob.io.File‘s.
bob.io.base.save(array, filename, create_directories=False)[source]

Saves the contents of an array-like object to file.

Effectively, this is the same as creating a bob.io.File object with the mode flag set to w (write with truncation) and calling bob.io.File.write() passing array as parameter.

Parameters:

array
The array-like object to be saved on the file
filename
The name of the file where you need the contents saved to
create_directories
Automatically generate the directories if required
bob.io.base.append(array, filename)[source]

Appends the contents of an array-like object to file.

Effectively, this is the same as creating a bob.io.File object with the mode flag set to a (append) and calling bob.io.File.append() passing array as parameter.

Parameters:

array
The array-like object to be saved on the file
filename
The name of the file where you need the contents saved to
bob.io.base.peek(filename)[source]

Returns the type of array (frame or sample) saved in the given file.

Effectively, this is the same as creating a bob.io.File object with the mode flag set to r (read-only) and returning bob.io.File.describe().

Parameters:

filename
The name of the file to peek information from
bob.io.base.peek_all(filename)[source]

Returns the type of array (for full readouts) saved in the given file.

Effectively, this is the same as creating a bob.io.File object with the mode flag set to r (read-only) and returning bob.io.File.describe(all=True).

Parameters:

filename
The name of the file to peek information from
bob.io.base.create_directories_save(directory, dryrun=False)[source]

Creates a directory if it does not exists, with concurrent access support. This function will also create any parent directories that might be required. If the dryrun option is selected, it does not actually create the directory, but just writes the (Linux) command that would have been executed.

Parameters:

directory
The directory that you want to create.
dryrun
Only write the command, but do not execute it.

C++ API Helpers

bob.io.base.get_include()[source]

Returns the directory containing the C/C++ API include directives

Table Of Contents

Previous topic

User Guide

Next topic

C++ API

This Page