Python API¶

This section includes information for using the pure Python API of bob.io.base.

Classes¶

class bob.io.base.File¶

Bases: object

File(filename, [mode=’r’, [pretend_extension=’‘]]) -> new bob::io::File

Use this object to read and write data into files.

Constructor parameters:

filename: [str] The file path to the file you want to open
mode: [str] A single character (one of 'r', 'w', 'a'), indicating if you’d like to read, write or append into the file. If you choose 'w' and the file already exists, it will be truncated.By default, the opening mode is read-only ('r').
pretend_extension: [str, optional] Normally we read the file matching the extension to one of the available codecs installed with the present release of Bob. If you set this parameter though, we will read the file as it had a given extension. The value should start with a '.'. For example '.hdf5', to make the file be treated like an HDF5 file.

append(array) → int¶

Adds the contents of an object to the file.

Parameters:

array: [array] The array to be added into the file. It can be a numpy, a bob.blitz.array or any other object which can be converted to either of them, as long as the number of dimensions and scalar type are supported by bob.blitz.array.

This method appends data to the file. If the file does not exist, creates a new file, else, makes sure that the inserted array respects the previously set file structure.

Returns the current position of the newly written array.

codec_name¶: Name of the File class implementation – available for compatibility reasons with the previous versions of this library.

describe([all]) → tuple¶

Returns a description (dtype, shape, stride) of data at the file.

Parameters:

all: [bool] If set, return the shape and strides for reading the whole file contents in one go.

filename¶: The path to the file being read/written

read([index]) → numpy.ndarray¶

Reads a specific object in the file, or the whole file.

Parameters:

index: [int|long, optional] The index to the object one wishes to retrieve from the file. Negative indexing is supported. If not given, impliess retrieval of the whole file contents.

This method reads data from the file. If you specified an index, it reads just the object indicated by the index, as you would do using the [] operator. If an index is not specified, reads the whole contents of the file into a numpy.ndarray.

write(array) → None¶

Writes the contents of an object to the file.

Parameters:

array: [array] The array to be written into the file. It can be a numpy, a bob.blitz.array or any other object which can be converted to either of them, as long as the number of dimensions and scalar type are supported by bob.blitz.array.

This method writes data to the file. It acts like the given array is the only piece of data that will ever be written to such a file. No more data appending may happen after a call to this method.

class bob.io.base.HDF5File¶

Bases: object

HDF5File(filename, [mode=’r’]) -> new bob::io::HDF5File

Reads and writes data to HDF5 files.

Constructor parameters:

filename: [str] The file path to the file you want to read from/write to
mode: [str, optional] The opening mode: Use 'r' for read-only, 'a' for read/write/append, 'w' for read/write/truncate or 'x' for (read/write/exclusive). This flag defaults to 'r'.

HDF5 stands for Hierarchical Data Format version 5. It is a flexible, binary file format that allows one to store and read data efficiently into files. It is a cross-platform, cross-architecture format.

Objects of this class allows users to read and write data from and to files in HDF5 format. For an introduction to HDF5, visit the HDF5 Website.

append(path, data[, compression=0]) → None¶

Appends a scalar or an array to a dataset

Parameters:

path: [str] The path to the dataset to read data from. Can be an absolute value (starting with a leading '/') or relative to the current working directory (cwd).
data: [scalar|numpy.ndarray] Object to append to the dataset. This value must be compatible with the typing information on the dataset, or an exception will be raised. You can also, optionally, set this to an iterable of scalars or arrays. This will cause this method to iterate over the elements and add each individually.
compression: This parameter is effective when appending arrays. Set this to a number betwen 0 (default) and 9 (maximum) to compress the contents of this dataset. This setting is only effective if the dataset does not yet exist, otherwise, the previous setting is respected.

cd(path) → None¶

Changes the current prefix path.

Parameters:

path: [str] The path to change directories to

When this object is started, the prefix path is empty, which means all following paths to data objects should be given using the full path. If you set this to a different value, it will be used as a prefix to any subsequent operation until you reset it. If path starts with '/', it is treated as an absolute path. '..' and '.' are supported. This object should be an str object. If the value is relative, it is added to the current path. If it is absolute, it causes the prefix to be reset. Note all operations taking a relative path, following a cd(), will be considered relative to the value defined by the cwd property of this object.

close() → None¶

Closes this file

This function closes the HDF5File after flushing all its contents to disk.After the HDF5File is closed, any operation on it will result in an exception.

copy(file) → None¶

Copies all accessible content to another HDF5 file

Parameters:

file: [HDF5File] The file (already opened), to copy the contents to. Unlinked contents of this file will not be copied. This can be used as a method to trim unwanted content in a file.

create_group(path) → None¶

Creates a new path (group) inside the file.

Parameters:

path: [str] The path to check

Creates a new directory (i.e., a group in HDF5 parlance) inside the file. A relative path is taken w.r.t. to the current directory. If the directory already exists (check it with HDF5File.has_group(), an exception will be raised.

cwd¶: The current working directory set on the file

del_attribute(name[, path='.']) → None¶

Removes a given attribute at the named resource.

Parameters:

name: [str] The name of the attribute to delete. A RuntimeError is raised if the attribute does not exist.
path: [str, optional] The path leading to the resource (dataset or group|directory) you would like to set an attribute at. If the path does not exist, a RuntimeError is raised.

del_attributes([attrs=None[, path='.']]) → None¶

Removes attributes in a given (existing) path

Parameters:

attrs: [list] An iterable containing the names of the attributes to be removed. If not given or set to None, then remove all attributes at the named resource.
path: [str, optional] The path leading to the resource (dataset or group|directory) you would like to set attributes at. If the path does not exist, a RuntimeError is raised.

describe(path) → tuple¶

Describes a dataset type/shape, if it exists inside a file

Parameters:

key: [str] The dataset path to describe

If a given path to an HDF5 dataset exists inside the file, return a type description of objects recorded in such a dataset, otherwise, raises an exception. The returned value type is a tuple of tuples (HDF5Type, number-of-objects, expandable) describing the capabilities if the file is read using theses formats.

filename¶: str <– The name (and path) of the underlying file on hard disk

flush() → None¶

Flushes the content of the HDF5 file to disk

When the HDF5File is open for writing, this function synchronizes the contents on the disk with the one from the file.When the file is open for reading, nothing happens.

get_attribute(name[, path='.']) → scalar|numpy.ndarray¶

Retrieve a given attribute from the named resource.

Parameters:

name: [str] The name of the attribute to retrieve. If the attribute is not available, a RuntimeError is raised.
path: [str, optional] The path leading to the resource (dataset or group|directory) you would like to get an attribute from. If the path does not exist, a RuntimeError is raised.

This method returns a single value corresponding to what is stored inside the attribute container for the given resource. If you would like to retrieve all attributes at once, use HDF5File.get_attributes() instead.

get_attributes([path='.']) → dict¶

All attributes of the given path organized in dictionary

Parameters:

path: [str, optional] The path leading to the resource (dataset or group|directory) you would like to get all attributes from. If the path does not exist, a RuntimeError is raised.

Attributes are returned in a dictionary in which each key corresponds to the attribute name and each value corresponds to the value stored inside the HDF5 file. To retrieve only a specific attribute, use HDF5File.get_attribute().

has_attribute(name[, path='.']) → bool¶

Checks existence of a given attribute at the named resource.

Parameters:

name: [str] The name of the attribute to check.
path: [str, optional] The path leading to the resource (dataset or group|directory) you would like to set an attribute at. If the path does not exist, a RuntimeError is raised.

has_dataset(key) → bool¶

Checks if a dataset exists inside a file

Parameters:

key: [str] The dataset path to check

Checks if a dataset exists inside a file, on the specified path. If the given path is relative, it is take w.r.t. to the current working directory.

has_group(path) → bool¶

Checks if a path (group) exists inside a file

Parameters:

path: [str] The path to check

Checks if a path (i.e. a group in HDF5 parlance) exists inside a file. This method does not work for datasets, only for directories. If the given path is relative, it is take w.r.t. to the current working directory.

has_key()¶

x.has_dataset(key) -> bool

Checks if a dataset exists inside a file

Parameters:

key: [str] The dataset path to check

Checks if a dataset exists inside a file, on the specified path. If the given path is relative, it is take w.r.t. to the current working directory.

keys()¶

x.paths([relative=False]) -> tuple

Lists datasets available inside this file

Parameters:

relative: [bool, optional] if set to True, the returned paths are relative to the current working directory, otherwise they are absolute.

Returns all paths to datasets available inside this file, stored under the current working directory. If relative is set to True, the returned paths are relative to the current working directory, otherwise they are absolute.

lread(key[, pos=-1]) → list|numpy.ndarray¶

Reads some contents of the dataset.

Parameters:

key: [str] The path to the dataset to read data from. Can be an absolute value (starting with a leading '/') or relative to the current working directory (cwd).
pos: [int, optional] Returns a single object if pos >= 0, otherwise a list by reading all objects in sequence.

This method reads contents from a dataset, treating the N-dimensional dataset like a container for multiple objects with N-1 dimensions. It returns a single numpy.ndarray in case pos is set to a value >= 0, or a list of arrays otherwise.

paths([relative=False]) → tuple¶

Lists datasets available inside this file

Parameters:

relative: [bool, optional] if set to True, the returned paths are relative to the current working directory, otherwise they are absolute.

Returns all paths to datasets available inside this file, stored under the current working directory. If relative is set to True, the returned paths are relative to the current working directory, otherwise they are absolute.

read(key[, pos=-1]) → numpy.ndarray¶

Reads whole datasets from the file.

Parameters:

key: [str] The path to the dataset to read data from. Can be an absolute value (starting with a leading '/') or relative to the current working directory (cwd).

rename(from, to) → None¶

Renames datasets in a file

Parameters:

from: [str] The path to the data being renamed
to: [str] The new name of the dataset

replace(path, pos, data) → None¶

Modifies the value of a scalar/array in a dataset.

Parameters:

key: [str] The path to the dataset to read data from. Can be an absolute value (starting with a leading '/') or relative to the current working directory (cwd).
pos: [int] Position, within the dataset, of the object to be replaced. The object position on the dataset must exist, or an exception is raised.
data: [scalar|numpy.ndarray] Object to replace the value with. This value must be compatible with the typing information on the dataset, or an exception will be raised.

set(path, data[, compression=0]) → None¶

Sets the scalar or array at position 0 to the given value.

Parameters:

path: [str] The path to the dataset to read data from. Can be an absolute value (starting with a leading '/') or relative to the current working directory (cwd).
data: [scalar|numpy.ndarray] Object to append to the dataset. This value must be compatible with the typing information on the dataset, or an exception will be raised. You can also, optionally, set this to an iterable of scalars or arrays. This will cause this method to iterate over the elements and add each individually.
compression: This parameter is effective when appending arrays. Set this to a number betwen 0 (default) and 9 (maximum) to compress the contents of this dataset. This setting is only effective if the dataset does not yet exist, otherwise, the previous setting is respected.

This method is equivalent to checking if the scalar or array at position 0 exists and then replacing it. If the path does not exist, we append the new scalar or array.

set_attribute(name, value[, path='.']) → None¶

Sets a given attribute at the named resource.

Parameters:

name: [str] The name of the attribute to set.
value: [scalar|numpy.ndarray] A simple scalar to set for the given attribute on the named resources (path). Only simple scalars (booleans, integers, floats and complex numbers) and arrays of those are supported at the time being. You can use numpy scalars to set values with arbitrary precision (e.g. numpy.uint8).
path: [str, optional] The path leading to the resource (dataset or group|directory) you would like to set an attribute at.

Warning

Attributes in HDF5 files are supposed to be small containers or simple scalars that provide extra information about the data stored on the main resource (dataset or group|directory). Attributes cannot be retrieved in chunks, contrary to data in datasets.

Currently, no limitations for the size of values stored on attributes is imposed.

set_attributes(attrs[, path='.']) → None¶

Sets attributes in a given (existing) path using a dictionary

Parameters:

attrs: [dict] A python dictionary containing pairs of strings and values. Each value in the dictionary should be simple scalars (booleans, integers, floats and complex numbers) or arrays of those are supported at the time being. You can use numpy scalars to set values with arbitrary precision (e.g. numpy.uint8).
path: [str, optional] The path leading to the resource (dataset or group|directory) you would like to set attributes at.

Warning

Attributes in HDF5 files are supposed to be small containers or simple scalars that provide extra information about the data stored on the main resource (dataset or group|directory). Attributes cannot be retrieved in chunks, contrary to data in datasets.

Currently, no limitations for the size of values stored on attributes is imposed.

sub_groups([relative=False[, recursive=True]]) → tuple¶

Lists groups (directories) in the current file.

Parameters:

relative: [bool, optional] if set to True, the returned sub-groups are relative to the current working directory, otherwise they are absolute.
recursive: [bool, optional] if set to False, the returned sub-groups are only the ones in the current directory. Otherwise, recurse down the directory structure.

unlink(key) → None¶

Unlinks datasets inside the file making them invisible.

Parameters:

key: [str] The dataset path to describe

If a given path to an HDF5 dataset exists inside the file, unlinks it. Please note this will note remove the data from the file, just make it inaccessible. If you wish to cleanup, save the reacheable objects from this file to another HDF5File object using copy(), for example.

writable¶: bool <– Has this file been opened in writable mode?

Functions¶

bob.io.base.load(inputs)[source]¶

Loads the contents of a file, an iterable of files, or an iterable of bob.io.File‘s into a numpy.ndarray.

Parameters:

inputs

This might represent several different entities:

The name of a file (full path) from where to load the data. In this case, this assumes that the file contains an array and returns a loaded numpy ndarray.

An iterable of filenames to be loaded in memory. In this case, this would assume that each file contains a single 1D sample or a set of 1D samples, load them in memory and concatenate them into a single and returned 2D numpy ndarray.

An iterable of bob.io.File. In this case, this would assume that each bob.io.File contains a single 1D sample or a set of 1D samples, load them in memory if required and concatenate them into a single and returned 2D numpy ndarray.

An iterable with mixed filenames and bob.io.File. In this case, this would returned a 2D numpy.ndarray, as described by points 2 and 3 above.

bob.io.base.merge(filenames)[source]¶

Converts an iterable of filenames into an iterable over read-only bob.io.File’s.

Parameters:

filenames

This might represent:

A single filename. In this case, an iterable with a single bob.io.File is returned.

An iterable of filenames to be converted into an iterable of bob.io.File‘s.

bob.io.base.save(array, filename, create_directories=False)[source]¶

Saves the contents of an array-like object to file.

Effectively, this is the same as creating a bob.io.File object with the mode flag set to w (write with truncation) and calling bob.io.File.write() passing array as parameter.

Parameters:

array: The array-like object to be saved on the file
filename: The name of the file where you need the contents saved to
create_directories: Automatically generate the directories if required

bob.io.base.append(array, filename)[source]¶

Appends the contents of an array-like object to file.

Effectively, this is the same as creating a bob.io.File object with the mode flag set to a (append) and calling bob.io.File.append() passing array as parameter.

Parameters:

array: The array-like object to be saved on the file
filename: The name of the file where you need the contents saved to

bob.io.base.peek(filename)[source]¶

Returns the type of array (frame or sample) saved in the given file.

Effectively, this is the same as creating a bob.io.File object with the mode flag set to r (read-only) and returning bob.io.File.describe().

Parameters:

filename: The name of the file to peek information from

bob.io.base.peek_all(filename)[source]¶

Returns the type of array (for full readouts) saved in the given file.

Effectively, this is the same as creating a bob.io.File object with the mode flag set to r (read-only) and returning bob.io.File.describe(all=True).

Parameters:

filename: The name of the file to peek information from

bob.io.base.create_directories_save(directory, dryrun=False)[source]¶

Creates a directory if it does not exists, with concurrent access support. This function will also create any parent directories that might be required. If the dryrun option is selected, it does not actually create the directory, but just writes the (Linux) command that would have been executed.

Parameters:

directory: The directory that you want to create.
dryrun: Only write the command, but do not execute it.

C++ API Helpers¶

bob.io.base.get_include()[source]¶: Returns the directory containing the C/C++ API include directives

Python API¶

Classes¶

Functions¶

C++ API Helpers¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

Python API¶

Classes¶

Functions¶

C++ API Helpers¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation