This section includes information for using the pure Python API of bob.learn.linear.
Bases: object
This machine is designed to classify image difference vectors to be either intrapersonal or extrapersonal
There are two possible implementations of the BIC:
What kind of machine is used is dependent on the way, this class is trained via the bob.learn.linear.BICTrainer.
[Teixeira2003] | (1, 2, 3) Marcio Luis Teixeira. The Bayesian intrapersonal/extrapersonal classifier, Colorado State University, 2003. |
[Guenther2009] | (1, 2) Manuel Guenther and Rolf P. Wuertz. Face detection and recognition using maximum likelihood classifiers on Gabor graphs, International Journal of Pattern Recognition and Artificial Intelligence, 23(3):433-461, 2009. |
Constructor Documentation:
- bob.learn.linear.BICMachine ([use_DFFS])
- bob.learn.linear.BICMachine (bic)
- bob.learn.linear.BICMachine (hdf5)
Creates a BIC Machine
Parameters:
use_DFFS : bool
[default: False] Use the Distance From Feature Space measure as described in [Teixeira2003]bic : bob.learn.linear.BICMachine
Another BICMachine to copyhdf5 : bob.io.base.HD5File
An HDF5 file open for reading to load the Gabor jet from
Class Members:
Computes the BIC or IEC score for the given input vector, which results of a comparison vector of two (facial) images
The resulting value is returned as a single float value. The score itself is the log-likelihood score of the given input vector belonging to the intrapersonal class.
Note
the __call__() function is an alias for this function
Parameters:
input : array_like (float, 1D)
The input vector, which is the result of comparing to (facial) images
Returns:
score : float
The log-likelihood that the given input belongs to the intrapersonal class
Compares this BICMachine with the other one to be approximately the same
The optional values r_epsilon and a_epsilon refer to the relative and absolute precision, similarly to numpy.allclose().
Parameters:
other : bob.learn.linear.BICMachine
The other BICMachine to compare with
r_epsilon : float
[Default: 1e-5] The relative precision
a_epsilon : float
[Default: 1e-8] The absolute precision
Loads the BIC machine from the given HDF5 file
Parameters:
hdf5 : bob.io.base.HDF5File
An HDF5 file opened for reading
Saves the BIC machine to the given HDF5 file
Parameters:
hdf5 : bob.io.base.HDF5File
An HDF5 file open for writing
bool <– Use the Distance From Feature Space during forwarding?
Bases: object
A trainer for a bob.learn.linear.BICMachine
It trains either a BIC model (including projection matrix and eigenvalues) [Teixeira2003] or an IEC model (containing mean and variance only) [Guenther2009]. See bob.learn.linear.BICMachine for more details.
Constructor Documentation:
- bob.learn.linear.BICTrainer ()
- bob.learn.linear.BICTrainer (intra_dim, extra_dim)
Creates a BIC Trainer
There are two ways of creating a BIC trainer. When you specify the intra_dim and extra_dim subspaces, a BIC model will be created, otherwise an IEC model is created.
Parameters:
intra_dim : int
The subspace dimensionality of the intrapersonal classextra_dim : int
The subspace dimensionality of the extrapersonal class
Class Members:
Trains the given machine to classify intrapersonal (image) difference vectors vs. extrapersonal ones
The given difference vectors might be the result of any (image) comparison function, e.g., the pixel difference of two images. In any case, all distance vectors must have the same length
Parameters:
intra_differences : array_like (float, 2D)
The input vectors, which are the result of intrapersonal (facial image) comparisons, in shape (#features, length)
extra_differences : array_like (float, 2D)
The input vectors, which are the result of extrapersonal (facial image) comparisons, in shape (#features, length)
machine : bob.lear.linear.BICMachine
The machine to be trained
Returns:
machine : bob.lear.linear.BICMachine
A newly generated and trained BIC machine, where the bob.lear.linear.BICMachine.use_DFFS flag is set to False
Bases: object
CGLogRegTrainer(other) -> new CGLogRegTrainer
Trains a linear machine to perform Linear Logistic Regression.
There are two initializers for objects of this class. In the first variant, the user passes the discrete training parameters, including the classes prior, convergence threshold and the maximum number of conjugate gradient (CG) iterations among other parameters. The second initialization form copy constructs a new trainer from an existing one.
The training stage will place the resulting weights (and bias) in a linear machine with a single output dimension. If the parameter mean_std_norm is set to True, then your input data will be mean/standard-deviation normalized and the according values will be set as normalization factors to the resulting machine.
Keyword parameters:
References:
The convergence threshold for the conjugate gradient algorithm
The regularization factor lambda. If you set this to the value of 0.0 (the default), then the algorithm will apply no regularization whatsoever.
The maximum number of iterations for the conjugate gradient algorithm
Performs mean and standard-deviation normalization (whitening) of the input data before training the (resulting) Machine. Setting this to True is recommended for large data sets with significant amplitude variations between dimensions.
The synthetic prior (should be in range ]0.,1.[).
Trains a linear machine to perform linear logistic regression.
The resulting machine will have the same number of inputs as columns in negatives and positives and a single output.
Keyword parameters:
This method always returns a machine, which will be the same as the one provided (if the user passed one) or a new one allocated internally.
Bases: object
FisherLDATrainer([use_pinv=False [, strip_to_rank=True]]) -> new FisherLDATrainer FisherLDATrainer(other) -> new FisherLDATrainer
Trains a bob.learn.linear.Machine to perform Fisher’s Linear Discriminant Analysis (LDA).
Objects of this class can be initialized in two ways. In the first variant, the user creates a new trainer from discrete flags indicating a couple of optional parameters:
use_pinv (bool) - defaults to False
If set to True, use the pseudo-inverse to calculate S_w^{-1} S_b and then perform eigen value decomposition (using LAPACK’s dgeev) instead of using (the more numerically stable) LAPACK’s dsyvgd to solve the generalized symmetric-definite eigenproblem of the form S_b v=(\lambda) S_w v.
Note
Using the pseudo-inverse for LDA is only recommended if you cannot make it work using the default method (via dsyvg). It is slower and requires more machine memory to store partial values of the pseudo-inverse and the dot product S_w^{-1} S_b.
strip_to_rank (bool) - defaults to True
Specifies how to calculate the final size of the to-be-trained bob.learn.linear.Machine. The default setting (True), makes the trainer return only the K-1 eigen-values/vectors limiting the output to the rank of S_w^{-1} S_b. If you set this value to False, the it returns all eigen-values/vectors of S_w^{-1} Sb, including the ones that are supposed to be zero.
The second initialization variant allows the user to deep copy an object of the same type creating a new identical object.
LDA finds the projection matrix W that allows us to linearly project the data matrix X to another (sub) space in which the between-class and within-class variances are jointly optimized: the between-class variance is maximized while the with-class is minimized. The (inverse) cost function for this criteria can be posed as the following:
J(W) = \frac{W^T S_b W}{W^T S_w W}
where:
W
the transformation matrix that converts X into the LD space
S_b
the between-class scatter; it has dimensions (X.shape[0], X.shape[0]) and is defined as S_b = \sum_{k=1}^K N_k (m_k-m)(m_k-m)^T, with K equal to the number of classes.
S_w
the within-class scatter; it also has dimensions (X.shape[0], X.shape[0]) and is defined as S_w = \sum_{k=1}^K \sum_{n \in C_k} (x_n-m_k)(x_n-m_k)^T, with K equal to the number of classes and C_k a set representing all samples for class k.
m_k
the class k empirical mean, defined as m_k = \frac{1}{N_k}\sum_{n \in C_k} x_n
m
the overall set empirical mean, defined as m = \frac{1}{N}\sum_{n=1}^N x_n = \frac{1}{N}\sum_{k=1}^K N_k m_k
Note
A scatter matrix equals the covariance matrix if we remove the division factor.
Because this cost function is convex, you can just find its maximum by solving dJ/dW = 0. This problem can be re-formulated as finding the eigen values (\lambda_i) that solve the following condition:
S_b &= \lambda_i Sw \text{ or} \\ (Sb - \lambda_i Sw) &= 0
The respective eigen vectors that correspond to the eigen values \lambda_i form W.
Compares this FisherLDATrainer with the other one to be approximately the same.
The optional values r_epsilon and a_epsilon refer to the relative and absolute precision for the weights, biases and any other values internal to this machine.
Returns the expected size of the output (or the number of eigen-values returned) given the data.
This number could be either K-1 (where K is number of classes) or the number of columns (features) in X, depending on the setting of strip_to_rank.
This method should be used to setup linear machines and input vectors prior to feeding them into this trainer.
The value of X should be a sequence over as many 2D 64-bit floating point number arrays as classes in the problem. All arrays will be checked for conformance (identical number of columns). To accomplish this, either prepare a list with all your class observations organised in 2D arrays or pass a 3D array in which the first dimension (depth) contains as many elements as classes you want to discriminate.
If the use_svd flag is enabled, this flag will indicate which LAPACK SVD function to use (dgesvd if set to True, dgesdd otherwise). By default, this flag is set to False upon construction, which makes this trainer use the fastest possible SVD decomposition.
Trains a given machine to perform Fisher/LDA discrimination.
After this method has been called, an input machine (or one allocated internally) will have the eigen-vectors of the S_w^{-1} S_b product, arranged by decreasing energy. Each input data set represents data from a given input class. This method also returns the eigen values allowing you to implement your own compression scheme.
The user may provide or not an object of type bob.learn.linear.Machine that will be set by this method. If provided, machine should have the correct number of inputs and outputs matching, respectively, the number of columns in the input data arrays X and the output of the method bob.learn.linear.FisherLDATrainer.output_size() (see help).
The value of X should be a sequence over as many 2D 64-bit floating point number arrays as classes in the problem. All arrays will be checked for conformance (identical number of columns). To accomplish this, either prepare a list with all your class observations organised in 2D arrays or pass a 3D array in which the first dimension (depth) contains as many elements as classes you want to discriminate.
Note
We set at most bob.learn.linear.FisherLDATrainer.output_size() eigen-values and vectors on the passed machine. You can compress the machine output further using bob.learn.linear.Machine.resize() if necessary.
If True, use the pseudo-inverse to calculate S_w^{-1} S_b and then perform the eigen value decomposition (using LAPACK’s dgeev) instead of using (the more numerically stable) LAPACK’s dsyvgd to solve the generalized symmetric-definite eigenproblem of the form S_b v=(\lambda) S_w v
Bases: object
Machine([input_size=0, [output_size=0]]) Machine(weights) Machine(config) Machine(other)
A linear classifier. See C. M. Bishop, ‘Pattern Recognition and Machine Learning’, chapter 4 for more details. The basic matrix operation performed for projecting the input to the output is: o = w \times i (with w being the vector of machine weights and i the input data vector). The weights matrix is therefore organized column-wise. In this scheme, each column of the weights matrix can be interpreted as vector to which the input is projected. The number of columns of the weights matrix determines the number of outputs this linear machine will have. The number of rows, the number of allowed inputs it can process.
Input and output is always performed on 1D arrays with 64-bit floating point numbers.
A linear machine can be constructed in different ways. In the first form, the user specifies optional input and output vector sizes. The machine is remains uninitialized. With the second form, the user passes a 2D array with 64-bit floats containing weight matrix to be used by the new machine. In the third form the user passes a pre-opened HDF5 file pointing to the machine information to be loaded in memory. Finally, in the last form (copy constructor), the user passes another Machine that will be fully copied.
The activation function - by default, the identity function. The output provided by the activation function is passed, unchanged, to the user.
Bias to the output units of this linear machine, to be added to the output before activation.
Projects input through its internal weights and biases. If output is provided, place output there instead of allocating a new array.
The input (and output) arrays can be either 1D or 2D 64-bit float arrays. If one provides a 1D array, the output array, if provided, should also be 1D, matching the output size of this machine. If one provides a 2D array, it is considered a set of vertically stacked 1D arrays (one input per row) and a 2D array is produced or expected in output. The output array in this case shall have the same number of rows as the input array and as many columns as the output size for this machine.
Note
This method only accepts 64-bit float arrays as input or output.
Input division factor, before feeding data through the weight matrix W. The division is applied just after subtraction - by default, it is set to 1.0.
Input subtraction factor, before feeding data through the weight matrix W. The subtraction is the first applied operation in the processing chain - by default, it is set to 0.0.
Compares this LinearMachine with the other one to be approximately the same.
The optional values r_epsilon and a_epsilon refer to the relative and absolute precision for the weights, biases and any other values internal to this machine.
Loads itself from a bob.io.base.HDF5File
Resizes the machine. If either the input or output increases in size, the weights and other factors should be considered uninitialized. If the size is preserved or reduced, already initialized values will not be changed.
Note
Use this method to force data compression. All will work out given most relevant factors to be preserved are organized on the top of the weight matrix. In this way, reducing the system size will supress less relevant projections.
Saves itself at a bob.io.base.HDF5File
A tuple that represents the size of the input vector followed by the size of the output vector in the format (input, output).
Weight matrix to which the input is projected to. The output of the project is fed subject to bias and activation before being output.
Bases: object
PCATrainer([use_svd=True]) -> new PCATrainer
PCATrainer(other) -> new PCATrainer
Sets a linear machine to perform the Principal Component Analysis (a.k.a. Karhunen-Loeve Transform) on a given dataset using either Singular Value Decomposition (SVD, the default) or the Covariance Matrix Method.
The training stage will place the resulting principal components in the linear machine and set it up to extract the variable means automatically. As an option, you may preset the trainer so that the normalization performed by the resulting linear machine also divides the variables by the standard deviation of each variable ensemble.
There are two initializers for objects of this class. In the first variant, the user can pass a flag indicating if the trainer should use SVD (default) or the covariance method for PCA extraction. The second initialization form copy constructs a new trainer from an existing one.
The principal components correspond the direction of the data in which its points are maximally spread.
Computing these principal components is equivalent to computing the eigen vectors U for the covariance matrix Sigma extracted from the data matrix X. The covariance matrix for the data is computed using the equation below:
\Sigma &= \frac{((X-\mu_X)^T(X-\mu_X))}{m-1} \text{ with}\\ \mu_X &= \sum_i^N x_i
where m is the number of rows in X (that is, the number of samples).
Once you are in possession of \Sigma, it suffices to compute the eigen vectors U, solving the linear equation:
(\Sigma - e I) U = 0
In this trainer, we make use of LAPACK’s dsyevd to solve the above equation, if you choose to use the Covariance Method for extracting the principal components of your data matrix X.
By default though, this class will perform PC extraction using SVD. SVD is a factorization technique that allows for the decomposition of a matrix X, with size (m,n) into 3 other matrices in this way:
X = U S V^*
where:
We can use this property to avoid the computation of the covariance matrix of the data matrix X, if we note the following:
X &= U S V^* \text{ , so} \\ XX^T &= U S V^* V S U^*\\ XX^T &= U S^2 U^*
If X has zero mean, we can conclude by inspection that the U matrix obtained by SVD contains the eigen vectors of the covariance matrix of X (XX^T) and S^2/(m-1) corresponds to its eigen values.
Note
Our implementation uses LAPACK’s dgesdd to compute the solution to this linear equation.
The corresponding bob.learn.linear.Machine and returned eigen-values of \Sigma, are pre-sorted in descending order (the first eigen-vector - or column - of the weight matrix in the bob.learn.linear.Machine corresponds to the highest eigen value obtained).
Note
One question you should pose yourself is which of the methods to choose. Here is some advice: you should prefer the covariance method over SVD when the number of samples (rows of X) is greater than the number of features (columns of X). It provides a faster execution path in that case. Otherwise, use the default SVD method.
References:
Compares this PCATrainer with the other one to be approximately the same.
The optional values r_epsilon and a_epsilon refer to the relative and absolute precision for the weights, biases and any other values internal to this machine.
Calculates the maximum possible rank for the covariance matrix of X, given X.
Returns the maximum number of non-zero eigen values that can be generated by this trainer, given some data. This number (K) depends on the size of X and is calculated as follows K=\min{(S-1,F)}, with S being the number of rows in data (samples) and F the number of columns (or features).
This method should be used to setup linear machines and input vectors prior to feeding them into this trainer.
If the use_svd flag is enabled, this flag will indicate which LAPACK SVD function to use (dgesvd if set to True, dgesdd otherwise). By default, this flag is set to False upon construction, which makes this trainer use the fastest possible SVD decomposition.
Trains a linear machine to perform the KLT.
The resulting machine will have the same number of inputs as columns in X and K eigen-vectors, where K=\min{(S-1,F)}, with S being the number of rows in X (samples) and F the number of columns (or features). The vectors are arranged by decreasing eigen-value automatically. You don’t need to sort the results.
The user may provide or not an object of type bob.learn.linear.Machine that will be set by this method. If provided, machine should have the correct number of inputs and outputs matching, respectively, the number of columns in the input data array X and the output of the method bob.learn.linear.PCATrainer.output_size() (see help).
The input data matrix X should correspond to a 64-bit floating point array organized in such a way that every row corresponds to a new observation of the phenomena (i.e., a new sample) and every column corresponds to a different feature.
This method returns a tuple consisting of the trained machine and a 1D 64-bit floating point array containing the eigen-values calculated while computing the KLT. The eigen-value ordering matches that of eigen-vectors set in the machine.
This flag determines if this trainer will use the SVD method (set it to True) to calculate the principal components or the Covariance method (set it to False).
Bases: object
WCCNTrainer() -> new WCCNTrainer
Trains a linear machine to perform Within-Class Covariance Normalisation (WCCN).
WCCN finds the projection matrix W that allows us to linearly project the data matrix X to another (sub) space such that:
(1/N) S_{w} = W W^T
where W is an upper triangular matrix computed using Cholesky Decomposition:
W = cholesky([(1/K) S_{w} ]^{-1})
where:
K
the number of classes
S_w
the within-class scatter; it also has dimensions (X.shape[0], X.shape[0]) and is defined as S_w = \sum_{k=1}^K \sum_{n \in C_k} (x_n-m_k)(x_n-m_k)^T, C_k a set representing all samples for class k.
m_k
the class k empirical mean, defined as m_k = \frac{1}{N_k}\sum_{n \in C_k} x_n
References:
Compares this WCCNTrainer with the other one to be approximately the same.
The optional values r_epsilon and a_epsilon refer to the relative and absolute precision for the weights, biases and any other values internal to this machine.
Trains a linear machine using WCCN.
The resulting machine will have the same number of inputs and outputs as columns in any of X‘s matrices.
The user may provide or not an object of type bob.learn.linear.Machine that will be set by this method. In such a case, the machine should have a shape that matches (X.shape[1], X.shape[1]). If the user does not provide a machine to be set, then a new one will be allocated internally. In both cases, the resulting machine is always returned by this method.
The value of X should be a sequence over as many 2D 64-bit floating point number arrays as classes in the problem. All arrays will be checked for conformance (identical number of columns). To accomplish this, either prepare a list with all your class observations organised in 2D arrays or pass a 3D array in which the first dimension (depth) contains as many elements as classes you want to train for.
Bases: object
WhiteningTrainer() -> new WhiteningTrainer
Trains a linear machine` to perform Cholesky Whitening.
The whitening transformation is a decorrelation method that converts the covariance matrix of a set of samples into the identity matrix I. This effectively linearly transforms random variables such that the resulting variables are uncorrelated and have the same variances as the original random variables. This transformation is invertible. The method is called the whitening transform because it transforms the input matrix X closer towards white noise (let’s call it \tilde{X}):
Cov(\tilde{X}) = I
where:
\tilde{X} = X W
W is the projection matrix that allows us to linearly project the data matrix X to another (sub) space such that:
Cov(X) = W W^T
W is computed using Cholesky Decomposition:
W = cholesky([Cov(X)]^{-1})
References:
Compares this WhiteningTrainer with the other one to be approximately the same.
The optional values r_epsilon and a_epsilon refer to the relative and absolute precision for the weights, biases and any other values internal to this machine.
The resulting machine will have the same number of inputs and outputs as columns in X.
The user may provide or not an object of type bob.learn.linear.Machine that will be set by this method. In such a case, the machine should have a shape that matches (X.shape[1], X.shape[1]). If the user does not provide a machine to be set, then a new one will be allocated internally. In both cases, the resulting machine is always returned by this method.
The input data matrix X should correspond to a 64-bit floating point 2D array organized in such a way that every row corresponds to a new observation of the phenomena (i.e., a new sample) and every column corresponds to a different feature.