Python API¶
This section includes information for using the pure Python API of
bob.learn.libsvm
.
-
class
bob.learn.libsvm.
File
(path)¶ Bases:
object
Loads a given LIBSVM data file. The data file format, as defined on the library README is like this:
<label> <index1>:<value1> <index2>:<value2> ... <label> <index1>:<value1> <index2>:<value2> ... <label> <index1>:<value1> <index2>:<value2> ... ...
The labels are integer values, so are the indexes, starting from
1
(and not from zero as a C-programmer would expect). The values are floating point. Zero values are suppressed - LIBSVM uses a sparse format.Upon construction, objects of this class will inspect the input file so that the maximum sample size is computed. Once that job is performed, you can read the data in your own pace using the
read()
method.This class is made available to you so you can input original LIBSVM files and convert them to another better supported representation. You cannot, from this object, save data or extend the current set.
-
eof
() → bool¶ Returns
True
if the file has reached its end. To start reading from the file again, you must callreset()
before another read operation may succeed.
-
fail
() → bool¶ Returns
True
if the file has afail
condition orbad
bit sets. It means the read operation has found a critical condition and you can no longer proceed in reading from the file. Note this is not the same aseof()
which informs if the file has ended, but no errors were found during the read operations.
-
filename
¶ The name of the file being read
-
good
() → bool¶ Returns if the file is in a good state for readout. It is
True
if the current file it has neither theeof
,fail
orbad
bits set, whic means that nextread()
operation may succeed.
-
read
([values]) -> (int, array)¶ Reads a single line from the file and returns a tuple containing the label and a numpy array of
float64
elements. Thenumpy.ndarray
has a shape as defined by theshape
attribute of the current file. If the file has finished, this method returnsNone
instead.If the output array
values
is provided, it must be a 64-bit float array with a shape matching the file shape as defined byshape
. Providing an output array avoids constant memory re-allocation.
-
read_all
([labels, [values]) -> (array, array)¶ Reads all contents of the file into the output arrays
labels
(used for storing each entry’s label) andvalues
(used to store each entry’s features). The arraylabels
, if provided, must be a 1Dnumpy.ndarray
with data typeint64
, containing as many positions as entries in the file, as returned by the attributesamples
. The arrayvalues
, if provided, must be a 2D array with data typefloat64
, as many rows as entries in the file and as many columns as features in each entry, as defined by the attributeshape
.If the output arrays
labels
and/orvalues
are not provided, they will be allocated internally and returned.Note
This method is intended to be used for reading the whole contents of the input file. The file will be reset as by calling
reset()
before the readout starts.
-
reset
() → None¶ Resets the current file so it starts reading from the begin once more.
-
samples
¶ The number of samples in the file
-
shape
¶ The size of each sample in the file, as tuple with a single entry
-
-
class
bob.learn.libsvm.
Machine
(path)¶ Bases:
object
Machine(hdf5file)
This class can load and run an SVM generated by libsvm. Libsvm is a simple, easy-to-use, and efficient software for SVM classification and regression. It solves C-SVM classification, nu-SVM classification, one-class-SVM, epsilon-SVM regression, and nu-SVM regression. It also provides an automatic model selection tool for C-SVM classification. More information about libsvm can be found on its website. In particular, this class covers most of the functionality provided by the command-line utility svm-predict.
Input and output is always performed on 1D or 2D arrays with 64-bit floating point numbers.
This machine can be initialized in two ways: the first is using an original SVM text file as produced by
libsvm
. The second option is to pass a pre-opened HDF5 file pointing to the machine information to be loaded in memory.Using the first constructor, we build a new machine from a libsvm model file. When you load using the libsvm model loader, note that the scaling parameters will be set to defaults (subtraction of 0.0 and division by 1.0). If you need scaling to be applied, set it individually using the appropriate methods on the returned object.
Using the second constructor, we build a new machine from an HDF5 file containing not only the machine support vectors, but also the scaling factors. Using this constructor assures a 100% state recovery from previous sessions.
-
coef0
¶ The coefficient 0 for
'POLY'
(polynomial) or'SIGMOIDAL'
(sigmoidal) kernels
-
degree
¶ The polinomial degree, only valid if the kernel is
'POLY'
(polynomial)
-
forward
(input[, output]) → array¶ o.predict_class(input, [output]) -> array
o(input, [output]) -> array
Calculates the predicted class using this Machine, given one single feature vector or multiple ones.
The
input
array can be either 1D or 2D 64-bit float arrays. Theoutput
array, if provided, must be of typeint64
, always uni-dimensional. The output corresponds to the predicted classes for each of the input rows.Note
This method only accepts 64-bit float arrays as input and 64-bit integers as output.
-
gamma
¶ The \(\gamma\) parameter for
'POLY'
(polynomial),'RBF'
(gaussian) or'SIGMOID'
(sigmoidal) kernels
-
input_divide
¶ Input division factor, before feeding data through the weight matrix W. The division is applied just after subtraction - by default, it is set to 1.0.
-
input_subtract
¶ Input subtraction factor, before feeding data through the weight matrix W. The subtraction is the first applied operation in the processing chain - by default, it is set to 0.0.
-
kernel_type
¶ The type of kernel used by the support vectors in this machine
-
labels
¶ The class labels this machine will output
-
machine_type
¶ The type of SVM machine contained
-
n_support_vectors
¶ Will output the number of support vectors per class
-
predict_class
()¶ o.forward(input, [output]) -> array
o.predict_class(input, [output]) -> array
o(input, [output]) -> array
Calculates the predicted class using this Machine, given one single feature vector or multiple ones.
The
input
array can be either 1D or 2D 64-bit float arrays. Theoutput
array, if provided, must be of typeint64
, always uni-dimensional. The output corresponds to the predicted classes for each of the input rows.Note
This method only accepts 64-bit float arrays as input and 64-bit integers as output.
-
predict_class_and_probabilities
(input, [cls, [prob]]) -> (array, array)¶ Calculates the predicted class and output probabilities for the SVM using the this Machine, given one single feature vector or multiple ones.
The
input
array can be either 1D or 2D 64-bit float arrays. Thecls
array, if provided, must be of typeint64
, always uni-dimensional. Thecls
output corresponds to the predicted classes for each of the input rows. Theprob
array, if provided, must be of typefloat64
(likeinput
) and have as many rows asinput
andlen(o.labels)
columns, matching the number of classes for this SVM.This method always returns a tuple composed of the predicted classes for each row in the
input
array, with data typeint64
and of probabilities for each output of the SVM in a 1D or 2Dfloat64
array. If you don’t provide the arrays upon calling this method, we will allocate new ones internally and return them. If you are calling this method on a tight loop, it is recommended you pass thecls
andprob
arrays to avoid constant re-allocation.
-
predict_class_and_scores
(input, [cls, [score]]) -> (array, array)¶ Calculates the predicted class and output scores for the SVM using the this Machine, given one single feature vector or multiple ones.
The
input
array can be either 1D or 2D 64-bit float arrays. Thecls
array, if provided, must be of typeint64
, always uni-dimensional. Thecls
output corresponds to the predicted classes for each of the input rows. Thescore
array, if provided, must be of typefloat64
(likeinput
) and have as many rows asinput
andC
columns, matching the number of combinations of the outputs 2-by-2. To score, LIBSVM will compare the SV outputs for each set two classes in the machine and output 1 score. If there is only 1 output, then the problem is binary and only 1 score is produced (C = 1
). If the SVM is multi-class, then the number of combinationsC
is the total amount of output combinations which is possible. IfN
is the number of classes in this SVM, then \(C = N\cdot(N-1)/2\). IfN = 3
, thenC = 3
. IfN = 5
, thenC = 10
.This method always returns a tuple composed of the predicted classes for each row in the
input
array, with data typeint64
and of scores for each output of the SVM in a 1D or 2Dfloat64
array. If you don’t provide the arrays upon calling this method, we will allocate new ones internally and return them. If you are calling this method on a tight loop, it is recommended you pass thecls
andscore
arrays to avoid constant re-allocation.
-
probability
¶ Set to
True
if this machine supports probability outputs
-
save
(path) → None¶ o.save(hdf5file) -> None
Saves itself at a LIBSVM model file or into a
bob.io.base.HDF5File
. Saving the SVM into anbob.io.base.HDF5File
object, has the advantage of saving input normalization options together with the machine, which are automatically reloaded when you re-initialize it from the samebob.io.base.HDF5File
.
-
shape
¶ A tuple that represents the size of the input vector followed by the size of the output vector in the format
(input, output)
.
-
-
class
bob.learn.libsvm.
Trainer
([machine_type='C_SVC'[, kernel_type='RBF'[, cache_size=100[, stop_epsilon=1e-3[, shrinking=True[, probability=False]]]]]]) → new Trainer¶ Bases:
object
This class emulates the behavior of the command line utility called
svm-train
, from LIBSVM. It allows you to create a parameterized LIBSVM trainer to fullfil a variety of needs and configurations. The constructor includes parameters which are global to all machine and kernel types. Specific parameters for specific machines or kernel types can be fine-tuned using object attributes (see help documentation).Parameters:
- machine_type, str
The type of SVM to be trained. Valid options are:
'C_SVC'
(the default)'NU_SVC'
'ONE_CLASS'
'EPSILON_SVR'
(unsupported regression)'NU_SVR'
(unsupported regression)
- kernel_type, str
The type of kernel to deploy on this machine. Valid options are:
'LINEAR'
, for a linear kernel'POLY'
, for a polynomial kernel'RBF'
, for a radial-basis function kernel'SIGMOID'
, for a sigmoidal kernel'PRECOMPUTED'
, for a precomputed, user provided kernel please note this option is currently unsupported.
- cache_size, float
- The size of LIBSVM’s internal cache, in megabytes
- stop_epsilon, float
- The epsilon value for the training stopping criterion
- shrinking, bool
- If set to
True
(the default), the applies LIBSVM’s shrinking heuristic. - probability, bool
- If set to
True
, then allows the outcoming machine produced by this trainer to output probabilities besides scores and class estimates. The default for this option isFalse
.
Note
These bindings do not support:
- Precomputed Kernels
- Regression Problems
- Different weights for every label (-wi option in svm-train)
Fell free to implement those and remove these remarks.
-
cache_size
¶ Internal cache size to be used by LIBSVM (in megabytes)
-
coef0
¶ The coefficient 0 for
'POLY'
(polynomial) or'SIGMOID'
(sigmoidal) kernels
-
cost
¶ The cost value for
C_SVC
,EPSILON_SVR
orNU_SVR
. This parameter is normally referred only as \(C\) on literature. It should be a non-negative floating-point number.
-
degree
¶ The polinomial degree, only used if the kernel is
'POLY'
(polynomial)
-
gamma
¶ The \(\gamma\) parameter for
'POLY'
(polynomial),'RBF'
(gaussian) or'SIGMOID'
(sigmoidal) kernels
-
kernel_type
¶ The type of kernel used by the support vectors in this machine
-
loss_epsilon_svr
¶ For
EPSILON_SVR
, this is the \(\epsilon\) value on the equation
-
machine_type
¶ The type of SVM machine that will be trained
-
nu
¶ The nu value for
NU_SVC
,ONE_CLASS
orNU_SVR
. This parameter should live in the range [0, 1].
-
probability
¶ If set to
True
, output Machines will support outputting probability estimates
-
shrinking
¶ If set to
True
, then use LIBSVM’s Shrinking Heuristics
-
stop_epsilon
¶ The epsilon used for stop training
-
train
(data[, subtract, divide]) → array¶ Trains a new machine for multi-class classification. If the number of classes in data is 2, then the assigned labels will be +1 and -1, in that order. If the number of classes is greater than 2, labels are picked starting from 1 (i.e., 1, 2, 3, 4, etc.). This convention follows what is done at the command-line for LIBSVM.
The input object
data
must be an iterable object (such as a Python list or tuple) containing 2D 64-bit float arrays each representing data for one single class. The data in each array should be organized row-wise (i.e. 1 row represents 1 sample). All rows for all arrays should have exactly the same number of columns - this will be checked.Optionally, you may also provide both input arrays
subtract
anddivide
, which will be used to normalize the input data before it is fed into the training code. If provided, both arrays should be 1D and contain 64-bit floats with the same width as all data in the input arraydata
. The normalization is applied in the following way:\[d' = \frac{d-\text{subtract}}{\text{divide}}\]