Several tasks can be achieved by a boosted classifier:
To achieve this goal, a strong classifier S is build out of a weighted list of I weak classifiers W_i:
S(\vec x) = \sum_{i=1}^I w_i \cdot W_i(\vec x)
Note
For the univariate case, both w_i and the weak classifier result W_i are floating point values. In the multivariate case, w_i is a vector of weights – one for each output dimension – and the weak classifier W_i returns a vector of floating point values as well.
Currently, two types of weak classifiers are implemented in this boosting framework.
The first classifier, which can only handle univariate classification tasks, is the bob.learn.boosting.StumpMachine. For a given input vector \vec x, the classifier bases its decision on a single element x_m of the input vector:
W(\vec x) = \left\{ \begin{array}{r@{\text{ if }}l} +1 & (x_m - \theta) * \phi >= 0 \\ -1 & (x_m - \theta) * \phi < 0 \end{array}\right.
Threshold \theta, polarity phi and index m are parameters of the classifier, which are trained using the bob.learn.boosting.StumpTrainer. For a given training set \{\vec x_p \mid p=1,\dots,P\} and according target values \{t_p \mid p=1,\dots,P\}, the threshold \theta_m is computed for each input index m, such that the lowest classification error is obtained, and the m with the lowest training classification error is taken. The polarity \phi is set to -1, if values lower than the threshold should be considered as positive examples, or to +1 otherwise.
To compute the classification error for a given \theta_m, the gradient of a loss function is taken into consideration. For the stump trainer, usually the bob.learn.boosting.ExponentialLoss is considered as the loss function.
The second classifier, which can handle univariate and multivariate classification and regression tasks, is the bob.learn.boosting.LUTMachine. This classifier is designed to handle input vectors with discrete values only. Again, the decision of the weak classifier is based on a single element of the input vector \vec x.
In the univariate case, for each of the possible discrete values of x_m, a decision {+1, -1} is selected:
W(\vec x) = LUT[x_m]
This look-up-table LUT and the feature index m is trained by the bob.learn.boosting.LUTTrainer.
In the multivariate case, each output W^o is handled independently, i.e., a separate look-up-table LUT^o and a separate feature index m^o is assigned for each output dimension o:
W^o(\vec x) = LUT^o[x_{m^o}]
Note
As a variant, the feature index m^o can be selected to be shared for all outputs, see bob.learn.boosting.LUTTrainer for details.
A weak look-up-table classifier is learned using the bob.learn.boosting.LUTTrainer.
The strong classifier, which is of type bob.learn.boosting.BoostedMachine, is a weighted combination of weak classifiers, which are usually of the same type. It can be trained with the bob.learn.boosting.Boosting trainer, which takes a list of training samples, and a list of univariate or multivariate target vectors. In several rounds, the trainer computes (here, only the univariate case is considered, but the multivariate case is similar – simply replace scores by score vectors.):
The classification results (the so-called scores) for the current strong classifier:
s_p = S(\vec x_p)
The derivative L' of the loss function, based on the current scores and the target values:
\nabla_p = L'(t_p, s_p)
This loss gradient is used to select a new weak machine W_i using a weak trainer (see above).
W_i = trainer.train([\vec x_p], [\nabla_p])
The scores of the weak machine are computed:
r_p = W_i(\vec x_p)
The weight for the new machine is optimized using scipy.optimize.fmin_l_bfgs_b. This call will use both the loss L and its derivative L' to compute the optimal weight for the new classifier:
w_i = scipy.optimize.fmin_l_bfgs_b(...)
The new weak machine is added to the strong classifier.
As shown above, the loss functions define, how well the currently predicted scores s_p fit to the target values t_p. Depending on the desired task, and on the type of classifier, different loss functions might be used:
Other loss functions, e.g., using the Euclidean distance for regression, should be easily implementable.