Multi-Layer Perceptron (MLP) Machines and Trainers¶
A multi-layer perceptron (MLP) is a neural network architecture that has some well-defined characteristics such as a feed-forward structure. You can create a new MLP using one of the trainers described below. We start this tutorial by examplifying how to actually use an MLP.
To instantiate a new (uninitialized) bob.learn.mlp.Machine
pass a
shape descriptor as a tuple
. The shape parameter should contain the
input size as the first parameter and the output size as the last parameter.
The parameters in between define the number of neurons in the hidden layers of
the MLP. For example (3, 3, 1)
defines an MLP with 3 inputs, 1 single
hidden layer with 3 neurons and 1 output, whereas a shape like (10, 5, 3,
2)
defines an MLP with 10 inputs, 5 neurons in the first hidden layer, 3
neurons in the second hidden layer and 2 outputs. Here is an example:
>>> mlp = bob.learn.mlp.Machine((3, 3, 2, 1))
As it is, the network is uninitialized. For the sake of demonstrating how to use MLPs, let’s set the weight and biases manually (we would normally use a trainer for this):
>>> input_to_hidden0 = numpy.ones((3,3), 'float64')
>>> numpy.allclose(input_to_hidden0,[[ 1., 1., 1.], [ 1., 1., 1.],[ 1., 1., 1.]])
True
>>> hidden0_to_hidden1 = 0.5*numpy.ones((3,2), 'float64')
>>> numpy.allclose(hidden0_to_hidden1, [[ 0.5, 0.5],[ 0.5, 0.5],[ 0.5, 0.5]])
True
>>> hidden1_to_output = numpy.array([0.3, 0.2], 'float64').reshape(2,1)
>>> numpy.allclose(hidden1_to_output, [[ 0.3], [ 0.2]])
True
>>> bias_hidden0 = numpy.array([-0.2, -0.3, -0.1], 'float64')
>>> bias_hidden0
array([-0.2, -0.3, -0.1])
>>> bias_hidden1 = numpy.array([-0.7, 0.2], 'float64')
>>> bias_hidden1
array([-0.7, 0.2])
>>> bias_output = numpy.array([0.5], 'float64')
>>> numpy.allclose(bias_output, [ 0.5])
True
>>> mlp.weights = (input_to_hidden0, hidden0_to_hidden1, hidden1_to_output)
>>> mlp.biases = (bias_hidden0, bias_hidden1, bias_output)
At this point, a few things should be noted:
Weights should always be 2D arrays, even if they are connecting 1 neuron to many (or many to 1). You can use the NumPy
reshape()
array method for this purpose as shown aboveBiases should always be 1D arrays.
By default, MLPs use the
bob.learn.activation.HyperbolicTangent
as activation function. There are currently 4 other activation functions available in Bob:The identity function:
bob.learn.activation.Identity
;The sigmoid function (also known as the logistic function function):
bob.learn.activation.Logistic
;A scaled version of the hyperbolic tangent function:
bob.learn.activation.MultipliedHyperbolicTangent
; andA scaled version of the identity activation:
bob.learn.activation.Linear
Let’s try changing all of the activation functions to a simpler one, just for this example:
>>> mlp.hidden_activation = bob.learn.activation.Identity()
>>> mlp.output_activation = bob.learn.activation.Identity()
Once the network weights and biases are set, we can feed forward an example
through this machine. This is done using the ()
operator, like for a
bob.learn.linear.Machine
:
>>> numpy.allclose(mlp(numpy.array([0.1, -0.1, 0.2], 'float64')), [ 0.33])
True
MLPs can be trained through backpropagation 2, which is a supervised learning technique. This training procedure requires a set of features with labels (or targets). Using Bob, this is passed to the train() method of available MLP trainers in two different 2D NumPy arrays, one for the input (features) and one for the output (targets). The number of rows in those two 2D arrays should be equal to the batch size set when creating the model.
>>> d0 = numpy.array([[.3, .7, .5]]) # input
>>> t0 = numpy.array([[.0]]) # target
The class used to train a MLP 1 with backpropagation 2 is
bob.learn.mlp.BackProp
. An example is shown below.
>>> trainer = bob.learn.mlp.BackProp(1, bob.learn.mlp.SquareError(mlp.output_activation), mlp, train_biases=False) # Creates a BackProp trainer with a batch size of 1
>>> trainer.train(mlp, d0, t0) # Performs the Back Propagation
Note
The second parameter of the trainer defines the cost function to be used for
the training. You can use two different types of pre-programmed costs in
Bob: bob.learn.mlp.SquareError
, like before, or
bob.learn.mlp.CrossEntropyLoss
(normally in association with
bob.learn.activation.Logistic
). You can implement your own
cost/loss functions. Nevertheless, to do so, you must do it using our
C/C++-API and then bind it to Python in your own package.
Backpropagation 2 requires a learning rate to be set. In the previous
example, the default value 0.1
has been used. This might be updated using
the bob.learn.mlp.BackProp.learning_rate
attribute.
Another training alternative exists referred to as resilient propagation
(R-Prop) 3, which dynamically computes an optimal learning rate. The
corresponding class is bob.learn.mlp.RProp
, and the overall
training procedure remains identical.
>>> trainer = bob.learn.mlp.RProp(1, bob.learn.mlp.SquareError(mlp.output_activation), mlp, train_biases=False)
>>> trainer.train(mlp, d0, t0)
Note
The trainers are not re-initialized when you call it several times. This
is done so as to allow you to implement your own stopping criteria. To reset
an MLP trainer, use their reset
method.