As in most modern face detectors, we also apply a cascaded classifier for detecting faces. In this package, we provide a pre-trained classifier for upright frontal faces, but the cascade can be re-trained using own data.
The most simple face detection task is to detect a single face in an image. This task can be achieved using a single command:
>>> face_image = bob.io.base.load('testimage.jpg')
>>> bounding_box, quality = bob.ip.facedetect.detect_single_face(face_image)
>>> print (quality, bounding_box.topleft, bounding_box.size)
33.1136586165 (113, 84) (216, 180)
(Source code, png, hires.png, pdf)
As you can see, the bounding box is not square as for other face detectors, but has an aspect ratio of 5:6. The function bob.ip.facedetect.detect_single_face() has several optional parameters with proper default values. The first optional parameter specifies the bob.ip.facedetect.Cascade, which contains the classifier cascade. We will see later, how this cascade can be re-trained.
The minimum_overlap parameter defines the minimum overlap that patches of multiple detections of the same face might have. If set to 1 (or None), only the bounding box of the best detection is returned, while smaller values will compute the average over more detection, which usually makes the detection more stable.
The second parameter is a bob.ip.facedetect.Sampler, which defines how the image is scanned. The scale_factor (a value between 0.5 and 1) defines, in which scale granularity the image is scanned. For higher scale factors like the default 2^{-1/16} many scales are tested and the detection time is increased. For lower scale factors like 2^{-1/4}, fewer scales are tested, which might reduce the stability of the detection.
The distance parameter defines the distance in pixel units between two tested bounding boxes. A lower distance improves stability, but needs more time. Anyways, distances higher than 4 pixels are not recommended.
The lowest_scale parameter defines the size of the smallest bounding box, relative to the size of the image. For example, for a given image of resolution 640\times480 and a lowest_scale = 0.125 (the default), the smallest detected face would be 60 (i.e. 480*0.125) pixels high. Theoretically, this parameter might be set to None, for which all possible scales are extracted, but this is not recommended.
Finally, the sampler has a given patch_size, which is tightly connected to the cascade and should not be changed.
The bob.ip.facedetect.Sampler can return an iterator of bounding boxes that will be tested:
>>> sampler = bob.ip.facedetect.Sampler(scale_factor=math.pow(2., -1./4.), distance=2, lowest_scale = 0.125)
>>> patches = list(sampler.sample(face_image))
>>> print (face_image.shape)
(3, 531, 354)
>>> print (patches[0].topleft, patches[0].size)
(0, 0) (357, 298)
>>> print (patches[-1].topleft, patches[-1].size)
(463, 300) (63, 53)
>>> print (len(patches))
14493
As you can see, there are a lot a lot of patches in different locations and scales that might contain faces. In fact, when given an image with several faces, you might want to get the bounding boxes for all faces at once. The classifiers in the cascade do not only provide a decision if a given patch contains a face, but it also returns a quality value. For the pre-trained cascade, this quality value lies approximately between -100 and +100. Higher values indicate that there is a face, while patches with smaller values usually contain background.
To extract all faces in a given image, the function bob.ip.facedetect.detect_all_faces() requires that this threshold is given as well:
>>> bounding_boxes, qualities = bob.ip.facedetect.detect_all_faces(face_image, threshold=20)
>>> for i in range(len(bounding_boxes)):
... print ("%3.4f"%qualities[i], bounding_boxes[i].topleft, bounding_boxes[i].size)
74.3045 (88, 66) (264, 220)
24.7024 (264, 192) (72, 60)
24.5685 (379, 126) (126, 105)
The returned list of detected bounding boxes are sorted according to the quality values. Again, cascade, sampler and minimum_overlap can be specified to the function.
Note
The strategy for merging overlapping detections differ between the two detection functions. While bob.ip.facedetect.detect_single_face() uses bob.ip.facedetect.best_detection() to merge detections, bob.ip.facedetect.detect_all_faces() simply uses bob.ip.facedetect.prune_detections() to keep only the detection with the highest quality in the overlapping area.
In case you want to implement your own strategy of merging overlapping bounding boxes, you can simply get the detection qualities for all sampled patches.
Note
For the low level functions, only grayscale images are supported.
>>> cascade = bob.ip.facedetect.default_cascade()
>>> gray_image = bob.ip.color.rgb_to_gray(face_image)
>>> for quality, patch in sampler.iterate_cascade(cascade, gray_image):
... if quality > 40:
... print ("%3.4f"%quality, patch.topleft, patch.size)
48.9983 (84, 84) (253, 210)
51.7809 (105, 63) (253, 210)
56.5325 (105, 84) (253, 210)
47.9453 (106, 88) (212, 177)
40.3316 (124, 71) (212, 177)
43.7717 (134, 104) (179, 149)
As you can see, most of the patches with high quality values overlap.
As previously mentioned, there is a pre-trained classifier cascade included into this package. However, this classifier is trained only to detect frontal or close-to-frontal upright faces, but no rotated or profile faces – or even other objects. Nevertheless, it is possible to train a cascade for your detection task.
The first thing that the cascade training requires is training data – the more the better. To ease the collection of positive and negative training data, a script ./bin/collect_training_data.py is provided. This script has several options:
To train the detector, both positive and negative training data needs to be present. Positive data is defined by annotations of the images, which can be translated into bounding boxes. E.g., for frontal facial images, bounding boxes can be defined by the eye coordinates (see bob.ip.facedetect.bounding_box_from_annotation()) or directly by specifying the top-left and bottom-right coordinate. There are two different ways, how annotations can be read. One way is to read annotations from annotation file using the bob.ip.facedetect.read_annotation_file() function, which can read various types of annotations. To use this function, simply specify the command line options for the ./bin/collect_training_data.py script:
The second way is to use one of our database interfaces (see https://github.com/idiap/bob/wiki/Packages), which have annotations stored internally:
Usually, it is also useful to include databases which do not contain target images at all. For these, obviously, no annotations are required/available. Hence, for pure background image databases, use the option:
For example, to collect training data from three different databases, you could call:
$ ./bin/collect_training_data.py --image-directory <...>/Yale-B/data --image-extension .pgm --annotation-directory <...>/Yale-B/annotations --annotation-type named --output-file Yale-B.txt
$ ./bin/collect_training_data.py --database xm2vts --image-directory <...>/xm2vtsdb/images --protocols lp1 lp2 darkened-lp1 darkened-lp2 --groups world dev eval --output-file XM2VTS.txt
$ ./bin/collect_training_data.py --image-directory <...>/FDHD-background/data --image-extension .jpeg --no-annotations --output-file FDHD.txt
The first scans the Yale-B/data directory for .pgm images and the Yale-B/annotations directory for annotations of the named type, the second uses the bob.db.xm2vts interface to collect images, whereas the third collects only background .jpeg data from the FDHD-background/data directory.
Training the classifier is split into two steps. First, the ./bin/extract_training_features.py can be used to extracted training features from a list of database files as generated by the ./bin/collect_training_data.py script. Again, several options can be selected:
Since the detector will use the bob.ip.facedetect.Sampler to extract image patches, we follow a similar approach to generate training data. A sampler is used to iterate over the training images and extract image patches. Depending on the overlap of the image patches, they are considered as positive or negative samples, or they are ignored, i.e., when the overlap has a value between the:
Since this sampling strategy would end up with a huge amount of negative samples, there are two options to limit them:
Now, the type of LBP features that are extracted have to be defined. Usually, LBP features in all possible sizes and aspect ratios that fit into the given --patch-size are generated. Several options can be used to select a conglomerate of different kinds of LBP feature extractors, for more information please refer to [Atanasoaei2012]:
Interestingly, already a quite limited number of different LBP feature extractors might be sufficient. For example, the pre-trained cascade uses the following options:
$ ./bin/extract_training_features.py --file-lists Yale-B.txt XM2VTS.txt FDHD.txt ... --lbp-scale 1 --lbp-variant mct
Finally, there --parallel option can be used to run the feature extraction in parallel. Particularly, in combination with the GridTK, processing can be speed up tremendously:
$ ./bin/jman submit --parallel 64 -- ./bin/extract_training_features.py ... --parallel 64
To finally train the face detector cascade, the ./bin/train_detector.py script is provided. This script reads the training features as extracted by the ./bin/extract_training_features.py script and generates a regular boosted cascade of weak classifiers. Again, the script has several options:
The training is done in several bootstrapping rounds. In the first round, a strong classifier is generated from randomly selected 5000 positive and 5000 negative samples. After 8 weak classifiers have been selected, all remaining samples are classified with the current boosted machine. Those 5000 positive and 5000 negative samples that are misclassified most strongly are added to the training samples. A new bootstrapping round starts, which now selects 8*2 = 16 weak classifiers, until the 7th round has selected 512 weak classifiers.
These numbers can be modified on command line with the command line options:
Finally, a regular cascade is created, which will reject patches with a value below the threshold -5 after each 25 weak classifiers are evaluated. These numbers can be changed using the options:
This package also provides a script ./bin/validate_cascade.py to automatically adapt the steps and thresholds of the cascade based on a validation set. However, but the use of this script is not encouraged since I couldn’t yet come up if a proper default configuration.
For completeness it is worth mentioning that the default pre-trained cascade was trained on the following databases: