.. vim: set fileencoding=utf-8 : .. @author: Manuel Gunther .. @date: Fri Jan 22 09:08:25 MST 2016 .. py:currentmodule:: bob.db.celeba ============== User's Guide ============== The CelebA database is a database of face images, which are labeled with several binary :py:meth:`Database.attributes`, such as "Big Nose", "Bushy Eyebrows", "Narrow Eyes" or "Smiling". Additionally, five facial landmarks (:py:meth:`Database.annotations`) are labeled for each image, namely, the two eyes, the nose and the left and right mouth corners. The database is split up into three partitions, a training set, a validation set and a test set. After launching the python interpreter (assuming that the environment is properly set up), you could get the training set as follows: .. doctest:: >>> import bob.db.celeba >>> db = bob.db.celeba.Database() >>> files = db.objects(purposes='training') >>> file_names = [db.original_file_name(f) for f in files] .. note:: You can also use the :py:meth:`Database.training_set` to obtain the list of training :py:class:`File` objects. Similarly, the validation and test set can be obtained with :py:meth:`Database.validation_set` and :py:meth:`Database.test_set`. The annotations can be used to align the face in the image, e.g., using the :py:class:`bob.ip.base.FaceEyesNorm`: .. code-block:: py >>> import bob.io.base >>> import bob.io.image >>> import bob.ip.base >>> face_eyes_norm = bob.ip.base.FaceEyesNorm(image_size=(100,100), right_eye=(33,33), left_eye=(33,67)) >>> image = bob.io.base.load(file_names[0]) >>> annotations = db.annotations(files[0]) >>> aligned_image = face_eyes_norm(image, right_eye = annotations['reye'], left_eye = annotations['leye']) Different kinds of features can be extracted from the images, and a classifier can be trained to classify the attributes. The attributes themselves are binary (the actual datatype is ``int``), where ``+1`` stands for the presence of the attribute, while ``-1`` indicates the absence of it: .. doctest:: >>> attributes = db.attributes(files[0]) >>> for i in range(4): ... print ("Attribute '%s' is %spresent" % (db.attribute_names()[i], "" if attributes[i] == 1 else "not ")) Attribute '5_o_Clock_Shadow' is not present Attribute 'Arched_Eyebrows' is present Attribute 'Attractive' is present Attribute 'Bags_Under_Eyes' is not present The :py:meth:`Database.objects` function can also be used to filter the results for some attributes. For example, if you only want the attractive young women, you might want to call: .. doctest:: >>> attractive_young_women = db.objects(with_attributes=["Attractive", "Young"], without_attributes="Male") Note that the dataset is highly biased, meaning that for most of the attributes, there is a strong bias on either having that attribute or not -- for most attributes the bias is on the negative side. You can get the bias of the dataset (here, only the training set) as follows: .. doctest:: >>> training_set = db.training_set() >>> training_attributes = [db.attributes(f) for f in training_set] >>> attribute_names = db.attribute_names() >>> counts = [sum(a[i] == 1 for a in training_attributes) for i in range(len(attribute_names))] >>> for n,c in zip(attribute_names, counts): ... print ("%6d of %d positives (%3.2f%%) for attribute %s" % (c, len(training_set), float(c) / float(len(training_set)), n)) 18177 of 162770 positives (0.11%) for attribute 5_o_Clock_Shadow 43278 of 162770 positives (0.27%) for attribute Arched_Eyebrows 83603 of 162770 positives (0.51%) for attribute Attractive 33280 of 162770 positives (0.20%) for attribute Bags_Under_Eyes 3713 of 162770 positives (0.02%) for attribute Bald ... 68263 of 162770 positives (0.42%) for attribute Male ... 126788 of 162770 positives (0.78%) for attribute Young Try these commands out to get the full list of all 40 attributes.