VFPAD
Description
The in-Vehicle Face Presentation Attack Detection (VFPAD) dataset consists of bona-fide and 2D/3D attack presentations acquired for a subject (real or fake) in the driver’s sear of the car. These presentations have been captured using an NIR camera (940 nm) placed on the steering wheel of the car, while NIR illuminators have been fixed on both front pillars (adjacent to the wind-shield) of the car. The bona-fide videos represent 24 male and 16 female subjects of various ethnicities. The PAI species used to construct this dataset include photo-prints, digital displays (for replay attacks), rigid 3D masks, and flexible 3D masks made of silicone.
Data Collection
The videos comprising this dataset represent bona-fide and attack presentations under a range of variations:
- Environmental variations: presentations have been recorded in four sessions, each under different environmental conditions (outdoor sunny; outdoor cloudy; indoor dimly-lit; and indoor brightly-lit)
- Different scenarios: bona-fide presentations for each subject have been captured with variety of appearances: with/without glasses, with/without hat, etc.
- Illumination variations: two illumination conditions have been used: ‘uniform’ (both NIR illuminators switched on), and ‘non-uniform’ (only the left NIR-illuminator switched on), and
- Pose variations: two poses (‘angles’) have been used: ‘front’: the subject looks ahead at the road; and ‘below’: subject looks straight into the camera.
As Figure 1 shows, the camera is placed on the steering column, looking up at the subject’s face.
Structure of the Dataset
Each presentation is recorded in a separate file in HDF5 format. The hdf5 files have the following directory-structure:
/stream_0
/stream_0/recording_0
/stream_0/recording_1
The subdirectory recording_0 contains several frames that may be used for illumination-calibration. These frames represent a video, approximately 2 seconds long, that has been captured without the NIR 940nm illumination. Therefore, these frames capture the ambient natural light.
The subdirectory recording_1 contains frames of a 10-second long video, with the appropriate NIR illuminators switched on. These are the frames that are used for PAD experiments.
Overall Statistics
Number of videos | |
bona-fide | 4046 |
PA | 1790 |
Total no. of videos | 5836 |
The dataset is divided into two folders: bf and pa--- each of which consists of sub-folders for each client (real subject or PAI). All recordings for the given client are stored in the corresponding sub-folder. These presentations are stored in HDF5 format. The filename encodes information about the type of presentation recorded. The filename has the following format:
<presentation-type>_<session-id>_<angle-id>_<illumination-id>_<client-id>_<presenter-id>_<type-id>_<sub-category-id>_<pai-id>_<trial-id>.hdf5
The description for each field is provided below:
Component | Length | Description | |
1 | presentation-type | 2 char | bf or pa: string indicating whether the corresponding sample is bona-fide or PA. |
2 | session-id | 2 digits | 01, 02, 03, or 04: indicates the session (S1, S2, S3, or S4, respectively) in which the data is captured. |
3 | angle-id | 1 digit | 1 or 2: indicates the angle between camera and face (below: 1; or front: 2). |
4 | illumination-id | 1 digit | 1 or 2: indicates the light distribution over the face (non-uniform: 1; or uniform: 2). |
5 | client-id | 4 digits | The identity assigned to the bona-fide subject or to the PAI that is in front of the camera. For bona-fide subjects, arbitrary numerical identities have been used from 0001 to 0040. For PAIs, arbitrary strings havebeen used to create identities for each PAI. |
6 | presenter-id | 4 digits | Redundant information for the present version of the dataset. Indicates who is presenting the face (real or fake) to the camera. In the dataset the presenter-id is either 0000 (for bf) or 0001 (for pa) for every file. |
7 | type-id | 2 digits | 00, 01, 02, 03, or 04: Indicates the main category of presentation. The numeric strings correspond to bona-fide, 2D print attacks, 2D replay attacks, 3D silicone masks, and 3D rigid masks, respectively. |
8 | sub-category-id | 2 digits | Indicates the sub-category of the main category indicated by type-id See Table 2 for explanations of sub-category-id for the various type-id values. |
9 | pai-id | 3 digits | 3 digits A unique number given to each presentation attack instrument. For bona-fide presentations this number is always 000. |
10 | trial-id | 8 digits | An arbitrary numeric string. This string helps to distinguish between separate captures of the same presentation for the exactly same recording scenario. |
The details of sub-category-id are provided in Table below:
Type ID | Sub-Category ID | Description |
---|---|---|
00 (bona-fide) | 00 | Natural (no glasses or hat) |
01 | Medical glasses (wherever applicable) | |
02 | Clear glasses | |
03 | Sunglasses | |
04 | Hat (no glasses) | |
05 | Hat + clear glasses | |
06 | Hat + sunglasses | |
01 (Print) | 01 | Matte on Laser printer |
02 | Glossy on Laser printer | |
03 | Matte on Inkjet printer | |
04 | Glossy on Inkjet printer | |
02 (Replay-attack) | 00 | – |
03 (3D Silicone masks) | 00 | Generic flexible mask (G-Flex-3D-Mask) |
01 | Custom flexible mask (C-Flex-3D-Mask) | |
04 (3D Rigid masks) | 00 | Custom rigid mask 1 |
02 | Custom rigid mask 2 | |
03 | Custom rigid mask 3 | |
04 | Custom rigid mask 4 |
Experimental protocol
The reference publication considers the experimental protocol named grandtest. For a frame-level evaluation, 20 frames from each video have been used, except for print attacks. The VFPAD dataset consists of relatively less number of print attacks. The grandtest protocol, thus, considers 80 frames per video to provide a fair representation of print attacks during experimentation. For the grandtest protocol, videos were divided into fixed, disjoint groups: train, dev, and eval. Each group consists of unique subset of subjects. (Subjects of one group are not present in other two).
Details of the grandtest protocol are summarized below:
Partition | #Videos | Split ratio (%) |
train bona-fide | 1503 | 37.15 |
train PA | 595 | 33.24 |
dev bona-fide | 1247 | 30.82 |
dev PA | 666 | 37.20 |
eval bona-fide | 1296 | 32.03 |
eval PA | 529 | 29.56 |
Total | 5836 |
Citation
If you use the dataset, please cite the following publication:
@article{IEEE_TBIOM_2021,
author = {Kotwal, Ketan and Bhattacharjee, Sushil and Abbet, Philip and Mostaani, Zohreh and Wei, Huang and Wenkang, Xu and Yaxi, Zhao and Marcel, S\'{e}bastien},
title = {Domain-Specific Adaptation of CNN for Detecting Face Presentation Attacks in NIR},
journal = {IEEE Transactions on Biometrics, Behavior, and Identity Science},
publisher = {{IEEE}},
year={2022},
volume={4},
number={1},
pages={135--147},
doi={10.1109/TBIOM.2022.3143569}
}