Synthetic face recognition datasets are often generated using a generator model. Therefore, an important question is whether any of the generated images in the generated synthetic face recognition dataset contain important information from training dataset, that was used to train the face generator model in the first place?
We consider an exhaustive search approach to compare all possible pairs of images from synthetic dataset and the training dataset of generator model. To this end, we use an off-the-shelf face recognition model to extract face embeddings from each face image, and then compare the embeddings of every pair of images from two datasets. Then, we sort the pairs of images according to the similarity of embeddings and consider the top-k pairs for visual comparison of images.
The source code of our experiments as well as meta-data for sample leaked images will be released soon.
@inproceedings{neurips2024unveiling,
author = {Hatef Otroshi Shahreza and S{\'e}bastien Marcel},
title = {Unveiling Synthetic Faces: How Synthetic Datasets Can Expose Real Identities},
booktitle = {NeurIPS Workshop on New Frontiers in Adversarial Machine Learning}
year = {2024}
}