By representing a synthetic dataset on the identity hypersphere as a set of reference embeddings, we can raise the question that “How should reference embeddings cover the identity hypersphere?” To answer this question, we remind that the distances between reference embeddings indicate the inter-class variation in the synthetic face recognition dataset. Therefore, since we would like to have a high inter-class variation in the gen- erated dataset, we can say that we need to maximize the distances between reference embeddings. We solve the optimization problem with an iterative approach based on gradient descent and then use a face genrator model to generate HyperFace synthetic dataset.
In the following table, we compare the performance of face recognition models trained with our generated datasets and with all publicly available versions (particularly larger scale) of synthetic datasets in the literature. As the results in this table show, our generated datasets achieve competitive performance with synthetic datasets in the literature at scale.
The source code of our experiments as well as different versions of HyperFace dataset are publicly availabble:
@inproceedings{shahreza2025hyperface,
title={HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere},
author={Hatef Otroshi Shahreza and S{\'e}bastien Marcel},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025}
}