SecretFan: Synthesizing Realistic Data without Breaking Privacy

arXiv CS Monday 08 June 2026, 04:00 UTC By Laura Plein, Alexi Turcotte, Arina Hallemans, Andreas Zeller 1 min read

Key Points

arXiv:2602.05833v2 Announce Type: replace Abstract: There is a need for synthetic training and test datasets that replicate statistical distributions of original datasets without compromising their confidentiality. A lot of research has been done in leveraging Generative Adversarial Networks (GANs) for synthetic data generation, however the resulting models are either not accurate enough or are still vulnerable to membership inference attacks (MIA) or dataset reconstruction attacks since the original data has been leveraged in the training process. In this paper, we frame synthetic data generation as a guided test generation, or search-based testing problem rather than a purely generative modeling task. Ours is a search-based, adequacy-guided input generation technique inspired by GANs, with a generation step and a discrimination step; as in GAN, discrimination uses a discriminator model trained on the date, but instead of using models also for generation, we use a fuzzer. This way, the original (private) data is only indirectly leveraged in the generation process, and by evolving samples and determining "good samples" with the discriminator, we can generate privacy-preserving data that follows the same statistical distributions as the original dataset, leading to a similar utility as the original data. We evaluated our approach on eight datasets that have been used to evaluate the state-of-the-art techniques, finding that synthetic generated with our technique achieves good utility on average while also having good similarity scores, highlighting the potential of a mixed approach leveraging classical generation and model-driven discrimination for generating privacy-preserving, useful synthetic datasets.

Generative Adversarial Networks (ORG) MIA (ORG) GAN (ORG)

Originally published by arXiv CS Read original →

SecretFan: Synthesizing Realistic Data without Breaking Privacy

Related Stories

'Our nature project has been amazing for wildlife'

Women prepare for 'space jobs that don't exist yet'

Popular joint supplement glucosamine linked to faster Alzheimer’s progression

Earth's first animals barely evolved until sex changed everything