Skip to main content


Synthetic Data

At Smart Data Foundry we're passionate about maintaining peoples privacy. Creating synthetic datasets helps to achieve this, by having the same essential patterns and relationships as the corresponding real datasets but containing only artificial people and events. Using synthetic data allows innovators and researchers to work safely with data without exposing sensitive information.

From ‘Crude Approximations’ to ‘Convincing Synthetic Alternatives’

There are multiple ways to manufacture synthetic data. There are simple, pragmatic approaches that build datasets with similar statistical properties to the original (similar average, range and variance, for example) but otherwise contain random values.

The most sophisticated approaches get machine learning algorithms ‘duelling’ with each other — one machine tries to create the convincingly realistic artificial data, while its opponent tries to spot the fakes among the real data.

At Smart Data Foundry we test a range of methodologies, in order to build the most effective, useful, yet protected, datasets to work with. The ideal synthetic dataset looks just like a different sample from the same underlying population.

Computer opening with warm glow

Testing the fine line between privacy and utility

To have confidence in synthetic data before you start to use it, you need to be able to assure yourself of two things: does it look like the real thing, and is it safe to use?

First, you need to know that the synthetic data behaves as much like the real data it mimics, and second, you need to minimize the risk any aspect of any real individual’s data can leak through. Building the best synthetic data means being skilled in how to test both these aspects to a high degree of confidence.

Working at the leading edge

This is new territory and Smart Data Foundry is rapidly gaining strength and depth in synthetic data. Our team recently entered and won a global competition organised by the United Nations (UNECE) High Level Group for Modernisation of Official Statistics (HLG-MOS) to explore and learn techniques for generating synthetic data. You can read more about the team and their approach here.

Get involved

If you need to work with complex data structures in a safe and secure way and would like to understand more about how we can create synthetic data structures for you, please get in touch with Bryn Coulthard