To make the best decisions, you need the best data. But what if that data doesn’t exist?
This was the problem faced by an early-stage Fintech looking to develop a new product that required a step-change in the way that individual identities are matched across multiple and disparate data-siloes.
Smart Data Foundry were approached to help solve the challenge with our synthetic data. The task was to provide a data set of individuals with varying iterations of their personal details to simulate disparate data held on multiple databases that could be out-of-date, poorly formatted, or simply incorrect.
Using our agent-based simulation data-generation product, we provided a synthetic dataset of 100,000 individuals, with an average of 10 variations and differences in their personal data, creating 1,000,000 rows of individual data. The data included multiple data ‘mutations’ (e.g. input error, transposition, truncation, concatenation, splitting, validation) replicating real data complexity.
Having access to this dataset allowed the Fintech to match individuals across multiple data siloes with a common identifier to validate results – helping to test and improve processes on datasets and the ability to implement and iterate error checking, validation, and imputation processes – all verified against ‘ground truth’ data.
This data is now being used to train and test the Fintech’s proprietary identity-matching algorithms models, leading to new ways to identify individuals and create better and cleaner data to make decisions for individuals.
David Tracy, Head of Data Science at Smart Data Foundry, said
this is a great example of the power of synthetic data. Not only did the real data not exist, if it did, it would be locked away in separate siloes, behind difficult compliance and consent barriers, making it difficult for a Fintech to develop their new propositions.
David continued with “We’re finding this is a common problem that we’re solving for innovators and start-ups, as well as more established Financial Institutions. Our agent-based approach to generating synthetic data doesn’t rely on having a real dataset to create a synthetic double. Instead we can create integrated synthetic data that exactly matches the problem in hand, and empowers innovators to kickstart development, and get to market with revolutionary new product propositions”
To learn more about our Synthetic Data, get in touch with Dave Jennings to discuss how Smart Data Foundry can work with you.