Creating meaningful data visualisation from 5m de-identified bank accounts
Lise Wall, Senior Data Scientist, and Lucy Lloyd, Product Manager.
How do we make sure our data insights can have immediate impact upon viewing?
When dealing with large-scale data sets like our 5m de-identified current accounts, it’s imperative that we let the data do the talking as clearly as possible.
At Smart Data Foundry we use this complex and rich source data to create curated datasets and tools which make it easier for people to derive insights from this novel data source.
Map-based data visualisation
The Economic Wellbeing Explorer, is our flagship tool. It is a map-based platform which enables users to zoom in and out between national, regional and local insights and apply contextual data to our financial indicators.
To make the data in the Explorer more interrogable, we need to think about how to best represent it visually. A common approach is to indicate by size or colour, allowing for instinctive understanding by the user; when users launch the dashboard, we want them to be able to see, at-a-glance, the wealth of information available to them and to feel excited at the potential of these insights to ultimately make data-driven decisions to improve people’s lives.
However, different datasets might play different rules; for example, not forming beautiful normal distribution curves, or clustering together at different points across their range. We need to ensure our users understand that these data variations could have significant impact on the lives of the people they represent, and alongside our financial indicators, can help influence changes to services and policies.
The data story needs to be clear.
Displaying contextual data
We will soon expand the Economic Wellbeing Explorer to cover communities across all of Scotland, Wales and England. Expanding the data to cover three nations within Great Britain of course has challenges; one challenge we’ve faced is that some of the contextual datasets we display have long tailed distributions where their values for some areas are either much higher or much lower than the median for all of GB. An example of this is average house price data, in which the values for London are much higher than the median house prices for the rest of GB.
The solution we originally landed on was to use a linear colour scale, where a direct, relationship exists between a numerical value and the related colour's intensity or lightness.
(The national distribution being skewed by the central London house prices)
From the Scottish data it is clear where there are clusters of more expensive housing: aligned with our Living Beyond Means financial metric, we can see how poverty might lurk amongst more wealthily perceived neighbourhoods.
However, when you want to look at areas e.g. Middlesborough, this linear colour scale is of little use if the variation in house prices isn’t obvious because they are being compared to multi-million pound houses elsewhere in the country. This is because the linear colour scale is divided into a number of equally spaced buckets between the minimum and maximum value of house price.
The solution
To address this, our Senior Data Scientist Lise Wall started to explore using different non-linear scales, trying out how our users can make best use of this valuable dataset.
First, she tried a percentile-based scale. In this scale the spacing of the colour buckets means that there are an equal number of data points in each bucket. This gave improved visuals, but then a concentration of similar colours within London.
Lise then tried a log scale. In this scale the spacing between the buckets get exponentially larger as the prices get higher. This allowed a better distribution of colours in populations with different clustering of house prices. This scale was parameterised to control the rate of growth and a grid search across parameters allowed us to select the scale which would be the most informative to the user.
Summary
By optimising the scales we use for housing contextual data, we allow our users to see at-a-glance house price distribution across different neighbourhoods, allowing us to enable policymakers within a more local region to understand the stresses different populations might be experiencing, whatever their context. This scaling was then introduced for some of our other contextual variables which suffered from the same challenges, such as GP driving time.
Making the data we use speak clearly to those who can use it to make a difference to people’s lives.


Economic Wellbeing Explorer
By providing actionable, data-driven insights, the Economic Wellbeing Explorer helps policymakers, analysts, economists, and service providers make informed decisions that drive impactful change—reducing poverty, tackling inequality, and improving overall economic wellbeing.
Access the Explorer
