Join Newsletter

Respecting privacy with synthetically generated "look-alike" data sets

YOW! Data 2018

Safely handling data that contains sensitive or private information about people is a multi-million dollar problem at many companies. It adds time into the data engineering process, it can cost a lot in software licenses for specialised tools, and brings a range of reputational and legal risks.

Recent advances in deep learning have prompted an interesting way to attack this problem. By fitting a certain class of model on a source data set that contains sensitive information, we can produce a generator that outputs a supply of synthetic "look alike" data. This output data will preserve many of the statistical relationships between fields as the source does, and offers mathematical guarantees around the identifiability of individuals in the source data set.

This talk will provide an overview of the approach and show how it can speed data engineering effort and reduce risk.

Tim Garnsey


Verge Labs


Tim is a Director at Verge Labs, a new type of AI company focused on the applied side of machine learning. At Verge Tim sifts through the daily firehose of theoretical research and builds a curated library of techniques and utilities ready to be switched on at companies today. 
Prior to this Tim spent six years absorbed in machine learning and data related roles in financial services tech companies including Atlassian, Spaceship and Airtasker.