Lake, Swamp or Puddle: Data Quality at Scale
YOW! Hong Kong 2017
Data is a powerful tool. Data-driven systems leveraging modern analytical and predictive techniques can offer significant improvements over static or heuristic driven systems.
The question is:
- How much can you trust your data? Data collection, processing and aggregation is a challenging task.
- How do we build confidence in our data? Where did the data come from?
- How was it generated? What checks have or should be applied?
- What is affected when it all goes wrong?
This talk looks at the mechanics of maintaining data-quality at scale. Firstly looking at bad-data, what it is and where it comes from. Then diving into the techniques required to detect, avoid and ultimately deal with bad-data. At the end of this talk the audience should come away with an idea of how to design quality data-driven systems that ultimately build confidence and trust rather than inflate expectations.
Mark Hibberd spends his time working on data and sustainability problems for Kinesis. Mark takes software development seriously. Valuing correctness and reliability, he is constantly looking to learn tools and techniques to support these goals. This approach has led to a history of building teams that utilise purely -functional programming techniques to help deliver robust products.