Lake, Swamp or Puddle: Data Quality at Scale
YOW! Hong Kong 2017
Data is a powerful tool. Data-driven systems leveraging modern analytical and predictive techniques can offer significant improvements over static or heuristic driven systems.
The question is:
- How much can you trust your data? Data collection, processing and aggregation is a challenging task.
- How do we build confidence in our data? Where did the data come from?
- How was it generated? What checks have or should be applied?
- What is affected when it all goes wrong?
This talk looks at the mechanics of maintaining data-quality at scale. Firstly looking at bad-data, what it is and where it comes from. Then diving into the techniques required to detect, avoid and ultimately deal with bad-data. At the end of this talk the audience should come away with an idea of how to design quality data-driven systems that ultimately build confidence and trust rather than inflate expectations.