Join Newsletter

Lake, Swamp or Puddle: Data Quality at Scale

YOW! Hong Kong 2017

Data is a powerful tool. Data-driven systems leveraging modern analytical and predictive techniques can offer significant improvements over static or heuristic driven systems.

The question is:

  • How much can you trust your data? Data collection, processing and aggregation is a challenging task.
  • How do we build confidence in our data? Where did the data come from?
  • How was it generated? What checks have or should be applied?
  • What is affected when it all goes wrong?

This talk looks at the mechanics of maintaining data-quality at scale. Firstly looking at bad-data, what it is and where it comes from. Then diving into the techniques required to detect, avoid and ultimately deal with bad-data. At the end of this talk the audience should come away with an idea of how to design quality data-driven systems that ultimately build confidence and trust rather than inflate expectations.