Image Classification in a Noisy Fraudulent World - A Journey of Computational and Statistical Performance
YOW! Data 2019
Formbay's fraud detection system relies on classification of photographic evidence to verify solar installations. Over the last 10 years, Formbay has amassed over 10 million labelled images of solar installations. Image classification over Formbay's dataset sounds easy. Lots of data, apply neural networks and profit from automation! However with such a large dataset, there is room for lots of noise. Noise such as mislabelled images, overlapping classes, corrupted image data, imbalanced classes, rotational variance and more.
This presentation demonstrates how we built our Image Processing pipeline tackling these noise issues while addressing class/concept drift. First we'll examine the data-situation of Formbay when we started and our initial model. Then we'll address each statistical and computational problem we met and how we decided to address them, slowly evolving our data pipeline over time.
This presentation focuses on the complexities of engineering production ready ML systems which involve balancing between statistical ("how accurate") and computational performance ("how fast").