Join Newsletter

Building Rome Every Day - Scaling ML Model Building Infrastructure

YOW! Data 2019

"I want to reset my password". "I ordered the wrong size". "These are not the droids I was looking for". Every day, a support agent fields thousands of these queries. Multiply that by the thousands of agents a company might have, and the sheer vastness of data being generated becomes hard to imagine. How can we make sense of it all? It seems a formidable task, but we have a formidable weapon in our arsenalwe have machine learning.

By combining deep learning, natural language processing and clustering techniques, we built a machine learning model that can take 100,000 tickets and efficiently cluster and summarise them into digestible topics. But that's only part of the challenge; we also had to scale it to build for 30,000 customers, in production, every day.

In this talk I'll share the story of Content Cues - Zendesk's latest Machine Learning product. It's the story of how we leveraged the power of AWS Batch to scale a model building platform. Of how we tackled challenges such as measuring how well an unsupervised model performs when it's not even clear what "well" means. Of how our team combined our pool of skills across data engineering, data science and product management to deliver a pipeline capable of building a thousand models for the price of a cup of coffee.

Dana Ma

Sr. Software Engineer



Dana is currently a Senior Software Engineer at Zendesk in the Data Products team, where she works on harnessing the power of data to help Zendesk's customers help their customers help themselves. Prior to that, she worked in London building pipelines and wrangling data in the financial sector. As a functional programming enthusiast with a background in pure mathematics, she's enthusiastic about the potential of data to help people, but also harbours a solid appreciation for the beautifully esoteric.