Automating Operations with Machine Learning
YOW! 2019 Melbourne
How much money would you save if AI could detect and fix your outages as soon as they happen? In a multi-billion dollar business, outages are very expensive. MTTR has a direct effect on the bottom-line, so every second count in resolving issues. But with millions of metrics being generated by thousands of microservices, how do you choose which metrics to pay attention to? How do you make your alerts meaningful to avoid alert fatigue and desensitisation? How do you respond to those alerts in a timely manner?
In this talk, Matt covers how Expedia is using Machine Learning to "close the loop" involved in detecting, diagnosing and remediating outages post-release. You will learn about how to use ML to build models for anomaly detection in metrics. You will also learn about "ML-Ops" and how to build a platform for training and deploying ML models.
Senior Software Development Engineer
Matt Callanan is a Senior Software Development Engineer at Expedia Group where he has worked in developer, lead, and management roles. He has 20 years software experience in finance, telecommunication, security, and travel industries. A well-received presenter to IT departments, executive teams, and DevOps conferences, his current focus is on Machine Learning and AI where he is passionate about mentoring software engineers in the skills they need for the future.