Join Newsletter

Cloud Performance Root Cause Analysis at Netflix

YOW! 2018 Brisbane

At Netflix, improving the performance of our cloud means happier customers and lower costs, and involves root cause analysis of applications, runtimes, operating systems, and hypervisors, in an environment of 150k cloud instances that undergo numerous production changes each week. Apart from the developers who regularly optimize their own code, we also have a dedicated performance team to help with any issue across the cloud, and to build tooling to aid in this analysis. In this session we will summarize the Netflix environment, procedures, and tools we use and build to do root cause analysis on cloud performance issues. The analysis performed may be cloud-wide, using self-service GUIs such as our open source Atlas tool, or focused on individual instances, and use our open source Vector tool, flame graphs, Java debuggers, and tooling that uses Linux perf, ftrace, and bcc/eBPF. You can use these open source tools in the same way to find performance wins in your own environment.
m

Brendan Gregg

Sr. Performance Architect

Netflix

United States

Brendan Gregg is an industry expert in computing performance and cloud computing. He is a senior performance architect at Netflix, where he does performance design, evaluation, analysis, and tuning. He is the author of multiple technical books including Systems Performance published by Prentice Hall, and received the USENIX LISA Award for Outstanding Achievement in System Administration. He has also worked as a kernel engineer, and as a performance lead on storage and cloud products. Brendan has created performance analysis tools included in multiple operating systems, and visualizations and methodologies for performance analysis, including flame graphs.