Working with Large Numbers of Non-Trivial ETL Pipelines
YOW! Data 2019
Data pipelines need to be flexible, modular and easily monitored. They are not just set-and-forget. The team that monitors a pipeline might not have developed it and may not be experts on the dataset. End users must have confidence in the output.
This talk is a practical walkthrough of a suggested pipeline architecture on AWS using Step functions, Spot instance, AWS Batch, Glue, Lambda and Data Dog.
I'll be covering techniques using AWS and DataDog, but many of the approaches are applicable in an Apache Airflow/Kibana environment.
Chief Data Architect
I have 15 years software development experience with the last ten years spent working with medium to large data. I have a passion for exploration, optimisation and best practices. I enjoy developing and mentoring teams and experimenting with new technology. I prefer to be technology agnostic and am focused on finding the right solution for a problem.