The Three-Rs of Data-Science - Repeatability, Reproducibility, and Replicability
YOW! Data 2019
Adaptation of data-science in industry has been phenomenal in the last 5 years. Primary focus of these adaptations has been about combining the three dimensions of machine-learning i.e. the ‘data’, the ‘model architecture’ and the ‘parameters’ to predict an outcome. Slight change in any of these dimensions has potential to skew the predicted outcomes. So how do we build trust with our models? And how do we manage the variances across multiple models trained on varied set of data, model-architectures and parameters? Why the three Rs i.e. “Repeatability, Reproducibility, and Replicability” may have a relevance in industry application of data-science?
This talk has following goals:
- Justify (with demonstrations) as to why “Repeatability, Reproducibility, and Replicability” is important in data-science even if the application is beyond experimental research and is geared towards industry applications.
- Discuss in detail the requirements around ensuring “Repeatability, Reproducibility, and Replicability” in data-science.
- Discuss ways to observe repeatability, reproducibility, and replicability with provenance and automated model management.
- Present various approaches and available tooling pertaining to provenance and model managements and compare and contrast them.
Sr. Data Scientist
A software professional with experience in product design, development & leading teams. Enjoys architecting and developing the next generation highly scalable, fast failing distributed platforms that encompass latest engineering techniques and machine learning models to bring a positive difference to overall end-user experience.