Kalyan is a Machine Learning Specialist at S&P Global Market Intelligence with 5 years of experience working at the intersection of Data Science & Automation for various domains. His work involves developing data extraction models and building human-in-the-loop pipelines. He spends his spare time listening to podcasts of Data Skeptic, Joe Rogan and Naval Ravikant.
Beyond the usual concerns in software development, machine learning development comes with additional challenges. These include trying multiple algorithms and parameters to get the best results, tracking these runs for reproducibility, and moving the model to diverse deployment environments. This talk demonstrates the use of an open-source platform called MLflow for managing the complete machine learning lifecycle with Python. The talk requires a basic understanding of Python and Machine Learning concepts.
In theory, the crux of machine learning (ML) development lies with data collection, model creation, model training, and deployment. In reality, machine learning projects are not so straightforward. They are a cycle iterating between improving the data, model, and evaluation that is never really finished. Unlike in traditional software development, ML developers experiment with multiple algorithms, tools, and parameters to optimize performance, and they need to track these experiments to reproduce work. Furthermore, developers need to use many distinct systems to productionize models.
In this talk, we introduce MLflow, an open-source platform that aims at simplifying the entire ML lifecycle where we can use any ML library and development tool of our choice to reliably build and share ML applications. MLflow offers simple abstractions through lightweight APIs to package reproducible projects, track results, and encapsulate models that are compatible with existing tools, thereby, accelerating ML lifecycle of any size.
With the help of an example, we will show how using MLflow can ease bookkeeping of experiment runs and results across frameworks, quickly reproducing runs on any platform (cloud or local execution), and productionizing models on diverse deployment tools.
At the end of this talk, you will be familiar with –
- Key concepts, abstractions, and components of open-source MLflow
- How each component of MLflow addresses challenges of ML lifecycle
- How to use MLflow Tracking during model training to record experimental runs
- How to use MLflow Tracking User Interface to visualize experimental runs with different tuning parameters and evaluation metrics
- How to use MLflow Projects for packaging reusable and reproducible models
- How to use MLflow Models general format to serve models using MLflow REST API
The purpose of the session is to introduce the audience to MLflow and give a taste of the ML development lifecycle. It is intended at providing a breadth than depth survey of MLflow platform, and we leave the audience to experiment with it further through takeaway exercises.
- Basic knowledge of Python programming language
- Basic understanding of machine learning concepts
Data Science in Production, Machine Learning, Data Engineering or MLOps