Karishma is currently working as a Machine Learning Specialist at S&P Global Market Intelligence, India. Her work is focused on developing data extraction solutions using machine learning and natural language processing generating significant client revenue. She comes from an engineering background with around ~4 years of experience in the data science domain. When she is not in front of a screen, she enjoys volunteering for women empowerment and diversity causes.
Beyond the usual concerns in software development, machine learning development comes with additional challenges. These include trying multiple algorithms and parameters to get the best results, tracking these runs for reproducibility, and moving the model to diverse deployment environments. This talk demonstrates the use of an open-source platform called MLflow for managing the complete machine learning lifecycle with Python. The talk requires a basic understanding of Python and Machine Learning concepts.
In theory, the crux of machine learning (ML) development lies with data collection, model creation, model training, and deployment. In reality, machine learning projects are not so straightforward. They are a cycle iterating between improving the data, model, and evaluation that is never really finished. Unlike in traditional software development, ML developers experiment with multiple algorithms, tools, and parameters to optimize performance, and they need to track these experiments to reproduce work. Furthermore, developers need to use many distinct systems to productionize models.
In this talk, we introduce MLflow, an open-source platform that aims at simplifying the entire ML lifecycle where we can use any ML library and development tool of our choice to reliably build and share ML applications. MLflow offers simple abstractions through lightweight APIs to package reproducible projects, track results, and encapsulate models that are compatible with existing tools, thereby, accelerating ML lifecycle of any size.
With the help of an example, we will show how using MLflow can ease bookkeeping of experiment runs and results across frameworks, quickly reproducing runs on any platform (cloud or local execution), and productionizing models on diverse deployment tools.
At the end of this talk, you will be familiar with –
- Key concepts, abstractions, and components of open-source MLflow
- How each component of MLflow addresses challenges of ML lifecycle
- How to use MLflow Tracking during model training to record experimental runs
- How to use MLflow Tracking User Interface to visualize experimental runs with different tuning parameters and evaluation metrics
- How to use MLflow Projects for packaging reusable and reproducible models
- How to use MLflow Models general format to serve models using MLflow REST API
The purpose of the session is to introduce the audience to MLflow and give a taste of the ML development lifecycle. It is intended at providing a breadth than depth survey of MLflow platform, and we leave the audience to experiment with it further through takeaway exercises.
- Basic knowledge of Python programming language
- Basic understanding of machine learning concepts
Data Science in Production, Machine Learning, Data Engineering or MLOps