Putting MLOps into practice effectively with the help of Databricks

MLOps
Blog

Karen Suenaga ,

Thamys Abrahão and

Wilder Galvão

Published: October 21, 2024

In today's data-driven landscape, MLOps (machine learning operations) has become a critical practice for companies aiming to deploy reliable, maintainable and scalable machine learning models. With the increasing prevalence of data science across industries, the need for MLOps has grown exponentially. It ensures that machine learning models are not only deployed efficiently but are also maintained over time to continue delivering value.

Why MLOps today?

MLOps is essential for addressing the operational challenges that arise from deploying machine learning models at scale. These challenges include managing model versions, ensuring reproducibility, monitoring model performance and maintaining data quality.

There are a range of tools and platforms available that can enable MLOps adoption. One that we’ve used on a number of client engagements is Databricks. One of the benefits of Databricks that we’ve seen is that it can simplify the process of embedding MLOps practices by bringing together a diverse range of features and capabilities.

Tackling core MLOps challenges

1. Code organization and management

Ensuring clean, maintainable and well-organized code is a major challenge in MLOps. Poorly organized code can lead to difficulties in collaboration, scalability and maintenance.

In an end-to-end data science project development there’s usually common steps such as data extraction, model training and creating predictions. Thus, it is recommended that you split processes into separate files, according to their respective contexts. Concentrating everything in a single file can be harder to understand, collaborate on and integrate new features to.

We tackle this challenge by using separate files in Databricks; this makes orchestration and monitoring easier. This separation improves code readability and maintainability. Another benefit we have found using Databricks is that it integrates seamlessly with Git for version control, enabling better collaboration and code management.

2. Test-Driven Development (TDD)

Implementing tests early in the development cycle is crucial for maintaining code quality and reducing bugs. However, many teams struggle with integrating TDD into their workflows.

Testing frameworks like Pytest can be used in Databricks to check your functions, which, in turn, enhances code reliability and reduces the cost of bug fixes.

3. Workflow automation

Managing complex machine learning pipelines with multiple interdependent tasks can be cumbersome and error-prone.

One of the ways of tackling this challenge is by automating timeline execution. Databricks Workflows helps us by allowing us to define and manage the logic needed to execute different tasks. While there are, of course, alternative approaches to this problem, we also like the fact that the platform also supports automated execution through scheduling, continuous execution and triggers, which optimizes resource usage and reduces the need for manual intervention.

4. Experiment tracking and versioning

When creating a machine learning model it’s important to register different experiments and version your models so you can compare and decide the best one to be used.

MLflow is integrated in Databricks and is useful for tracking experiments and implementing model versioning and management. It’s also helpful to log model's hyperparameters, performance metrics and prediction's statistics.

5. Feature Store

Creating features can be computationally expensive and time consuming so having a centralized repository to store and share features among data scientists is another good practice in developing machine learning models.

In Databricks, Feature Store helps us tackle this challenge by having this centralized repository that allows you to easily check the lineage of your feature as it tracks its origin, transformation, and dependencies from its upstream data to its downstream data.

6. Monitoring

Once a model has been deployed, it’s important to monitor performance metrics, input data and its predictions. Using this kind of monitoring will help you to easily identify any signs of data or concept drift.

Databricks Lakehouse Monitoring can be integrated by keeping track of the metrics using your input and prediction tables. To create a robust monitoring dashboard, you can integrate it into Databricks SQL dashboards/Databricks alerts.

Conclusion

MLOps can be a valuable method of increasing the speed at which machine learning models are deployed into production. As demand increases for AI-driven products and features, it is only going to become more important for teams to embrace such an approach.

It’s important to note that MLOps cannot be accomplished with a single tool or platform: it is a practice and thus requires a particular mindset and set of skills across the relevant teams. However, vendors of MLOps products can significantly help teams looking to make a step forward into the practice. The right tool will ultimately be driven by your specific needs and organizational context, but our experience with Databricks has demonstrated how MLOps platforms can play a valuable role in embedding the techniques inside the existing tools.

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

Solutions

Industries

Resource Hubs

Publications and Tools

All Insights