Embedding ML systems into production is still a hard thing to do (for most companies)

Photo by Glen Carrie @ Unsplash.com

Have you ever heard of a company that successfully integrated Machine Learning into their business processes overnight, completely transforming the way the organization operated from one day to the next?

Yup, me neither!

And did you did you know that most ML models never make it to production?

Setting up production-level systems into business processes is extremely hard. By production-level, I mean, systems that have a certain level of reliability that add value to the company’s top and bottom line. Embedding ML systems into organizations is not an overnight’s job and, honestly, Data Science and Machine Learning gets a bad rep just because leaders get lost in the process. Particularly, I see two types of mistakes when trying to experiment with ML first:

  • Incorrect expectations: This one is extremely common and the fault lies in ML vendors. High expectations about ML and AI systems are normally caused by people that want to sell those systems (or by media hype). But hear me out: every ML system has error and there’s no other way around it.
  • Data Scientists = Magicians: Another common error is when companies hire a bunch of data scientists without any plan or concrete goal for ML in the organization. Adding a couple of fresh math graduates to your organization won’t make ML work, per se.

What bummers me is that most organizations and leaders then lean on the “ML does not work” or “ML is not for me” arguments. That can be true but more often than not, the root cause is because they’ve commited one of the two capital mistakes indicated above.

How should leaders approach integrating machine learning into their organizations? In the midst of the current hype surrounding this technology, how can they avoid being deceived by companies that are merely looking to make a quick profit?

In this blog post, we’ll approach some of the main properties and ideas behind embedding machine learning into companies — and why it’s so hard to see immediate results.

To set the stage, it’s important to define what I mean by “incremental”? I define it as the stage where companies begin integrating machine learning (ML) and data science processes through non-critical operations via proof of concept (POCs). The focus here is on getting these POCs into production rather than experimenting with ML indiscriminately. This approach accelerates future deployments and helps avoid underestimating the challenges of deploying ML models in production, which is often a complex task.

“It’s the data, stupid!”

Well, machine learning systems need to.. learn. And they need to learn from quality data. You wouldn’t like to study mathematics based on a textbook from 2000 years ago, written in Papyrus paper, would you?

The same applies to machine learning — it needs high-quality data. However, many companies overlook this crucial step and rush straight to implementing “fancy neural network algorithms.” For example:

  • Want to predict which clients will churn? Many companies hastily apply powerful neural networks to poorly recorded, unreliable data.
  • Want to predict the next best item for your customer? “Let’s fit that gradient boosting model and pray to the God of machine learning that your model understands that customers are not over 150 years old

Data governance, data leadership and setting proper data engineering pipelines are the first steps to embed ML into the organization and avoid falling into the eternal-POC stage. Ideally, this should come even before you start to hire any PhDs (nothing against them, they’re great) that will transform your organization top to bottom with the latest LLM released from OpenAI :-)

There is a specific stage of data maturity at which companies begin to see substantial benefits from Machine Learning and Data Science algorithms. Below this level, the old adage “garbage in, garbage out” still holds true.

Image by Scott Rodgerson @ Unsplash.com

Reliability, Scalability, Maintainability, Adaptability

From Chip Huyen’s Designing Machine Learning Systems, I really enjoy this checklist that wants to give some guidance on how a machine learning model in production should behave.

Reliability — ML systems will ultimately fail in some cases. Knowing how they fail and understanding their weaknesses is deeply tied with their reliability. One common way for companies to assess reliability of an ML model is to introduce an human in the loop first, in a testing stage.

Scalability — ideally, we would like our algorithms to scale fast. You need to develop machine learning models that can handle possibly hundred of thoussands of requests without even blinking. For some organizations, these types of scalable processes are just not there- in the end, they blame ML, when the reality is that most of their IT processes don’t scale at all.

Maintainability — another important property. This tends to fail when companies just have data scientists develop the typical notebook that is dependent on a single machine or a single person. “It works on my machine” is a common saying amongst developers, that gets special difficult to overcome in the context of ML. The result of low maintainability models leads not only to poor value added from ML, but also in low happiness from data scientists and machine learning engineers.

Finally, there’s adaptability. This is often the last factor companies consider when integrating ML into their organization. Reaching this stage indicates a mature level of ML implementation, where models can be quickly retrained or applied to different scenarios and problems. In today’s fast-paced world, this adaptability is becoming increasingly crucial.

Incrementally incorporating ML projects into your organization allows you to understand and master the four essential properties that ML models ideally need to ensure.

Senior Leadership

This is where I will contradict the title of the post. It’s the only scenario where, even with the incremental and step-by-step introduction of ML projects, they still won’t succeed. And it’s one of the most important topics related to ML strategy.

It doesn’t matter what you do as a data scientist, if senior leadership doesn’t believe in data-driven decision making, nothing can be done to embed analytics and machine learning onto the organization.

The fear of using data can cascade through the organization, affecting middle management and ultimately influencing every decision-making process.

If leaders believe that their intuition is more accurate than data and can fully understand the company’s operations (newsflash: it can’t), then ML efforts will be futile. Analytical projects will falter, Machine Learning models will face persistent criticism, and AI will be relegated to mere buzzwords in the corridors:

  • We already tried that, but it didn’t work.

If you are a Machine Learning Engineer or Data Scientists in one of those organizations (and your goal is to become a good professional in the field), it’s time to leave :-)

Photo by khyta @Unsplash.com

Data Science has Science in its name

Another reason why Data Science (and ML, in extend) should be embed incrementally, is the fact that it has science in its name.

A lot of traditional companies are not trained on working with the scientific process or even dealing with failure. Organizations that are too risk averse or don’t understand how they can use scientific method to their advantage will have issues deploying production level models with Machine Learning and extract value.

It takes time to incorporate these processes, particularly in companies that have low risk tolerance. Experimenting with non-critical use cases and understand how AI and automation add value to them is normally a better way than selecting use cases based on certain directors’ enthusiasm. Remember, performance = expectations-reality, and setting up expectations that are too high for processes that contain a non-trivial amount of error is extremely unfair to any Machine Learning Process.

And that’s a wrap! I hope you’ve enjoyed reading this post. I’d love to hear your thoughts — do you agree or disagree with any of the topics I’ve described above? How has your organization integrated AI and ML into its processes? Was it a smooth transition?

The advice and guidance I’ve shared here are particularly relevant for larger companies where data processes aren’t central to the business. For example, retail companies can operate without focusing on data, though being competitive is another matter. In contrast, startups and tech companies often rely heavily on data for their core operations and revenue. Removing ML from companies like Meta or Google would probably mean a whole different business model.

This blog post has been mostly written based on my experience at implementing these systems at different comapnies with DareData — our goal is to democratize machine learning. From our experience, failing to follow these standards is one of the main reasons organizations struggle to integrate ML and AI into their business.