As hard as it is for data scientists to tag data and develop accurate machine learning models, managing models in production can be even more daunting. Recognizing model drift, retraining models with updating data sets, improving performance, and maintaining the underlying technology platforms are all important data science practices. Without these disciplines, models can produce erroneous results that significantly impact business.
Developing production-ready models is no easy feat. According to one machine learning study, 55 percent of companies had not deployed models into production, and 40 percent or more require more than 30 days to deploy one model. Success brings new challenges, and 41 percent of respondents acknowledge the difficulty of versioning machine learning models and reproducibility.
The lesson here is that new obstacles emerge once machine learning models are deployed to production and used in business processes.
Model management and operations were once challenges for the more advanced data science teams. Now tasks include monitoring production machine learning models for drift, automating the retraining of models, alerting when the drift is significant, and recognizing when models require upgrades. As more organizations invest in machine learning, there is a greater need to build awareness around model management and operations.
The good news is platforms and libraries such as open source MLFlow and DVC, and commercial tools from Alteryx, Databricks, Dataiku, SAS, DataRobot, ModelOp, and others are making model management and operations easier for data science teams. The public cloud providers are also sharing practices such as implementing MLops with Azure Machine Learning.
There are several similarities between model management and devops. Many refer to model management and operations as MLops and define it as the culture, practices, and technologies required to develop and maintain machine learning models.
Understanding model management and operations
To better understand model management and operations, consider the union of software development practices with scientific methods.
As a software developer, you know that completing the version of an application and deploying it to production isn’t trivial. But an even greater challenge begins once the application reaches production. End-users expect regular enhancements, and the underlying infrastructure, platforms, and libraries require patching and maintenance.
Now let’s shift to the scientific world where questions lead to multiple hypotheses and repetitive experimentation. You learned in science class to maintain a log of these experiments and track the journey of tweaking different variables from one experiment to the next. Experimentation leads to improved results, and documenting the journey helps convince peers that you’ve explored all the variables and that results are reproducible.
Data scientists experimenting with machine learning models must incorporate disciplines from both software development and scientific research. Machine learning models are software code developed in languages such as Python and R, constructed with TensorFlow, PyTorch, or other machine learning libraries, run on platforms such as Apache Spark, and deployed to cloud infrastructure. The development and support of machine learning models require significant experimentation and optimization, and data scientists must prove the accuracy of their models.
Like software development, machine learning models need ongoing maintenance and enhancements. Some of that comes from maintaining the code, libraries, platforms, and infrastructure, but data scientists must also be concerned about model drift. In simple terms, model drift occurs as new data becomes available, and the predictions, clusters, segmentations, and recommendations provided by machine learning models deviate from expected outcomes.
Successful model management starts with developing optimal models
I spoke with Alan Jacobson, chief data and analytics officer at Alteryx, about how organizations succeed and scale machine learning model development. “To simplify model development, the first challenge for most data scientists is ensuring strong problem formulation. Many complex business problems can be solved with very simple analytics, but this first requires structuring the problem in a way that data and analytics can help answer the question. Even when complex models are leveraged, the most difficult part of the process is typically structuring the data and ensuring the right inputs are being used are at the right quality levels.”
I agree with Jacobson. Too many data and technology implementations start with poor or no problem statements and with inadequate time, tools, and subject matter expertise to ensure adequate data quality. Organizations must first start with asking smart questions about big data, investing in dataops, and then using agile methodologies in data science to iterate toward solutions.
Monitoring machine learning models for model drift
Getting a precise problem definition is critical for ongoing management and monitoring of models in production. Jacobson went on to explain, “Monitoring models is an important process, but doing it right takes a strong understanding of the goals and potential adverse effects that warrant watching. While most discuss monitoring model performance and change over time, what’s more important and challenging in this space is the analysis of unintended consequences.”
One easy way to understand model drift and unintended consequences is to consider the impact of COVID-19 on machine learning models developed with training data from before the pandemic. Machine learning models based on human behaviors, natural language processing, consumer demand models, or fraud patterns have all been affected by changing behaviors during the pandemic that are messing with AI models.
Technology providers are releasing new MLops capabilities as more organizations are getting value and maturing their data science programs. For example, SAS introduced a feature contribution index that helps data scientists evaluate models without a target variable. Cloudera recently announced an ML Monitoring Service that captures technical performance metrics and tracking model predictions.
MLops also addresses automation and collaboration
In between developing a machine learning model and monitoring it in production are additional tools, processes, collaborations, and capabilities that enable data science practices to scale. Some of the automation and infrastructure practices are analogous to devops and include infrastructure as code and CI/CD (continuous integration/continuous deployment) for machine learning models. Others include developer capabilities such as versioning models with their underlying training data and searching the model repository.
The more interesting aspects of MLops bring scientific methodology and collaboration to data science teams. For example, DataRobot enables a champion-challenger model that can run multiple experimental models in parallel to challenge the production version’s accuracy. SAS wants to help data scientists improve speed to markets and data quality. Alteryx recently introduced Analytics Hub to help collaboration and sharing between data science teams.
All this shows that managing and scaling machine learning requires a lot more discipline and practice than simply asking a data scientist to code and test a random forest, k-means, or convolutional neural network in Python.