MLOPS: Emerging best practices for deploying ML Models in production

It comes as no surprise that machine learning models are in production systems right now. How a machine learning algorithm got there is a little bit complicated story. So in the recently concluded MLOps (Machine Learning and IT Operations), that was held virtually from June 15th to 18th, one thing is clear, the proliferation of machine learning algorithms in production systems is a challenge to deploy. The theme of most of the talks in the conference is centered around taking software development practices and DevOps (software development and IT operations) and applying them to deploying machine learning algorithms.


The workshop gave meaningful insights into how an organization can adapt some of the platforms presented at the conference. If we want to leverage machine learning algorithms in software products, we need a platform to consistently develop, test, and deploy machine learning algorithms in production.

The workshop presented some of the platforms that can be used right now, like Domino https://www.dominodatalab.com/, MLFlow, KubeFlow. These platforms are mature enough to be used in production. I hope that we would not build our half-baked version of some of these platforms in our organization. We will be years behind in functionality, and it can be a bottleneck for getting machine learning in our products. Creating your MLOps tool can be uneconomical in the long run too. Besides adopting one of these platforms for your MLOps needs (or if we ever build one - which I hope not), there would also be work to ensure that product platforms can accommodate the delivery mechanism from these platforms.


Another issue that most of the sessions address using an MLOps platform to manage the process of production algorithm delivery is "model drift." A model's performance will eventually dip or drift because of the dated data used to train it. A model can encounter cyclical drift, which is seasonal (e.g., activities tied to the season), gradual drift the model meets new data, or a sudden drift because of model actions. The critical concept to mitigate model drift is to do continuous improvement or delivery, using AutoML techniques as well as architecture that allows multiple versions of the model in production. Reflecting on some of the models we have in production, a need to establish how to address model drift is necessary. 


Highlights of sessions that I attended from the conference:


Walkthrough: Utterance Generator Fast Tracks Chatbot Training 

The session is one of the bonus workshops that you need to get a ticket to reserve a seat. The presentation is about how the presenters trained a chatbot with utterances of the same meaning. At first, I was a bit confused about why you would train your chatbot with statements that mean the same thing, that would have been a boring chat. A chatbot is now synonymous with command-line parsing, which now makes sense to train with utterances of the same meaning. A client using the chatbot will now be able to submit phrases of the same meaning that have the chatbot take action on.

The utterances of the same meaning were found through agglomerative clustering using cosine distance as the metric. Once the utterances are clustered, a quick check of the quality and quantity of the utterances was performed to ensure that the phrases are reflective of real-user utterances. These clustered utterances were used to train a language model so that the model would be syntactically diverse. The language model is now used in the chatbot to generate quality paraphrases for the chatbot to consume.


Automating Production Level Machine Learning Operations on AWS 

  • Speaker was not able to connect - turned into a Q and A with a quick impromptu presentation.


Simplify ML Pipeline Automation 

A presentation about MLRun

The point was to present one of the most straightforward ML pipeline automation

 https://github.com/mlrun/mlrun 


DevOps for Machine Learning and other Half-Truths 

As most of the sessions in the conference, the fundamental question is, "How to deploy a model ?"

We should follow DevOps best practices. Following DevOps best practices means there will be many parallels between SDLC and the emerging Machine Learning Life Cycle. The Machine Learning Life Cycle will have unique situations where a typical SDLC might not have dealt with before, like including data versioning when training models. Data is useless without algorithms, and models will not perform well without data. Model training requires a lot of resources and long compute cycles, which might not be familiar in a traditional SDLC.

Making a long story short, deploying ML is economically challenging, demanding the need for best practices. Some of the challenges are the following:

  1. Model and Algorithms can be stuck in the lab - researchers might never deploy a model but will have to spend time and resources experimenting with the model.
  2. Disconnected teams - along with a data scientist, organizations need a data engineer, a DevOps engineer, and a software engineer to bring models to production. Asking a data scientist to build infrastructure is just not reasonable. The first step of creating a model can be done on a laptop, but getting it in production is another story.
  3. Technology mismatch - technology stack to use for deploying machine learning in production. This is one point that resonates deeply in our organization. We do have a defined stack and best practices in our current product platforms, but it is severely mismatched with the current technology that leverages machine learning. We need to consider adding Python services to our product platforms.

Smart Data Products – From Prototype to Production 

The highlight of this session is the mistakes of taking an experiment notebook to a production software artifact. A notebook is a suitable proof of concept, but a lot of work is needed before it can be used in production.

Some of the grave mistakes that should be avoided:

  1. Not incorporating data science artifacts as part of the CI/CD chain. Labeled and curated data is the necessary foundation of a machine learning model. This point is evident but not easily implemented. Our organization should consider looking into the best practices for this.
  2. Assumptions that data will remain the same Hence the need for number 1. Variables change, and a model needs to be refreshed because of drift.


ML Monitoring ML: Scalable Monitoring of ML Models in Production 

Well, what can go wrong in production once we successfully deploy a model in a product? There are a lot of things that can go wrong. Mistakes in production can be costly, so monitoring your model in production is an essential infrastructure that needs to be in place.

What do we need to monitor? Model metrics and unexpected changes.

Here is the tool that was introduced in this session: 

https://github.com/anodot/MLWatcher

The philosophy behind this tool is that dashboards don't scale really well; that is why the tool is designed to detect anomalies at scale.



After all of that, I found a session

Building an AI Capability in the United States Army – Isaac Faber which I attended because of an organization as big as the United States Military, I was interested in how they are rolling AI capabilities in different platforms with high reliability in such a large scale. The United States Military is one huge enterprise that any organization can look upon for best practices.

So the first thing they did is launch an AI Task Force - which is the central organization, to direct the effort of rolling out AI capabilities in a much broader organization. Let us mimic this task force in our organization. The task force lets the military units dictate the AI initiative but intervene to develop AI frameworks at scale. The task force also reviews policies that impede the deployment of AI capabilities in the smaller units.  


Another session that I found interesting was

Lessons Learned Building – NLP Systems in HealthCare

NLP in healthcare is a little bit like NLP in legal documents. The vocabulary and syntax are unique to the domain that out of the box solutions trained on general language corpora like news does not work. Sentence disambiguation and named entity recognition are more challenging in the healthcare and legal domains because of the characteristics of the field. The speaker enumerated the difficulties encountered for working in the domain. By applying state-of-the-art methods in reading comprehension, the speaker concluded that it is better to train NLP models in the healthcare domain with your proprietary data. The reuse of an NLP model doesn't work for the task the speaker was trying to solve.

This kind of resolve towards off-the-shelf NLP models are reflective of what we encounter when we use such models in the legal domain. So to effectively solve NLP tasks in such fields, the general approach was to use multiple sources for the same document taking into consideration both structured and unstructured data.

Comments

Popular posts from this blog

OAuth 1.0a Request Signing and Verification - HMAC-SHA1 - HMAC-SHA256

Spark DataFrame - Array[ByteBuffer] - IllegalAurmentException

Gensim Doc2Vec on Spark - a quest to get the right Vector