4shadow analytics

9/13/2023

Shadow mode should not be confused with feature flagging/toggling. There are a few ways to implement this functionality, which we will discuss in section 4. Instead, the old version of the service or model continues to serve responses or predictions, and the new version’s results are merely captured and stored for analysis. “Shadow Mode” or “Dark Launch” as Google calls it is a technique where production traffic and data is run through a newly deployed version of a service or machine learning model, without that service or model actually returning the response or prediction to customers/other systems. But this story could have had a very different ending if alternative deployment strategies had been in place, and these strategies are what we will consider in the upcoming sections. Think of models in healthcare, agriculture, shipping and logistics, legal services, the list goes on.

Many industries using machine learning to make important predictions are at risk of similar disasters.

The fact that it involves a bank doesn’t mean that these are risks faced only by ML practitioners in finance. The business survives, if the regulatory authority is in a forgiving mood. The CEO of the bank spends a long time shouting at the CTO and the Chief Data Scientist, and then spends the next month apologizing to everyone. By this time, thousands of loans have been issued to people who should have been offered a better interest rate, meaning that all the bank’s future predictions of default rates, its promises to the regulatory authority, and its guarantees to companies which bought a part of the loan book, are inaccurate. One day, three months later, someone discovers the bug. Parts of the loan book are sold to external parties and assurances are made to the industry regulator about the level of risk being taken. But there is no dramatic effect, so no alarms are sounded, and a few months go by. This makes it appear that a subset of bank customers are spending more on their lifestyle, which means the bank issues them a slightly worse credit score, and therefore a loan with a slightly higher interest rate than it usually would (the exact specifics don’t matter, the point is, there is a subtle error). Now for example, instead of using a key feature which distinguishes student loan payments, an older version of the feature which doesn’t make the distinction is used. Perhaps it was a config typo, or maybe the pipeline code had an edge case, or it could have just been a simple breakdown in communication. Somewhere between the data science department and the release, one of the features was created incorrectly. The model is released to customers, where it begins assessing loan applicant credit worthiness. The features are all captured in config and all the tests pass. Jenny is happy that her model performs well, and passes it to one of her ML engineer colleagues who proceeds to write some new code to perform the feature engineering steps in the production ML application. This new model requires a dozen additional features that none of the bank’s other credit risk models use, but it is decided that the improvements in performance warrant the inclusion of these extra features. Jenny is building a new credit risk assessment model. Why should you care about Machine Learning Deployments?Ħ. “Shadow Mode” is one such deployment strategy, and in this post I will examine this approach and its trade-offs.

This is particularly true for machine learning systems, where detecting subtle data munging, feature engineering or model bugs in production can be very challenging, particularly when the production data inputs are hard to replicate exactly. The strategies you adopt when deploying software have the potential to save you from expensive and insidious mistakes.

0 Comments

discovery guide

4shadow analytics

Leave a Reply.

Author

Archives

Categories