Whoever invents a complete automation tool for correcting model drift deserves a major award, or at least a few rounds of beer. Until then, data scientists across San Francisco are left to monitor and manage machine learning models on their own terms.
While model drift may be unavoidable if unattended, Sun Hae Hong, a senior machine learning scientist at molecular and clinical data library Tempus, said there are actions to protect against it. To start, plan ahead. Hong recommends model drift consideration at the early development stage, during validation and after the deployment of every data model. Below, Hong breaks down the steps machine learning scientists can take — and the questions developers should ask — to protect against model drift.
The Early Development Phase
Developers should spend ample time understanding the data itself. Visualization, univariate statistical tests and linear regression are good ways to identify simple trends.
Questions developers should ask in the early development phase:
- Is the data clean and reliable?
- Are there highly correlated variables?
- Are there frequently missing variables?
- Are there seasonal effects?
- Are there any time-dependent trends?
Build simple and interpretable models. Should you choose a complex and unexplainable model, it may be prudent to build a simple and explainable model that can mimic the model behavior at least locally. If there are seasonal effects, models can be built to account for them.
Identify key features. It is important to understand how key features contribute to prediction. If the effect of a feature is contradictory to what is known, consider validating it. It is possible those counterintuitive features were just modeling artifacts due to confounders, in which case the effect is not a novel finding but instead an issue to fix.
Assess your model with various performance metrics. Performance metrics measure different aspects of a model. Developers may need to assess several metrics to understand model performance and limitations fully. In binary classification of highly imbalanced classes, accuracy may be inflated, but F1 scores can compensate for accuracy limitations.
Develop a plan for tracking changes over time using tools such as data metrics and dashboarding, both in terms of input data and model outputs. If available, use input data monitoring to ensure consistently high-quality data over time. Datasets that minimize “nuisance” variation like the day-by-day variability, but preserve important variation, are considered good for modeling.
The Validation Phase
Document model performance and limitations. Communicating that information to stakeholders can result in an improved release experience.
Establish plans for model monitoring and retraining. Previously unseen data should be used to evaluate models. If temporal data is available, data from a different span of time can be reserved as validation data to assess potential model drift.
Determine key metrics to be monitored. Metrics can include model performance indicators, such as accuracy and F1 score as well as feature-based metrics. Developers may want to track the observed distribution of their metrics as well as define ranges for warnings and failures. If there is a mathematical model for a metric, document both the expected distribution and the observed distribution.
Determine rules for when to retrain and reevaluate models. For example, if accuracy or F1 scores drop below a designated threshold, the model may need to be retrained. Even if everything seems to be running well, it can be prudent to retrain and reevaluate models periodically to minimize risks of failures. The periodicity can be determined before deployment.
The Post-Deployment Phase
Babysit a model right after deployment. Anticipate model failures on edge cases right after deployment when the model suddenly sees new data. To ensure the model is working as expected, retrospective analyses can be performed. Depending on the context and the sample size, one-week, one-month and three-month marks may be good times for retrospective analyses.
Monitor key metrics identified during validation. Consider adding a built-in alert system that will notify model drift. Real-time tracking with business intelligence (BI) dashboards is highly recommended. BI tools will facilitate timely response if there is an urgent issue with the model.
Questions developers should ask in the post deployment phase:
- Are the input features stable over time?
- Are the targets stable over time?
- Are the key metrics contributing to the model in the same way as during training?
Plan to not only retrain but revisit the model. Instead of blindly retraining with newer data, consider assessing the data and the model. Update warning ranges and triggering rules for retraining if needed.