🏀Zerve chosen as NCAA's Agentic Data Platform for 2026 Hackathon·🏆Zerve × ODSC AI Datathon — $10k Prize Pool·📈We're hiring — awesome new roles just gone live!
Back

Proactive Data Pipeline Maintenance via Machine Learning-Driven Anomaly Detection

greg
July 21, 2025

About

This canvas replicates the workflow from the paper by Akash Vijayrao Chaudhari and Pallavi Ashokrao Charate implementing a synthetic data pipeline throughput dataset with injected anomalies including spikes, drops, and schema drift. It uses Isolation Forest as the primary anomaly detection model, tuning its contamination parameter via grid search to optimize detection performance (accuracy, precision, recall, and F1).
The canvas includes detailed evaluation blocks such as confusion matrix heatmaps, performance metrics, time series anomaly overlays, and visualizations of throughput and schema version changes over time. Custom thresholds are applied to balance sensitivity, particularly to detect schema drift events.

Overall, the canvas reproduces the core anomaly detection and evaluation methodology described in the paper, providing a clear, extensible environment for experimentation and further research in proactive data pipeline maintenance.

Related Topics

Decision-grade data work

Explore, analyze and deploy your first project in minutes