Proactive Data Pipeline Maintenance via Machine Learning-Driven Anomaly Detection

Proactive Data Pipeline Maintenance via Machine Learning-Driven Anomaly Detection

Last Updated about 17 hours ago

About

This canvas replicates the workflow from the paper by Akash Vijayrao Chaudhari and Pallavi Ashokrao Charate implementing a synthetic data pipeline throughput dataset with injected anomalies including spikes, drops, and schema drift. It uses Isolation Forest as the primary anomaly detection model, tuning its contamination parameter via grid search to optimize detection performance (accuracy, precision, recall, and F1). The canvas includes detailed evaluation blocks such as confusion matrix heatmaps, performance metrics, time series anomaly overlays, and visualizations of throughput and schema version changes over time. Custom thresholds are applied to balance sensitivity, particularly to detect schema drift events. Overall, the canvas reproduces the core anomaly detection and evaluation methodology described in the paper, providing a clear, extensible environment for experimentation and further research in proactive data pipeline maintenance.

Share:
X

Transform your data science journey with Zerve

Explore & develop at light speed.

Footer Background