
How to Detect Rare Instances of Fraud Automatically, at a Vast Scale
Detecting fraud is tough because fraud cases are less than 0.2% of all transactions. Recently several of our customers have been working on similar use cases, and I wanted to share how they are doing it at a massive scale with an agentic workflow.
To demonstrate this, I built a detailed anomaly detection system to analyze the Credit Card Fraud Detection dataset from Kaggle. The goals were to test different anomaly detection methods, find the best one, and set up an easy process for retraining and deployment with little manual work.
You can follow along in this Zerve Canvas.
The Challenge: Extreme Class Imbalance in Anonymized Data
The dataset has 284,807 transactions from European cardholders. Only 492 of these are fraud, which is just 0.172%. Features V1 to V28 are hard to understand because they come from PCA projections. The dataset also includes real values for Time and Amount, and a Class label where 1 means fraud and 0 means a normal transaction. This data is high dimensional, anonymized, and very unbalanced, making it ideal for testing anomaly detection methods.
Why Automate the Workflow?
Doing this project manually means repeating preprocessing steps, adjusting thresholds by hand, collecting metrics inconsistently, and risking mistakes.
Using Zerve's agent for Data Science to build an automated workflow, it allowed me to:
Apply the same preprocessing for both training and inference
Test multiple models in a consistent way
Record every change, threshold, and metric
Retrain and redeploy with just one pipeline run
Data Understanding & Preprocessing
We used z-score scaling at the beginning to standardize all numerical features for consistent results. The data was split into training and testing sets using stratified sampling, 80% for training and 20% for testing, to keep the rare fraud cases balanced. These steps were included in a reusable Zerve pipeline that processes all new batches of transactions in the same way.
Model Portfolio and Training Approach
We tested different unsupervised and supervised models to detect anomalies:
Isolation Forest – Uses trees to score anomalies based on path length
One-Class SVM – Separates normal transactions by defining boundaries
Autoencoder – Detects anomalies by measuring reconstruction errors
Deep SVDD-style AE – Learns a tight feature space around normal data
Elliptic Envelope – Assumes normal data follows a multivariate Gaussian distribution
HBOS – Scores outliers using histograms
We trained the models using only Class=0 (normal) transactions so they could learn typical patterns. For neural network models, we used a validation set to choose anomaly thresholds dynamically.





Automated Benchmarking and Selection
The pipeline used the stratified test set to make predictions and calculated:
Precision
Recall
F1-score (main metric due to class imbalance)
Area Under the Precision-Recall Curve (AUPRC)
Confusion matrix
Zerve created a versioned report that gathered all results into one document, making it easier to compare models. The selection started by looking at the F1-score, then used AUPRC or recall to decide if F1-scores were tied. The best model was saved as a versioned .pkl file for production.
Automated Inference and Retraining
The production system loads the selected model and applies the same preprocessing steps before scoring new data. Scoring can be done either daily in batches or whenever needed.
Here's how it works:
Load the chosen model.
Preprocess data just like during training.
Score incoming data using the preprocessed information.
When new labels are available, the system calculates and records performance metrics with timestamps. If it detects a drop in performance due to changes or feedback, it automatically retrains itself. This helps the system keep up with evolving fraud patterns.



Productivity Gains
Switching from manual coding to automated orchestration offers many benefits, such as:
Time Saving – No need to repeat setup for each retrain
Reproducibility – Complete logs of transformations and metrics for every run
Consistency – It uses the same rules and thresholds during training and inference
Scalability – It is easy to add more models or new data sources
Thoughts on Why This Matters for Fraud Detection
Detecting fraud well means spotting fake activities accurately, acting fast to stop losses, and doing this regularly. Using Zerve's automation tools, we built a system that finds unusual activity quickly with little need for people to step in. This mix of accuracy and speed is key in high risk areas like stopping fraud.
Manual vs. Automated Fraud Detection
| Aspect | With Zerve | Without Zerve |
|---|---|---|
| Preprocessing | Manual coding of scaling and splitting for each run; prone to inconsistencies. | Automated z-score scaling and stratified splitting applied identically across training and inference. |
| Model Benchmarking | Models tested individually with separate scripts; inconsistent metrics and comparisons. | Multiple anomaly detection models benchmarked in one pipeline with standardized metrics and versioned reports. |
| Metric Collection | Metrics gathered manually; risk of missing or misreporting results. | Automated calculation of precision, recall, F1, AUPRC, and confusion matrices, fully logged and reproducible. |
| Model Selection | Manual review to choose the best model; selection criteria may vary. | Automated selection based on primary metric (F1), with AUPRC or recall tie breakers applied consistently. |
| Deployment | Requires manual packaging and setup for each deployment. | Best model automatically exported as a versioned .pkl file ready for production. |
| Inference | New data processing may not match training pipeline, increasing risk of drift. | Identical preprocessing applied to new data before scoring, ensuring consistency. |
| Retraining | Time consuming; requires re running and re validating the entire workflow manually. | Automatic retraining triggered by performance drop or data drift, with minimal intervention. |
| Scalability | Adding new models or data sources requires major code changes. | Pipeline easily extended to include more models or integrate new data feeds. |
| Audit & Compliance | Limited or inconsistent logs of past runs. | Full transformation, metric, and threshold logs version controlled for compliance. |
| Engineering Effort | High – repeated manual coding, tuning, and documentation. | Low – retraining and redeployment reduced to a single pipeline run. |
Experience Zerve Now
Join our free tier and explore what is possible.
FAQs
What makes fraud detection such a difficult problem?
Fraud cases make up less than 0.2% of transactions, creating extreme class imbalance. Detecting them requires precision, consistency, and scalable automation.
How does Zerve automate the fraud detection workflow?
Zerve standardizes preprocessing, benchmarks multiple models, tracks every metric, and automates retraining when performance drops, ensuring continuous improvement.
Which models were tested for fraud detection?
The workflow compared Isolation Forest, One-Class SVM, Autoencoder, Deep SVDD-style AE, HBOS, and Elliptic Envelope models, selecting the best performer automatically.
How does Zerve maintain accuracy over time?
By monitoring model performance and retraining automatically when results degrade, Zerve adapts to evolving fraud patterns without human intervention.
What are the main advantages of automating fraud detection with Zerve?
Automation improves speed, reproducibility, and scalability while reducing manual effort and ensuring consistent preprocessing, evaluation, and deployment.

