Abstract digital image showing a grid of dark circular nodes with one glowing orange diamond-shaped node in the center, symbolizing focus, activation, or a key process within a system.

How to Detect Rare Instances of Fraud Automatically, at a Vast Scale

See how automated orchestration cuts manual effort and makes anomaly detection faster, reproducible, and scalable.

Detecting fraud is tough because fraud cases are less than 0.2% of all transactions. Recently several of our customers have been working on similar use cases, and I wanted to share how they are doing it at a massive scale with an agentic workflow.

To demonstrate this, I built a detailed anomaly detection system to analyze the Credit Card Fraud Detection dataset from Kaggle. The goals were to test different anomaly detection methods, find the best one, and set up an easy process for retraining and deployment with little manual work.

You can follow along in this Zerve Canvas.

The Challenge: Extreme Class Imbalance in Anonymized Data

The dataset has 284,807 transactions from European cardholders. Only 492 of these are fraud, which is just 0.172%. Features V1 to V28 are hard to understand because they come from PCA projections. The dataset also includes real values for Time and Amount, and a Class label where 1 means fraud and 0 means a normal transaction. This data is high dimensional, anonymized, and very unbalanced, making it ideal for testing anomaly detection methods.

Why Automate the Workflow?

Doing this project manually means repeating preprocessing steps, adjusting thresholds by hand, collecting metrics inconsistently, and risking mistakes.

Using Zerve's agent for Data Science to build an automated workflow, it allowed me to:

Apply the same preprocessing for both training and inference
Test multiple models in a consistent way
Record every change, threshold, and metric
Retrain and redeploy with just one pipeline run

Data Understanding & Preprocessing

We used z-score scaling at the beginning to standardize all numerical features for consistent results. The data was split into training and testing sets using stratified sampling, 80% for training and 20% for testing, to keep the rare fraud cases balanced. These steps were included in a reusable Zerve pipeline that processes all new batches of transactions in the same way.

Model Portfolio and Training Approach

We tested different unsupervised and supervised models to detect anomalies:

Isolation Forest – Uses trees to score anomalies based on path length
One-Class SVM – Separates normal transactions by defining boundaries
Autoencoder – Detects anomalies by measuring reconstruction errors
Deep SVDD-style AE – Learns a tight feature space around normal data
Elliptic Envelope – Assumes normal data follows a multivariate Gaussian distribution
HBOS – Scores outliers using histograms

We trained the models using only Class=0 (normal) transactions so they could learn typical patterns. For neural network models, we used a validation set to choose anomaly thresholds dynamically.

Screenshot of a Zerve notebook showing Python code for training and evaluating an Isolation Forest model with Scikit-learn. The output panel displays performance metrics including AUPRC 0.16108, precision 0.03534, recall 0.83673, and F1-score 0.06782, along with a confusion matrix and a precision-recall curve labeled “Isolation Forest Fraud Detection.”

Screenshot of a Zerve notebook block showing Python code for training and evaluating a One-Class SVM model using Scikit-learn. The output includes fraud detection metrics such as AUPRC (0.35654), precision (0.12), recall (0.83), F1-score (0.21), and a confusion matrix, along with a plotted precision-recall curve labeled “One-Class SVM Fraud Detection.”

Screenshot of a Zerve notebook showing Python code for training and evaluating an autoencoder using Scikit-learn’s MLPRegressor. The output panel lists iteration numbers and corresponding loss values, illustrating the model’s convergence over time during training on legitimate (non-fraud) data.

Screenshot of a Zerve notebook showing Python code for training and evaluating an Elliptic Envelope model using Scikit-learn. The output panel includes metrics with AUPRC of 0.01587, zero precision, recall, and F1-score, along with a confusion matrix and a precision-recall curve indicating poor model performance on fraud detection.

Screenshot of a Zerve notebook showing Python code for training and evaluating an HBOS (Histogram-Based Outlier Score) model using PyOD and Scikit-learn. The output includes AUPRC 0.28802, precision 0.01522, recall 0.90816, and F1-score 0.02995, with a confusion matrix and a precision-recall curve labeled “HBOS Fraud Detection.”

Automated Benchmarking and Selection

The pipeline used the stratified test set to make predictions and calculated:

Precision
Recall
F1-score (main metric due to class imbalance)
Area Under the Precision-Recall Curve (AUPRC)
Confusion matrix

Zerve created a versioned report that gathered all results into one document, making it easier to compare models. The selection started by looking at the F1-score, then used AUPRC or recall to decide if F1-scores were tied. The best model was saved as a versioned .pkl file for production.

Automated Inference and Retraining

The production system loads the selected model and applies the same preprocessing steps before scoring new data. Scoring can be done either daily in batches or whenever needed.

Here's how it works:

Load the chosen model.
Preprocess data just like during training.
Score incoming data using the preprocessed information.

When new labels are available, the system calculates and records performance metrics with timestamps. If it detects a drop in performance due to changes or feedback, it automatically retrains itself. This helps the system keep up with evolving fraud patterns.

Screenshot of a Zerve notebook block showing Python code that aggregates model metrics and outputs a comparison table. The table lists six unsupervised models: Isolation Forest, One-Class SVM, Autoencoder, Deep SVDD-like, HBOS, and Elliptic Envelope, along with their AUPRC, Precision, Recall, and F1 scores. The Autoencoder and Deep SVDD-like models show the strongest overall performance.

A composite image showing six confusion matrices comparing model performance: Isolation Forest, One-Class SVM, Autoencoder, Deep SVDD-like, HBOS, and Elliptic Envelope. Each matrix visualizes true versus predicted labels for the test set, with most true negatives concentrated in the top-left corner and minimal false positives.

Screenshot of three connected Zerve blocks showing a machine learning workflow for loading test data, restoring a trained model, and evaluating predictions. The outputs display dataset shape, model type (MLPRegressor), and prediction summary including total samples and percentage predicted as fraud.

Productivity Gains

Switching from manual coding to automated orchestration offers many benefits, such as:

Time Saving – No need to repeat setup for each retrain
Reproducibility – Complete logs of transformations and metrics for every run
Consistency – It uses the same rules and thresholds during training and inference
Scalability – It is easy to add more models or new data sources

Thoughts on Why This Matters for Fraud Detection

Detecting fraud well means spotting fake activities accurately, acting fast to stop losses, and doing this regularly. Using Zerve's automation tools, we built a system that finds unusual activity quickly with little need for people to step in. This mix of accuracy and speed is key in high risk areas like stopping fraud.

Manual vs. Automated Fraud Detection

Aspect	With Zerve	Without Zerve
Preprocessing	Manual coding of scaling and splitting for each run; prone to inconsistencies.	Automated z-score scaling and stratified splitting applied identically across training and inference.
Model Benchmarking	Models tested individually with separate scripts; inconsistent metrics and comparisons.	Multiple anomaly detection models benchmarked in one pipeline with standardized metrics and versioned reports.
Metric Collection	Metrics gathered manually; risk of missing or misreporting results.	Automated calculation of precision, recall, F1, AUPRC, and confusion matrices, fully logged and reproducible.
Model Selection	Manual review to choose the best model; selection criteria may vary.	Automated selection based on primary metric (F1), with AUPRC or recall tie breakers applied consistently.
Deployment	Requires manual packaging and setup for each deployment.	Best model automatically exported as a versioned .pkl file ready for production.
Inference	New data processing may not match training pipeline, increasing risk of drift.	Identical preprocessing applied to new data before scoring, ensuring consistency.
Retraining	Time consuming; requires re running and re validating the entire workflow manually.	Automatic retraining triggered by performance drop or data drift, with minimal intervention.
Scalability	Adding new models or data sources requires major code changes.	Pipeline easily extended to include more models or integrate new data feeds.
Audit & Compliance	Limited or inconsistent logs of past runs.	Full transformation, metric, and threshold logs version controlled for compliance.
Engineering Effort	High – repeated manual coding, tuning, and documentation.	Low – retraining and redeployment reduced to a single pipeline run.

Experience Zerve Now

Join our free tier and explore what is possible.

FAQs

What makes fraud detection such a difficult problem?

Fraud cases make up less than 0.2% of transactions, creating extreme class imbalance. Detecting them requires precision, consistency, and scalable automation.

How does Zerve automate the fraud detection workflow?

Zerve standardizes preprocessing, benchmarks multiple models, tracks every metric, and automates retraining when performance drops, ensuring continuous improvement.

Which models were tested for fraud detection?

The workflow compared Isolation Forest, One-Class SVM, Autoencoder, Deep SVDD-style AE, HBOS, and Elliptic Envelope models, selecting the best performer automatically.

How does Zerve maintain accuracy over time?

By monitoring model performance and retraining automatically when results degrade, Zerve adapts to evolving fraud patterns without human intervention.

What are the main advantages of automating fraud detection with Zerve?

Automation improves speed, reproducibility, and scalability while reducing manual effort and ensuring consistent preprocessing, evaluation, and deployment.

Kreshnaa Raam

Kreshnaa is Lead Data Scientist at Zerve.

Don't miss out

Abstract digital image showing a stack of horizontal bars fanning out from a central base, illuminated with a blue glow on a dark grid background.

Thought Leadership

What the 2025 Stack Overflow Survey Really Tells Us About Data Science

The latest survey confirms what the industry already knows: data scientists code and build in modern, production-grade environments.

Zerve Marketing

October 10th 2025

Thought Leadership

Context-Aware AI: What It Means for Data Science

Context-aware AI helps data scientists work the way they think, carrying knowledge forward across experiments instead of starting from zero each time.

VIDEO: The Cursor Moment for Data Science, Context at the Core

Zerve’s co-founders showcased how agentic workflows outperform generic coding assistants by running real EDA, ETL, and modeling tasks with live code execution.

Zerve Marketing

September 18th 2025

Build something you can ship

Explore, analyze and deploy your first project in minutes

How to Detect Rare Instances of Fraud Automatically, at a Vast Scale

The Challenge: Extreme Class Imbalance in Anonymized Data

Why Automate the Workflow?

Data Understanding & Preprocessing

Model Portfolio and Training Approach

Automated Benchmarking and Selection

Automated Inference and Retraining

Productivity Gains

Thoughts on Why This Matters for Fraud Detection

Manual vs. Automated Fraud Detection

Experience Zerve Now

FAQs

What makes fraud detection such a difficult problem?

How does Zerve automate the fraud detection workflow?

Which models were tested for fraud detection?

How does Zerve maintain accuracy over time?

What are the main advantages of automating fraud detection with Zerve?

Related Articles

What the 2025 Stack Overflow Survey Really Tells Us About Data Science

Context-Aware AI: What It Means for Data Science

VIDEO: The Cursor Moment for Data Science, Context at the Core

Build something you can ship