Scikit Learn

ValueError: Input Contains NaN, Infinity or Value Too Large - How to Fix It

Answer

This sklearn error means your data has missing values, infinite values, or numbers too large for float64. Fix it by checking for NaN with np.isnan(), replacing infinities with np.isinf(), and either imputing missing values or dropping rows with problems before fitting your model.

Why This Happens

Scikit-learn models can't handle NaN or infinite values. Unlike pandas which tolerates missing data, sklearn requires clean numeric arrays. This commonly occurs after division by zero (creates inf), failed type conversions (creates NaN), or when your data pipeline doesn't handle missing values before model training.

Solution

The rule: always check for NaN and inf before fitting sklearn models. Use SimpleImputer for systematic missing value handling, or drop/clip problematic values.

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer

X = np.array([[1, 2], [np.nan, 3], [7, np.inf], [4, 5]])
y = [0, 1, 0, 1]

# ❌ Problematic: data contains NaN and inf
model = RandomForestClassifier()
model.fit(X, y)
# ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

# ✅ Debug: find the problems
print(f"NaN count: {np.isnan(X).sum()}")
print(f"Inf count: {np.isinf(X).sum()}")
print(f"NaN locations: {np.argwhere(np.isnan(X))}")
print(f"Inf locations: {np.argwhere(np.isinf(X))}")

# ✅ Fix 1: Replace inf with NaN, then impute
X_clean = np.where(np.isinf(X), np.nan, X)
imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(X_clean)
model.fit(X_imputed, y)

# ✅ Fix 2: Drop rows with any NaN or inf
mask = ~(np.isnan(X).any(axis=1) | np.isinf(X).any(axis=1))
X_clean = X[mask]
y_clean = np.array(y)[mask]
model.fit(X_clean, y_clean)

# ✅ Fix 3: For pandas DataFrames
df = pd.DataFrame(X, columns=['a', 'b'])
df = df.replace([np.inf, -np.inf], np.nan)
df = df.fillna(df.mean())
model.fit(df.values, y)

# ✅ Fix 4: Clip extreme values
X_clipped = np.clip(X, -1e10, 1e10)
X_clipped = np.nan_to_num(X_clipped, nan=0.0)
model.fit(X_clipped, y)

# ✅ Validate before fitting
def check_data(X):
    issues = []
    if np.isnan(X).any():
        issues.append(f"NaN: {np.isnan(X).sum()} values")
    if np.isinf(X).any():
        issues.append(f"Inf: {np.isinf(X).sum()} values")
    return issues if issues else ["Data is clean"]

print(check_data(X))

Better Workflow

In Zerve, your fitted model persists in its block's state. Create a training block that fits the model, then connect prediction blocks downstream. The visual DAG makes it obvious if you're missing the fitting step. Variables flow automatically between connected blocks, so you can't accidentally use an unfitted model from a different instance.