Pandas

How to Handle Missing Values in Pandas

Answer

Use df.isna() to find missing values, df.dropna() to remove rows/columns with nulls, and df.fillna() to replace them with a value. Choose your strategy based on the data: drop if missing values are rare, fill with mean/median for numeric columns, or forward-fill for time series.

Why This Happens

Missing data breaks calculations, skews analysis, and crashes ML models. You need to either remove it or fill it with sensible values. The right approach depends on why data is missing and how much is gone — dropping 5% of rows is fine, dropping 50% destroys your dataset.

Solution

The rule: check how much is missing with isna().sum(), then decide — drop if <5%, fill with mean/median for numeric, fill with mode or 'Unknown' for categorical.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name': ['Alice', 'Bob', None, 'David'],
    'age': [25, np.nan, 35, 40],
    'salary': [50000, 60000, np.nan, 80000]
})

# ✅ Find missing values
df.isna()              # boolean mask
df.isna().sum()        # count per column
df.isna().sum().sum()  # total missing

# ✅ Drop rows with any missing values
df.dropna()

# ✅ Drop rows only if specific column is null
df.dropna(subset=['name'])

# ✅ Fill with a constant
df['age'].fillna(0)

# ✅ Fill with mean/median (numeric columns)
df['age'].fillna(df['age'].mean())
df['salary'].fillna(df['salary'].median())

# ✅ Forward-fill (good for time series)
df['salary'].fillna(method='ffill')

# ✅ Fill different columns with different values
df.fillna({'name': 'Unknown', 'age': df['age'].mean(), 'salary': 0})

Better Workflow

Zerve shows null counts and data summaries inline, so you can assess missing data patterns immediately after loading and iterate on your imputation strategy.