Pandas

MemoryError When Reading Large CSV — How to Fix It

Answer

This error means your CSV is too large to fit in memory. Fix it by reading in chunks with chunksize, loading only the columns you need with usecols, or switching to a more memory-efficient library like Polars or Dask. You can also downsample data types with dtype to reduce memory footprint.

Why This Happens

When you call pd.read_csv(), pandas loads the entire file into RAM. If your file is 10GB and your machine has 8GB of RAM, it fails. Pandas also defaults to memory-heavy dtypes (e.g., int64 instead of int32, object instead of category), which makes the problem worse.

Solution

The rule: for files over 1GB, never use vanilla pd.read_csv() without chunking, column selection, or dtype optimization.

import pandas as pd

# ❌ Problematic: loading entire file at once
df = pd.read_csv('huge_file.csv')
# MemoryError

# ✅ Fixed: read in chunks and process iteratively
chunks = pd.read_csv('huge_file.csv', chunksize=100000)
for chunk in chunks:
    # process each chunk
    pass

# ✅ Fixed: load only columns you need
df = pd.read_csv('huge_file.csv', usecols=['col1', 'col2'])

# ✅ Fixed: optimize dtypes to reduce memory
df = pd.read_csv('huge_file.csv', dtype={'id': 'int32', 'category': 'category'})

# ✅ Alternative: use Polars for large files (much faster, lower memory)
import polars as pl
df = pl.read_csv('huge_file.csv')

Better Workflow

Zerve runs in the cloud with multiple compute options — Lambda, Fargate, GPU, Kubernetes — that can execute serverlessly or persistently. So instead of fighting your laptop's RAM limits, you can run large CSV jobs on infrastructure that can handle them.