๐Ÿ€Zerve chosen as NCAA's Agentic Data Platform for 2026 Hackathonยท๐ŸงฎMeet the Zerve Team at Data Decoded Londonยท๐Ÿ“ˆWe're hiring โ€” awesome new roles just gone live!
Back
Pandas

ParserError: Error Tokenizing Data โ€” How to Fix It

Answer

This error means pandas can't parse your CSV because rows have inconsistent column counts, there's a malformed line, or the delimiter is wrong. Fix it by specifying on_bad_lines='skip' to skip problem rows, setting the correct delimiter, or using engine='python' for more flexible parsing.

Why This Happens

CSVs in the wild are messy. Common causes: some rows have extra commas, quoted fields contain unescaped delimiters, the file uses a semicolon or tab instead of comma, there's a corrupted line mid-file, or header row doesn't match data rows. Pandas' default C parser is fast but strict โ€” it fails on any inconsistency.

Solution

The rule: when you hit a parser error, first check delimiter, then try on_bad_lines='warn' to see what's actually broken.

import pandas as pd

# โŒ Problematic: default parser fails on messy CSV
df = pd.read_csv('messy_file.csv')
# ParserError: Error tokenizing data. C error: Expected 5 fields in line 47, saw 6

# โœ… Fixed: skip bad lines
df = pd.read_csv('messy_file.csv', on_bad_lines='skip')

# โœ… Fixed: use python engine (slower but more forgiving)
df = pd.read_csv('messy_file.csv', engine='python', on_bad_lines='skip')

# โœ… Fixed: specify correct delimiter if not comma
df = pd.read_csv('messy_file.csv', delimiter=';')

# โœ… Debug: find the bad lines first
df = pd.read_csv('messy_file.csv', on_bad_lines='warn')  # prints which lines fail

Better Workflow

Zerve lets you iterate on parsing logic quickly โ€” run a cell, see what breaks, adjust parameters, re-run โ€” without restarting your whole environment. Faster feedback loop for wrangling messy files.

Better workflow

Related Topics

Decision-grade data work

Explore, analyze and deploy your first project in minutes