ParserError: Error Tokenizing Data โ How to Fix It
Answer
This error means pandas can't parse your CSV because rows have inconsistent column counts, there's a malformed line, or the delimiter is wrong. Fix it by specifying on_bad_lines='skip' to skip problem rows, setting the correct delimiter, or using engine='python' for more flexible parsing.
Why This Happens
CSVs in the wild are messy. Common causes: some rows have extra commas, quoted fields contain unescaped delimiters, the file uses a semicolon or tab instead of comma, there's a corrupted line mid-file, or header row doesn't match data rows. Pandas' default C parser is fast but strict โ it fails on any inconsistency.
Solution
The rule: when you hit a parser error, first check delimiter, then try on_bad_lines='warn' to see what's actually broken.
import pandas as pd
# โ Problematic: default parser fails on messy CSV
df = pd.read_csv('messy_file.csv')
# ParserError: Error tokenizing data. C error: Expected 5 fields in line 47, saw 6
# โ
Fixed: skip bad lines
df = pd.read_csv('messy_file.csv', on_bad_lines='skip')
# โ
Fixed: use python engine (slower but more forgiving)
df = pd.read_csv('messy_file.csv', engine='python', on_bad_lines='skip')
# โ
Fixed: specify correct delimiter if not comma
df = pd.read_csv('messy_file.csv', delimiter=';')
# โ
Debug: find the bad lines first
df = pd.read_csv('messy_file.csv', on_bad_lines='warn') # prints which lines failBetter Workflow
Zerve lets you iterate on parsing logic quickly โ run a cell, see what breaks, adjust parameters, re-run โ without restarting your whole environment. Faster feedback loop for wrangling messy files.
)
&w=1200&q=75)
&w=1200&q=75)
&w=1200&q=75)